2

I am working on a Baby Crying Detection model using logistic regression.

Out of $581$ audios, $222$ are of a baby crying. Each audio is of $5$ seconds.

what I have done is convert each audio into numbers. and those numbers go into a .csv file. so first I took $100$ samples from each audio, then $1000$ samples, and then all $110250$ samples into a .csv file, and at the end of each of them was a number 1 (crying) or 0 (not crying). Then I trained the model using logistic regression from that .csv file.

The Problem I m facing is that with $100$ samples the 64% accuracy on each audio, while with 1000 samples and 110250 samples(Full dataset) it reaches to 66% accuracy only. How can I improve the accuracy of my model to upto 80% using logistic regression.

I can only use simple logistic regression because I have to deploy the model on Arduino.

Faizy
  • 1,144
  • 1
  • 8
  • 30

1 Answers1

2

Try Rectification

Improve the features available to your model, Remove some of the NOISE present in the data.

  • In audio data, a common way to do this is to smooth the data and then rectify it so that the total amount of sound energy over time is more distinguishable.
# Rectify the audio signal
audio_rectified = audio.apply(np.abs)
  • You can also calculate the absolute value of each time point. This is also called Rectification because you ensure that all time points are positive.

  • Smooth your data by taking the rolling mean in a window of say 50 samples

audio_rectified_smooth = audio_rectified.rolling(50).mean()
  • Calculating the envelope of each sound and smoothing it will eliminate much of the noise and you have a cleaner signal.

Calculate Spectrogram

  • Calculate a spectrogram of sound(i.e combining of windows Fourier transforms). This describes what spectral content (e.g., low and high pitches) are present in the sound over time. there is a lot more information in a spectrogram compared to a raw audio file. By computing the spectral features, you have a much better idea of what's going on.

    • this is similar to how we calculate rolling mean :

      • We calculate multiple Fourier transforms in a sliding window to see how it changes over time. For each time point, we take a window of time around it, calculate a Fourier transform for the window, then slide to the next window. The result is a description of the Fourier transform as it changes throughout the time-series called a short-time Fourier transform or STFT.

        • Choose a windows size and shape
        • At a timepoint, calculate the FFT for that window
        • Slide the window over by one
        • Aggregate the results
    • Calculating the STFT

      • We can calculate the STFT with librosa $\rightarrow$ import librosa as lr
      • There are several parameters we can tweak (such as window size)
      • For our purposes, we'll convert into decibels which normalizes the average values of all frequencies.
      • We can then visualize it with the specshow() function

This is how you can calculate the STFT

# Calculating the STFT
# Import the functions we'll use for the STFT
from librosa.core import stft, amplitude_to_db
from librosa.display import specshow

Calculate our STFT

HOP_LENGTH = 24 SIZE_WINDOW = 27 audio_spec = stft(audio, hop_length=HOP_LENGTH, n_fft=SIZE_WINDOW)

Convert into decibels for visualization

spec_db = amplitude_to_db(audio_spec)

Visualize

specshow(spec_db, sr=sfreq, x_axis='time', y_axis='hz', hop_length=HOP_LENGTH)

Try Spectral feature engineering

you can also perform the Spectral feature engineering on your baby audio data

  • since each time series has a different spectral pattern.
  • We can calculate these spectral patterns by analyzing the spectrogram.
  • For example, spectral bandwidth and spectrum centroids describe where most of the energy is at each moment in time.
# Calculate the spectral centroid and bandwidth for the spectrogram
bandwidths = lr.feature.spectral_bandwidth(S=spec)[0]
centroids = lr.feature.spectral_centroid(S=spec)[0]

Display these features on top of the spectrogram

ax = specshow(spec, x_axis='time', y_axis='hz', hop_length=HOP_LENGTH) ax.plot(times_spec, centroids) ax.fill_between(times_spec, centroids - bandwidths / 2,centroids + bandwidths / 2, alpha=0.5)

Now you can Combine spectral(spec $\rightarrow$ spectral dataframe) and temporal features in a classifier

centroids_all = []
bandwidths_all = []

for spec in spectrograms: bandwidths = lr.feature.spectral_bandwidth(S=lr.db_to_amplitude(spec)) centroids = lr.feature.spectral_centroid(S=lr.db_to_amplitude(spec)) # Calculate the mean spectral bandwidth bandwidths_all.append(np.mean(bandwidths)) # Calculate the mean spectral centroid centroids_all.append(np.mean(centroids))

Create our X matrix

X = np.column_stack([means, stds, maxs, tempo_mean, tempo_max, tempo_std, bandwidths_all, centroids_all])

for your logistic regression models

  • One of the ways to improve accuracy is by optimizing the prediction probability cutoff scores generated by your logit model.
  • You can Normalize all your features to the same scale before putting them in a machine learning model.
  • Look for class imbalance in your data.
  • You can Optimize on other Metrics also such as Log Loss and F1-Score.
  • Tune the hyperparameters of your model. In the case of LogisticRegression, parameter $C$ is a hyperparameter.
Faizy
  • 1,144
  • 1
  • 8
  • 30