Continue with Machine Learning - Noise Detection (Classification)
This post hasn't been updated for 6 years
Noise has pattern that we can identify. If our model is good enough to classify a specific group of noises such as gun shot, mirror broken, car horn, ..., then we can use the model in a very useful ways such as to identify crime or abnormality in a running machine by just detecting the noise.
Today we'll build a noise classifier based on the data from https://drive.google.com/drive/folders/0By0bAi7hOBAFUHVXd1JCN3MwTEU The data is the collection of noises from 10 classes. Our goal is to learn from the training data in order to identify one among those noises.
Note that we'll be using librosa libray to audio analysing. We need to convert wave form to interval of frequency. Here we change each noise into MFCCs (Mel Frequency Cepstral Coefficient), which is an array of some numbers.
All we'll be using Deep Learning algorithm to train our data with [keras](https://keras.io/)
and [tensorflow](https://github.com/tensorflow/tensorflow)
libraries.
We'll skip the installation process by going directly into the implementation.
Understand Data
The attributes of data are as follows: ID – Unique ID of sound excerpt Class – type of sound
Each noise is a wave file with .wav extension. Try to listen to some noises.
import IPython.display as ipd
ipd.Audio('data/Train/2022.wav')
Output
Now let's plot some waves to see their patterns.
% pylab inline
import os
import pandas as pd
import librosa
import glob
import librosa.display
data, sampling_rate = librosa.load('data/Train/1045.wav')
plt.figure(figsize=(12, 4))
librosa.display.waveplot(data, sr=sampling_rate)
Output
Let's check some wave plots and observe its patterns. Pay attention to how the patterns of wave of the same class looks similar to each other.
import time
train = pd.read_csv('data/train.csv')
data_dir = os.getcwd() + '/data'
def load_wave():
i = random.choice(train.index)
audio_name = train.ID[i]
path = os.path.join(data_dir, 'Train', str(audio_name) + '.wav')
print('Class: ', train.Class[i])
x, sr = librosa.load(path)
plt.figure(figsize=(12, 4))
librosa.display.waveplot(x, sr=sr)
plt.show()
for i in range(10):
load_wave()
Output
Class: jackhammer
Class: street_music
Class: gun_shot
Class: street_music
Class: siren
Class: engine idling
Class: children_playing
Class: gun_shot
Class: children_playing
Class: children_playing
Let's check number of values for each class:
train.Class.value_counts()
Output:
jackhammer 668
engine_idling 624
siren 607
children_playing 600
street_music 600
drilling 600
air_conditioner 600
dog_bark 600
car_horn 306
gun_shot 230
Name: Class, dtype: int64
Let's create function to convert wave to mfcc.
def parser(row):
# function to load files and extract features
file_name = os.path.join(os.path.abspath(data_dir), 'Train', str(row.ID) + '.wav')
# handle exception to check if there isn't a file which is corrupted
try:
# here kaiser_fast is a technique used for faster extraction
X, sample_rate = librosa.load(file_name, res_type='kaiser_fast')
# we extract mfcc feature from data
mfccs = np.mean(librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=40).T,axis=0)
except Exception as e:
print("Error encountered while parsing file: ", file)
return None, None
feature = mfccs
label = row.Class
return [feature, label]
temp = train.apply(parser, axis=1)
temp.columns = ['feature', 'label']
temp.head(2)
Output
feature label
0 [-82.12358939071989, 139.5059159813099, -42.43... siren
1 [-15.744005405358056, 124.1199599305049, -29.4... street_music
2 [-123.39365145003913, 15.181946313102896, -50.... drilling
3 [-213.27878814908152, 89.32358896182456, -55.2... siren
4 [-237.92647882472895, 135.90246127730546, 39.2... dog_bark
[array([-82.12358939, 139.50591598, -42.43086489, 24.82786139,
-11.62076447, 23.49708426, -12.19458986, 25.89713885,
-9.40527728, 21.21042898, -7.36882138, 14.25433903,
-8.67870015, 7.75023765, -10.1241154 , 3.2581183 ,
-11.35261914, 2.80096779, -7.04601346, 3.91331351,
-2.3349743 , 2.01242254, -2.79394367, 4.12927394,
-1.62076864, 4.32620082, -1.03440959, -1.23297714,
-3.11085341, 0.32044827, -1.787786 , 0.44295495,
-1.79164752, -0.76361758, -1.24246428, -0.27664012,
0.65718559, -0.50237115, -2.60428533, -1.05346291]), array([-15.74400541, 124.11995993, -29.42888126, 39.44719325,
-23.50191209, 16.55081468, -21.73682007, 16.533573 ,
-16.97172924, 4.48358393, -17.38768904, 0.73712233,
-16.28922845, 5.11214906, -10.55923116, 2.91787297,
-10.39084829, 0.6512996 , -10.04633806, -1.78348022,
-6.09971424, 5.62978658, -4.65111382, -1.3691931 ,
-8.24916556, -2.36192798, -4.79620618, -0.50256975,
-5.41067503, 2.07804459, 7.18600337, 8.1857473 ,
0.76736086, 0.32726166, -2.21366512, -3.1068377 ,
-5.72384895, 0.82370563, 1.7193221 , -0.33146235]), array([-123.39365145, 15.18194631, -50.09332904, 7.14187248,
-26.81703338, -0.69250356, -8.22307572, 13.51293887,
-11.38205589, 19.94935211, -11.19345959, 9.59290493,
-8.26916969, 4.59170834, -4.1160931 , -0.12661012,
-9.26636096, 12.86464874, -6.76813103, 0.17970622,
-5.58614496, 6.82406367, -7.44342262, 6.7138549 ,
0.88696144, 7.95247415, -7.80404736, 4.75135726,
-5.91704383, -0.51082848, -2.89312164, 3.75250478,
-4.3756492 , 5.6246255 , -4.87082627, 1.88768287,
-3.88603327, 1.57439023, -3.9967419 , 3.24574944]), array([-2.13278788e+02, 8.93235890e+01, -5.52561899e+01, 1.26320999e+01,
-4.77753793e+01, 1.47029095e+01, 1.90393420e+01, 1.59744018e+01,
-3.44622589e-01, -3.85278488e+00, -5.71352012e+00, 1.45023797e+01,
7.35625117e+00, 2.77150926e+00, -1.21664120e+01, -7.54264413e+00,
-1.07718201e+00, -8.32686797e+00, 1.13106934e+01, 1.30546411e+01,
-1.24408594e+01, -1.77511089e+01, 5.21535203e-02, 2.10836481e+00,
5.23872284e+00, 9.59365485e+00, -3.76473392e+00, -2.26955796e+00,
-6.74119185e+00, -8.63759137e+00, 1.20566440e+01, -4.87823860e+00,
-5.16628578e+00, 1.11927687e+00, -2.28136843e+00, 5.82950793e+00,
1.19403556e+00, 7.46546796e+00, -1.78587829e+00, -1.50114553e+01]), array([-2.37926479e+02, 1.35902461e+02, 3.92684403e+01, 2.12402387e+01,
9.53132848e+00, 1.38851206e+01, -3.99444661e+00, 1.24814870e+01,
-2.60462664e+00, 6.07091558e+00, 2.23836723e+00, 4.17497228e+00,
-1.90301314e+00, 2.30779460e+00, -2.66080009e+00, -6.64915491e-01,
4.49824368e+00, 3.77204298e+00, 3.37126391e+00, 1.59958680e+00,
-5.34918903e-01, -1.95140379e-01, 6.27361166e-01, 3.20973300e+00,
1.33894133e+00, 1.04329816e-01, 9.07274421e-01, -2.27093950e+00,
-5.29897600e-01, 2.00067343e-01, 5.41832293e-01, -3.01083238e-01,
5.31196471e-01, -5.16148851e-01, 1.73844662e+00, 1.10963680e+00,
2.75794074e+00, 2.15940254e+00, 6.26566792e-01, 6.92017477e-01])]
Let's convert label to dummy variable
from sklearn.preprocessing import LabelEncoder
from keras.utils import np_utils
X = np.array(temp.feature.tolist())
y = np.array(temp.label.tolist())
lb = LabelEncoder()
print(lb.fit_transform(y)[:5])
y = np_utils.to_categorical(lb.fit_transform(y))
print(y.shape)
print(y[:10])
Output
[8 9 4 8 3]
(5435, 10)
[[0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
[0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
[0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
[0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
[0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
[0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]]
Now let's build the model:
model = Sequential()
model.add(Dense(256, input_shape=(40,)))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(256))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_labels))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam')
model.fit(X, y, batch_size=32, epochs=200, validation_split=0.1, shuffle=True)
Here is some last few output:
Epoch 196/200
4891/4891 [==============================] - 0s 84us/step - loss: 0.3572 - acc: 0.8779 - val_loss: 0.3988 - val_acc: 0.8989
Epoch 197/200
4891/4891 [==============================] - 0s 94us/step - loss: 0.3450 - acc: 0.8867 - val_loss: 0.3363 - val_acc: 0.9044
Epoch 198/200
4891/4891 [==============================] - 0s 84us/step - loss: 0.3603 - acc: 0.8796 - val_loss: 0.3784 - val_acc: 0.8934
Epoch 199/200
4891/4891 [==============================] - 0s 84us/step - loss: 0.3349 - acc: 0.8902 - val_loss: 0.3603 - val_acc: 0.8952
Epoch 200/200
4891/4891 [==============================] - 0s 84us/step - loss: 0.3700 - acc: 0.8814 - val_loss: 0.3686 - val_acc: 0.8952
<keras.callbacks.History at 0x7f6fb6eb6b00>
We'll see that accuracy rate is 0.8814, and validation accuracy 0.8952 the loss is around 0.37.
Pretty good result. But I think we can tune to make better result!
Conclusion
We cannot work on wave sound directly so we need to convert it to digital format first. The we use deep learning to learn from those pattern to predict the other noises.
All Rights Reserved