import os
import numpy as np
import librosa
import librosa.display
import matplotlib.pyplot as plt
min_level_db= -100
def normalize_mel(S):
return np.clip((S-min_level_db)/-min_level_db,0,1)
def feature_extraction(path):
y = librosa.load(path,16000)[0]
S = librosa.feature.melspectrogram(y=y, n_mels=80, n_fft=512, win_length=400, hop_length=160) # 320/80
norm_log_S = normalize_mel(librosa.power_to_db(S, ref=np.max))
return norm_log_S
a = feature_extraction('sample1.wav')
librosa.display.specshow(a, y_axis='mel', x_axis='time')
plt.colorbar(format='%+2.0f dB')
plt.title('Mel-Spectrogram')
plt.tight_layout()
plt.savefig('Mel-Spectrogram example.png')
plt.show()
'Domain Knowledge > Speech' 카테고리의 다른 글
다채널 음성인식을 위한 Multi-channel speech processing(Spatial information/ Beamformer) (0) | 2020.08.04 |
---|---|
MFCC(Mel Frequency Cepstrum Coefficient)의 python구현과 의미 (2) | 2020.08.03 |
LPC(Linear Prediction Coding, 선형예측부호화)와 formant estimation (0) | 2020.07.31 |
Pitch detction(ACF, AMDF, Cepstrum) (0) | 2020.07.31 |
Speech production and perception(음성의 생성과 인지) (0) | 2020.07.31 |