音频领域常用的谱特征

76 阅读2分钟

目录

  • 谱特征
  • 最后

本文详细列举一些谱特征的公式定义,做业务的时候,再也不用为脑海里捉襟见肘的特征发愁了!!!

关于谱特征,其广泛的使用于机器学习和深度学习中,在乐器分类,音色分析,端点侦测,情绪识别,语音活动检测等等各种业务中都有大量的实践应用。

这些谱特征都是频域数据各个维度高度抽象、总结、量化的结果,为业务后续研发提供思维的燃料,脑海里有没有很重要,至于烧不烧、怎么烧是另外一回事,但前提是先备好这些"燃料",幸运的是,audioFlux项目提供下面所列谱特征几乎所有的支持,感兴趣的小伙伴后续可以用其做不同的测试以加深理解。

谱特征

b1,b2b_1 , b_2 为频带bin边界,fkf_k单位Hz,sks_k为频谱值,可以 magnitud spectrum或power spectrum

1. Spectral Centroid

μ1=k=b1b2fkskk=b1b2sk\mu_1=\frac{\sum_{ k=b_1 }^{b_2} f_ks_k } {\sum_{k=b_1}^{b_2} s_k }

2. Spectral Spread

μ2=k=b1b2(fkμ1)2skk=b1b2sk \mu_2=\sqrt{\frac{\sum_{ k=b_1 }^{b_2} (f_k-\mu_1)^2 s_k } {\sum_{k=b_1}^{b_2} s_k } }

3. Spectral Skewness

μ3=k=b1b2(fkμ1)3sk(μ2)3k=b1b2sk\mu_3=\frac{\sum_{ k=b_1 }^{b_2} (f_k-\mu_1)^3 s_k } {(\mu_2)^3 \sum_{k=b_1}^{b_2} s_k }

4. Spectral Kurtosis

μ4=k=b1b2(fkμ1)4sk(μ2)4k=b1b2sk\mu_4=\frac{\sum_{ k=b_1 }^{b_2} (f_k-\mu_1)^4 s_k } {(\mu_2)^4 \sum_{k=b_1}^{b_2} s_k }

5. Spectral Entropy

pk=skk=b1b2skp_k=\frac{s_k}{\sum_{k=b_1}^{b_2}s_k}

entropy1=k=b1b2pklog(pk)log(b2b1)entropy1= \frac{-\sum_{ k=b_1 }^{b_2} p_k \log(p_k)} {\log(b_2-b_1)}

entropy2=k=b1b2pklog(pk)entropy2= {-\sum_{ k=b_1 }^{b_2} p_k \log(p_k)}

6. Spectral Flatness

flatness=(k=b1b2sk)1b2b11b2b1k=b1b2skflatness=\frac{\left ( \prod_{k=b_1}^{b_2} s_k \right)^{ \frac{1}{b_2-b_1} } } {\frac{1}{b_2-b_1} \sum_{ k=b_1 }^{b_2} s_k}

7. Spectral Crest

crest=max(sk[b1,b2])1b2b1k=b1b2skcrest =\frac{max(s_{k\in_{[b_1,b_2]} }) } {\frac{1}{b_2-b_1} \sum_{ k=b_1 }^{b_2} s_k}

8. Spectral Flux

flux(t)=(k=b1b2sk(t)sk(t1)p)1pflux(t)=\left( \sum_{k=b_1}^{b_2} |s_k(t)-s_k(t-1) |^{p} \right)^{\frac{1}{p}}

一般情况下sk(t)sk(t1)s_k(t) \geq s_k(t-1)参与计算

9. Spectral Slope

slope=k=b1b2(fkμf)(skμs)k=b1b2(fkμf)2slope=\frac{ \sum_{k=b_1}^{b_2}(f_k-\mu_f)(s_k-\mu_s) } { \sum_{k=b_1}^{b_2}(f_k-\mu_f)^2 }

μf\mu_f平均频率值,μs\mu_s平均频谱值

10. Spectral Decrease

decrease=k=b1+1b2sksb1k1k=b1+1b2skdecrease=\frac { \sum_{k=b_1+1}^{b_2} \frac {s_k-s_{b_1}}{k-1} } { \sum_{k=b_1+1}^{b_2} s_k }

11. Spectral Rolloff

k=b1iskηk=b1b2sk\sum_{k=b_1}^{i}|s_k| \geq \eta \sum_{k=b_1}^{b_2}s_k

η(0,1)\eta \in (0,1),一般取0.95或0.85,满足条件ii获得fif_i滚降频率

12. Spectral bandwidth

centroid=k=b1b2fkskk=b1b2skcentroid =\frac{\sum_{ k=b_1 }^{b_2} f_ks_k } {\sum_{k=b_1}^{b_2} s_k }

bandwidth=(k=b1b2sk(fkcentroid)p)1pbandwidth=\left(\sum_{k=b_1}^{b_2} s_k(f_k-centroid)^p \right)^{\frac{1}{p}}

13. Spectral Energy相关

energy=n=1Nx2[n]=1Nm=1NX[m]2\qquad energy=\sum_{n=1}^N x^2[n] =\frac{1}{N}\sum_{m=1}^N |X[m]|^2

rms=1Nn=1Nx2[n]=1N2m=1NX[m]2 \qquad rms=\sqrt{ \frac{1}{N} \sum_{n=1}^N x^2[n] }=\sqrt {\frac{1}{N^2}\sum_{m=1}^N |X[m]|^2 }

le=log10(1+γ×energy) \qquad le=\log_{10}(1+\gamma \times energy)γ(0,)\gamma \in (0,\infty),表示数据的loglog压缩

pk=skk=b1b2sk \qquad p_k=\frac{s_k}{\sum_{k=b_1}^{b_2}s_k}

entropy2=k=b1b2pklog(pk)\qquad entropy2= {-\sum_{ k=b_1 }^{b_2} p_k \log(p_k)}

eef=1+energy×entropy2\qquad eef=\sqrt{ 1+| energy\times entropy2| }

eer=1+leentropy2\qquad eer=\sqrt{ 1+\left| \cfrac{le}{entropy2}\right| }

14. Spectral Novelty相关

hfc(t)=k=b1b2sk(t)kb2b1+1\qquad hfc(t)=\frac{\sum_{k=b_1}^{b_2} s_k(t)k }{b_2-b_1+1}

flux(t)=(k=b1b2sk(t)sk(t1)p)1p\qquad flux(t)=\left( \sum_{k=b_1}^{b_2} |s_k(t)-s_k(t-1) |^{p} \right)^{\frac{1}{p}}

sd(t)=flux(t) \qquad sd(t)=flux(t),满足sk(t)sk(t1) s_k(t) \ge s_k(t-1)计算,p=2p=2,结果不再1/p1/p

sf(t)=flux(t) \qquad sf(t)=flux(t),满足sk(t)sk(t1) s_k(t) \ge s_k(t-1)计算,p=1p=1

mkl(t)=k=b1b2log(1+sk(t)sk(t1))\qquad mkl(t)=\sum_{k=b_1}^{b_2} \log\left(1+ \cfrac {s_k(t)}{s_k(t-1)} \right)

ψk(t)\qquad \psi_k(t)设为t时刻k点的相位函数

ψk(t)=ψk(t)ψk(t1)\qquad \psi_k^{\prime}(t)=\psi_k(t)-\psi_k(t-1)

ψk(t)=ψk(t)ψk(t1)=ψk(t)2ψk(t1)+ψk(t2) \qquad \psi_k^{\prime\prime}(t)=\psi_k^{\prime}(t)-\psi_k^{\prime}(t-1) = \psi_k(t)-2\psi_k(t-1)+\psi_k(t-2)

pd(t)=k=b1b2ψk(t)b2b1+1\qquad pd(t)= \frac {\sum_{k=b_1}^{b_2} \| \psi_k^{\prime\prime}(t) \|} {b_2-b_1+1}

wpd(t)=k=b1b2ψk(t)sk(t)b2b1+1\qquad wpd(t)= \frac {\sum_{k=b_1}^{b_2} \| \psi_k^{\prime\prime}(t) \|s_k(t)}{b_2-b_1+1}

nwpd(t)=wpdμs\qquad nwpd(t)= \frac {wpd} {\mu_s}μs\mu_ssk(t)s_k(t)平均值

αk(t)=sk(t)ej(2ψk(t)ψk(t1))\qquad \alpha_k(t)=s_k(t) e^{j(2\psi_k(t)-\psi_k(t-1))}

βk(t)=sk(t)ejψk(t)\qquad \beta_k(t)=s_k(t) e^{j\psi_k(t)}

cd(t)=k=b1b2βk(t)αk(t1)\qquad cd(t)=\sum_{k=b_1}^{b_2} \| \beta_k(t)-\alpha_k(t-1) \|

rcd(t)=cd \qquad rcd(t)=cd,满足sk(t)sk(t1)s_k(t) \geq s_k(t-1)时参与求和计算

15. Novelty Method 相关

subk(t)=sk(t)sk(t1)\qquad sub_k(t)= s_k(t)-s_k(t-1)

entropyk(t)=log(sk(t)sk(t1)\qquad entropy_k(t)= \log \left( \frac {s_k(t)}{s_k(t-1} \right)

klk(t)=sk(t)log(sk(t)sk(t1)\qquad kl_k(t)= s_k(t) \log \left( \frac {s_k(t)}{s_k(t-1} \right)

isk(t)=sk(t)sk(t1)log(sk(t)sk(t1)1\qquad is_k(t)= \frac {s_k(t)}{s_k(t-1)} - \log \left( \frac {s_k(t)}{s_k(t-1} \right)-1

fk=subk,entropyk,,iskgk=log(1+γfk)\qquad f_k=sub_k,entropy_k,\cdots,is_k \quad g_k=\log(1+\gamma f_k) ,满足fk(t)0f_k(t) \ge 0,γ>0\gamma >0

vk=fk,gk \qquad v_k=f_k,g_k

V(t)=k=b1b2vk(t)\qquad \mathcal{V}(t)=\sum_{k=b_1}^{b_2}v_k(t), 满足 vk(t)αv_k(t) \ge \alpha 时计算,一般α0\alpha \ge 0

\qquad

V(t)=i[vk[b1,b2](t)]\qquad \mathcal{V}(t) =i[v_{k_{\in [b_1,b_2]}} (t) ],满足 vk(t)αv_k(t) \ge \alpha时个数统计,一般α0\alpha \ge 0

broadband\qquad broadband 使用i[entropyk]i[entropy_k]

最后

以上谱特征只是频域数据常用的部分特征,可以在此基础上实现更为高级的音色听觉特征如roughness,hardness,brightness等等各种***ness音色感知特征。

14和15包含丰富多样的各种维度的Novelty相关方法,干货满满,每一个单独拎出来都可以作为一篇论文发表,建议使用audioFlux做详细的测试,一定会有不少的收获。

下面是一张使用audioFlux测试的部分特征效果图。

bi8.png