Environmental sound classification with dilated convolutions



Authors: Yan Chen, Qian Guo, Xinyan Liang, Jiang Wang, Yuhua Qian
Abstract:In sound information retrieval (SIR) area, environmental sound classification (ESC) emerges as a new issue, which aims at classifying environments by analysing the complex features extracted from the various sound data. As one of the most efficient feature extraction methods, convolution neural networks (CNN) has made its success in speech and music signal processing, and in particular, CNN with pooling has worked effectively in classifying environmental and urban sound sources. However, pooling causes information loss. In this paper, dilated CNN, being introduced to ESC problem, achieves better results than that of CNN with max-pooling and other state-of-the-art approaches. At the same time, we explore the effect of different dilation rate and the number of layers of dilated convolution to the experimental results, and find that expanding the number of covered frames or enlarging the dilation rate will make the accuracy reduce. That may be the sound signal has short-term stability, the size of the overlay frame seriously affects the feature extraction of the sound signal, and there is an inherent ‘‘gridding” in the dilation model conjunction defect.

Keywords :Sound information retrieval ,Environmental sound classification, Dilated convolutions

Environmental sound classification with dilated convolutions


Environmental sound classification with dilated convolutions.pdf

Thu Mar 14 10:37:00 CST 2019