Abstract:
When recording speech signal with microphone array, speech enhancement techniques aim to suppress interference signals, such as noise, reverberation and interfering speakers, from the recordings. The key of speech enhancement is to leverage spectral and/or spatial information to discriminate between speech and interferences. Spectral information refers to the signal’s spectral pattern, which has been extensively exploited in the era of deep-learning-based speech enhancement. Spatial information refers to the information of signal propagation and sound field. Current research in the field mainly focuses on integrating spectral deep learning techniques with conventional linear spatial filters, e.g. beamforming. This talk will present our recent works about how to use neural networks directly learning spatial information, and to perform end-to-end speech enhancement. According to the narrow-band and cross-band formulation of spatial information, we have designed a series of narrow-band and cross-band networks. These networks can be somehow interpreted as non-linear spatial filters and have shown clear performance superiority over the linear filters.
Biography:
Dr. Xiaofei Li joined Westlake University since Mar. 2020, as an assistant professor. Before this, he worked in the PERCEPTION team at INRIA Grenoble Rhône-Alpes, France, as a post-doctoral researcher from Feb. 2014 to Jan. 2016, and as a starting research scientist from Feb. 2016 to Dec. 2019. He got his Ph.D. degree in Electronics from Peking University, in 2013. He works on acoustic, audio and speech signal processing. Specific research topics include speech enhancement, sound source localization, audio/speech semi-supervised and self-supervised learning, etc.
