Date: Sep 10, 2021

Time: 10:00 - 11:00

Location: Zoom 966 6459 3908

DSRC Seminar | New Challenges and Recent Progress on Speech Processing in A Cocktail Party

Although intelligent speech processing has been greatly advanced in research and widely used in many real-life applications, there still remains a large performance gap between controlled environments and real-life scenarios. One of the core problems in the real-world condition is known as the cocktail party problem. The cocktail party defines a complicated scenario where multiple talkers speak simultaneously with the presence of background noise and reverberation. It is easy for humans to attend to a target source of interest and recognize the speech in such conditions, but the mechanism behind this strong capability has not been well studied yet. In the past few decades, researchers have tried to develop algorithms for machines to mimic humans’ capability in the cocktail party scenario, but the performance is still far from satisfactory. In this talk, we will summarize recent progress and present our efforts on speech processing in the cocktail party problem, especially the new techniques on speech separation and automatic speech recognition (ASR) those developed in SJTU. Finally, we will discuss the new challenges and potential directions to solve the cocktail party problem.

Dr. Yanmin Qian is Associate Professor in Shanghai Jiao Tong University (SJTU), China. He received his PhD in the Department of Electronic Engineering from Tsinghua University, China in 2012. From 2013, he joined the Department of Computer Science and Engineering in Shanghai Jiao Tong University. From 2015 to 2016, he also worked as an Associate Research at the Speech Group in Cambridge University Engineering Department, UK. He was one of the key members to design and implement the Cambridge Multi-Genre Broadcast Speech Processing system, which won all four tasks of the first MGB Challenge in 2015. He is a senior member of IEEE and a member of ISCA, and one of the founding members of Kaldi Speech Recognition Toolkit. He has published more than 140 papers on speech and language processing with 8000+ citations, including T-ASLP, Speech Communication, ICASSP, INTERSPEECH and ASRU. His current research interests include the acoustic and language modeling in speech recognition, speaker and language recognition, speech separation and enhancement, natural language understanding, deep learning and multi-media signal processing.

Learn more about him at: