Audio, Speech and Language Processing include speaker recognition, speaker diarization, speech synthesis, voice conversion, speech separation, key word spotting, speech recognition, speech enhancement, language identification, paralinguistic speech attribute recognition, etc. More than 100 top conference or journal papers have been published in this field. Collaboration with multiple industry leaders and local companies are ongoing in terms of collaborative research and technology transfer.
Multimodal Behavior Signal Analysis and Interpretation have ben conducted towards the AI assisted Autism Spectrum Disorder (ASD) diagnose and intervention. An AI studio is developed for the early screening of ASD. The studio’s four walls are programmable projection screens that can recreate a variety of settings, such as a forest environment, with sound delivered through multichannel audio equipment. The therapist can use the studio to interact with the child, such as asking him or her to point at a certain object projected onto the wall to observe their reaction. At the same time, cameras capture the movements of the child and the therapist, including gestures, gazes and other actions. The studio is equipped with more than 10 technologies that have obtained or are in the process of obtaining patents. These include technologies that assist with gaze detection, human pose estimation, face detection, face recognition; emotion recognition, speech recognition, speaker diarization, paralinguistic attribute detection and text understanding.
Hyperbolic neural networks have achieved considerable success in extracting representation from hierarchical or tree-like data. They have become an important tool for applications in natural language processing, recommendation systems and social network analysis. However, it is difficult to build hyperbolic neural networks with deep hyperbolic layers, no matter which coordinate system is used. Moreover, hyperbolic neural networks often lack expressivity or miss important local geometric information. This project studies novel methods and neural network architectures for extracting expressive and robust hyperbolic representations in a stable manner. Applications include molecular generation, network analysis and graph anomaly detection.
Machine learning (ML) and Deep neural networks (DNNs) have achieved great success in many applications. However, recent research investigations show that DNNs are vulnerable on small perturbations of input data or have low generalizations over new samples, making them less trustable to be applied in real scenarios. In this research, we aim to establish a systematic framework to build up robust ML models that enjoy excellent generalization abilities on future/unseen data. Both theoretical and practical explorations will be conducted from the perspectives of adversarial training and data augmentation. So far, our research has been published in premier ML journals/conferences including ICML, WWW, ICCV, NeurIPS, and AAAI.