Abstract:
The talk will first introduce the widely used cross image-text matching models like CLIP, and then present our research works on computer vision, which employs these pre-trained models for weakly supervised semantic segmentation. Approaches fusing text information to improve the performance of thyroid nodule segmentation, and precision of dental CBCT-based implant position prediction, will be followed. After introducing text-based face and image generation approaches, the latest works applying LLM (Large Language Model) for general CV tasks like object detection, captioning and VQA will be discussed.
Speaker Bio:
Linlin Shen is currently a Pengcheng Scholar Distinguished Professor at the School of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China. He is also an Honorary professor at the School of Computer Science, University of Nottingham, UK. He serves as the Deputy Director of National Engineering Lab of Big Data Computing Technology, Director of Computer Vision Institute, AI Research Center for Medical Image Analysis and Diagnosis and China-UK joint research lab for visual information processing. He also serves as the Co-Editor-in-Chief of the IET journal of Cognitive Computation and Systems and Associate Editor of Expert Systems with Applications. His research interests include deep learning, facial recognition, analysis/synthesis and medical image processing. Prof. Shen is listed as the “Most Cited Chinese Researchers” by Elsevier, and listed in a ranking of the “Top 2% Scientists in the World” by Stanford University. He received the Most Cited Paper Award from the Journal of Image and Vision Computing. His cell classification algorithms were the winners of the International Contest on Pattern Recognition Techniques for Indirect Immunofluorescence Images held by ICIP and ICPR.