Abstract:
Image/video communication has two fundamental purposes: one is the visual content delivery that is closely related to the compression technology; and the other is the visual content understanding that relies on the efficient semantic analysis and inference. Over the past three decades, techniques for visual content compression and analytics grow exponentially but separately. Neural vision and brain studies suggest that our human visual system (HVS) is a super processing unit that can perform the compression and understanding simultaneously. Thus, we hypothesize that our HVS (from eye to brain) extracts compact (for compression) and discriminative (for analytics) features for subsequent decision making, by which we mimic the processing steps in our visual pathway and propose an end-to-end image/video coding method, then extend the solution including the consideration of the differences of retinal cells, and later shows that the compressed latent features can be used for high-level semantic understanding. Extensive simulations report the encouraging efficiency of the proposed method, leading the compression performance to the recently emerged H.266/VVC, and simultaneously offering the competitive accuracy for vision tasks without requiring the pixel decoding.
Bio:
Dr. Zhan Ma is a Full Professor in the School of Electronic Science and Engineering, China. He received his Ph.D. from the New York University in 2011, and his B.S. and M.S. from the Huazhong University of Science and Technology, China, in 2004 and 2006 respectively. From 2011 to 2014, he worked in several research labs in USA focusing on video coding standardization and codec ASIC design. Dr. Ma’s research focuses include computational imaging, and AI-based future video coding. Dr. Ma was awarded the National Science Fund for Excellent Young Scholars, 2018 PCM Best Paper Finalist, 2019 IEEE BTS Best Paper Award, 2020 IEEE MMSP Grand Challenge Best Image Coding Solution, and 2020 SPIE ICMV Camera Illumination Contest First Prize.