top of page
Keynote Speaker
keynote_edited.jpg
Dr. Jia-Wei Chang
Associate Professor in the Department of Computer Science and Information Engineering, National Taichung University of Science and Technology, Taiwan

Title of talk
Exploring Multimodal Learning and Generative AI from the Experiences of Natural Language, Image, and Audio Tasks
Bio
 

Dr. Jia-Wei Chang is an Associate Professor in the Department of Computer Science and Information Engineering at the National Taichung University of Science and Technology. He holds several prestigious positions, including Young Professionals Chair and Director of the Institution of Engineering and Technology (IET) - Taipei Network. Additionally, Dr. Chang has been a consultant at the NEXCOM Industry 4.0 Innovation Center since 2017 and at Mobagel Inc. since 2023, and an adjunct professor in the Department of Engineering Science at National Cheng Kung University since 2018.

 Before his academic career, Dr. Chang worked as a data scientist and project manager at IoT BU, Nexcom, from 2016 to 2017. He obtained his Ph.D. from the Department of Engineering Science at National Cheng Kung University in 2017. His research interests encompass natural language processing, the Internet of Things, artificial intelligence, data mining, and e-learning technologies.

 Dr. Chang has made significant contributions to the academic community, serving as a Guest Editor for several prestigious SCIE/SSCI journals, including Computer Science and Information Systems, Sensors, Sustainability, and the Journal of Internet Technology. Since 2023, he has held the position of Associate Editor at the International Journal of System Assurance Engineering and Management, which is indexed by ESCI. Additionally, he won the Outstanding Reviewer Award  from Engineering Applications of Artificial Intelligence in 2018.

In recent years, he has also served as a reviewer for many prestigious journals, such as Applied Soft Computing, Computer Communications, and Engineering Applications of Artificial Intelligence.

Abstract of talk

 

In the rapidly advancing field of artificial intelligence, multimodal learning, and generative AI have emerged as pivotal research and application areas. These technologies leverage multiple data modalities, such as text, images, and audio, to enhance AI's capabilities in recognition and generation tasks.

 Multimodal learning emphasizes AI systems' ability to recognize and understand information from various sources. For instance, in movies, AI can interpret the storyline by analyzing visual scenes, audio tracks, and textual subtitles. This capability enables comprehensive understanding and detailed explanations of video content. Integrating these data types enhances applications such as automated video summarization, content recommendation, and accessibility features. The core of multimodal learning lies in combining natural language processing (NLP) and computer vision (CV) and utilizing powerful encoder and decoder models.

 Generative AI focuses on creating new content from given prompts, such as text, images, or audio. This involves generating coherent narratives, realistic images, or meaningful audio clips based on input data. Generative AI models, like those based on Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), have shown remarkable success in various creative and practical applications. By leveraging these generative models, AI can produce high-quality content that mimics human creativity, opening new possibilities in content creation, design, and data augmentation.

 Advancements in multimodal learning and generative AI are driven by integrating NLP and CV, primarily through encoder-decoder architectures. These frameworks enable the seamless processing and synthesis of diverse data types. Notable models such as Transformers and Diffusion Models have been instrumental in this progress.

Finally, we will cover deep learning models' sustainability and ethical implications in multimodal learning and generative AI. This keynote provides a thorough overview of multimodal learning and generative AI, highlighting their current capabilities and future potential. By understanding the principles and applications of these technologies, attendees will be better equipped to leverage AI to create intelligent, versatile systems that can transform various industries.

bottom of page