Forum Review|Nanjing University Lakeside Large Model Forum

发布者:汤靖玲发布时间:2025-01-13浏览次数:10

On January 4, 2025, the “Nanjing University Lakeside Large Model Forum,” organized by the School of Intelligence Science and Technology at Nanjing University, was successfully held at Room E105 of Nanyong Building, Suzhou Campus, Nanjing University. The forum aimed to explore the applications and development trends of cutting-edge large model technologies, attracting numerous professionals from the industry.




 The forum featured prominent experts and scholars, including Professor Jian Yu from Beijing Jiaotong University, Professor Xin Zhao from Renmin University of China, Young Scientist Wenhai Wang from the Shanghai Artificial Intelligence Laboratory, Assistant Professor Chao Huang from the University of Hong Kong, Senior Researcher Ke Li from Tencent, Assistant Professor Li Yuan from Peking University, Associate Professor Yaxing Wang from Nankai University, Researcher Yuxuan Zhang from Zhipu AI, and Researcher Zonghao Guo from Mianbi Intelligence.

The forum commenced with an opening speech by Professor CaiFeng Shan, Vice Dean of the School of Intelligence Science and Technology, and was co-hosted by Assistant Professor Chaoyou Fu and Jing Huo from the Schools of Intelligence Science and Technology and the School of Computer Science, respectively. In the opening speech, Professor CaiFeng Shan emphasized that large model technology is driving a new wave of transformation and opportunity in the field of artificial intelligence. The forum not only serves as a platform for communication and collaboration among industry professionals but is also expected to promote further development in the large model industry.





Professor Xin Zhao, in his presentation titled “Exploring Slow Thinking Methods Based on Large Language Models” discussed the limitations of large models in the current environment. He highlighted the distinction between fast and slow thinking in large models and illustrated the exploration of slow thinking methods through practical cases, such as tree search strategies and model design. Professor Zhao also analyzed key issues in the field of slow thinking, offering insightful ideas to further research in this area.



In the talk, “Technical Evolution and Application Exploration of the InternVL Multimodal Large Model,” Wenhai Wang explained the fundamental implementation paradigms of multimodal large models. He showcased the development journey of visual-language models from inception to realization and demonstrated the continuous improvement and optimization processes of a high-performance multimodal large model.

 Assistant Professor Chao Huang, in his report titled “When Graph Data Meets Large Language Models,” emphasized that graph data better captures the relationships between different entities. He introduced the GraphGPT model, which equips large language models with the ability to understand graph data. He also discussed how the MoE architecture design is better suited for graph data models and highlighted the performance improvements achieved through LightRAG technology. Additionally, he presented the GraphAgent product as an application of these advancements.




Ke Li, in his presentation titled “A New Paradigm for Workflow Execution with Large Models,” provided a clear introduction to the concept of workflows and discussed the challenges associated with integrating workflows into large models. He explored innovative representations of workflows through the FlowAgent model and introduced a new PDL language that combines the flexibility of natural language with the rigor of procedural logic. Additionally, he showcased the model's outstanding performance using an automated evaluation framework, offering new perspectives on implementing large models in practical applications.


In his talk, “Can Video Generation Based on Diffusion Models Achieve a Visual World Model?” Li Yuan analyzed the differences between visual generation and understanding and explored ways to unify the two within a single framework. To achieve this goal, he presented the Open-Sora Plan, an open-source project for video generation, and shared his insights into the exploration of visual world models. By delving into key issues, Li Yuan provided clear directions and practical references for research on unified large models.




Yaxing Wang, in his report titled “Thoughts on Text and Image Representations in Text-to-Image Models,” highlighted three recent representative papers focusing on the acceleration and optimization of diffusion models. The first paper proposed semantic binding optimization, effectively addressing the issue of attribute confusion in multi-object scenarios under text prompts, significantly improving generation quality. The second paper focused on negative target suppression, achieving precise exclusion of undesired targets by optimizing text embeddings without requiring model fine-tuning. The third paper introduced an in-depth analysis of encoder characteristics, proposing encoder propagation and parallel processing mechanisms, which enhanced inference speed by over 40% while maintaining high-quality outputs. These studies not only overcome efficiency bottlenecks in diffusion models but also provide new directions for practical applications.




Yuxuan Zhang, in his presentation titled “Zhipu Open-Source Large Models,” introduced the open-source achievements and latest progress of Zhipu AI. The report covered the CogVideo series of video generation models, highlighting breakthroughs in inference acceleration with CogVideoX1.5, as well as the low-cost fine-tuning and customization support of CogVideoX. In the field of image generation, he introduced the latest applications of the CogView series. For multimodal visual understanding models, he presented CogAgent, which supports GUI operations, and CogVLM, which achieved a leap in understanding from images to videos. Finally, he provided a detailed explanation of Zhipu AI's open-source ecosystem, including workflow standards and international ecosystem adaptation, offering new directions for the development of the open-source community.

Zonghao Guo, in his report titled “MiniCPM-V: Moving Toward GPT-4V-Level Edge Multimodal Large Models,” systematically elaborated on the development history and scientific value of multimodal large models, focusing on the key technologies and applications of MiniCPM-V, an efficient edge-side multimodal large model. MiniCPM-V made breakthroughs in efficient model architecture, training methods, and high-quality data construction: it adopted a high-resolution visual encoding framework and conducted an in-depth analysis of systematic flaws in GPT-4V's visual encoding. The model overcame challenges in label ambiguity and learning efficiency in multimodal feedback data optimization and multilingual generalization. It achieved joint understanding of multiple images, multimodal context learning, and video understanding capabilities, demonstrating excellent performance in evaluations such as OpenCompass. He also discussed the practical applications of MiniCPM-V and the growing community interest, providing clear directions for the future development of multimodal models.



    The forum provided new directions for the technological development of large models and further promoted in-depth exchanges and cooperation among industry professionals. Attendees actively participated in discussions, making progress through thought-provoking exchanges. The event concluded successfully at 5:00 PM on January 4th.