Optimization and application of vision-based large models in educational scenarios

With the rapid advancement of artificial intelligence technology, LLMs have achieved significant success across various fields. However, their application in the field of education domain still faces challenges such as difficulties in processing multimodal data, insufficient response accuracy, and l...

Full description

Saved in:

Bibliographic Details
Main Authors:	XU Yuepeng, XU Chaidi, GUO Jinjun, JIANG Yunqiao, WANG Shijia, LIU Yao
Format:	Article
Language:	Chinese
Published:	China InfoCom Media Group 2025-01-01
Series:	大数据
Subjects:	large language model multimodal smart education RAG technology
Online Access:	http://www.j-bigdataresearch.com.cn/zh/article/111999691/
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	With the rapid advancement of artificial intelligence technology, LLMs have achieved significant success across various fields. However, their application in the field of education domain still faces challenges such as difficulties in processing multimodal data, insufficient response accuracy, and limited information delivery methods. To address these issues, a VELM was proposed. VELM was trained on multimodal public educational datasets and specialized educational datasets, and combined with model optimization techniques, VELM not only enhances response quality in educational scenarios but also optimizes and reduces computational resource consumption. Additionally, RAG technology was utilized to ensure accuracy and richness in generated content. In terms of deployment and application, VELM was implemented through the Dify platform, enabling flexible multi-end deployment, including WeChat mini programs, Web cloud platforms, and localized deployment, meeting the diverse needs of different educational scenarios. Evaluation experiments demonstrated that VELM significantly outperformed open-source large models such as MiniCPM-V, DeepSeek-VL, and Yi-VL on standard benchmark datasets like Mathvista, OCRBench, and MMMU. On specialized educational evaluation datasets, the accuracy of VELM was improved by 9.78% compared to the base model Qwen2-VL.
ISSN:	2096-0271

Optimization and application of vision-based large models in educational scenarios

Similar Items