Text this: Optimization and application of vision-based large models in educational scenarios