Evaluating Proprietary and Open-Weight Large Language Models as Universal Decimal Classification Recommender Systems

Manual assignment of Universal Decimal Classification (UDC) codes is time-consuming and inconsistent as digital library collections expand. This study evaluates 17 large language models (LLMs) as UDC classification recommender systems, including ChatGPT variants (GPT-3.5, GPT-4o, and o1-mini), Claud...

Full description

Saved in:

Bibliographic Details
Main Authors:	Mladen Borovič, Eftimije Tomovski, Tom Li Dobnik, Sandi Majninger
Format:	Article
Language:	English
Published:	MDPI AG 2025-07-01
Series:	Applied Sciences
Subjects:	universal decimal classification large language models conversational systems recommender systems prompt engineering zero-shot classification
Online Access:	https://www.mdpi.com/2076-3417/15/14/7666
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Manual assignment of Universal Decimal Classification (UDC) codes is time-consuming and inconsistent as digital library collections expand. This study evaluates 17 large language models (LLMs) as UDC classification recommender systems, including ChatGPT variants (GPT-3.5, GPT-4o, and o1-mini), Claude models (3-Haiku and 3.5-Haiku), Gemini series (1.0-Pro, 1.5-Flash, and 2.0-Flash), and Llama, Gemma, Mixtral, and DeepSeek architectures. Models were evaluated zero-shot on 900 English and Slovenian academic theses manually classified by professional librarians. Classification prompts utilized the RISEN framework, with evaluation using Levenshtein and Jaro–Winkler similarity, and a novel adjusted hierarchical similarity metric capturing UDC’s faceted structure. Proprietary systems consistently outperformed open-weight alternatives by 5–10% across metrics. GPT-4o achieved the highest hierarchical alignment, while open-weight models showed progressive improvements but remained behind commercial systems. Performance was comparable between languages, demonstrating robust multilingual capabilities. The results indicate that LLM-powered recommender systems can enhance library classification workflows. Future research incorporating fine-tuning and retrieval-augmented approaches may enable fully automated, high-precision UDC assignment systems.
ISSN:	2076-3417

Evaluating Proprietary and Open-Weight Large Language Models as Universal Decimal Classification Recommender Systems

Similar Items