An Analysis of the Training Data Impact for Domain-Adapted Tokenizer Performances—The Case of Serbian Legal Domain Adaptation
Various areas of natural language processing (NLP) have greatly benefited from the development of large language models in recent years. This research addresses the challenge of developing efficient tokenizers for transformer-based domain-specific language models. Tokenization efficiency within tran...
Saved in:
Main Authors: | Miloš Bogdanović, Milena Frtunić Gligorijević, Jelena Kocić, Leonid Stoimenov |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2025-07-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/15/13/7491 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
-
EDUCATIONAL FUNCTIONS OF STUDENTS’ CREATIVE WORK IN PRIMARY SCHOOL SERBIAN LANGUAGE CLASSES: TEACHERS’ PERSPECTIVE
by: Iva Medojević, et al.
Published: (2025-06-01) -
THE SOCIOLINGUISTIC SITUATION IN PRESENT-DAY MONTENEGRO – SERBIAN STUDIES, MONTENEGRIN STUDIES
by: D. Bojović
Published: (2018-12-01) -
Artificial intelligence in foreign language teaching: Evaluating the reliability of large language models with a focus on Serbian as a foreign language
by: Danijela D. Vranješ, et al.
Published: (2025-07-01) -
SUPPLEMENTARY TEACHING IN THE SERBIAN LANGUAGE ABROAD: PROBLEMS AND EXPERIENCES
by: Jelena M. Jovanović
Published: (2025-06-01) -
Machine learning methods (tokenization) in marketing research
by: E. V. Ganebnykh, et al.
Published: (2024-06-01)