An Analysis of the Training Data Impact for Domain-Adapted Tokenizer Performances—The Case of Serbian Legal Domain Adaptation

Various areas of natural language processing (NLP) have greatly benefited from the development of large language models in recent years. This research addresses the challenge of developing efficient tokenizers for transformer-based domain-specific language models. Tokenization efficiency within tran...

Full description

Saved in:
Bibliographic Details
Main Authors: Miloš Bogdanović, Milena Frtunić Gligorijević, Jelena Kocić, Leonid Stoimenov
Format: Article
Language:English
Published: MDPI AG 2025-07-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/13/7491
Tags: Add Tag
No Tags, Be the first to tag this record!