A Semi-Automatic Framework for Practical Transcription of Foreign Person Names in Lithuanian

We present a semi-automatic framework for transcribing foreign personal names into Lithuanian, aimed at reducing pronunciation errors in text-to-speech systems. Focusing on noisy, web-crawled data, the pipeline combines rule-based filtering, morphological normalization, and manual stress annotation—...

Full description

Saved in:

Bibliographic Details
Main Authors:	Gailius Raškinis, Darius Amilevičius, Danguolė Kalinauskaitė, Artūras Mickus, Daiva Vitkutė-Adžgauskienė, Antanas Čenys, Tomas Krilavičius
Format:	Article
Language:	English
Published:	MDPI AG 2025-06-01
Series:	Mathematics
Subjects:	practical transcription character-level transduction sequence-to-sequence learning web-crawled data Lithuanian
Online Access:	https://www.mdpi.com/2227-7390/13/13/2107
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	We present a semi-automatic framework for transcribing foreign personal names into Lithuanian, aimed at reducing pronunciation errors in text-to-speech systems. Focusing on noisy, web-crawled data, the pipeline combines rule-based filtering, morphological normalization, and manual stress annotation—the only non-automated step—to generate training data for character-level transcription models. We evaluate three approaches: a weighted finite-state transducer (WFST), an LSTM-based sequence-to-sequence model with attention, and a Transformer model optimized for character transduction. Results show that word-pair models outperform single-word models, with the Transformer achieving the best performance (19.04% WER) on a cleaned and augmented dataset. Data augmentation via word order reversal proved effective, while combining single-word and word-pair training offered limited gains. Despite filtering, residual noise persists, with 54% of outputs showing some error, though only 11% were perceptually significant.
ISSN:	2227-7390

A Semi-Automatic Framework for Practical Transcription of Foreign Person Names in Lithuanian

Similar Items