A survey and evaluation of text-to-speech systems for the Tamil language

This survey provides a comprehensive review of existing Tamil Text-to-Speech (TTS) synthesis systems, synthesis approaches, evaluation approaches, and highlights state-of-the-art approaches and challenges in handling linguistic nuances. Voice-based interfaces are becoming part of life. Therefore, it...

Full description

Saved in:
Bibliographic Details
Main Authors: Ahrane Mahaganapathy, Kengatharaiyer Sarveswaran
Format: Article
Language:English
Published: Elsevier 2025-09-01
Series:Natural Language Processing Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2949719125000470
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This survey provides a comprehensive review of existing Tamil Text-to-Speech (TTS) synthesis systems, synthesis approaches, evaluation approaches, and highlights state-of-the-art approaches and challenges in handling linguistic nuances. Voice-based interfaces are becoming part of life. Therefore, it is import to have an expensive TTS system which can make human experience better. Tamil, with its rich linguistic features and diagnostic nature, presents significant challenges to speech synthesis. In addition to the survey, importantly this work proposes a perceptual evaluation framework which consists of expressiveness, low listening fatigue, and overall quality, in addition to traditional intelligibility and naturalness, dimensions to evaluate better human experience. This study also uses the Comparative Mean Opinion Score (CMOS) for the subjective evaluation instead of the Mean Opinion Score. A dataset for the evaluation was also carefully prepared and six widely used Tamil TTS systems were evaluated using Word Error Rate and the subjective evaluation was done using the proposed evaluation framework with the support of 30 evaluators. The reliability of the subjective evaluation is also assessed using Krippendorff’s Alpha. The results indicate the existing systems have significant room for improvement in all perceptual dimensions. The study underscores the need for evaluation datasets and evaluation approaches that cater to subjective perceptual dimensions of speech synthesis for better human experience and lays a foundation for future research and development in Tamil and similar TTS systems.
ISSN:2949-7191