Mixtec–Spanish Parallel Text Dataset for Language Technology Development

This article introduces a freely available Spanish–Mixtec parallel corpus designed to foster natural language processing (NLP) development for an indigenous language that remains digitally low-resourced. The dataset, comprising 14,587 sentence pairs, covers Mixtec variants from Guerrero (Tlacoachist...

Full description

Saved in:
Bibliographic Details
Main Authors: Hermilo Santiago-Benito, Diana-Margarita Córdova-Esparza, Juan Terven, Noé-Alejandro Castro-Sánchez, Teresa García-Ramirez, Julio-Alejandro Romero-González, José M. Álvarez-Alvarado
Format: Article
Language:English
Published: MDPI AG 2025-06-01
Series:Data
Subjects:
Online Access:https://www.mdpi.com/2306-5729/10/7/94
Tags: Add Tag
No Tags, Be the first to tag this record!