A dataset for classifying phrases and sentences into statements, questions, or exclamations based on sound pitchMendeley Data.

Speech is the most fundamental and sophisticated channel of human communication, and breakthroughs in Natural Language Processing (NLP) have substantially raised the quality of human-computer interaction. In particular, new wave of deep learning methods have significantly advanced human speech recog...

Full description

Saved in:
Bibliographic Details
Main Authors: Ayub Othman Abdulrahman, Shanga Ismail Othman, Gazo Badran Yasin, Meer Salam Ali
Format: Article
Language:English
Published: Elsevier 2025-08-01
Series:Data in Brief
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2352340925005530
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Speech is the most fundamental and sophisticated channel of human communication, and breakthroughs in Natural Language Processing (NLP) have substantially raised the quality of human-computer interaction. In particular, new wave of deep learning methods have significantly advanced human speech recognition by obtaining fine-grained acoustic cues including pitch, an acoustic feature that can be a critical ingredient in understanding communicative intent. Pitch variation is in particular important for prosodic classification tasks (i.e., statements, questions, and exclamations), which is crucial in tonal and low resource languages such as Kurdish, where intonation holds significant semantic information. This paper presents the dataset of the Statements, Questions, or Exclamations Based on Sound Pitch (SQEBSP) which contains 12,660 professionally-recorded speech audio clips by 431 native Kurdish speakers who reside in the Kurdistan Region of Iraq.Regarding utterances, 10 new phrases were articulated by each speaker per three prosodic categories: statements, questions, and exclamations. All utterances were digitized at 16 kHz and then manually checked for correctness concerning pitch-based classification. The dataset contains equal representation from all three classes, about 4200 samples per class, and metadata such as speaker gender, age group, and sentence identifiers.The original audio files, alongside resources like Mel-Frequency Cepstral Coefficients (MFCCs) and waveform visualizations, can be found on Mendeley Data. The dataset offered has significant advantages for formulating and testing pitch-based speech classification algorithms, furthers the work on pronunciation modelling for languages lacking sufficient resources. It furthermore, aids in developing speech technologies sensitive to dialects.
ISSN:2352-3409