Entropy and type-token ratio in gigaword corpora

There are different ways of measuring diversity in complex systems. In particular, in language, lexical diversity is characterized in terms of the type-token ratio and the word entropy. We here investigate both diversity metrics in six massive linguistic data sets in English, Spanish, and Turkish, c...

Full description

Saved in:
Bibliographic Details
Main Authors: Pablo Rosillo-Rodes, Maxi San Miguel, David Sánchez
Format: Article
Language:English
Published: American Physical Society 2025-07-01
Series:Physical Review Research
Online Access:http://doi.org/10.1103/rxxz-lk3n
Tags: Add Tag
No Tags, Be the first to tag this record!

Similar Items