PROPERTIES OF LEXICAL NETWORKS BUILT ON NATURAL AND RANDOM TEXTS

We study experimentally and analyze phenomenologically the properties of lexical co-occurrence networks. Our linguistic networks are built on a natural text (NT) and a random text (RT), which has been obtained after randomizing the NT on the lexical level. Another subject of our interest is a random...

Full description

Saved in:
Bibliographic Details
Main Authors: Oleh Kushnir, A. Drebot, D. Ostrikov, O. Kravchuk
Format: Article
Language:English
Published: Ivan Franko National University of Lviv 2024-12-01
Series:Електроніка та інформаційні технології
Subjects:
Online Access:http://publications.lnu.edu.ua/collections/index.php/electronics/article/view/4575
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We study experimentally and analyze phenomenologically the properties of lexical co-occurrence networks. Our linguistic networks are built on a natural text (NT) and a random text (RT), which has been obtained after randomizing the NT on the lexical level. Another subject of our interest is a random-graph-type (RG) network having the same number of nodes as in the NT network and randomized links among those nodes. We consider non-weighted non-restricted networks where words are not filtered by their frequency and stop words are not removed. The main parameters of the above networks are calculated and compared with the data of lexical statistics obtained for the NT, RT and RG texts. The latter data is usually expressed by so-called Zipf, Pareto and Heaps laws. For this aim an additional ‘RG text’ has been built following form the RG network. Both of the NT and RT reveal well-known power-law word frequency distributions with heavy tails. On the contrary, the lexical statistics for the RG text is characterized by a nearly-logarithmic rank–frequency dependence and a thin-tail (approximately exponential) frequency distribution. The probability distributions for the degrees of nodes found for the NT and RT networks are close to each other. Moreover, the degree distributions for the NT (RT) and RG have respectively heavy (power-law) and thin (nearly exponential) tails. In other words, the NT and RT networks are scale-free, unlike our RG network. Moreover, this implies that a heavy (thin) tail of the degree distribution is a consequence of a heavy (thin) tail of the frequency distribution. The average clustering coefficient and path lengths for the networks built upon the NT and RT are very close to each other. Contrary to the RG network, the NT and RT networks are small worlds and their Walsh’ parameters measuring the small-worldliness are only 2 per cents different. Finally, we analyze a number of consequences of our empirical results and some data known from the literature. In particular, since the RT lacks any semantics or syntax, it would not be proper to associate the scale-free and small-world properties of the lexical networks to either semantics or syntax.
ISSN:2224-087X
2224-0888