January 22, 2007
This paper introduces how human languages can be studied in light of recent development of network theories. There are two directions of exploration. One is to study networks existing in the language system. Various lexical networks can be built based on different relationships between words, being semantic or syntactic. Recent studies have shown that these lexical networks exhibit small-world and scale-free features. The other direction of exploration is to study networks of...
April 24, 2023
Interpreting natural language is an increasingly important task in computer algorithms due to the growing availability of unstructured textual data. Natural Language Processing (NLP) applications rely on semantic networks for structured knowledge representation. The fundamental properties of semantic networks must be taken into account when designing NLP algorithms, yet they remain to be structurally investigated. We study the properties of semantic networks from ConceptNet, ...
June 27, 2002
We define two words in a language to be connected if they express similar concepts. The network of connections among the many thousands of words that make up a language is important not only for the study of the structure and evolution of languages, but also for cognitive science. We study this issue quantitatively, by mapping out the conceptual network of the English language, with the connections being defined by the entries in a Thesaurus dictionary. We find that this netw...
June 5, 2008
We study the self-organization of the consonant inventories through a complex network approach. We observe that the distribution of occurrence as well as cooccurrence of the consonants across languages follow a power-law behavior. The co-occurrence network of consonants exhibits a high clustering coefficient. We propose four novel synthesis models for these networks (each of which is a refinement of the earlier) so as to successively match with higher accuracy (a) the above m...
February 2, 2011
In this paper we extract the topology of the semantic space in its encyclopedic acception, measuring the semantic flow between the different entries of the largest modern encyclopedia, Wikipedia, and thus creating a directed complex network of semantic flows. Notably at the percolation threshold the semantic space is characterised by scale-free behaviour at different levels of complexity and this relates the semantic space to a wide range of biological, social and linguistics...
October 18, 2013
In this paper, we explore complex network properties of word collocation networks (Ferret, 2002) from four different genres. Each document of a particular genre was converted into a network of words with word collocations as edges. We analyzed graphically and statistically how the global properties of these networks varied across different genres, and among different network types within the same genre. Our results indicate that the distributions of network properties are vis...
August 27, 2020
We study correlations between the structure and properties of a free association network of the English language, and solutions of psycholinguistic Remote Association Tests (RATs). We show that average hardness of individual RATs is largely determined by relative positions of test words (stimuli and response) on the free association network. We argue that the solution of RATs can be interpreted as a first passage search problem on a network whose vertices are words and links ...
March 5, 2018
Language can be described as a network of interacting objects with different qualitative properties and complexity. These networks include semantic, syntactic, or phonological levels and have been found to provide a new picture of language complexity and its evolution. A general approach considers language from an information theory perspective that incorporates a speaker, a hearer, and a noisy channel. The later is often encoded in a matrix connecting the signals used for co...
August 10, 2005
Complex network theory is used to investigate the structure of meaningful concepts in written texts of individual authors. Networks have been constructed after a two phase filtering, where words with less meaning contents are eliminated, and all remaining words are set to their canonical form, without any number, gender or time flexion. Each sentence in the text is added to the network as a clique. A large number of written texts have been scrutinized, and its found that text...
January 5, 2024
The review summarizes the main methodological concepts used in studying natural language from the perspective of complexity science and documents their applicability in identifying both universal and system-specific features of language in its written representation. Three main complexity-related research trends in quantitative linguistics are covered. The first part addresses the issue of word frequencies in texts and demonstrates that taking punctuation into consideration r...