Skip to content

EL-CL/CLD_Polysemous_Basic_Vocabulary-

Repository files navigation

CLD_Polysemous_Basic_Vocabulary

We present an open-source database of Cross-Linguistic polysemes for basic vocabularies (Cross Linguistic Database of Polysemous Basic Vocabulary) aimed at collecting common concepts and exploring their extended senses as comprehensively as possible. Our database comprises the meanings of 60 basic vocabularies across 12 semantic domains from 61 language varieties spanning 16 language families (including one language isolate), totaling 11,841 meanings of 3,736 entries. We delve into some classic concrete concept categories, such as physical entities, human entities, and body parts, as well as characteristic categories like qualities, quantity, and physical attributes, and other common or frequent domains, including motion verbs, pronouns, and numerals. We manually collected data from medium-sized dictionaries. This method enables us to gather intricate and multifaceted meanings while avoiding the need to distinguish between polysemous and homonymous words.

The Supplementary Material contains five files. The first is the Cross-Linguistic Database of Polysemous Basic Vocabulary. The second consists of dictionaries and their publication information and dates, which were used to collect word meanings. The column named "Number of Entries Contributed by Each Language" offers the quantity of dictionary entries contributed by each language. Some languages have fewer entries than the expected 60, primarily due to limitations imposed by the scale of available dictionaries, we were unable to collect entries corresponding to every concept. The third is a converted version of the Cross-Linguistic Database, used as input data for generating the Polysemous Semantic Networks. The converting method can be found in the section "Building the semantic network" in the main text. We have included a script named 'preparing_network_data.py' with the purpose of processing data from the Cross-Linguistic Database of Polysemous Basic Vocabulary for visualization in Gephi. The fourth file contains information on the modularity classification of nodes in Figure 2, indicating that the 60 concepts and their senses belong to 34 different communities. The fifth file contains information on the modularity classification of nodes in the filtered network shown in Figure 3, indicating that the 60 concepts and their senses belong to 10 communities.

If you use this database, please cite:

Liang, Y., Xu, K., & Ran, Q. (2024). Shared structure of fundamental human experience revealed by polysemy network of basic vocabularies across languages. Scientific Reports, 14, 5877. https://doi.org/10.1038/s41598-024-56571-8

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages