Skip to content

Developing a linked open data version of Words Matter

Notifications You must be signed in to change notification settings

cultural-ai/wordsmatter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DOI

This work is described in the paper "A Knowledge Graph of Contentious Terminology for Inclusive Representation of Cultural Heritage"

A knowledge graph of contentious terminology

An RDF knowledge graph of contentious terms from the cultural sector is based on the publication "Words Matter" published by the Dutch National Museum of World Cultures.

"Words Matter" provides English and Dutch glossaries of problematic terms often found in museum databases. These glossaries include terms "that are sensitive to particular groups, that can cause offense, that elide important context, and that are understood as derogatory" (as it is explained in the publication). We call such terms "contentious".

In the knowledge graph, 75 English and 83 Dutch contentious terms are linked to explanantions on their usage and suggested alternatives from domain experts. For example, using a SPARQL-query, one could answer "What is an appropriate alternative for the term 'Slave' (when it is used to describe people in slavery in the cultural heritage context)?".

The Jupyter notebook competency_questions.ipynb demonstrates what kind of information it is possible to retrive from the knowledge graph.

The knowledge graph concept scheme with 2 custom classes and 6 custom properties is presented below on the diagram:

The knowledge graph concept scheme

The developed knowledge graph has been given persistent W3id.org URIs, documented according to FAIR practices (the documentaion properties are included in the .ttl files), and made openly available for reuse with the license CC BY-SA 4.0. The HTML-version of the scheme documentation was generated using WIDOCO:

java -jar java-11-widoco-1.4.17-jar-with-dependencies.jar -ontFile schema.ttl -uniteSections -ignoreIndividuals -getOntologyMetadata -htaccess -rewriteAll -outFolder docs

The placeholder text generated by the tool for the Introduction, Description, and References sections was edited in index.html.

To access the knowledge graph, use https://w3id.org/culco/wordsmatter/. The concept scheme is available at https://w3id.org/culco#

We have also made the knowledge graph available on TriplyDB

Related matches

The knowledge graph has URIs of contentious labels, which we link to four LOD-resources: controlled vocabularies used by cultural heritage institutions (Wereldculturen Thesaurus (NMVW) and Getty AAT) and commonly used LOD-resources (Wikidata and Princeton WordNet 3.1). NMVW is a thesaurus of the Dutch National Museum of World Cultures, which published the original glossaries.

The process of identifying related matches:
  1. Collecting a list of query terms for every contentious label:
  1. Querying LOD-resources: – Wikidata, Getty AAT, Princeton WordNet were queried using their web-interface – Querying NMVW: see the directory NMVW

  2. Selecting related matched in every resource based on guidelines: – all related matches rep resource per contentious label: rm.csv, rm.json – synsets from Princeton WordNet 3.1 are mapped to PWNIDs: synset2pwnid_mappings.json

  3. Getting literal values of related matches in every resource – see LODlitParser; [Updated on 14.09.2023]: see the new version of LODlit package; – the resulting files: aat_rm_en.json, aat_rm_nl.json, wikidata_rm_en.json, wikidata_rm_nl.json, pwn_rm.json, nmvw_rm.json.

Contentious labels are linked to the URIs of their related matches with the property skos:relatedMatch.

Citation

Nesterov, A., Hollink, L., van Erp, M., van Ossenbruggen, J. (2023). A Knowledge Graph of Contentious Terminology for Inclusive Representation of Cultural Heritage. In: , et al. The Semantic Web. ESWC 2023. Lecture Notes in Computer Science, vol 13870. Springer, Cham. https://doi.org/10.1007/978-3-031-33455-9_30

Download BIB