Linked Data, Linked Open Data, Semantic Web
Linked Data (LD) refers to information which is published in accordance to the rules of RDF, and using HTTP URIs. RDF is an abstract data model, which can be expressed through several different syntaxes. The most common of these are RDF/XML, JSON-LD, and Turtle.
Although often used interchangeably and synonymously, the three terms of “Linked Data” (LD), “Linked Open Data” (LOD), and the “Semantic Web” refer to distinct things. LD and LOD are conceptually (and technologically) similar: the former refers to information which has been published in a machine-parsable format, but might (or not) be contained in a machine which is not online, or in a closed online system, or behind a paywall, or within a site that requires the user to register in order to gain access. The latter adds the crucial adjective “Open” referring specifically to information available without restrictions or limitations to any online user or data consumer (be they human or software). This includes the use of non-proprietary file formats (publishing your data as a .CSV file, rather than a Microsoft Excel sheet), and under licences that enable others to access, consume, reuse, edit, and redistribute it (for example Creative Commons CC BY-SA 3.0). Both terms refer to a method of publication, a process, an act of doing.
The Semantic Web is also referred to as the Web Of Data, and in this respect differs distinctly from the current Web as we know it, which is a Web Of Documents (Képéklian et al. 2014). As implied by its name, the Web of Data is a form of the Web (utilising existing technologies and web architecture). The crucial difference here between the Web of Data and the Web of Documents is that rather than connecting vast numbers of individual pages (that is to say, Web documents), in the Web of Data, unique identifiers point to specific instances of data (people, places, events, etc.) and the relationships between them.
Jargon Busting
- RDF = RDF is a way of representing data at an abstract level. It is based on triples.
- Triples = The smallest unit of RDF is known as a triple, the nomenclature being indicative and representative of the tripartite construct of the unit: each triple has the same three parts (subject, predicate, and object).
- Ontology = ontologies are machine-readable documents written in RDF, which capture the data entities present in the dataset, as well as the relationships between these entities.
- RDF/XML, JSON-LD and Turtle = Arguably three of the most common syntaxes for expressing RDF.
Prerequisites for a Linked Data project
- A dataset or several datasets, captured either in a relational database, or a spreadsheet.
- An ontology, which captures the data categories in the dataset (such as people, places, events, etc).
- You might have several of each.
Creating an Ontology in Protege
Protege is is a free, open-source ontology editor from Stanford University. You can use it to create your own bespoke ontological structures, and to populate them with instances. Protege will let you export your ontology in any of several possible RDF syntaxes, but for the purposes of using Web-Karma to produce instance level RDF, you are most likely to succeed if you export your ontology in Turtle.
Producing RDF from a Spreadsheet
Web-Karma is an open and free tool from the University of Southern California. You can use it to map your dataset (this works well if your data is in a .CSV) to your ontology (which you may have downloaded from the Web, if you’re using an existing one, or made yourself using Protege).
Some cool Linked Data projects
- JazzCats: http://jazzcats.cdhr.anu.edu.au/
- Nomisma: http://nomisma.org/
- SotaSampo: https://www.sotasampo.fi/en/
References and things to read
- Nurmikko-Fuller, T. (2018). 11 Publishing Sumerian Literature on the Semantic Web. In CyberResearch on the Ancient Near East and Neighboring Regions (pp. 336-363). Brill.
- Nurmikko-Fuller, T., Bangert, D., Dix, A., Weigl, D., & Page, K. (2018). Building prototypes aggregating musicological datasets on the Semantic Web. Bibliothek Forschung und Praxis, 42(2), 206-221.
- DuCharme, B. (2013). Learning SPARQL: querying and updating with SPARQL 1.1. ” O’Reilly Media, Inc.”.
- Van Hooland, S., & Verborgh, R. (2014). Linked Data for Libraries, Archives and Museums: How to clean, link and publish your metadata. Facet publishing.
- Képéklian, Gabriel, Olivier Curé, and Laurent Bihanic. 2014. “From the Web of Documents to the Linked Data.” In European Business Intelligence Summer School, pp. 60-87. Springer, Cham.
- Blaney, J. (2017) “Introduction to the Principles of Linked Open Data“, The Programming Historian 6 , https://doi.org/10.46430/phen0068.
SUGGESTED CITATION
Terhi Nurmikko-Fuller, ‘Linked Data’, MetoDHology (2020), https://doi.org/00000000000.
Image from https://unsplash.com/photos/tSlvoSZK77c