It is often commented that 80% of the work of data science is data cleaning, while only 20% is analysis (Browne-Anderson, 2018). Despite this, the actual contents of what data cleaning entails is largely obscured, often dismissed as a tedious and laboursome yet necessary exercise (Rawson and Muñoz, 2019). While
The Google Ngram Viewer is an online search engine that charts the frequencies of searched word strings, using a yearly count of n-grams found in Google’s text corpora. In the context of humanities research, it is a useful tool for social linguistic research for both historical and contemporary context, as
A wide range of humanities data can be analysed, including text (from literature, newspapers and social media), images (from art history and traditional or social media) and material culture (from representations of artefacts, to ethnographic reports of their creation). The creation of structured data in the form of data tables, with features