Google Ngram Viewer – MetoDHology

The Google Ngram Viewer is an online search engine that charts the frequencies of searched word strings, using a yearly count of n-grams found in Google’s text corpora. In the context of humanities research, it is a useful tool for social linguistic research for both historical and contemporary context, as it possess the capacity for providing strong data visualization of comparative literary trends in accordance with the changing frequency of data string found within a given time frame, thus providing some tangible insight concerning comparative social trends that existed as proved by its protectory of frequency within literature at the time.

The term Ngram refers to a contiguous sequence of n items from a given sample of text collection, a concept used to predict language sequence based on the frequency of recurring elements based on probability.

Regarding the standard workflow of the examined online search tool, one simply enter the desired search strings and establish the desired time frame between the 1500s and 2019 AD, and the tool outputs finds in the form of a projection line chart, comparatively displaying the projection of Ngram occurrence frequency within the database. The presented data are relatively normalized in the form of published percentages within the respective categorised year. Further parameters settings are available as option to account for specific language sources, case sensitivities, form distinguishment and such. Advanced feature such as “wild cards” search enables further contextualization of data findings undefined by the original search string through comparative categorisation of Ngram sequences (like how the search string is sequentially structured within the sentence of occurrence), though this capacity is only limited to the immediate adjacent sequence.

The operation of said digital tool is facilitated by the multiple text corpora databases in tsv format, complied through continuous book scanning that is in update on regular basis. The quality of findings through said method (assuming the validity of search strings provided) is largely dependent on that of the corpora database, as well as the capacity of its “wild card” function when it comes to the contextualization of extracted data, the latter of which can be quite constraining in practice due to its imposed limitation.

In the context of digital humanities, the Ngram viewer is a clear example of data visualization that enables enhancement in the acquisition of societal knowledge, displaying changes in societal trends as evident by its relative prominence within literary publications at a given time. Its key strength lies in its operational efficiency as well as simplistic input process, allowing considerable degree of general accessibility of compacted datasets and presents it in an informative manner. A significant weakness, however, concerns its lack of capacity in the contextualisation of data presented, placing much of said responsibility on the quality of search string input. The tool itself is not advanced enough to factor language comprehension as part of its function, limiting its capacity to produce solid knowledge without the involvement of human interpretation.

Bibliography:

Basile, P., Caputo, A., Luisi, R., & Semeraro, G. (2016, December). Diachronic Analysis of the Italian Language exploiting Google Ngram. In CLiC-it/EVALITA.

Lin, Y., Michel, J. B., Lieberman, E. A., Orwant, J., Brockman, W., & Petrov, S. (2012, July). Syntactic annotations for the google books ngram corpus. In Proceedings of the ACL 2012 system demonstrations (pp. 169-174).

Zeng, R., & Greenfield, P. M. (2015). Cultural evolution over the last 40 years in China: Using the Google Ngram Viewer to study implications of social and political change for cultural values. International Journal of Psychology, 50(1), 47-55