Text Analysis

lilymunnings — Fri, 27 May 2022 01:09:22 +0000

Overview

Text analysis, also known as, “text mining” is a computational method or technique used to extract large amounts of ‘unstructured’ data from documents and texts in their online forms (Reardon,2020). Many people who research using text analysis tools, use it to collect specific information from the texts they are studying. For example, text analysis can detect important phrases, patterns of words and word frequencies.

The term encompasses a wide range of tools and techniques that are practised in a range of research areas. Research in the Humanities and Social Sciences uses text analysis methods most consistently, however, the tool is gradually being used more in STEM with the analysis of metadata. This is because qualitative and ‘unstructured’ datasets occur more frequently within the Humanities.

Any form of written or transcribed text can be used as data for this method. The texts can range from social media posts to reviews to large novels. There is a range of publicly available datasets and collections of texts that can be filtered through text analysis tools. Project Gutenberg is a site that allows free access to literary texts before 1910 cut off for copyright. The British Nation Corpus, the Internet Movie Script Database, Scientific paper collections and Digital National Security Archive are all other forms of databases accessible to be used in text analysis. Links to all these datasets and more can be found on the website below:

https://onlinelibrary-wiley-com.virtual.anu.edu.au/doi/full/10.1111/cgf.12873

Process of text analysis

The first part of using this method for research is to acquire a dataset (discussed above). This data or text then needs to be put through a text analysis tool. The most common tools used in the humanities and social sciences for text analysis are “Voyant”, “MALLET”, Topic Modelling Tool, “WordSeer” and even “Wordle” (a popular word game site) can be used as a text analysis tool (Gupta, 2022).

Each of these platforms use the data from various texts and makes them into visualisations in the forms of word clouds, lists, graphs, tables, micro searches and many more. Some are more analytical than others, MALLET has a range of tools embedded and requires quite a lot of training and understanding of the tool, whereas Voyant is much easier to use and visualise data. Next, Once the data from the texts have been collected by the tools, researchers look at and analyse the data shown for themselves to aid their research. Often, the datasets collected from text analysis can be great starting points and used in other methodologies.

Text analysis does require researchers to have an extent of knowledge of the context of the text/texts studied. This is due to the fact that it is a quantitative study on ‘unstructured’ datasets. Texts, no matter what the form, are usually written in sentences that are less structured than, for example, a list of dates or names that are structured datasets. This makes text analysis a good aid to research, not a method to solely base research on. However, this is not necessarily a downside, as it requires researchers to bring their own backgrounds of research to the digital world and lens, making their study with text analysis tools one of collaboration with different disciplines, which is what Digital Humanities is all about.

Some key words/ phrases related to text mining:

Word frequencies: The number of times a word is written or occurs in the text.

An example of how this is used in literary research could be to see the amount of times words like she / her, they/them, he/him are used in a text to detect a specific gendered lens.

Word Clouds: Are a collection of the most frequent words in a text in a cloud visualisation. This offers a snapshot of the text in one visualisation.

Topic Modelling: is under the text analysis term but refers specifically to the grouping together of words or sentences relating to the same topic.

Distant reading: A method of collecting data from the text analysis and using it to group together certain bits of text or themes within a text in a quantitative way. It uses methods of literary analysis and mixes it with computational methods to read a collection of texts at a “distance” to try to see the bigger picture. The term was first coined by literary scholar Franco Moretti in 2000, however, versions of the method have gradually been used throughout literary history. The invention of the computer just sped the process up. (Underwood, 2017).

References

Gupta, Ravi. “Wordle -Vision: Simple Analytics To Up Your Wordle Game” , Towards Data Science, 2022. https://towardsdatascience.com/wordle-vision-simple-analytics-to-up-your-wordle-game-65daf4f1aa6f

Reardon, Jed. “Text Analysis: An Overview”, MethoDHology, 2020. https://metodhology.anu.edu.au/index.php/content/text-analysis/

Underwood, Ted. “A Genealogy of Distant Reading”, Digital Humanities Quarterly 11, no. 2 (2017).http://digitalhumanities.org/dhq/vol/11/2/000317/000317.html

Text Analysis: Methods, Assessment, and Experience

emiliastandish — Thu, 26 May 2022 03:05:44 +0000

Text analysis can be a very general term. It’s often used to describe computational tools that analyse text (Reardon, 2020). Though computational tools that analyse text in computational text analysis, or machine analysis, are prevalent, human text analysis has provided a fundamental basis. A comparison of the two, as well as a personal example of the use of text analysis tools, can assist in the understanding of why text analysis is so significant in the modern age.

Computational text analysis has become a far more widely used method of text analysis in the modern years. There are several benefits to computational text analysis. Using this technique, the root of a problem within both unstructured or structured data can be identified, trends and limits can be recognised, and digital experiences can be enhanced (Haije, 2019). In addition to these advantages, once the system behind the computational analysis has been trained to a sufficient level, the process becomes significantly efficient and quick (Haije, 2019).

In comparison to computational text analysis, human text analysis has been used in the past and is currently used either in addition to or to replace machine text analysis. The benefits to human text analysis include the ease of commencement. Once a topic and dictionary have been established, the reading and writing of annotations can begin almost immediately. In addition, the interpretations and capabilities of humans have been trained and influenced during our every-day life by all the encounters we experience. Humans also have the benefit of being able to interpret anomalies with a higher success rate, such as irony (Wonderflow, 2019).

Though human text analysis does display some benefits, there are also many limitations that make computational analysis more easily accessible in the modern age. Consistency is often lacking in human text analysis, especially without repeating the process, as humans often evaluate things differently based on their mood (Wonderflow, 2019). Human memory can also present a constraint on the competence and speed of human text analysis. Text analysis often involves many firm definitions and parameters, the ability to remember these terms can hinder the process (Wonderflow, 2019).. Additionally, in comparison to computational text analysis, human text analysis can be a slow method due to manual input.

With the benefits and limitations of computational and human text analysis in mind, I chose to document my own experiences with text analysis. Google Ngram Viewer is a tool that can be used to analyse terms used in literature and its relevance over time. I used this site to research the terms “anxiety” and “depression” over the years of 1800-2019. While the results showed a general increase, there was a peak in the use of the word “depression” in the 1930s. After the realisation that this was not related to mental health, and was instead referencing the Great Depression, one of Google Ngram’s complications became clear: context is not taken into account when analysing words. In addition, since Google Ngrams only documents written texts, much of the material from the world is unable to be assessed.

There are many tools on the internet that can provide basic computational text analysis. These instruments can be web-based applications, like voyant, or python-based, like Mallet. Either way, there are many ways to begin text analysis processes, and even more ways to enhance them.

References

Haije, E. G. (2019). What is Text Analytics? And why should I care? Retrieved May 23, 2022, from https://mopinion.com/what-is-text-analytics-benefits/

Reardon, J. (2020). “Text Analysis: An Overview”. METODHOLOGY. Retrieved May 23, 2022, from https://metodhology.anu.edu.au/index.php/content/text-analysis/

Wonderflow. (2019). What are the pros and cons of human text analysis – Part 2. Retrieved May 23, 2022, from https://www.wonderflow.ai/blog/what-are-the-pros-and-cons-of-human-text-analysis-part-2

Text Analysis – MetoDHology

Text Analysis

Text Analysis: Methods, Assessment, and Experience