Student Writing – MetoDHology https://metodhology.anu.edu.au A resource developed by the Centre for Digital Humanities Research at the Australian National University, Sat, 18 Jun 2022 04:27:13 +0000 en-US hourly 1 https://wordpress.org/?v=6.8.1 https://metodhology.anu.edu.au/wp-content/uploads/2020/06/cropped-DH_favicon_icon-32x32.png Student Writing – MetoDHology https://metodhology.anu.edu.au 32 32 Play, Experimentation and Collaboration – Reconciling Ecological Conservation through Video Games https://metodhology.anu.edu.au/index.php/2022/05/27/play-experimentation-and-collaboration-reconciling-ecological-conservation-through-video-games/ https://metodhology.anu.edu.au/index.php/2022/05/27/play-experimentation-and-collaboration-reconciling-ecological-conservation-through-video-games/#respond Fri, 27 May 2022 07:37:44 +0000 https://metodhology.anu.edu.au/?p=2717 Video games are most often thought of as a source of entertainment, and are often scrutinised for unrealistic, fictionalised portrayals of history, culture, and technology. However, in recent years, as video gaming becomes more ubiquitous, accessible, and diverse in its methods, it becomes crucial for us as students of digital humanities and as the average video game enjoyer to not only understand, but also critique how we play video games. As the nature and medium of video games evolve and adapt, so too does our own awareness and role in becoming more than just a consumer.

As per the digital literacy/fluency dichotomy introduced by Champion et al. (2015), one needs to understand the game designer’s own digital fluency of how they are able to portray their expertise in a medium that is also, able to be literate and understood by the user/player. There is also the issue of when we learn more about history and culture through video games, we need to also gauge whether in that design process and understanding, we are looking at an accurate representation of history or merely a simulated and alternate form that may not teach us anything at all or make use of lessons that have any implications apart from entertainment and enjoyment. Since games are discursive and performative, we form and answer questions about characters and understand actual historical figures through games.

For example, on the depictions of mythology within the games God of War, Hades, and Okami, all of which are critically acclaimed games for entertainment, the argument is that outside of providing a basic outline of characters and plots, they lack the pedagogical design in advancing an understanding about mythology beyond curiosity and a push to learn more. Contrarily, some games are able to reconcile the rift between pedagogy and entertainment, such as Red Dead Redemption and Pokémon Go, on the topic of ecological issues such as conservation.

Video Thumbnail of the Naturalist obtained from: https://www.rockstargames.com/newswire/article/89k8a554551o78/Red-Dead-Online-The-Naturalist-Now-Available, Rockstar Games (2020)

Crowley et al. (2021) has shown that the “realistic” wildlife ecology in Red Dead Redemption 2 was designed in such a way that players who played as the Naturalist class (whose core gameplay quest revolves around either hunting or conserving the ecosystem) within the game found that they were more aware of their ecological impact not just within the game, but also in real life. Contrarily, Pergams and Zaradic (2006) have previously argued that this may not be effective; the growing downward trend of national park attendance with the correlation of increasing electronic media usage has been a cause for concern. However, due to the ever-growing adaptation and evolution of the video game medium, the work shown by Crowley et al. prove that with time, and ecologically conscious game design philosophies, games can have an impact on mobilising individuals, even inadvertently in the case of Pokemon Go.

Dorward et al. (2017) recognise that although apps such as Pokémon Go have incentivised outdoor activity and movement and forced an understanding of species habitat preferences, there is a fundamental disconnect in the game design philosophy of Pokémon Go, between the conservation of real life animal species, and capturing and battling of Pokémon obtained. It is not to say that Pokémon as a series concept is fundamentally flawed; it draws on the creator’s own childhood experiences of catching and recording various insect species, a crucial activity for entomologists and conservationists alike. As Balmford et al. (2002) have posited, children are able and aware of recognising different species, but have better results in differentiating Pokémon (the First Generation, 151).

Although gameplay wise, Red Dead Redemption and Pokemon Go offer two varying experiences, they both have addressed some semblance of learning more about ecological issues through their medium. The main conclusion from this is to gauge whether gamifying ecological concepts lessens their impact, and to figure out a way that can reconcile the disconnect between a game’s design that is entertainment and for pedagogy. Can we learn anything from any of these video game mediums and processes?

Further Readings and References

  • Crowley, E. J., Silk, M. J., & Crowley, S. L. (2021). The educational value of virtual ecologies in Red Dead Redemption 2. People and Nature, 3, 1229–1243. https://doi.org/10.1002/pan3.10242
  • Champion, Erik, Marilyn Deegan, Lorna M. Hughes, Yehuda Kalay, Professor Andrew Prescott, and Mr Harold Short. Critical Gaming: Interactive History and Virtual Heritage. Digital Research in the Arts and Humanities. Farnham: Taylor and Francis, 2015. https://library.anu.edu.au/record=b4897627
  • Balmford, Andrew & Clegg, Lizzie & Coulson, Tim & Taylor, Jennie. (2002). Why Conservationists Should Heed Pokémon. Science (New York, N.Y.). 295. 2367. 10.1126/science.295.5564.2367b.
  • Dorward, Leejiah & Mittermeier, John & Sandbrook, Chris & Spooner, Fiona. (2016). Pokémon Go: Benefits, Costs, and Lessons for the Conservation Movement. Conservation Letters. 10. 10.1111/conl.12326.
  • Oliver R.W. Pergams; Patricia A. Zaradic (2006). Is love of nature in the US becoming love of electronic media? 16-year downtrend in national park visits explained by watching movies, playing video games, internet use, and oil prices. , 80(4), 387–393. doi:10.1016/j.jenvman.2006.02.001
  • The Naturalist, Video Thumbnail of the Naturalist obtained from: https://www.rockstargames.com/newswire/article/89k8a554551o78/Red-Dead-Online-The-Naturalist-Now-Available, Rockstar Games (2020)
]]>
https://metodhology.anu.edu.au/index.php/2022/05/27/play-experimentation-and-collaboration-reconciling-ecological-conservation-through-video-games/feed/ 0
Getting, Making and Clean Data: A Practical Experience https://metodhology.anu.edu.au/index.php/2022/05/27/getting-making-and-clean-data-a-practical-experience/ https://metodhology.anu.edu.au/index.php/2022/05/27/getting-making-and-clean-data-a-practical-experience/#respond Fri, 27 May 2022 06:29:01 +0000 https://metodhology.anu.edu.au/?p=2712  

This image has an empty alt attribute; its file name is 11.jpeg

The following are some introductions and evaluations of my personal experience in exploring how to collect data from different sources and how to clean different types of data.

Text is an important information carrier for human beings (Dragulanescu,2002; Yang et al.,2018), and text analysis is an important tool for understanding and analysing the activities of human interaction (Bernard and Ryan,1998).

I mainly try to collect textual data from two sources, online platforms and manually by myself. Collecting data online is interesting and important, as people are willing and used to express and exchange their opinions and attitudes on the internet/social media platforms. These data can be considered as useful and UpToDate information that reveals the public voice and is a vane of social development. For example, collecting posts on Reddit/Twitter about the Australian election allows you to analyse people’s attitudes towards candidates, predict trends in election results, and visualize online social networks between users. Additionally, online data is mostly a resource open to the public and can be accessed by anyone. Gathering information from the web can be easy, with both Twitter and Reddit offering free APIs for users to collect posts. Collecting data from online platforms can also be complex, depending on the platform’s security and privacy policies. For example, Meituan, the largest takeaway software in China, allows the usage of a web crawler to collect data only if an official approval is granted, and the collector account is disabled after a certain number of reviews have been collected. Programming skills are required.

Collecting your own dataset can be time-consuming and cumbersome to document, just like requiring official ethics approval for human involved research, but it can give you the most relevant data for your experimental purposes. I created my own dataset, consisting of 20 interviews in which participants were asked to verbalize every thought they had during the simulated food ordering process, such as the reasons for choosing or not choosing this restaurant. Each interview was recorded and transcribed into text form by me.

Data is simply a collection of different facts. There is not only structured data like numbers but also unstructured data in the form of text, images, audio, video, etc. (Feldman and Sanger, 2007). Since unstructured data cannot be used directly for research, we need to convert unstructured data into structured data so that it can be understood and processed by computers.

Data cleaning deals with detecting and eliminating errors, inconsistencies and unanalysable parts of the data to improve the quality of the data. (Ilyas and Chu,2019; Rahm and Do, 2000). I taught myself how to use Python, and the more challenging thing is to find the right list of stop words, which are a set of common words in a language that carries little useful information and needs to be eliminated. Examples of stop words in English are ‘a’, ‘the’, ‘is’, ‘are’, etc. And different languages or topics require different stop word lists, so if you can’t find a suitable one you need to customize one. Similarly, when it comes to semantic analysis, it is difficult to find a suitable sentiment dictionary or pre-trained model. This is because the semantic definitions of words are different in different languages or topics, and there are proper nouns in different contexts. What I did was compare and explore the accuracy of different stop word lists and different semantic dictionary on my food delivery review dataset. So far, with the Naive Bayesian algorithm, I train my own dataset and get a customised model, surprisingly, the accuracy is around 94%.

For the interview, to know which factors can actually influence purchase intention. I converted the written records into numbers and marked key factors mentioned as 1 and those not mentioned as 0. I find the considerable challenge was prioritizing the factors that people mentioned. For example, a participant might say that, in the beginning, he/she wants to filter restaurants in order of distance, which makes him/her make the following decisions based on distance preferences. Or the participant verbally emphasized the decisive role of specific factors, such as personal preference. In these cases, giving more weight to the influence of these factors is needed. However, the importance of the factor that people verbally express may differ from their behavioural performance. Therefore, it is difficult to set a standard and there may be subjective bias when I convert this information into numbers.

Reference

Bernard, H.R. and Ryan, G., 1998. Text analysis. Handbook of methods in cultural anthropology613.

Dragulanescu, N.G., 2002. Website quality evaluations: Criteria and tools. The international information & library review34(3), pp.247-254.

Ilyas, I.F. and Chu, X., 2019. Data cleaning. Morgan & Claypool.

Feldman, R. and Sanger, J., 2007. The text mining handbook: advanced approaches in analyzing unstructured data. Cambridge university press.

Rahm, E. and Do, H.H., 2000. Data cleaning: Problems and current approaches. IEEE Data Eng. Bull.23(4), pp.3-13.

Yang, Z., Zhang, P., Jiang, M., Huang, Y. and Zhang, Y.J., 2018, June. Rits: Real-time interactive text steganography based on automatic dialogue model. In International Conference on Cloud Computing and Security (pp. 253-264). Springer, Cham.

]]>
https://metodhology.anu.edu.au/index.php/2022/05/27/getting-making-and-clean-data-a-practical-experience/feed/ 0
Text Analysis https://metodhology.anu.edu.au/index.php/2022/05/27/text-analysis/ https://metodhology.anu.edu.au/index.php/2022/05/27/text-analysis/#respond Fri, 27 May 2022 01:09:22 +0000 https://metodhology.anu.edu.au/?p=2697
O
verview 

Text analysis, also known as, “text mining” is a computational method or technique used to extract large amounts of ‘unstructured’ data from documents and texts in their online forms (Reardon,2020). Many people who research using text analysis tools, use it to collect specific information from the texts they are studying. For example, text analysis can detect important phrases, patterns of words and word frequencies. 

The term encompasses a wide range of tools and techniques that are practised in a range of research areas. Research in the Humanities and Social Sciences uses text analysis methods most consistently, however, the tool is gradually being used more in STEM with the analysis of metadata. This is because qualitative and ‘unstructured’ datasets occur more frequently within the Humanities.   

Any form of written or transcribed text can be used as data for this method. The texts can range from social media posts to reviews to large novels. There is a range of publicly available datasets and collections of texts that can be filtered through text analysis tools. Project Gutenberg is a site that allows free access to literary texts before 1910 cut off for copyright. The British Nation Corpus, the Internet Movie Script Database, Scientific paper collections and Digital National Security Archive are all other forms of databases accessible to be used in text analysis. Links to all these datasets and more can be found on the website below: 

https://onlinelibrary-wiley-com.virtual.anu.edu.au/doi/full/10.1111/cgf.12873

Process of text analysis

The first part of using this method for research is to acquire a dataset (discussed above). This data or text then needs to be put through a text analysis tool. The most common tools used in the humanities and social sciences for text analysis are “Voyant”, “MALLET”, Topic Modelling Tool, “WordSeer” and even “Wordle” (a popular word game site) can be used as a text analysis tool (Gupta, 2022).  

Each of these platforms use the data from various texts and makes them into visualisations in the forms of word clouds, lists, graphs, tables, micro searches and many more. Some are more analytical than others, MALLET has a range of tools embedded and requires quite a lot of training and understanding of the tool, whereas Voyant is much easier to use and visualise data. Next, Once the data from the texts have been collected by the tools, researchers look at and analyse the data shown for themselves to aid their research. Often, the datasets collected from text analysis can be great starting points and used in other methodologies. 

Text analysis does require researchers to have an extent of knowledge of the context of the text/texts studied. This is due to the fact that it is a quantitative study on ‘unstructured’ datasets. Texts, no matter what the form, are usually written in sentences that are less structured than, for example, a list of dates or names that are structured datasets. This makes text analysis a good aid to research, not a method to solely base research on. However, this is not necessarily a downside, as it requires researchers to bring their own backgrounds of research to the digital world and lens, making their study with text analysis tools one of collaboration with different disciplines, which is what Digital Humanities is all about. 

Some key words/ phrases related to text mining:

Word frequencies: The number of times a word is written or occurs in the text. 

An example of how this is used in literary research could be to see the amount of times words like she / her, they/them, he/him are used in a text to detect a specific gendered lens. 

Word Clouds: Are a collection of the most frequent words in a text in a cloud visualisation. This offers a snapshot of the text in one visualisation.  

Topic Modelling: is under the text analysis term but refers specifically to the grouping together of words or sentences relating to the same topic. 

Distant reading: A method of collecting data from the text analysis and using it to group together certain bits of text or themes within a text in a quantitative way. It uses methods of literary analysis and mixes it with computational methods to read a collection of texts at a “distance” to try to see the bigger picture. The term was first coined by literary scholar Franco Moretti in 2000, however, versions of the method have gradually been used throughout literary history. The invention of the computer just sped the process up. (Underwood, 2017).

References

Gupta, Ravi.  “Wordle -Vision: Simple Analytics To Up Your Wordle Game” , Towards Data Science, 2022. https://towardsdatascience.com/wordle-vision-simple-analytics-to-up-your-wordle-game-65daf4f1aa6f

Reardon, Jed. “Text Analysis: An Overview”, MethoDHology, 2020. https://metodhology.anu.edu.au/index.php/content/text-analysis/

Underwood, Ted. “A Genealogy of Distant Reading”, Digital Humanities Quarterly 11, no. 2 (2017).http://digitalhumanities.org/dhq/vol/11/2/000317/000317.html

]]>
https://metodhology.anu.edu.au/index.php/2022/05/27/text-analysis/feed/ 0
Cultural Categories and Data Cleaning https://metodhology.anu.edu.au/index.php/2022/05/27/cultural-categories-and-data-cleaning/ https://metodhology.anu.edu.au/index.php/2022/05/27/cultural-categories-and-data-cleaning/#respond Fri, 27 May 2022 00:27:02 +0000 https://metodhology.anu.edu.au/?p=2694 It is often commented that 80% of the work of data science is data cleaning, while only 20% is analysis (Browne-Anderson, 2018). Despite this, the actual contents of what data cleaning entails is largely obscured, often dismissed as a tedious and laboursome yet necessary exercise (Rawson and Muñoz, 2019). While definitions vary, data cleaning can be described broadly as a process of data standardisation or ‘detecting, diagnosing, and editing faulty data’ (Van der Broeck et al. 2005). Implicit in this language of data ‘cleaning’ is the inverse notion of ‘messy’ or ‘untidy’ data which needs to be organised. 

At a mechanical level, that might mean filtering out unnecessary variables, or conflating slight variations of the same concept, such as lowercases, typos, word inflections, or ‘pruning’ words down to their stem. More generally however, data cleaning can necessitate imposing a normative order, a process of standardisation which some humanities scholars have seen as reductive and having serious intellectual and ethical implications (Drucker, 2021: 30; Rawson and Muñoz, 2019). Rather than being necessarily reductive, I argue that the act of data cleaning itself has potential as a critical cultural practice. 

The cultural challenges of (Olteanu et al., 2019) data cleaning are well exemplified by the processing of data pertaining to the Indigenous population in the Australian census, run by the Australian Bureau of Statistics (ABS). Aboriginal and Torres Strait Islander people are consistently underrepresented in the Census, and methods of data collection and analysis have come under scrutiny from social science and humanities scholars. Frances Morphy (2007a) argued that the Census unsuccessfully tried to model remote Aboriginal social relationships in terms of the foundational Western metaphor of a ‘bounded container’. 

Observing the Indigenous Processing Team (IPT) at the Data Processing Centre in Melbourne where forms from remote Indigenous communities were manually processed and standardised, Morphy (2007b) praised these efforts while highlighting the challenges of parameterization, and the ethics of designating Indigenous people as ‘disorder’ which can be remedied. It also reveals an inherent tension between maintaining consistency and commensurability, and making sure the data is coherent in the context of the community it pertains to. 

Likewise, judgements about how to handle missing data can equally reflect political and cultural factors. For example, when a Census question which asks whether people identify as Aboriginal or Torres Strait Islander is left blank, should this value be excluded? Or should it be ‘imputed’ by assuming what someone might have answered based on similar ‘donor’ data, or through data-matching with previous Census data? This continues to be debated, although the ABS currently makes a deliberate choice not to impute the missing data because Indigenous identity is understood as a matter of self-determination, which is ultimately what the question seeks to measure. 

In recognising that cleaning data can suppress diversity, Rawson and Muñoz (2019) argue that as humanities scholars we should consult with the relevant communities, whether it be Indigenous communities, analysts at the Data Processing Centre or librarians to unpack the concepts which structure the data, and its relationship to other data. The tension between standardisation and ensuring data faithfully represents the phenomenon being studied will probably always persist, but being able to clearly articulate and justify decisions in the data cleaning stage is a valuable exercise, one that is uniquely well-suited to digital humanities scholars. Clearly describing the process of data cleaning not only makes it easily reproducible, which is valued in data science, but enriches an understanding of the topic at hand. 

Bibliography

Bowne-Anderson H (2018) What Data Scientists Really Do, According to 35 Data Scientists. Harvard Business Review, 15 August. Available at: https://hbr.org/2018/08/what-data-scientists-really-do-according-to-35-data-scientists (accessed 27 May 2022).

Broeck JV den, Cunningham SA, Eeckels R, et al. (2005) Data Cleaning: Detecting, Diagnosing, and Editing Data Abnormalities. PLOS Medicine 2(10). Public Library of Science: e267. DOI: 10.1371/journal.pmed.0020267.

Drucker J (2021) Cleaning and using data. In: The Digital Humanities Coursebook: An Introduction to Digital Methods for Research and Scholarship. Routledge.

Morphy F (2007a) The transformation of input into output: At the Melbourne Data Processing Centre. In: Http://Press-Files.Anu.Edu.Au/Downloads/Press/P18061/Pdf/Ch0810.Pdf. ANU ePress. Available at: https://openresearch-repository.anu.edu.au/handle/1885/32592 (accessed 27 May 2022).

Morphy F (2007b) Uncontained subjects: Population and household in remote aboriginal Australia. Journal of Population Research 24(2): 163–184. DOI: 10.1007/BF03031929.

Olteanu A, Castillo C, Diaz F, et al. (2019) Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries. Frontiers in Big Data 2. Available at: https://www.frontiersin.org/article/10.3389/fdata.2019.00013 (accessed 27 May 2022).

Rawson K and Muñoz T (2019) Against Cleaning. In: Gold MK and Klein LF (eds) Debates in the Digital Humanities 2019. University of Minnesota Press.

]]>
https://metodhology.anu.edu.au/index.php/2022/05/27/cultural-categories-and-data-cleaning/feed/ 0
Text Analysis: Methods, Assessment, and Experience https://metodhology.anu.edu.au/index.php/2022/05/26/text-analysis-methods-assessment-and-experience/ https://metodhology.anu.edu.au/index.php/2022/05/26/text-analysis-methods-assessment-and-experience/#respond Thu, 26 May 2022 03:05:44 +0000 https://metodhology.anu.edu.au/?p=2687 Text analysis can be a very general term. It’s often used to describe computational tools that analyse text (Reardon, 2020). Though computational tools that analyse text in computational text analysis, or machine analysis, are prevalent, human text analysis has provided a fundamental basis. A comparison of the two, as well as a personal example of the use of text analysis tools, can assist in the understanding of why text analysis is so significant in the modern age. 

Computational text analysis has become a far more widely used method of text analysis in the modern years. There are several benefits to computational text analysis. Using this technique, the root of a problem within both unstructured or structured data can be identified, trends and limits can be recognised, and digital experiences can be enhanced (Haije, 2019). In addition to these advantages, once the system behind the computational analysis has been trained to a sufficient level, the process becomes significantly efficient and quick (Haije, 2019). 

In comparison to computational text analysis, human text analysis has been used in the past and is currently used either in addition to or to replace machine text analysis. The benefits to human text analysis include the ease of commencement. Once a topic and dictionary have been established, the reading and writing of annotations can begin almost immediately. In addition, the interpretations and capabilities of humans have been trained and influenced during our every-day life by all the encounters we experience. Humans also have the benefit of being able to interpret anomalies with a higher success rate, such as irony (Wonderflow, 2019).

Though human text analysis does display some benefits, there are also many limitations that make computational analysis more easily accessible in the modern age. Consistency is often lacking in human text analysis, especially without repeating the process, as humans often evaluate things differently based on their mood (Wonderflow, 2019). Human memory can also present a constraint on the competence and speed of human text analysis. Text analysis often involves many firm definitions and parameters, the ability to remember these terms can hinder the process (Wonderflow, 2019).. Additionally, in comparison to computational text analysis, human text analysis can be a slow method due to manual input. 

With the benefits and limitations of computational and human text analysis in mind, I chose to document my own experiences with text analysis. Google Ngram Viewer is a tool that can be used to analyse terms used in literature and its relevance over time. I used this site to research the terms “anxiety” and “depression” over the years of 1800-2019. While the results showed a general increase, there was a peak in the use of the word “depression” in the 1930s. After the realisation that this was not related to mental health, and was instead referencing the Great Depression, one of Google Ngram’s complications became clear: context is not taken into account when analysing words. In addition, since Google Ngrams only documents written texts, much of the material from the world is unable to be assessed.

There are many tools on the internet that can provide basic computational text analysis. These instruments can be web-based applications, like voyant, or python-based, like Mallet. Either way, there are many ways to begin text analysis processes, and even more ways to enhance them. 

References 

Haije, E. G. (2019). What is Text Analytics? And why should I care? Retrieved May 23, 2022, from https://mopinion.com/what-is-text-analytics-benefits/ 

Reardon, J. (2020). “Text Analysis: An Overview”. METODHOLOGY. Retrieved May 23, 2022, from https://metodhology.anu.edu.au/index.php/content/text-analysis/ 

Wonderflow. (2019). What are the pros and cons of human text analysis – Part 2. Retrieved May 23, 2022, from https://www.wonderflow.ai/blog/what-are-the-pros-and-cons-of-human-text-analysis-part-2 

]]>
https://metodhology.anu.edu.au/index.php/2022/05/26/text-analysis-methods-assessment-and-experience/feed/ 0