Machine learning in Archaeology

In a broad sense, Machine learning (ML) describes an algorithmic process that allows categorical derivation of mathematical classifiers, based on statistical analysis of categorized “training data”, enabling a machine intelligence to make informed predictions based on data acquired (Bickler, 2021). As the study of Archaeology has shown proscription of emphasis on the application of classification, the increasing application of ML within its contemporary field research can be argued as a natural technical progression. 

ML is capable of constructing classification models based on large quantity of established (or “known”) set of data through test and tuning, through which ensures a considerable degree of internal consistency within the classification process, whilst also possessing considerable capacity for noise management (Resler, 2021). Through logic models constructed through the classification of training data, ML application is thus capable of predicting information based on raw data input. 

Within the context of archaeology, ML application has seen field implementation broadly in the processing of statistical (such as chemical analysis), textual (language translation), image (automated identification and feature reconstruction), and geospatial data (Bickler, 2021). The latter of which has been described as the most promising of existing implementation, as by the combing of varies raw information derived from archeological sites and subterranean scanning, the combined algorithmic process is capable of creating some reliable reconstruction of human communal presence and activity in given time periods (Resler, 2021). 

As the quality of ML application is heavily dependent on the quality of training data applied, a common difficulty of implementing ML in archaeology is rooted in the nature of archeological data, which unlike traditional “big data” come primarily in the form of highly contextualized chunk of information, occurring through spontaneous discovery, thus without consistency in flow (Grosman, 2014). Poor quality in training data will result in flawed logic models that can create bias in processing results (especially if the training data is incapable of accounting unfamiliar variables), and the deep complexity involved in the interaction between different algorithms can result in a “black box” process where it may prove difficult to comprehend the logical process of the ML application, both of which are common challenges in ML implementations. 

Regarding the current developmental status of the application, one may be fair to conclude that whilst ML has proven to be a useful addition to the field study of archeology, it is far from capable of completely replacing manual input and oversight in its operation, given the complexity of variables involved in the raw processed data (Davis, 2019). ML predicts information based on information it knows, thus one have to question the validity of its predictions when involving information it may not account. 


Bickler, S. H. (2021). Machine Learning Arrives in Archaeology. Advances in Archaeological Practice9(2), 186-191.

Davis, D. S. (2019). Object‐based image analysis: a review of developments and future directions of automated feature detection in landscape archaeology. Archaeological Prospection, 26(2), 155-163.

Grosman, L., Karasik, A., Harush, O., & Smilansky, U. (2014). Archaeology in three dimensions: Computer-based methods in archaeological research. Journal of Eastern Mediterranean Archaeology and Heritage Studies, 2(1), 48-64.

Hörr, C., Lindinger, E., & Brunnett, G. (2014). Machine learning based typology development in archaeology. Journal on Computing and Cultural Heritage (JOCCH)7(1), 1-23.

Resler, A., Yeshurun, R., Natalio, F., & Giryes, R. (2021). A deep-learning model for predictive archaeology and archaeological community detection. Humanities and Social Sciences Communications, 8(1), 1-10.

All comments.