Digitizing the past
Archaeologists and technologists develop supercomputer-powered solutions to protect and preserve ancient sites and artifacts
Sept. 19, 2011
When Adam Rabinowitz was 15 years old, his aunt, an archaeologist, invited him to join her on a dig in Sicily.
Twenty-three years later, Rabinowitz, now the assistant director at the Institute of Classical Archaeology (ICA) at The University of Texas at Austin, is still traveling around the world getting dirt under his nails. And though much remains the same about archaeology since he first picked up a duster brush, a lot has changed.
“For a long time, the history of archaeology came from the desire to have pretty objects,” he said. “Then, in the 19th century, the desire to prove the historical accuracy of Greek epic poetry, like the Iliad and the Odyssey, became important as well.”
Documentation has always been important to archaeology, but over the course of the 20th century, it became more important to preserve the contextual associations among objects and layers of history. These changes have altered the theoretical underpinnings of archaeology.
“At this point, archaeology is heavily focused on the documentation and recording of contextual associations between things and not just the nice objects themselves,” Rabinowitz said.
Other changes to the field have been brought about by technology. In previous eras, researchers logged their data in notebooks, which were preserved along with photographs, maps and objects, in a physical archive. In fact, Rabinowitz can still access the notebooks and negatives of people who conducted research more than a hundred of years ago at the same sites he is exploring.
Today, archaeologists are more likely to take hundreds of digital photos, notate the information in a spreadsheet on a laptop, and record careful geographically referenced information that only a computer can interpret.
“The development of digital technologies has exponentially magnified the amount of data we’re collecting, simply because we have the tools now to collect a lot more information much more easily than we did in the past,” Rabinowitz said.
Working with experts and resources at the Texas Advanced Computing Center (TACC), one of the leading academic computing centers in the nation, Rabinowitz is harnessing the richness of digital data to develop a greater understanding of the past. But the evolution of a digital archeological practice did not occur overnight. In fact, it was arguably decades in the making.
Professor Joseph C. Carter, the director of the Institute of Classical Archaeology, has applied cutting-edge technologies to interdisciplinary projects since he founded ICA in 1974. In the 1990s and 2000s, this focus on new technologies and methods made ICA a leader in the adoption of digital tools such as Geographic Information Systems and relational databases. But the ability to manage technology often lags behind the capability of the technology itself. Rabinowitz knows this personally.
“Digs that I’ve participated in have produced information that is now digitally gone because the platforms and the storage mechanisms became obsolete, and that’s in the space of ten years,” he said. “When we look down the road and ask, ‘What will we leave for people 25 years from now, 100 years from now?’ we’re faced with a huge issue that people are just starting to confront. The use of new tools outpaced the concern about the future.”
Destroying to Understand
When an archaeological team explores a site, they destroy the very thing they are studying. This is archaeology’s dirty little secret. After years of displacing dirt and extracting pottery shards or bones, archaeologists are often left with nothing but documentation, artifacts, and a big pit.
The site itself can never be physically reconstructed. However, with the right information — the right data and data structures — the site can be recreated virtually so future archaeologists can re-examine the raw information in order to ask new questions and draw new conclusions.
“The idea is to preserve this richer dataset in a form that will allow it to be used by people in the future so that they can have more access to an accurate representation of the things that we destroyed as we conducted our research,” Rabinowitz said.
In 1994, Professor Carter began a research project at Chersonesos, a Greek colony on the Crimean peninsula that thrived through the Byzantine age. With the support of the Packard Humanities Institute, ICA was able to apply emerging digital technologies to the investigation of this fascinating site, then little known in the West.
Over the course of 16 years, the ICA team developed an extremely rich dataset related to excavations in both the urban center and the agricultural territory of the ancient city. But by 2008, some of this information sat on a single portable server that they carried back and forth to Ukraine and that “could have blown up at any time.”
“We didn’t have a solution and we were getting very nervous about it,” Rabinowitz said. “Our team knew a lot about geographical information systems, databases, and online presentation, but we had neither the resources nor the skills to deal with a higher-end set of archival solutions. This is when we turned to the Texas Advanced Computing Center.”
One of the leading academic computing centers in the nation, TACC operates some of the most powerful computing systems, including Corral, a storage system with specialized servers for highly reliable digital archiving and large-scale databases. TACC’s staff also have expertise in the computing technologies required to index, preserve, and make available the data about Chersonesos.
Working with Maria Esteva, a digital archivist, and David Walling, a data applications expert, Rabinowitz developed a state-of-the-art data management system and framework for the ICA’s project.
“Research data has to be organized, described and preserved so that people can use it during and after the project is finalized,” Esteva said. “Based on the research lifecycle, we defined best practices at each stage; how data could be managed from the site, all the way to publication and long term preservation.”
The data management system keeps the data safe and replicated for long-term protection; it also brings to life the connections between different objects.
To illustrate the power of this system, Rabinowitz pointed to an interactive map his team created of a Byzantine residential block at Chersonesos that was excavated between 2001 and 2006. The block was pillaged and burned in the middle of the 13th century, and was left as a well preserved snapshot of life at that moment. A padlock found at the site, dating to the late 12th or early 13th century AD, serves as an example of why context is key.
“There are a number of these locks in museums and private collections, almost none of them with context because they were beautiful objects,” Rabinowitz explained. “People bought them on the market; people dug them up and sold them as objects. So here we have a very rare opportunity to look at this in a contextual setting.”
In addition to the standard pictures taken of the lock after it was pulled from the ground, all of the data from the dig site itself is accessible through the map. Icons show where and how the lock was found: smashed into two pieces, probably by the ax of a looter. It also shows information about other items found nearby, including an iron bucket, and the skeleton of a woman in her fifties left lying in the street under the collapsed rubble of a roof. Together, the objects suggest a story.
“Somebody kicked in the door, grabbed the stuff, ran out in the street, dumped the bucket – it probably wasn’t worth anything – knocked the lock off the strong box, took the gold, and booked leaving the building in flames,” he said.
This is a much more vivid depiction of this particular moment than you would get with a traditional archaeological recording system, and it is possible because, at the point of ingestion, the system automatically extracts information about the data to preserve the contextual associations between the documented objects and the layers in which they were found on the site.
In addition, the system keeps track of the changes that digital objects undergo, for example as researchers edit images. For each object, a “metadata” document records these associations, allowing future scholars to retrieve original data in relation to other objects and to the history of changes.
“We’re preserving the data gathered at the site, and we’re also preserving the documentation process itself,” said Esteva. “We are documenting the documentation process — that’s why we talk about reflexive archaeology.”
The conception and deployment of such a system was not simple. Three years of collaboration between ICA and TACC were required to complete the data management system and to ingest the collection into Corral. Now that the infrastructure is complete, the team is working to make the primary archaeological data available to other archaeologists. The researchers presented their findings at the 6th International Digital Curation Conference, IS&T Archiving 2010, and Computer Applications and Quantitative Methods in Archaeology 2010, with papers submitted for publication in the proceedings of all three.
The methodology can be generalized to other research topics in the humanities and social sciences where scholars are struggling with the deluge of data.
“Many researchers understand the value in having technical and descriptive metadata for their data. Automating the creation of that metadata helps them to realize that value,” Walling explained. “We are currently using the same automatic information extraction method to gather metadata from digital arts collections.”
From rough notebooks to fragile hard drives to supercomputer-powered archives, the archaeologist’s road to data security and open access has been a long one, but progress is visible in the system created by ICA and TACC.
“We have to take care of the research data collections so they can be reused in the future to answer new questions and to make discoveries,” said Esteva.
“Archaeology is changing rapidly, paralleled by changes in information technology in the broader world,” said Rabinowitz. “The amount of data available has ballooned. The really exciting part is about what’s going to come out of these quantities of data.”
For more information, contact: Aaron Dubrow, Texas Advanced Computing Center, 512 475 9439.