Skip Navigation
UT wordmark
College of Liberal Arts wordmark
llilas masthead
Charles R. Hale, Director SRH 1.310, 2300 Red River Street D0800, Austin, TX 78712 • 512.471.5551

LLILAS-UTL Project to Benefit from Increased Supercomputing Capacity at TACC

Posted: February 28, 2012
db-image

The Latin American Government Documents Archive (LAGDA) is one of numerous projects that will soon be able to take advantage of expanded and enhanced supercomputing capacity at the Texas Advanced Computing Center (TACC) following the recently announced commitment of $10 million to TACC by the O’Donnell Foundation.

LAGDA is a joint initiative of the institute’s Latin American Network Information Center (LANIC), the Benson Latin American Collection, and the University of Texas Libraries. Since 2005, under the direction of LLILAS-Benson Digital Curation Coordinator Kent Norsworthy, LAGDA has run quarterly Web crawls of about 300 Latin American and Caribbean presidency and ministerial websites. The resulting Web archive currently contains over 5.8 TB of data, including thousands of important documents, reports, and speeches that have long since disappeared from the live Web. For example, LAGDA contains an extensive collection of materials from the presidency of Manuel Zelaya of Honduras covering the period between his inauguration in January 2006 and his overthrow in a coup d’état in June 2009.
 
The LAGDA data mining effort with TACC is conducted by LANIC GRA and LLILAS alum Nicholas Woodward (photo above) and supervised by Dr. Weijia Xu, a Research Associate in TACC’s Data and Information Systems group, and by TACC Data Archivist Maria Esteva. Woodward’s research is in developing data mining and document representation algorithms that will facilitate the programmatic classification and discovery of specific types of content in large-scale data collections, in this case, ministerial and presidential reports and speeches in the LAGDA Web archive. On TACC’s Longhorn Visualization Cluster, Woodward utilizes a variety of programming tools and libraries to meet his research goals, including Java, Heritrix, Lucene, Mahout, Hadoop, and LibSVM.
 
Read more about the O’Donnell Foundation’s recent commitment to TACC, which seeks to expand data-intensive science, in the University Communications press release or in the Austin American-Statesman. Explore the full LAGDA collection at: http://lanic.utexas.edu/project/archives/lagda/.

back
bottom border