TACC Introduces New System for Data-Intensive Computing and Storage

April 6, 2009

AUSTIN, Texas — "Corral," a system for data-intensive computing and storage, is the newest resource to be deployed by the Texas Advanced Computing Center (TACC) at The University of Texas at Austin.

A partnership among TACC, DataDirect Networks (DDN) and Dell Inc., Corral went into friendly-user production on March 31 and is available to researchers and educators at The University of Texas at Austin. The resource will soon become available to a wider group of users, including UT System institutions and National Science Foundation TeraGrid users.

Corral will support database, file system and Web-based access, as well as other network protocols for storage and retrieval of data from local and remote sources. Corral's high-performance parallel file system, based on Lustre, will be accessible from TACC's world-class computational resources, Ranger and Lonestar. The system will also be accessible from Stallion, the world's highest-resolution tiled display, and from Spur, TACC's remote visualization system, enabling mathematical and visual analysis of petabyte-scale datasets. Corral will host Web applications and services for access to data from anywhere on the Internet.

"We support world-class science and engineering research, and we are now working with increasingly diverse applications from other domains," TACC Director Jay Boisseau said. "In both our science research support and in our projects from new communities—industry, humanities, etc.—we are seeing a rapidly growing need to be able to host, manage and organize massive data collections, and to support the development and availability of new types of data applications. We're excited to partner with DataDirect Networks and Dell to provide new capabilities for our growing user community."

Paul Bloch, president and co-founder of DDN, said, "DataDirect Networks' storage solutions, such as the S2A9900 ExaScaler system which TACC deployed, are designed for extreme performance, data reliability and scalable capacity, which lend itself to many applications in an HPC datacenter, such as long-term and fast-scratch data storage. We have a strong presence in high performance computing, and are proud to support seven of the top 10 fastest supercomputers in the world. We're honored that TACC has put its trust in us and our research computing storage technologies to support the Corral project."

"Dell has a long-standing commitment of supporting the global research community's efforts to solve major scientific problems with high performance computing," said John Mullen, vice president of Dell education, state and local government. "We are now extending that commitment to affordable, accessible HPC research storage solutions, such as Corral, through our partnership with TACC and DataDirect Networks. Going forward, we will continue to drive standards into the HPC ecosystem, making it simpler for scientists and researchers worldwide to collaborate, share information and address many of society's biggest challenges."

Chris T. Jordan, a senior operating systems specialist in TACC's Advanced Systems Group, said Corral complements TACC's system portfolio, enabling users to gain additional insights from the systems that are already in place. For example, a user can access all of Corral's storage capabilities from HPC systems Ranger or Lonestar, and from TACC's visualization systems, Spur or Stallion, Jordan said.

"We hope that people will use the TACC Visualization Laboratory to visualize data on Corral that may have been generated on Ranger," Jordan said. "Corral provides online storage at the petabyte scale—it's all online, accessible and high-speed so that researchers can store and use much more data as part of their computation or visualization."

Data collection projects that will use Corral include:

  • PECOS Engineering Simulation Project, The University of Texas at Austin—The Center for Predictive Engineering and Computational Sciences (PECOS) is a new Department of Energy-funded Center of Excellence within the Institute for Computational Engineering and Sciences at The University of Texas at Austin. The PECOS project will develop the next generation of advanced computational methods for predictive simulation of multiscale, multiphysics phenomena, and apply these methods to the problem of reentry of vehicles into the atmosphere. PECOS hopes to advance the science and modeling of atmospheric reentry and the science of predictive simulation. Corral will be used to process, manage and store the images and other data generated by the project, and will provide high-speed access to this data for researchers and members of the public anywhere in the world.
  • Herbarium Digitization, The University of Alaska Museum of the North—One of the world's premier collections of arctic and boreal plants. With support from the National Science Foundation, the Herbarium is taking high-resolution digital photographs of 230,000 pressed plants to capture data about the collection and to make these specimens more accessible for research and education. The images are archived as digital negatives, the most data-intensive file format, preserving all of the data captured by the camera. Making these images publicly available requires four terabytes of rapidly accessible Web storage. Corral will be used to process, manage and store the digital images and other data generated by the project, and will provide high-speed access to this data for researchers and members of the public anywhere in the world. 
  • Center for Space Research (CSR), The University of Texas at Austin—CSR will use Corral for two important space-based projects—imagery data and geospatial data for emergency response operations, and high-precision gravity data processing. As part of CESAR (Cyberinfrastructure for Emergency Situation Assessment and Response), Corral will be used to rapidly access the 'framework' geospatial data needed for emergency response operations during natural and man-made disasters. Framework data are the most recent, high-resolution aerial and orbital imagery and elevation data sets. CSR will also use Corral to store the data sets collected during a major event, such as Hurricane Ike, for distribution to state and federal agencies, and universities performing disaster research. The Gravity Recovery and Climate Experiment (GRACE) is providing a continuous, multi-year record of the spatial and temporal variations in the Earth's mass through measurements of its gravity field, and has provided new insights into the evolution of the Earth's climate system. The group expects to collect a few terabytes of original data and 20 to 40 terabytes of analysis results. Corral will house the data online for rapid mission reprocessing and scientific analysis. In addition, Corral will host the output products online for analysis of multi-year data sets.
  • Institute of Classical Archaeology (ICA), Liberal Arts, The University of Texas at AustinICA will use Corral to preserve, protect and disseminate two dynamic datasets to the wider academic community and the public. The first dataset contains information gathered during an intensive field survey of ancient sites in the territory of Metaponto in South Italy where data were documented using GPS and incorporated with remote-sensing imagery into a geographic information system. The second dataset involves excavations in an area of the Greek, Roman and Byzantine city of Chersonesos in Crimea (Ukraine). These spatial and contextual datasets also contain extensive data produced in the course of specialist research into forensic anthropology and ancient agriculture and technology.
  • The Hobby-Eberly Telescope Dark Energy Experiment (HETDEX), The University of Texas at Austin—The HETDEX project at McDonald Observatory is the first major experiment to probe dark energy, the mysterious force causing the expansion of the universe to speed up over time. Over three years, HETDEX will collect data on at least one million galaxies that are nine billion to 11 billion light-years away, yielding the largest map of the universe ever produced. The map will allow astronomers to measure how fast the universe was expanding at different times in history. The project will generate several tens of terabytes of data in a realm previously unexplored by astronomers of which the project will use a small fraction. TACC will archive the dataset for use by the wider astronomical community, and provide a public Web portal.

Some of these data collections are as small as five terabytes, while some are as large as 100 terabytes.

"As Corral fills up, we plan to expand it," Jordan said. "It's designed to extend TACC's infrastructure. We now have one unified system that can support all of these applications that can grow to meet future demands."

Technical Specifications

  • 1.2 petabytes of SATA disk in a Data Direct NetworksTM S2A9900 controller system shared via parallel file system to other TACC systems and via databases such as MySQL, PostgreSQL and SQL Server.
  • 10 Lustre file system server nodes.
  • Six database, Web and application server nodes with 10Gb/sec network connections.
  • The disk system is composed of 1200 one terabyte drives, and the controller has eightInfiniband connections to the server systems. The controller is capable of reading and writing data at up to 6GB/sec.

For questions about Corral, contact Chris T. Jordan, senior operating systems specialist, TACC Advanced Systems Group.

For more information, contact: Faith Singer-Villalobos, Texas Advanced Computing Center, 512 232 5771.