Creating and Working with Spoken Language Corpora in EXMARaLDA
Thu, October 24, 2013 • 5:00 PM - 6:00 PM • BUR 337
with Thomas Schmidt, Institute for the German Language, Manheim
Spoken language corpora—as used in conversation analytic research, language acquisition studies and dialectology—pose a number of challenges that are rarely addressed by corpus linguistic methodology and technology. Creating a corpus of spoken language is a labor-intensive and time-consuming task; making authentic recordings and transcribing them both require sophisticated methodological skills and specialized equipment. The resulting corpora are thus valuable resources, and it seems desirable to enable the research community to reuse and share such corpora. In practice, however, technological obstacles, like incompatibilities between data formats, software tools and operating systems, make the reuse and exchange of corpora a difficult undertaking. The EXMARaLDA (Extensible Markup Language for Discourse Annotation) system was designed to overcome some of these obstacles.
Thirteen years after the beginning of its development, EXMARaLDA is now a stable system, used by a great number of researchers, mainly in the fields of discourse and conversation analysis, in language acquisition studies and in dialectology. Several spoken language corpora have been and are currently being compiled with the help of EXMARaLDA.
Dr. Schmidt will explain the system’s architecture and design principles, give an overview of its most important software components—a transcription editor, a tool for corpus management and a corpus query tool—and present some corpora that were constructed with the help of EXMARaLDA.