Enabling Information Retrieval from Conversational Speech Archives via Crowdsourcing
|This is an ongoing project.||
Contact DetailsMatt Lease
Presently this is an unpaid opportunity only, either independent study for course credit or simply volunteering.
This project was featured in WIRED: http://www.wired.com/dangerroom/2013/03/darpa-speech
Advances in capture and storage technology now let us archive massive amounts of spontaneous (conversational) speech data. However, effective use of this data requires accurate information retrieval technology designed for and evaluated on spontaneous speech data. Unfortunately, traditional practice for benchmarking search engine accuracy cannot scale to “big data”, especialy for conversational speech. This restricts our ability to even measure the effectiveness of existing search engines, much less further advance them.
Evaluating search with spontaneous speech archives is particularly challenging vs. more traditional text collections. Unlike text, speech must first be transcribed, and while prepared speech (e.g. broadcast news) transcripts are very readable, spontaneous speech transcripts are often very difficult to read, even with perfect transcription, due to "disfluency" (self-corrections, trailing off, interruptions, etc.) and lack of commas and sentence boundaries. Human editing to correct this would require even greater manual effort. As a result, few spontaneous speech IR test collections exist today.
We are investigating use of nascent crowdsourcing (crowd computing) techniques in concert with "rich transcription" technology. While crowdsourcing offers tremendous potential for time and cost savings, how to achieve these savings without compromising quality remains an open research problem. As a case study, we are investigating search of spontaneous speech interviews with Holocaust eye-witnesses collected by the Shoah Foundation.
Now that you've used EUREKA to identify a project of interest to you, read about getting involved in research at The University of Texas at Austin.
The Office of Undergraduate Research recommends that you attend an info session before contacting faculty members or project contacts about research opportunities. We'll cover the steps to getting involved, tips for contacting faculty, funding possibilities, and options for course credit.
If you aren't able to attend an info session, contact the Office of Undergraduate Research to schedule an appointment with an advisor.
Once you have attended an Office of Undergraduate Research info session or spoken to an advisor, you can use the "Contact Details" for this project to get in touch with the project leader and express your interest in getting involved.