Skip Navigation
UT wordmark
College of Liberal Arts wordmark
linguistics masthead linguistics masthead
Anthony C. Woodbury, Chair CLA 4.304, Mailcode B5100, Austin, TX 78712 • 512-471-1701

Katrin E Erk

Associate Professor Ph.D., Saarland University, Germany

Katrin E Erk

Contact

Biography

Dr. Katrin Erk joined the Department of Linguistics at The University of Texas at Austin in 2006. She received her doctorate in computer science from Saarland University in Saarbruecken in 2002. From 2002 to 2006 she worked as a researcher at Saarland University. Her research focuses on computational models for word meaning, the automatic acquisition of lexical information from text corpora, and manual and automatic meaning analysis of free text.

LIN 350 • Anly Txt Data: Ling Stats

41090 • Fall 2014
Meets TTH 330pm-500pm CLA 1.108
show description

Do women produce more words than men do? Does it matter whether you say "I gave Mary the book" or "I gave the book to Mary"? Can you tell, from the way a wine is described, whether it is expensive or cheap? Are singers nowadays saying "I" more than they used to (thereby showing that they are more self-centered)? These and many other language questions can be addressed using statistics.This course is a hands-on introduction to working with statistics. We will use language problems like the ones above to introduce concepts from statistics, and will use linguistic datasets to do statistical analyses.

Prerequisites: Upper-division standing.

Textbooks:

P. R. Hinton (2004): Statistics Explained: A Guide for Social Science Students. Psychology Press; 2nd edition.

R.H. Baayen (2008): Analyzing Linguistic Data: A PracticalIntroduction to Statistics Using R. Cambridge University Press.

LIN 392 • Working With Corpora

41195 • Fall 2014
Meets TTH 1230pm-200pm CLA 4.422
show description

Text corpora hold a wealth of information. To make use of it, you need to know how to find relevant examples, count occurrences of relevant phenomena (and interpret the counts), and decode annotation that otherpeople have added. This course provides a practical, hands-on introduction to working with corpora. We will study thedesign,annotation formats, and analysis of text corpora.

Topics to be discussed include:

- what types of corpora there are, and what kinds of researchquestions can be answered using a corpus;

- corpus annotation: principles and standards, formats, examples, andtests for annotation guidelines;

- tools for searching corpora- an introduction to the programming language Python, concentrating onsmall programs that process text-

and the basics of statistical modeling of corpus phenomena.

This course is aimed at graduate students in linguistics who would like to use text corpora for their own research. Previous programming experience is not required.

(No required texts.)

LIN 353C • Intro Computatnl Linguistics

41490 • Spring 2014
Meets TTH 1100am-1230pm BEN 1.124
show description

In the age of the Internet, there is a considerable demand for technology helping users to manage, search and access the enormous amount of information that is available.  Examples of language technology applications are machine translation, the automatic recognition of the language in which a document is written, automatic extraction of information from documents, and the automatic detection of whether a document expresses a positive or a negative opinion on some topic.The field of computational linguistics deals both with the creation of such applications and with the science behind them. This course provides an introduction to the key representations and algorithms used in computational linguistics, and it discusses some of the main natural language processing applications.The course will be oriented towards hands-on experience of language processing techniques, using the Python programming language and the Natural Language Toolkit (NLTK, http://www.nltk.org/). No previous programming experience is required. This class is specifically aimed at students of linguistics and related fields with no computational background, and includes an introduction to the basics of programming.

LIN 350 • Words In A Haystack

40945 • Spring 2013
Meets TTH 930am-1100am CLA 0.118
show description

LIN 350: Words in a HaystackDescription from earlier this year (course description for unnumbered topic form)Are singers nowadays saying "I" more than they used to (thereby showing that they are more self-centered)? Do women produce more words than men do? Does it matter whether you say "I gave Mary the book" or "I gave the book to Mary"? Can you tell, from the way a wine is described, whether it is expensive or cheap? These and many other language questions can be addressed using statistics.This course is a hands-on introduction to working with statistics. We will use language problems like the ones above to introduce concepts from statistics, and will use linguistic datasets to do statistical analyses. 

Textbooks

P. R. Hinton (2004): Statistics Explained: A Guide for Social Science Students. Psychology Press; 2nd edition.R.H. Baayen (2008): Analyzing Linguistic Data: A Practical Introduction to Statistics Using R. Cambridge University Press. 

 

Grading

4 homeworks (40%), Midterm (30%) and Final exam (30%)

LIN 353C • Intro Computatnl Linguistics

40970 • Spring 2013
Meets TTH 1230pm-200pm CLA 0.118
show description

In the age of the Internet, there is a considerable demand for technology helping users to manage, search and access the enormous amount of information that is available.  Examples of language technology applications are machine translation, the automatic recognition of the language in which a document is written, automatic extraction of information from documents, and the automatic detection of whether a document expresses a positive or a negative opinion on some topic.The field of computational linguistics deals both with the creation of such applications and with the science behind them. This course provides an introduction to the key representations and algorithms used in computational linguistics, and it discusses some of the main natural language processing applications.The course will be oriented towards hands-on experience of language processing techniques, using the Python programming language and the Natural Language Toolkit (NLTK, http://www.nltk.org/). No previous programming experience is required. This class is specifically aimed at students of linguistics and related fields with no computational background, and includes an introduction to the basics of programming.

LIN 386M • Learn Grounded Models Meaning

40855 • Fall 2012
Meets TTH 200pm-330pm CAL 323
show description

Co-taught with Jason Baldridge

Natural language processing applications typically need large amountsof information at the lexical level: words that are similar inmeaning, idioms and collocations, typical relations between entities,lexical patterns that can be used to draw inferences, and so on. Todaysuch information is mostly collected automatically from large amountsof data, making use of regularities in the co-occurrence of words. Butdocuments often contain more than just co-occurring words, for exampleillustrations, geographic tags, or a link to a date. Just likeco-occurrences between words, these co-occurrences of words andextra-linguistic data can be used to automatically collect informationabout meaning. The resulting grounded models of meaning link words tovisual, geographic, or temporal information. Such models can be usedin many ways: to associate documents with geographic locations orpoints in time, or to automatically find an appropriate image for agiven document, or to generate text to accompany a given image.

In this seminar, we discuss different types of extra-linguistics data,and their use for the induction of grounded models of meaning.

LIN 392 • Working With Corpora

40860 • Fall 2012
Meets TTH 1100am-1230pm PAR 10
show description

Text corpora hold a wealth of information. To make use of it, you needto know how to find relevant examples, count occurrences of relevantphenomena (and interpret the counts), and decode annotation that otherpeople have added. This course provides a practical, hands-onintroduction to working with corpora. We will study the design,annotation formats, and analysis of text corpora.

Topics to bediscussed include:

- what types of corpora there are, and what kinds of researchquestions can be answered using a corpus;

- corpus annotation: principles and standards, formats, examples, andtests for annotation guidelines;

- tools for searching corpora

- an introduction to the programming language Python, concentrating onsmall programs that process text

- and the basics of statistical modeling of corpus phenomena.

This course is aimed at graduate students in linguistics who wouldlike to use text corpora for their own research. Previous programmingexperience is not required.

(No required texts.)

LIN 350 • Intro To Computational Ling

40790 • Spring 2012
Meets TTH 200pm-330pm MEZ 1.202
show description

In the age of the Internet, there is a considerable demand for technology helping users to manage, search and access the enormous amount of information that is available.  Examples of language technology applications are machine translation, the automatic recognition of the language in which a document is written, automatic extraction of information from documents, and the automatic detection of whether a document expresses a positive or a negative opinion on some topic.The field of computational linguistics deals both with the creation of such applications and with the science behind them. This course provides an introduction to the key representations and algorithms used in computational linguistics, and it discusses some of the main natural language processing applications.The course will be oriented towards hands-on experience of language processing techniques, using the Python programming language and the Natural Language Toolkit (NLTK, http://www.nltk.org/). No previous programming experience is required. This class is specifically aimed at students of linguistics and related fields with no computational background, and includes an introduction to the basics of programming.

LIN 386M • Word Meaning And Concepts

41220 • Spring 2011
Meets TTH 200pm-330pm PAR 10
show description

Word meanings are hard to characterize. There are cases in which word are
systematically related and can be derived systematically, in
particular through metonymy. But beyond metonymy, dictionaries do not
even agree on how many senses different words have. There has been
work on characterizing word meaning in different areas: lexicography,
theoretical linguistics, cognitive linguistics, psychology,
computational linguistics, and artificial intelligence. In this
graduate seminar, we will study approaches to word meaning
characterization from these different areas, looking at their
commonalities and differences. We will particularly focus on work in
psychology on human concept representation and its application in
linguistics. Some of the topics of the readings will be:

*  In what ways can word senses be related, systematically or otherwise?
* Is it reasonable to assume that each word has a fixed number of
senses and that that number can be detected through tests?
* What is the relation between word meaning and human concept representation?
* Prototype representations of word meaning, and their problems
* Frames and structured representations of word meaning
* The representation of word meaning through attributes,
decomposition, and feature norms
* Geometric models of word meaning

Note that this is not a computational seminar, though we will include
computational approaches in the discussion of geometric models. The
main part of the grade will be determined through a course project,
which may be theoretical, a corpus study, or a computational study.

LIN 392 • Analyzing Linguistic Data

41230 • Spring 2011
Meets TTH 1100am-1230pm PAR 10
show description

Co-taught by Katrin Erk and Colin Bannard.

Course Description
Many areas of linguistics require statistical analysis: phonetics and phonology, psycholinguistics, computational linguistics, or empirical studies in syntax or semantics. Statistical analysis helps detect regularities in the numbers, test whether observed effects are statistically significant, and implement and test hypotheses about the data. This course provides hands-on introduction to statistics for language, using the R programming language. Using data from existing linguistic studies, we will study the following topics: * data exploration through visualization * probability distributions * mean and standard deviation of a single dataset * comparing pairs of datasets and hypotheses:testing for statistical significance * regression modeling * clustering and classification

Texts
R.H. Baayen (2008). Analyzing Linguistic Data: A Practical Introduction to Statistics Using R. Cambridge University Press.
P.R. Hinton (2004). Statistics Explained: A Guide for Social Science Students. Routledge.

LIN 350 • Intro To Computational Ling

40705 • Fall 2010
Meets MWF 1000am-1100am PAR 206
show description

Course Description

In the age of the Internet, there is a considerable demand for
technology helping users to manage, search and access the enormous
amount of information that is available. There is also a need for
speech interfaces to computer systems of various types, from tutoring
systems to automated customer support lines to robots. Examples of
language-technological applications are the identification of the
correct sense of an ambiguous word like "bass" (fish or musical
instrument), automatic recognition of the language in which a document
is written, machine translation, and automatic extraction of
information from documents.

The field of computational linguistics deals with both the science
behind providing such capabilities and the actual creation of
applications which implement them. This course provides an
introduction to the key representations and algorithms used in
computational linguistics, and it discusses the main natural language
processing applications.

The course will be oriented towards hands-on experience of language
processing techniques, using the Python programming language and the
Natural Language Toolkit (NLTK, http://www.nltk.org/). No previous
programming experience is required. This class is specifically aimed
at students of linguistics and related fields with no computational
background.


There will be two exams.

Grading Policy

Homework assignments: 40% In-class exams: 40% Class participation: 20%

Texts

      Jurafsky, D. and J. H. Martin, Speech and language processing:
An Introduction to Natural Language Processing, Computational
Linguistics, and Speech Recognition. Second Ediition. Upper Saddle River, NJ:
Prentice-Hall, 2000.

      Additional required readings will be made available for download
from the course website.

      Recommended additional text: Mark Lutz and David Ascher,
Learning Python, O'Reilly.

LIN 392 • Working With Corpora

40825 • Fall 2010
Meets W 1200pm-300pm PAR 10
show description

Course Description

Text corpora hold a wealth of information. To make use of it, you need
to know how to find relevant examples, count occurrences of relevant
phenomena (and interpret the counts), and decode annotation that other
people have added. This course provides a practical, hands-on
introduction to working with corpora. We will study the design,
annotation formats, and analysis of text corpora. Topics to be
discussed include:

- what types of corpora there are, and what kinds of research
questions can be answered using a corpus;

- corpus annotation: principles and standards, formats, examples, and
tests for annotation guidelines;

- tools for searching corpora

- an introduction to the programming language Python, concentrating on
small programs that process text

- and the basics of statistical modeling of corpus phenomena.

This course is aimed at graduate students in linguistics who would
like to use text corpora for their own research. Previous programming
experience is not required.

(No required texts.)

LIN 350 • Intro To Computational Ling

41470 • Fall 2009
Meets TTH 330pm-500pm PAR 206
show description

Syllabus for Introduction to Computational Linguistics: LIN350

 

Course Information

Course: Introduction to Computational Linguistics, LIN350 - 41470

Semester: Fall 2009

Course page: comp.ling.utexas.edu/courses/2009/fall/introduction_to_computational_linguistics

Course times: Tuesday, Thursday 3:30-5pm|

Course location: PAR 206.

Instructor Contact Information

Katrin Erk

office hours: Mon 1-3pm, Tue 10-11am

office: Calhoun 512

phone: 471-9020

fax: 471-4340

email: katrin dot erk at mail dot utexas dot edu

Teaching Assistant Contact Information

Farzan Zaheed

office hours: Wed 3-5pm

office: Calhoun 536A

email: farzanzaheed at yahoo.com

 

Lab information

To get an account on the computational linguistics lab machines, please contact our lab administrator, Joey Frazee, at

   jfrazee @mail.utexas.edu and cc me on your mail.

 

Prerequisites

Upper-division standing.

Syllabus and Text

This page serves as the syllabus for this course.

Textbook: Jurafsky, D. and J. H. Martin, Speech and language processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition (2nd Edition). Prentice-Hall, 2008.

Additional required readings will be made available for download from the course website.

Recommended additional text: Mark Lutz and David Ascher, Learning Python, O'Reilly.

Exams and Assignments

There will be one mid-term exam and one final exam. The midterm will consist of the material covered in the first half of the class, and the final will be comprehensive, but with a greater emphasis on the contents covered in the second half of the class.

There will be four homework assignments. Assignments will be updated on the assignments page. A tentative schedule for the entire semester is posted on the schedule page. Readings and exercises may change up one week in advance of their due dates.

To see your grades go to eGradebook.

Philosophy and Goal

The main aim of this course is to introduce the student to the core algorithms, data structures, and applications of computational linguistics. Students will gain an appreciation for the difficulties inherent in NLP and and understanding of strategies for tackling them. The course will address both theoretical and applied topics.

Some specific goals of the course are to enable students to:

  • understand core algorithms and data structures used in NLP
  • write non-trivial programs for NLP (using the Python programming language)
  • appreciate the relationship between linguistic theory and computational applications
  • write computational grammars and analyze their adequacy
  • gain insight into the possibilities and difficulties of automatic semantic analysis of text

 

This course presents an opportunity for students to gain experience with models and algorithms used in computational linguistics that underly practical applications, and at the same time to get to know the underlying theoretical questions.

Content Overview

The field of computational linguistics has experienced significant growth in the last ten years. In addition to the hard work of researchers in the field in general, some of the most important factors behind this include the use of statistical techniques, the availability of large (sometimes annotated) corpora (including the web itself), and the availablity of relatively cheap and powerful computers. Together, these factors have played a major part in making computational linguistics very relevant in applied settings. This course will discuss many of the core technologies and techniques used in computational linguistics, such as finite-state methods, context-free grammars and parsing, and semantics construction.

This course provides a broad introduction to computational linguistics with a particular emphasis on core algorithms and data structures. Topics include:

  •       finite-state automata and transducers
  •       computational morphology
  •       part-of-speech tagging and chunking
  •       context-free grammars, and parsing
  •       computational semantics
  •       lexical semantics
  •       grammar engineering
  •       applications that use computational linguistics: machine translation, search, information extraction

 

A detailed schedule for the course, with topics for each lecture, is available at http://comp.ling.utexas.edu/courses/2009/fall/introduction_to_computational_linguistics/schedule

Course Requirements and Grading Policy

Assignments (10% each):

A series of four assignments will be assigned during the semester.

Mid-term Exam (20%):

There will be a mid-term exam on October 15 over the material covered in class up to October 13.

Final Exam (30%):

The final exam will be given on December 9 and will cover all course material.

Participation (10%):

There will be exercises and interactions in class which will in part determine your score on participation. Also, while there is no explicit attendance policy, a lower participation score will be given to those who regularly miss class.

Given that homeworks and the exams address the material covered in class, attendance is essential for doing well in this class.

Final grades will not use plus/minus grades.

Extension Policy

If you turn in your assignment late, expect points to be deducted. Extensions will be considered on a case-by-case basis, but in most cases they will not be granted.

For other assignments, by default, 5 points (out of 100) will be deducted for lateness, plus an additional 1 point for every 24-hour period beyond 2 that the assignment is late. For example, an assignment due at 2pm on Tuesday will have 5 points deducted if it is turned in late but before 2pm on Thursday. It will have 6 points deducted if it is turned in by 2pm Friday, etc.

The greater the advance notice of a need for an extension, the greater the likelihood of leniency.

Academic Dishonesty Policy

You are encouraged to discuss assignments with classmates. But all written work must be your own. Students caught cheating will automatically fail the course. If in doubt, ask the instructor.

Notice about students with disabilities

The University of Texas at Austin provides upon request appropriate academic accommodations for qualified students with disabilities. Please contact the Division of Diversity and Community Engagement, Services for Students with Disabilities, 471-6259.

Notice about missed work due to religious holy days

A student who misses an examination, work assignment, or other project due to the observance of a religious holy day will be given an opportunity to complete the work missed within a reasonable time after the absence, provided that he or she has properly notified the instructor. It is the policy of the University of Texas at Austin that the student must notify the instructor at least fourteen days prior to the classes scheduled on dates he or she will be absent to observe a religious holy day. For religious holy days that fall within the first two weeks of the semester, the notice should be given on the first day of the semester. The student will not be penalized for these excused absences, but the instructor may appropriately respond if the student fails to complete satisfactorily the missed assignment or examination within a reasonable time after the excused absence.

LIN 392 • Working With Corpora

40660 • Spring 2009
Meets TTH 330pm-500pm MEZ 1.120
show description

            Across the field of linguistics, more and more scholars are using experiments to investigate linguistic issues, and to provide evidence that stands up to skeptical scrutiny. This course is a hands-on introduction to how one does this. Students will design and run two small-scale experiments on any linguistic issue they choose: a production experiment oriented toward the speaker, and a corresponding reception experiment oriented toward the listener. The topic of the research can be in any area of linguistics, and involve any particular language.

            We will work through the process of doing an experiment step by step from the beginning:

  • Coming up with a hypothesis
  • Designing an experiment
  • Formulating and submitting a human subjects research proposal
  • Recording speech
  • Doing acoustic measurements using Praat
  • Synthesizing speech using Praat
  • Presenting stimuli
  • Running the experiment
  • Doing basic statistical analysis in R
  • Presenting results of experimental work - orally and in print.

          In order to learn how your work can be interpreted and evaluated, we will read and critique published work from the experimental literature in various areas of linguistics, including phonetics, phonology, syntax, semantics, language acquisition, language processing, sociolinguistics, and language change. The readings will be made available in electronic form.

          The grade for the course will be based on two short papers presenting the results from the two pilot experiments.

          The course is open to any graduate student.

bottom border