Skip Navigation
UT wordmark
College of Liberal Arts wordmark
linguistics masthead linguistics masthead
Anthony C. Woodbury, Chair CLA 4.304, Mailcode B5100, Austin, TX 78712 • 512-471-1701

Jason M Baldridge

Associate Professor Ph.D., University of Edinburgh

Jason M Baldridge

Contact

Biography

Baldridge is a computational linguist whose research focuses on probabilistic computational models of syntax and discourse structure and machine learning for natural language processing in general.  His most recent research is on evaluating the effectiveness of active learning in real-life annotation scenarios (e.g., language documentation) and hierarchical Bayesian models for supertagging, parsing and discourse structure. Baldridge won the 2003 Beth Dissertation Prize from the European Association for Logic, Language, and Information and he is a recipient of best paper awards from the European Association for Computational Linguistics and the North American Association for Computational Linguistics. 

 

Interests

Computational Linguistics, Syntax, Parsing, Machine Learning

LIN 313 • Language And Computers

40915 • Spring 2013
Meets MWF 1000am-1100am GAR 1.126
show description

This undergraduate class looks at everyday tasks that involve natural language processing: document classification, spelling and grammar correction, dialogue systems, machine translation, cryptography and forensic linguistics. Students will get insight into the how these systems work (and why it is still so difficult to do natural language processing well). We also consider social and ethical considerations such as privacy, job creation and loss due to language technologies, and the nature of consciousness and machine intelligence.

 

LIN 386M • Applied Natrl Lang Processing

41030 • Spring 2013
Meets MW 100pm-230pm GAR 1.134
show description

Advances in computational linguistics, machine learning, and computer hardware over the last two decades have produced a powerful set of tools and capabilities for the automatic processing of natural language texts. We now have access to a massive quantity of free-form natural language text available on the Internet and in large text collections (including out-of-copyright books). Increasing quantities of text in a wide variety of languages are being produced everyday through news, blogs, and social media. Consequently, the ability to process natural language to categorize and cluster texts, find and visualize patterns in them, or to even just find them at all has become increasingly important. A wide variety of disciplines---from linguistics to psychology to archeology to literature and beyond---are coming to rely on natural language processing tools that enable them to ask new questions of corpora of interest to them. This is particularly evident in the ascendancy of digital humanities, where researchers would often like to be able to identify interesting patterns in corpora that are too large to be manually inspected. There is also a great deal of commercial interest in systems that can process unstructured textual data to extract, categorize, and present the information contained in it, and in some cases, to use it to predict things about the real world, such as the expected opening day revenues for movies based on social media chatter.This class will provide instruction on applying algorithms in natural language processing and machine learning for experimentation and for real world tasks, including clustering, classification, part-of-speech tagging, named entity recognition, topic modeling, and more. The approach will be practical and hands-on: for example, students will program common classifiers from the ground up, use existing toolkits such as OpenNLP, StanfordNLP, Mallet, and Breeze, construct NLP pipelines with UIMA, and get some initial experience with distributed computation with Hadoop. Guidance will also be given on software engineering, including build tools, git, and testing. It is assumed that students are already familiar with machine learning and/or computational linguistics and that they already are competent programmers. The programming language used in the course will be Scala; no explicit instruction will be given in Scala programming, but resources and assistance will be provided for those new to the language.

LIN 313 • Language And Computers

40750 • Fall 2012
Meets TTH 1100am-1230pm JES A215A
show description

This undergraduate class looks at everyday tasks that involve natural language processing: document classification, spelling and grammar correction, dialogue systems, machine translation, cryptography and forensic linguistics. Students will get insight into the how these systems work (and why it is still so difficult to do natural language processing well). We also consider social and ethical considerations such as privacy, job creation and loss due to language technologies, and the nature of consciousness and machine intelligence.

 

LIN 386M • Applied Text Analysis

40885 • Spring 2012
Meets W 1100am-200pm PAR 10
show description

Advances in computational linguistics, machine learning, and computer hardware over the last two decades have produced a powerful set of tools and capabilities for the automatic processing of natural language texts. We now have access to a massive quantity of free-form natural language text available on the Internet and in large text collections (including out-of-copyright books). Increasing quantities of text in a wide variety of languages are being produced everyday through news, blogs, and social media. Consequently, the ability to process natural language to categorize and cluster texts, find and visualize patterns in them, or to even just find them at all has become increasingly important. A wide variety of disciplines---from linguistics to psychology to archeology to literature and beyond---are coming to rely on natural language processing tools that enable them to ask new questions of corpora of interest to them. This is particularly evident in the ascendancy of digital humanities, where researchers would often like to be able to identify interesting patterns in corpora that are too large to be manually inspected. There is also a great deal of commercial interest in systems that can process unstructured textual data to extract, categorize, and present the information contained in it, and in some cases, to use it to predict things about the real world, such as the movement of stock prices.

This class will provide a practical introduction to many of the core algorithms in natural language processing and machine learning that are useful in a wide variety of text analysis applications, such as authorship attribution, sentiment analysis, information extraction and geolocation. We will cover algorithms for clustering, classification, part-of-speech tagging, topic modeling and named entity recognition, as well as evaluation methodologies for evaluating their success and methods for visualizing their outputs. The course will include an introduction to the programming language Scala, which will be used for homework assignments. Assignments will provide experience with the methods as well as experience with popular open source toolkits such as Apache OpenNLP and Mallet. No prior programming experience is assumed.

More information can be found at the course website: http://ata-s12.utcompling.com/

LIN 386M • Intro To Computational Ling

40778 • Fall 2011
Meets W 1200pm-300pm PAR 10
show description

Advances in computational linguistics have not only led to industrial applications of language technology; they can also provide useful tools for linguistic investigations of large online collections of text and speech, or for the validation of linguistic theories.Introduction to Computational Linguistics introduces the most important data structures and algorithmic techniques underlying computational linguistics: regular expressions and finite-state methods, context-free grammars and parsing, feature structures and unification, taxonomies, distributional representations and pattern-based approaches. The linguistic levels covered are morphology, syntax, semantics and lexical semantics. While the focus is on the symbolic basis underlying computational linguistics, a high-level overview of statistical techniques in computational linguistics will also be given. We will apply the techniques in actual programming exercises, using the programming language Python and the Natural Language Toolkit. Practical programming techniques, tips and tricks, including version control systems, will also be discussed.Course site: http://icl-f11.utcompling.com

LIN 313 • Language And Computers

41090 • Spring 2011
Meets MWF 1000am-1100am PAR 206
show description

This undergraduate class looks at everyday tasks that involve natural language processing: document classification, spelling and grammar correction, dialogue systems, machine translation, cryptography and forensic linguistics. Students will get insight into the how these systems work (and why it is still so difficult to do natural language processing well). We also consider social and ethical considerations such as privacy, job creation and loss due to language technologies, and the nature of consciousness and machine intelligence.


Course Requirements

Assignments (45%): A series of six assessed assignments will be assigned during the semester. The lowest grade will be dropped, so each homework that counts is worth 9%.

Essay (15%): A 1000-1500 word essay on a topic dealing with the social implications of computational applications for language.

Mid-term Exam (20%): There will be a mid-term exam on October 15 over the material covered in class up to October 6.

Final Exam (20%): The final exam will be given during finals week and will cover all course material.

The course will use plus-minus grading, using standard scales. Attendance is not required, and it is not used as part of determining the grade.


Syllabus and Text

Syllabus is here: https://sites.google.com/site/languageandcomputersfall2010/

There is no official course text book for this course since the topic is quite new. Some readings will be assigned and made available for download or copying.

LIN 350 • Natural Language Processing

41110 • Spring 2011
Meets MWF 1200pm-100pm CBA 4.328
show description

In the age of the Internet, there is a considerable demand for technology helping users to manage, search and access the enormous amount of information that is available. There is also a need for speech interfaces to computer systems of various types, from tutoring systems to automated customer support lines to robots. Examples of language-technological applications are the identification of the correct sense of an ambiguous word like “bass” (fish or musical instrument), automatic recognition of the language in which a document is written, machine translation, and automatic extraction of information from documents.

The field of computational linguistics deals with both the science behind providing such capabilities and the actual creation of applications which implement them. This course discusses the main natural language processing applications and provides an introduction to the key representations and algorithms used in computational linguistics and to the use of machine learning for NLP tasks.

The course will be oriented towards hands-on experience of language processing techniques. Previous programming experience is required.


Course Requirements
Assignments (70%): A series of six assessed, equally-weighted assignments will be given out during the semester.
Mid-term Exam (15%): There will be a mid-term exam over the material covered during the first half of the semester.
Final Exam (15%): There will be a final exam over material covered after the mid-term.
The course will use plus-minus grading, using standard scales. Attendance is not required, and it is not used as part of determining the grade.


Syllabus and Text

Here is the syllabus: http://comp.ling.utexas.edu/courses/2010/spring/natural_language_processing

The official course text book: Jurafsky, D. and J. H. Martin. Speech and language processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Second Edition, Upper Saddle River, NJ: Prentice-Hall, 2008.

LIN 313 • Language And Computers

40675 • Fall 2010
Meets TTH 1100am-1230pm PAR 1
show description

Course Description

This undergraduate class looks at real world tasks that involve natural language processing: authorship attribution, text classification, spelling and grammar correction, machine translation, cryptography and some others. Students will get insight into the how these systems work (and why it is still so difficult to do natural language processing well). We also consider social and ethical considerations such as privacy, job creation and loss due to language technologies, and the nature of consciousness and machine intelligence.

Grading Policy

Assignments (8% each): A series of six assessed assignments will be assigned during the semester. Essay (12%): A 1000-1500 word essay on a topic dealing with the social implications of computational applications for language. Mid-term Exam (20%): There will be a mid-term exam on October 15 over the material covered in class up to October 6. Final Exam (20%): The final exam will be given during finals week and will cover material discussed after the midterm. The course will use plus-minus grading, using standard scales.

Texts

There is no appropriate textbook for this course. Reading material will be made available through the course website.

LIN 386M • Semi-Supv Learn For Comp Ling

40803 • Fall 2010
Meets TTH 200pm-330pm PAR 101
show description

Course Description

The field of computational linguistics has undergone a major shift over the last two decades toward statistical methods. For some tasks, such as language modeling, there is a wealth of data available for training models, but for many tasks, the performance of models is severely limited by the amount of relevant labeled training material. Semisupervised learning seeks to use small amounts of annotated data in combination with (possibly) large amounts of raw text to improve performance over just using the annotated data by itself. This class will look at the theory and methods behind semisupervised learning methods in the context of computational linguistics.

Texts

Abney. Semisupervised Learning for Computational Linguistics. We will also make use of other readings from books, articles and lecture notes, which will be made available on the course website.

LIN 312 • Language And Computers

41090 • Spring 2010
Meets MWF 1000-1100 PAR 206
show description

LIN 350 • Natural Language Processing

41130 • Spring 2010
Meets MWF 1100-1200 CBA 4.328
(also listed as C S 378 )
show description

LIN 392 • Analyzing Linguistic Data

41590 • Fall 2009
Meets TTH 1230pm-200pm PAR 10
show description

            Across the field of linguistics, more and more scholars are using experiments to investigate linguistic issues, and to provide evidence that stands up to skeptical scrutiny. This course is a hands-on introduction to how one does this. Students will design and run two small-scale experiments on any linguistic issue they choose: a production experiment oriented toward the speaker, and a corresponding reception experiment oriented toward the listener. The topic of the research can be in any area of linguistics, and involve any particular language.

            We will work through the process of doing an experiment step by step from the beginning:

  • Coming up with a hypothesis
  • Designing an experiment
  • Formulating and submitting a human subjects research proposal
  • Recording speech
  • Doing acoustic measurements using Praat
  • Synthesizing speech using Praat
  • Presenting stimuli
  • Running the experiment
  • Doing basic statistical analysis in R
  • Presenting results of experimental work - orally and in print.

          In order to learn how your work can be interpreted and evaluated, we will read and critique published work from the experimental literature in various areas of linguistics, including phonetics, phonology, syntax, semantics, language acquisition, language processing, sociolinguistics, and language change. The readings will be made available in electronic form.

          The grade for the course will be based on two short papers presenting the results from the two pilot experiments.

          The course is open to any graduate student.

Publications

 

For a complete list of publications, click link below:

http://www.jasonbaldridge.com/papers

bottom border