Syllabus for Introduction to Computational Linguistics I: LIN386

Instructor Contact Information

office hours: Thur/Fri, 1:30-3pm or by appointment
office: Calhoun 512
phone: 471-9020
fax: 471-4340
email: jbaldrid@mail.utexas.edu

Prerequisites

Graduate standing. Syntax I or consent of instructor.

Syllabus and Text

This page serves as the syllabus for this course.

The official course text book:

Jurafsky, D. and J. H. Martin, Speech and language processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Upper Saddle River, NJ: Prentice-Hall, 2000.

Selected readings from this text will be suggested, along with relevant research papers.

Assignments

Assignments will be updated on the assignments page. A tentative schedule for the entire semester is posted on the schedule page. Readings and exercises may change up one week in advance of their due dates. To see your grades go to eGradebook.

Philosophy and Goal

The foremost goal of this course is to expose the student to the core techniques and applications of computational linguistics, with a primary focus on symbolic approaches. Students will gain an appreciation for the difficulties inherent in NLP and and understanding of strategies for tackling them. The course will address both theoretical and applied topics.

Some specific goals of the course are to enable students to:

  • understand core algorithms and data structures used in NLP
  • write non-trivial programs for NLP (using the Python programming language)
  • build and use finite state transducers with XFST
  • appreciate the relationship between linguistic theory and computational applications, especially with respect to syntax
  • write computational grammars and analyze their adequacy
  • complete a non-trivial NLP project and write a report in the format of submissions to computational linguistics conferences

This course presents an opportunity for students to gain experience with models and algorithms used in computational linguistics that underly practical applications while gaining an appreciation for the theoretical questions which they raise and which they can help us tackle. It will thus help prepare the student both for jobs in the industry and for doing original research in computational linguistics.

Evaluation will be based on the project and homeworks. There will be no exams.

Content Overview

The field of computational linguistics has experienced significant growth in the last ten years. In addition to the hard work of researchers in the field in general, some of the most important factors behind this include the use of statistical techniques, the availability of large (sometimes annotated) corpora (including the web itself), and the availablity of relatively cheap and powerful computers. Together, these factors have played a major part in making computational linguistics very relevant in applied settings. This course will focus on many of the core technologies and techniques used in computational linguistics, such as finite-state methods, context-free grammars and parsing. It will also serve as an introduction to Python programming and programming for NLP.

This course provides a broad introduction to computational linguistics with a particular emphasis on core algorithms and data structures. Topics include:

  • Python programming
  • finite-state automata and transducers
  • morphology
  • context-free grammars, categorial grammars, and parsing
  • feature structures and unification
  • computational semantics

There will be four programming assignments and a project. There will be a project proposal halfway through the semester with an opportunity for revisions in the form of a progress report. The grade for the final project will be based largely on the written report due at the end of the semester and a presentation on the project given during the final week of class.

The sequel to this course, Computational Linguistics II, addresses empirical methods (primarily statistical) and applications of natural language processing.

Content Objective

With respect to content, the goal of this course is to give the student an appreciation for the broad research topics currently being pursued in the field of computational linguistics. By the end of the course, the student should be able to

  • identify and discuss the characteristics of different NLP techniques; and
  • implement finite-state transducers and context-free parsers.

The course is designed to include key activities engaged in by computational linguistics researchers, including generation of ideas and programs, critical oral discussion of ideas, and written evaluation and presentation of ideas.

The course is designed to help students make the transition to doing real research in the field. For those students with interest, it could possibly lead to subsequent research opportunities.

Course Requirements

Assignments (15% each):

A series of 4 assignments will be assigned during the semester. Their purpose is to give you direct experience with the tools and techniques covered in class and the readings. Some assignments will be done individually, and others will be done by students working in groups of two. However, with the paired assignments, each student must still turn in an independent write-up and an evaluation of their partner.
Project proposal draft (5%):

Midway through the semester, you will propose a topic for your final project. You are encouraged to discuss this with the instructor in advance. Suggested topics will also be made available for you to choose from. The proposal will be in written form and should be roughly 2-3 single-spaced pages, preferably done using LaTeX and ACL submission style. The draft will be evaluated primarily on written expression and coherence of argument. Feedback will be given both on writing and content.
Project progress report (5%):

The progress report is mainly a revision of the proposal. It should take into account both types of comments given on the proposal. Expect it to require significant rewriting, as opposed to just editing of the proposal. In addition, it should include an update on progress to date. It will be graded primarily on written expression and coherence of argument. Feedback will be given on content.
Project final report (20%):

The final report builds on the progress report and presents the project results and conclusions. It should be 4-8 pages in length, using LaTeX and the ACL submission style. The grade will be based on the final product (program, corpus, etc) and the written report.
Project presentation (10%):

Each student will give a 15 minute presentation on his or her project in the last week of class.  

Extension Policy

If you turn in your assignment late, expect points to be deducted.  Extensions will be considered on a case-by-case basis, but in most cases they will not be granted.

For other assignments, by default, 5 points (out of 100) will be deducted for lateness, plus an additional 1 point for every 24-hour period beyond 2 that the assignment is late. For example, an assignment due at 2pm on Tuesday will have 5 points deducted if it is turned in late but before 2pm on Thursday. It will have 6 points deducted if it is turned in by 2pm Friday, etc.

The greater the advance notice of a need for an extension, the greater the likelihood of leniency.

Academic Dishonesty Policy

You are encouraged to discuss assignments with classmates. But all written work must be your own. Programming assignments must be your own except for 2-person team assignments. All work ideas, quotes, and code fragments that originate from elsewhere must be cited according to standard academic practice. Students caught cheating will automatically fail the course. If in doubt, ask the instructor.

Notice about students with disabilities

The University of Texas at Austin provides upon request appropriate academic accommodations for qualified students with disabilities. To determine if you qualify, please contact the Dean of Students at 471-6529; 471-4641 TTY. If they certify your needs, I will work with you to make appropriate arrangements.

Notice about missed work due to religious holy days

A student who misses an examination, work assignment, or other project due to the observance of a religious holy day will be given an opportunity to complete the work missed within a reasonable time after the absence, provided that he or she has properly notified the instructor. It is the policy of the University of Texas at Austin that the student must notify the instructor at least fourteen days prior to the classes scheduled on dates he or she will be absent to observe a religious holy day. For religious holy days that fall within the first two weeks of the semester, the notice should be given on the first day of the semester. The student will not be penalized for these excused absences, but the instructor may appropriately respond if the student fails to complete satisfactorily the missed assignment or examination within a reasonable time after the excused absence.

Page maintained by Jason Baldridge.
Questions? Send me mail: jbaldrid@mail.utexas.edu.