Topics in Computational Linguistics:
Grammar Engineering

Dan Flickinger
danf @ csli.stanford.edu

Stephan Oepen
oe @ csli.stanford.edu

Tuesday, 2:15 - 3:05 (Braun-221), Thursday, 2:15 - 4:05 (Meyer-183)

office hours: Tuesday, 4:15 - 5:45 (Cordura-227)

Contents

a brief summary of course contents and goals;

draft schedule of course and exercise topics;

background information on the LinGO Laboratory at CSLI Stanford;

obtaining the LKB package (source code and binaries for certain platforms);

instructions for course participants: using the LKB on Sweet Hall machines.


Course Summary

From machine translation to speech recognition and web-based search engines, a wide range of applications demand increasing accuracy and robustness from natural language processing. Meeting these demands will require better hand-built grammars of human languages combined with sophisticated statistical processing methods.

In this course we will focus on the implementation of linguistic grammars, drawing on a combination of sound grammatical theory and engineering skills, providing a hands-on introduction to the necessary techniques. A combination of lectures and in-class exercises will enable the student to investigate the implementation of constraints in morphology, syntax, and semantics, working within a unification-based lexicalist framework. While most of the course work will focus on developing small grammars for English, we will apply our jointly acquired grammar engineering expertise to at least one other language towards the end of the term.

A basic knowledge of syntactic theory — at about the level of Linguist 120 — will be assumed, but no prior programming skills are required. There will be eight hands-on exercises assigned throughout the course (see the draft schedule below) that will form the basis for joint laboratory sessions; we will try to not complete each of the exercises during the laboratory hours, but instead expect students to continue implementation work individually outside of class hours. The expected time to complete each assignment should be between two and ten hours per exercise, and students will be asked to submit their solutions to each assignment electronically.

Exercises will be graded and contribute substantially towards the final course assessment; exercise results will be complemented by a 90-minute written exam in March (exact time and date to be confirmed).

Expected Schedule

Date Topic Reading
Tue, January 4  Lecture: Course Overview, Motivation, and Goals SWB 1, 2.1 – 2.7
Thu, January 6  Lecture: Typed Feature Structures for Linguistic Description SWB 3.1 – 3.5
Tue, January 11  Lecture: History of Unification-Based Grammar in Fifty Minutes SWB Appendix B
Thu, January 13  Laboratory: Assignment 1 (First Steps Using the LKB System)  
Tue, January 18  Lecture: Basic Syntagmatic Relations; Some of the LKB Machinery SWB 4.1 – 4.6
Thu, January 20  Laboratory: Assignment 2 (Phrase Structure Recursion and Modification)  
Tue, January 25  Lecture: Fine Points of our (Implemented) Analysis of Modification  
Thu, January 27  Laboratory: Assignment 3 (Lexical Rules)  
Tue, February 1  Lecture: Meaning Composition; Minimal Recursion Semantics  
Thu, February 3  Laboratory: Assignment 4 (Semantic Composition and Generation)  
Tue, February 8  Lecture: A Little More Semantics; Fine Points of the LKB Machinery Copestake, et al. (1999)
Thu, February 10  Laboratory: Assignment 5 (A Grammar of Esperanto)  
Tue, February 15  Lecture: Construction Semantics  
Thu, February 17  Laboratory: Assignment 6 (Semantics in Esperanto)  
Tue, February 22  Lecture: Long Distance Dependencies in Unification-Based Grammar  
Thu, February 24  Laboratory: Assignment 7 (Topicalization and Relative Clauses)  
Tue, March 1  Lecture: Linguistic Grammars in Machine Translations Oepen, et al. (2004)
Thu, March 3  Laboratory: Assignment 8 (Esperanto – English Machine Translation)  
Tue, March 8  Lecture: Summary; Exam Preparation  
Thu, March 10  Lecture: Hybrid NLP: Combining Symbolic and Stochastic Approaches  
Wed, March 16  Written Exam: 7:00 – 10:00 pm @ Braun-221  

Materials

Slides Grammar Excercise Solution
Overview, Formalism no grammar no exercise no solution
Brief History Grammar 1 Exercise 1 Solution 1
Fundamentals of Grammar Grammar 2 Exercise 2 Solution 2
Modification Grammar 3 Exercise 3 Solution 3A & 3B
Semantics Grammar 4 Exercise 4 Solution 4
Types vs. Instances Grammar5; Esperanto Data, Exercise no solution
Construction Semantics no grammar Exercise 6 Solution 6
Non-Local Dependencies Grammar 7 Exercise 7 Solution 7
Machine Translation MT Package Exercise 8  
Sample Exam, Redwoods      

Background Reading


last modified: 10-mar-05 (oe@csli.stanford.edu)