LSA.343 Precision Grammar Implementation for Linguistic Hypothesis Testing

Emily M. Bender
ebender @ u.washington.edu

Dan Flickinger
danf @ csli.stanford.edu

Stephan Oepen
oe @ csli.stanford.edu

Tuesday/Friday, 10:15am-12pm (Location: 460-021)

office hours: 3:30pm-5:30pm (Location: 460-021)

Contents

a brief summary of course contents and goals;

draft schedule of course and exercise topics;

background information on the LinGO Laboratory at CSLI Stanford;

obtaining the LKB package (source code and binaries for certain platforms);

instructions for course participants: using the LKB on Sweet Hall machines.


Course Summary

Precision grammar implementation is the practice of encoding linguistic constraints in ever larger, machine-readable grammar fragments and testing those fragments against hand-constructed test suites as well as naturally occurring text. By using the machine to compare the grammar to the data, grammar engineers are able to test their hypotheses against thousands of sentences in mere minutes, test their analyses of different phenomena for consistency, and test their hypotheses against corpus data that goes beyond the carefully selected examples needed in analyses of particular linguistic phenomena. This class combines lectures and hands-on laboratory sessions to explore the methodology and implications of precision grammar implementation, including basic grammar engineering techniques; treebank annotation, using the English Resource Grammar and other existing large resource grammars; test suite development in the context of multilingual grammar engineering; and machine translation as an application for precision grammars. Within this general research area, we intend this course to introduce students to several fundamental questions and possibilities, and to give them hands-on experience with some of the relevant technology. Specifically, by the end of the course, students should:

Expected Schedule

Date Topic Background Readings
Fri, July 6 Lecture: motivation, mechanics of grammar engineering, formalism
Lab: simple grammar of english: complementation; use of types
Copestake 2002
Tue, July 10 Lecture: bridging theory and implementation (HPSG fundamentals)
Lab: modification; better use of types
Sag et al 2003 (Ch 3-5),
Flickinger 2000
Fri, July 13 Lecture: ambiguity, the role of linguistic data, regression testing
Lab: the lexeme -- word distinction; lexical rules
Oepen and Flickinger 1998
Tue, July 17 Lecture: multilingual grammar implementation; semantics
Lab: Matrix configuration; plan additional phenomenon to add
Bender et al 2002
Copestake et al 2005
Fri, July 20 Lecture: scalability; interoperability of analyses across languages
Lab: Extend Matrix-derived grammar
Bender to appear
Tue, July 24 Lecture: treebanking; application requirements: stochastic disambiguation
Lab: treebanking; observe impact on parse selection models
Oepen et al 2004
Fri, July 27 Lecture: machine translation
Lab: machine translation
Oepen et al 2004, Flickinger et al 2005

Materials

Slides Grammar Exercise Solution
Overview, Formalism Grammar 1 Exercise 1 Solution 1
HPSG/Modification Grammar 2 Exercise 2 Solution 2
Lexical rules/types Grammar 3 Exercise 3 Solution 3a or 3b
Grammar Matrix/MRS   Exercise 4
Appendix
 
Hypothesis Testing   Exercise 5  
Treebanks and Disambiguation      
Corpus Analysis      

To learn more...


last modified: 26-jan-07 (danf@csli.stanford.edu)