LSA.343 Precision Grammar Implementation for Linguistic Hypothesis Testing
Emily M. Bender
ebender @ u.washington.edu
Dan Flickinger
danf @ csli.stanford.edu
Stephan Oepen
oe @ csli.stanford.edu
Tuesday/Friday, 10:15am-12pm (Location: 460-021)
office hours: 3:30pm-5:30pm (Location: 460-021)
Contents
a brief
summary of course contents and goals;
draft schedule of course and exercise topics;
background information on the
LinGO Laboratory at
CSLI Stanford;
obtaining the
LKB package
(source code and binaries for certain platforms);
instructions for course participants: using the LKB on
Sweet Hall machines.
Course Summary
Precision grammar implementation is the practice of encoding
linguistic constraints in ever larger, machine-readable grammar
fragments and testing those fragments against hand-constructed test
suites as well as naturally occurring text. By using the machine to
compare the grammar to the data, grammar engineers are able to test
their hypotheses against thousands of sentences in mere minutes, test
their analyses of different phenomena for consistency, and test their
hypotheses against corpus data that goes beyond the carefully selected
examples needed in analyses of particular linguistic phenomena. This
class combines lectures and hands-on laboratory sessions to explore
the methodology and implications of precision grammar implementation,
including basic grammar engineering techniques; treebank annotation,
using the English Resource Grammar and other existing large resource
grammars; test suite development in the context of multilingual
grammar engineering; and machine translation as an application for
precision grammars.
Within this general research area, we intend this course to introduce
students to several fundamental questions and possibilities, and to give
them hands-on experience with some of the relevant technology. Specifically,
by the end of the course, students should:
- Appreciate the potential for computerized validation of
linguistic hypotheses against constructed and naturally occurring data, and
consistency with other hypotheses about interacting phenomena.
- Gain experience in constructing linguistic
hypotheses of the sort which can be tested by this methodology.
- Have some exposure to the implications for linguistic theory:
What requirements does this kind of relationship to empirical
foundations put on the theory in terms of a stable formalism and
precision of analyses? How much and what kind of testing does it take
to validate a hypothesis?
- Be familiar with basic methodological and practical issues in test suite design.
- Be aware of existing software and knowledge base resources
provided by DELPH-IN (the LKB grammar development system, the
[incr tsdb()] platform for test suite management and competence
profiling, the Grammar Matrix starter kit for precision grammars)
and others.
- Have enough hands-on experience applying the techniques to
feel that these are approachable and relevant to students' own future work.
Expected Schedule
| Date |
Topic |
Background Readings |
| Fri, July 6 |
Lecture: motivation, mechanics of grammar engineering, formalism
Lab: simple grammar of english: complementation; use of types |
Copestake 2002 |
| Tue, July 10 |
Lecture: bridging theory and implementation (HPSG fundamentals)
Lab: modification; better use of types |
Sag et al 2003 (Ch 3-5), Flickinger 2000 |
| Fri, July 13 |
Lecture: ambiguity, the role of linguistic data, regression testing
Lab: the lexeme -- word distinction; lexical rules |
Oepen and Flickinger 1998 |
| Tue, July 17 |
Lecture: multilingual grammar implementation; semantics
Lab: Matrix configuration; plan additional phenomenon to add |
Bender et al 2002
Copestake et al 2005 |
| Fri, July 20 |
Lecture: scalability; interoperability of analyses across languages
Lab: Extend Matrix-derived grammar |
Bender to appear |
| Tue, July 24 |
Lecture: treebanking; application requirements: stochastic disambiguation
Lab: treebanking; observe impact on parse selection models |
Oepen et al 2004 |
| Fri, July 27 |
Lecture: machine translation
Lab: machine translation |
Oepen et al 2004, Flickinger et al 2005 |
Materials
To learn more...
- Baldwin, T., J. Beavers, E.M. Bender, D. Flickinger,
A. Kim and S. Oepen (2005) "Beauty and the Beast: What running
a broad-coverage precision grammar over the BNC taught us about the
grammar---and the corpus". In Kepser, S. and M. Reis (eds). Linguistic Evidence: Empirical, Theoretical, and Computational Perspectives.
Mouton de Gruyter. pp.49--70.
- Bender, E.M. (to appear). Grammar Engineering for Linguistic Hypothesis Testing. In Proceedings of Texas Linguistics
Society X. Stanford: CSLI Publications.
- Bender, E.M. and D. Flickinger (2005) "Rapid Prototyping of
Scalable Grammars: Towards Modularity in Extensions to a
Language-Independent Core." Proceedings of IJCNLP-05
(Posters/Demos), Jeju Island, Korea.
- Bender, E.M., D. Flickinger, F. Fouvry, and M. Siegel, eds (2005)
Journal of Research on Language and Computation Special Issue on Shared
Representation in Multilingual Grammar Engineering.
- Bender, E.M., D. Flickinger, and S. Oepen (2002) "The
Grammar Matrix: An Open-Source Starter-Kit for the Rapid Development
of Cross-Linguistically Consistent Broad-Coverage Precision Grammars."
In Proceedings of the Workshop on Grammar Engineering and
Evaluation at the 19th International Conference on
Computational Linguistics. Taipei, Taiwan.
-
Butt, M. and T.H. King (1999) A Grammar Writer's Cookbook. Stanford: CSLI Publications
- Copestake, A. (2002)
Implementing
Typed Feature Structure Grammars.
CSLI Publications, Stanford, CA.
- Copestake, A., D. Flickinger, I.A. Sag, and C. Pollard. (2005) "Minimal
Recursion Semantics: An introduction".
Research on Language and Computation 3.4:281-332.
-
Copestake, A., A. Lascarides and D. Flickinger. (2001)
"An Algebra for
Semantic Construction in Constraint-based Grammars".
In: Proceedings of the 39th Annual Meeting of the Association for
Computational Linguistics, Toulouse, France.
-
Flickinger, D. (2002) "On building a more efficient grammar by
exploiting types," in Stephan Oepen, Dan Flickinger, Jun'ichi
Tsujii and Hans Uszkoreit (eds.) Collaborative Language Engineering,
Stanford: CSLI Publications, pp. 1-17.
- Flickinger, D., J.T. Lønning, H. Dyvik, S. Oepen
and F. Bond (2005) "SEM-I
Rational MT: Enriching Deep Grammars with a Semantic Interface for
Scalable Machine Translation", in Proceedings of MT Summit
X, Phuket, Thailand, pp. 165-172.
-
S. Oepen, H. Dyvik, J.T. Lønning, E.Velldal, D.
Beermann, J. Carroll, D. Flickinger, L. Hellan, J.B.
Johannessen, P. Meurer, T.Nordgård, and V. Rosén. (2004)
.Som å kapp-ete
med trollet? Towards MRS-based Norwegian–English Machine
Translation.
In Proceedings of the 10th International Conference on
Theoretical and Methodological Issues in Machine Translation,
Baltimore, MD,.
- Oepen, S. and D. Flickinger (1998) "Towards
Systematic Grammar Profiling: Test Suite Technology Ten Years
After", in R. Gaizauskas, ed., Journal of Computer Speech and
Language, special issue on Evaluation in Speech and Language
Technology, 12:411-435.
-
Oepen, S., D. Flickinger, K. Toutanova, C.D. Manning (2004) "LinGO
Redwoods: A Rich and Dynamic Treebank for HPSG", in Journal for Research
on Language and Computation 2.4, pp. 575-596.
-
Pollard, C. and I.A. Sag. (1994) Head-Driven Phrase Structure Grammar.
University of Chicago Press, Chicago, IL and London, UK.
-
Sag, I.A. (1997) "English Relative Clause Constructions," Journal of Linguistics 33(2):431--483.
-
Sag, I.A., T. Wasow, and E.M. Bender (2003) Syntactic Theory: A Formal Introduction, Second Edition. Stanford: CSLI Publications.
-
Sells, P. (1985) Lectures on Contemporary Syntactic Theories Stanford: CSLI Publications.
-
Shieber, Stuart. (1986) An Introduction to Unification-Based Approaches to
Grammar. CSLI Publications, Stanford, CA.
last modified: 26-jan-07
(danf@csli.stanford.edu)