An Introduction to Computational Word Learning

Timothy Baldwin and Dominic Widdows,

Fridays, 1:45 - 3:15 (Ventura 17)

Summary Schedule Report Readings


This course provides an introduction to computational methods for learning the syntax and semantics of words and multiword expressions based on evidence from language resources (such as dictionaries and thesauri) and actual language usage in text corpora. Each class is designed to cover one aspect or sub-task of word learning in a relatively modular fashion, working through a spectrum of morphological, syntactic and finally semantic learning tasks.

This course has no specific prerequisites, although a basic knowledge of linguistics (syntax, lexical semantics and the syntax-semantics interface) and computational linguistics (data-driven methods and evaluation) would certainly be advantageous.


Week Date Topic
1 Sep 26 Course organisation and basic introduction to founding concepts and the topics we propose to cover
2 Oct 3 Word and multiword expression discovery
3 Oct 10 Morphology and tagging
4 Oct 17 Vector space methods
5 Oct 24 Conceptual ambiguity and disambiguation
6 Oct 31 Noun countability
7 Nov 7 Lexico-syntactic patterns in discovering word similarity
8 Nov 14 Subcategorisation frame acquisition, selectional preferences and diathesis alternations
9 Nov 21 Semantic compositionality/decomposability and idiomaticity
10 Dec 1 Nominalisations and compound nominals


Grading of the course is to be based on attendance and a single report about 1600-2000 words in length on some aspect of computational word learning. Suggestions for the report are: (a) a review of representative methods (not covered in the lectures) applied to a particular task, (b) a review of how a particular method has been applied to different word learning tasks, or (c) a more speculative report on potential methods for solving a given word learning task, or the applications of a given method to a range of tasks. Students are encouraged to discuss ideas for their report with the instructors ahead of time. Grading will be based on the student's grasp of the subject matter and originality of content (i.e. divergence from the lecture content).

Submission date: December 3, 2003

Preferred submission format: PDF file, submitted electronically to the course coordinators

Background Readings

Week 2
Recommended: Rie Kubota Ando and Lillian Lee (2003) Mostly-Unsupervised Statistical Segmentation of Japanese Kanji Sequences Natural Language Engineering 9(2), pp. 127-149.

Timothy Baldwin and Aline Villavicencio (2002) Extracting the Unextractable: A Case Study on Verb-particles, In Proceedings of the Sixth Conference on Computational Natural Language Learning (CoNLL 2002), Taipei, Taiwan, pp. 98-104.

Frank Smadja (1993) Retrieving Collocations from Text: Xtract Computational Linguistics 19(1), pp. 143-177.

Week 3
Recommended: Mathias Creutz and Krista Lagus (2002) Unsupervised Discovery of Morphemes In Proceedings of the 6th Meeting of the ACL Special Interest Group in Computational Phonology, Philadelphia, USA. [On-line demo]

Eric Brill (1995) Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging Computational Linguistics 21(4), pp. 543-565.

Marc Light (1996) Morphological Cues for Lexical Semantics In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics (ACL), Santa Cruz, USA, pp. 25-31.

Further reading: John Goldsmith (2001) Unsupervised Learning of the Morphology of a Natural Language Computational Linguistics 27(2), pp. 153-198.
Week 4
Recommended: Dominic Widdows (to appear) Word Vectors and Search Engines Chapter 5 of Geometry and Meaning, CSLI publications.
Hot off the press, this is a chapter explaining the ideas behind vector spaces mainly for non-mathematicians. Glance at the first couple of pages before you print it out, and if you feel patronised go onto the other papers below instead.

Dominic Widdows (2003) Unsupervised methods for developing taxonomies by combining syntactic and statistical information In Proceedings of HLT/NAACL 2003, Edmonton, Canada, pages 276-283.

Further reading: Patrick Schone, Daniel Jurafsky (2000) Knowledge-Free Induction of Morphology Using Latent Semantic Analysis Proceedings of the Fourth Conference on Computational Natural Language Learning and of the Second Learning Language in Logic Workshop, Lisbon

Dominic Widdows, Beate Dorow, and Chiu-Ki Chan. Using Parallel Corpora to enrich Multilingual Lexical Resources. Third International Conference on Language Resources and Evaluation, Las Palmas, May 2002, pages 240-245.

Landauer, T. K. and Dumais, S. T. (1997) A solution to Plato's problem: the Latent Semantic Analysis theory of acquisition, induction and representation of knowledge. (html) Psychological Review, 104(2) , 211-240. This paper takes a somewhat broader cognitive science perspective on dimension reduction and inductive learning.

Week 5
Recommended: Hinrich Schutze (1998) Automatic Word Sense Discrimination. Computational Linguistics, 24(1), 97-123.

Yarowsky, D. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics. Cambridge, MA, pp. 189-196, 1995.

Mark Stevenson and Yorick Wilks The Interaction of Knowledge Sources in Word Sense Disambiguation Computational Linguistics 27(3)

Further reading: Ide, N., & Véronis, J. (1998). Introduction to the special issue on word sense disambiguation: the state of the art. Computational Linguistics, 24(1), 1-40.

Dominic Widdows. A Mathematical Model for Context and Word-Meaning. Fourth International and Interdisciplinary Conference on Modeling and Using Context, Stanford, California, June 23-25, 2003, pages 369-382.

Week 6

Baldwin, Timothy and Francis Bond (2003) Learning the Countability of English Nouns from Corpus Data, In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan, pp. 463-70.

Bond, Francis and Caitlin Vatikiotis-Bateson (2002) Using an ontology to determine English countability, In Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), Taipei, Taiwan, pp. 99-105.

Baldwin, Timothy and Leonoor van der Beek (to appear) The Ins and Outs of Dutch Noun Countability Classification, In Proceedings of the 2003 Australasian Language Technology Workshop (ALTW2003), Melbourne, Australia.

Further reading:

Brendan S. Gillon (1996) The Lexical Semantics of English Count and Mass Nouns, In Proceedings of the ACL-SIGLEX Workshop on the Breadth and Depth of Semantic Lexicons Santa Cruz, USA, pp. 51-61.

Baldwin, Timothy and Francis Bond (2003) A Plethora of Methods for Learning English Countability, In Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing (EMNLP 2003), Sapporo, Japan, pp. 73-80.

Week 7

Hearst, Marti (1992) Automatic Acquisition of Hyponyms from Large Text Corpora, In Proceedings of the Fourteenth International Conference on Computational Linguistics (COLING 1992), Nantes, France

Dominic Widdows and Beate Dorow (2002) A Graph Model for Unsupervised Lexical Acquisition, In Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), Taipei, Taiwan

Scott Cederberg and Dominic Widdows (2003) Using LSA and Noun Coordination Information to Improve the Precision and Recall of Automatic Hyponymy Extraction, In Proceedings of the 7th Conference on Computational Natural Language Learning (CoNLL-2003), Edmonton, Canada

Further reading:

Sharon A. Caraballo (1999) Automatic Acquisition of a Hypernym-Labeled Noun Hierarchy from Text, In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL 1999)

Week 8

Michael R. Brent (1993) From Grammar to Lexicon: Unsupervised Learning of Lexical Syntax, Computational Linguistics, 19(2), pp. 243-62.

Hang Li and Naoki Abe (1998) Generalizing Case Frames Using a Thesaurus and the MDL Principle, Computational Linguistics 24(2), pp. 217-44.

Timothy Baldwin and Francis Bond (2002) Alternation-based Lexicon Reconstruction, In Proceedings of the 9th International Conference on Theoretical and Methodological Issues in Machine Translation (TMI 2002), Keihanna, Japan, pp. 1-11.

Eric Joanis and Suzanne Stevenson (2003) A general feature space for automatic verb classification, In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL-03), Budapest, Hungary.

Week 9

Sag, Ivan, Timothy Baldwin, Francis Bond, Ann Copestake and Dan Flickinger (2002) Multiword Expressions: A Pain in the Neck for NLP, In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics (CICLING 2002), Mexico City, Mexico, pp. 1-15.

Dekang Lin (1999) Automatic Identification of Non-Compositional Phrases In Proceedings of the 37th Annual Meeting of the ACL, College Park, USA, pp. 317-24.

Colin Bannard, Timothy Baldwin and Alex Lascarides (2003) A Statistical Approach to the Semantics of Verb-Particles, In Proceedings of the ACL-2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, Sapporo, Japan, pp. 65-72.

Diana McCarthy, Bill Keller and John Carroll (2003) Detecting a Continuum of Compositionality in Phrasal Verbs, In Proceedings of the ACL-2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment , Sapporo, Japan.

Week 10

Maria Lapata (2002) The Disambiguation of Nominalisations, Computational Linguistics, 28(3), pp. 357-388.

Barbara Rosario and Marti Hearst (2001) Classifying the Semantic Relations in Noun Compounds via a Domain-Specific Lexical Hierarchy, In Proceedings of EMNLP '01, Pittsburgh, USA.

Ann Copestake and Alex Lascarides (1997) Integrating symbolic and statistical representations: the lexicon-pragmatics interface, In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics (ACL-EACL 97), Madrid, Spain, pp. 136-143.

