LinGO Working Papers
Leonoor van der Beek and Timothy Baldwin. 2003. Crosslingual Countability Classification: English meets Dutch . LinGO Working Paper No. 2003-03.
This paper presents a range of methods for classifying Dutch nouns as countable, uncountable or plural only based on both Dutch and English data. The classification is based on the occurrence of countability specific linguistic features that are extracted from unannotated corpora. We show that in the absence of reliable Dutch gold standard data, cross-linguistic classification can be achieved on the basis of a word-to-word or feature-to-feature mapping between English and Dutch.
Villavicencio, Aline and Ann Copestake. 2002. Phrasal Verbs and the LinGO-ERG . LinGO Working Paper No. 2002-01.
A description of phrasal verbs as implemented in the existing LinGO ERG
Villavicencio, Aline and Ann Copestake. 2002. On the Nature of Idioms . LinGO Working Paper No. 2002-04.
A thorough syntactic and semantic account of a sample of 100 English idioms
Bannard, Colin. 2002. Statistical Techniques for Automatically Inferring the Semantics of Verb-Particle Constructions . LinGO Working Paper No. 2002-06.
This paper describes an investigation of some potential features for a statistical approach to inferring the semantics of verb-particle constructions from corpus data. Verb-particles cause particular problems for the computational semantic analysis of language, because their meaning often cannot be derived through the usual compo-sitional methods of analysis. Two novel techniques are presented which promise to provide information about the nature and extent of composition. The first of these measures the extent to which the verb or particle of any given verb-particle may be replaced with a verb or particle of a similar semantic class to form other verb-particles that are attested in the data. The intuition here is that if it reflects systematic patterns in this way then it is more likely that the verb or particle concerned have their simplex meaning. The second technique measures the degree of semantic relatedness between the verb-particle and its component verb. The intuition here is that if a verb-particle is semantically similar to the verb then it is more likely that the verb contributes its simplex meaning. These two features are then combined and used as training data for a classifier using appropriately annotated data.
Beavers, John. 2002. Aspect and the Distribution of Prepositional Resultative Phrases in English. LinGO Working Paper No. 2002-07.
This paper examines the distribution of to and into prepositional resultative XPs in terms of three criteria: the lexical semantics of the verb, selectional restrictions imposed by the preposition, and the aspect of the event. Speficially pertaining to aspect, it appears that into resultatives preserve the durativity or punctuality of their verbs, whereas to resultatives have durative readings. When to XPs modify punctual predicates, they force iterative readings (Smith shot Jones to death in ten seconds). With achievements, modification by to XPs is ungrammatical since achievements cannot have iterative readings (*She stunned him to silence). Into XPs preserve punctuality and may occur with achievements (She stunned him into silence). I propose that while into XPs only entail crossing a threshold to the inside of the goal/result, to XPs entail movement/change up to and including the goal, thus entailing a non-trivial path. This requires a nontrivial span of time to traverse, explaining the durative reading of to resultatives. Into resulatatives have no such entailment, thus explaining the compatible with punctual readings.
Beavers, John. 2002. Documentation: A CCG Implementation for the LKB. LinGO Working Paper No. 2002-08.
This document outlines the Typed-inheritance Combinatory Categorial Grammar (TCCG), an implementation of a CCG grammar in the LKB for a fragment of English based on Sag and Wasow (1999). TCCG implements a CCG grammar based on work by Steedman (1996, 2000) but incorporating a typed-inheritence hierarchy that allows for massive additional generalizations, particularly in terms of the structure of the lexicon and the relationships between different combinators. Additionally, TCCG employs a form of MRS semantics rather than a lambda-calculus based semantics, allowing for efficient semantic composition as well as generation capabilities with the LKB. Further efficiency issues are addressed by employing the feature-structure based normal form parsing algorithm described in Eisner (1996), significantly reducing the so-called spurious ambiguity problem of CCG while retaining its combinatory advantages.
An outline of the research proposal submitted to the NSF and NTT to gain funding for the Multiword Expression Project
Multiword expressions are a key problem for the development of large-scale, linguistically sound natural language processing technology. This paper surveys the problem and some currently available analytic techniques. The various kinds of multiword expressions should be analyzed in distinct ways, including listing words with spaces, hierarchically organized lexicons, restricted combinatoric rules, lexical selection, idiomatic constructions and simple statistical affinity. An adequate comprehensive analysis of multiword expressions must employ both symbolic and statistical techniques.