The Nature of Constraint-Based Grammar
Carl Pollard
Pacific Asia Conference on
Language, Information, and Computation
Kyung Hee University
Seoul, Korea
Dec. 20, 1996
O. Introduction
I want to start by thanking the organizers, and especially Prof.
Byung-Soo Park, for giving me this opportunity to return to Kyung Hee
University after seven long years. After all, it was at Kyung Hee, on
the Kwangneung campus, that Ivan Sag and I first publicly presented
the so-called standard version of head-driven phrase structure
grammar, in a set of forum lectures at the International Conference on
Linguistic Studies in August 1989. So it seems especially appropriate
that Prof. Park asked me to talk to you here about recent developments
in constraint-based grammar.
But I have to admit that when I started to think about what to say, I
felt a little overwhelmed. Work in this area has proceeded on so many
fronts in recent times -- say since 1989 -- that there is no way to
even provide a reasonable summary in just two hours. Looking over the
the published version of our 1989 lectures, I was a little shocked to
realize how much has changed since then, even the basic terminology.
In fact, it dawned on me that back then, the term CONSTRAINT-BASED
grammar was not used yet -- instead one spoke of INFORMATION-BASED, or
UNIFICATION-BASED, grammar. I think the new term CONSTRAINT-BASED
grammar is actually a much more accurate name. After all, the use of
the term INFORMATION-BASED in a grammar context really reflects a very
special use of that term that was current in the San Francisco bay
area during the 1980s, a use which evokes on the one hand the
contentfulness of states of affairs in situation semantics compared
with the lack of content associated with truth-conditional or
possible-worlds semantics, and on the other hand a certain formal
analogy between situation-semantical states of affairs and feature
structures. But this terminology never really gained much currency
beyond the the Center for the Study of Language and Information, and
people somehow connected with CSLI.
The other term I mentioned, UNIFICATION-BASED grammar, was much more
widely used. But it was never really quite appropriate for the trend
in theoretical and computational linguistics that it was intended to
denote. After all, UNIFICATION refers to a certain binary algebraic
operation on logical expressions or on feature structures conceived of
as bearers of partial information, an operation which merges their
information content; if we think in terms of feature logic instead of
feature structures, this operation essentially corresponds to logical
conjunction. Alternatively, the term UNIFICATION also denotes any
algorithm that computes this operation.
But it is now widely recognized that we must make a sharp distinction
between the formal objects actually licensed by a grammar --
structures (for example the feature structures employed in theories
like HPSG for modelling linguistic expressions) -- and feature
descriptions, which are used to impose constraints on these
structures. In this setting, the grammar is nothing but a set of
constraints that structures are required to satisfy in order to be
considered well-formed. Of course the unification operation is often
used in algorithms that solve the constraints, that is, which find
structures that satisfy the grammar; but it is the constraints
themselves that are really crucial, not the techiniques used to solve
them.
If I may make an analogy with mathematical physics: typically a
physical theory about some dynamical system -- say a vibrating string,
or the solar system -- consists of a set of differential equations.
The predictions of the theory are the physical motions that satisfy
those equations; or to be more precise, the predictions are certain
abstract mathematical objects that satisfy the equations. These
abstract objects, called the SOLUTIONS of the equations, are formal
models of the actual predicted motions of the system. When we say that
the solutions satisfy the equations, the sense of the word SATISFY is
exactly the same as the use in logic when we say that a certain
structure satisfies a first-order theory. In fact, if we actually
formalized differential equations, say within the first-order language
of set theory, the solutions would actually be first-order models of
the physical theory in the technical logical sense.
The point of all this is that the physical theory consists of the
constraints themselves (the equations), and the predictions are the
things that satisfy the constraints. The techniques used for solving
the equations -- assuming this is possible -- may be of interest to an
engineer or a celestial navigator, but they are really only of
secondary interest to the theoretical physicist. In exactly the same
way, to a theoretical linguist, it is really the constraints
themselves -- the grammar -- that are important, because the solutions
of the grammar are the well-formed linguistic objects. Of course the
methods for solving the grammar, such as unification, are important to
linguistic software engineers, but they are only of secondary interest
for theoretical linguistics. This general way of looking at things is
summarized in table (1):
(1) Basic notions of Constraint-Based Grammar (CBG)
----------------------------------------------------------------------
TYPICAL
DOMAIN CONSTRAINTS CANDIDATE ACTUAL SOLUTION
SOLUTIONS SOLUTIONS METHOD
----------------------------------------------------------------------
constraint the grammar (set structures well-formed unification;
based of constraints) structures constraint
grammar solving;
stochastic
methods
physical physical theory certain solutions of numerical
analog (the equations) continuous equations approximation
functions
logical logical theory (logical) models of model
analog (set of wffs) structures the theory constructions
Chomskian I-language sequences licensed ?
analog of phrase structural
markers representations
Notice that in addition to the physcial and logical analogs, this
table also gives rough analogs from Chomskyan linguistic theory. I
will have more to say about this later.
1. The Nature of Constraint-Based Grammar
In the remainder of this talk, I want to flesh out a little bit the
skeletal view of constraint-based grammar that I just sketched. (In a
second talk, entitled "HPSG: An Overview and Some Work in Progress", I
will try to convey something of the flavor of current research in
constraint-based grammar by looking at some work within the framework
of head-driven phrase structure grammar. Of course a number of our
colleagues at this conference will also be presenting some of this
work in their own talks.)
One way to get a feel for what constraint-based grammar is is to look
at some of its exemplars. Of course the exemplar closest to home for
me is HPSG and other members of the PSG family of grammar frameworks,
such as GPSG, JPSG, and KPSG. Another examplar familiar to many of you
is lexical-functional grammar (LFG). In fact, I believe that the
difference between LFG and so-called PSG is no greater than the
differences among various theoretical proposals within PSG, or even
within HPSG itself. As far as I am concerned, then, the separation
between PSG and LFG exists more at a sociological level than at the
level of scientific content -- but I am aware that not everyone agrees
about this. Yet another example is the so-called REPRESENTATIONAL
MODULARITY approach proposed by Ray Jackendoff in an important recent
critique of Chomsky's minimalist program. A great many other examples
come from computational linguistics, such as Martin Kay's FUG and the
PATR-II system of Shieber et al., as well as more recent avatars such
as TFS, ALE, TDL, Troll, and so forth.
(2) Some Exemplars of Constraint-Based Grammar
THEORETICAL: Arc-Pair Grammar, LFG, {G,H,J,K,,...}PSG,
Jackendoff's Representational Modularity, ...
COMPUTATIONAL: FUG, PATR-II, TFS, ALE, TDL, Troll, ...
However, as David Johnson and Shalom Lappin make clear in a another
very important new critique of Chomsky's MP (called simply "A Critique
of the Minmalist Program"), the historically first exemplar of
constraint-based grammar was neither LFG nor GPSG, but rather another
framework that originated in the mid-to-late 1970's, namely the
Arc-Pair Grammar (APG) of Johnson and Postal. Although APG never
gained many followers, it is true that most of the key innovations
that distinguish constraint-based grammar from its predecessors and
competitors were present in APG.
Let me now turn to a more detailed characterization of what
constraint-based grammar amounts to. To this end, I'll try to identify
some commonalities across the many frameworks, systems, and research
traditions that are generally considered to lie within the domain of
constraint-based grammars. Some of these common properties are
immediate consequences of the logical architecture I discussed above;
others are just methodological commitments or sociological tendencies.
Some of the properties I have in mind are listed in (3):
(3) Characteristics of Constraint-Based Grammar
A. Generativity
B. Expressivity
C. Empirical Adequacy
D. Psycholinguistic Responsibility
E. Nondestructiveness
F. Locality
G. Parallelism
H. Radical Nonautonomy
A. GENERATIVITY. This term has fallen out of fashion, but
practicioners of constraint-based grammar still think it is important
for a grammatical theory at minimum to tell us what the well-formed
structures are. Of course theories are going to differ on such
particulars as how many levels of representation there are, and what
sort of information each of the levels contains, but I think it is
generally agreed that a good theory must tell at least tell us which
representations, or ntuples of representations, or derivations, or
whatever, are actually predicted. Otherwise the theory doesn't have
any empirical consequences. This criterion of generativity entails a
certain precision in formulating the theory. Minimally, this includes
at least the following three requirements, which are adapted slightly
>From the three criteria proposed by Geoff Pullum in his celebrated
column in NATURAL LANGUAGE AND LINGUISTIC THEORY entitled "Formal
linguistics meets the boojum."
(4) Three Criteria of Generativity for Grammatical Theory
(i) it must be determinate whether a given mathematical object is
the kind of mathematical object that is used in the theory for
modelling linguistic entities.
(ii) it has to be determinate whether a given string of symbols (in
some formal logic, or in careful natural language) counts as
one of the assertions (constraints) of the grammar.
(iii) given a grammar G and a mathematical object O used as a
candidate model of a linguistic object (a structural
representation or a derivation), it has to be determinate
whether O satisfies the constraints imposed by G.
The first of these three criteria means that we have to make explicit
exactly what the candidate structures are whose well-formedness or
ill-formedness is at stake. For example, in HPSG they are feature
structures, a certain kind of labelled directed graph. In Chomsky's
Barriers theory as formalized by Ed Stabler, they are sequences of
phrase markers.
It is impossible to overestimate the importance of the second
criterion, because it means that one must actually be able to tell
what the theory is. For example, in LFG, the theory is precisely
formulated in a combination of context-free grammar, the
quantifier-free theory of equality, and a linear logic called "glue
language". HPSG is formulated mostly in feature logic, but there is a
scandalous exception to this -- namely lexical rules -- which (alas) I
will not have time to discuss here. Stabler's version of Barriers
theory is formulated in first-order logic. By the way, notice that
this criterion does not require that the theory be expressed in an
artificial formal language. It could just as well be in plain
English, or plain Korean, as long as it is clear what the theory is
asserting.
To get a feel for the significance of the second criterion, try to
imagine what Einstein's theory of special relativity would look like
reformulated by a linguist who rejected it. This is shown in (5):
(5) a. Einstein's equation:
2
E = mc
b. Linguist's reformulation:
Energy must be in an appropriate licensing relationship with
the mass and the speed of light.
Try to imagine constructing a theory with real empirical consequences
based on (5)b!
Last, consider the third critrion. In a fully formalized theory, what
this means technically is that given a grammar and a potential
structure, it has to be decidable whether the structure satisfies the
grammar. This third criterion is not so obviously reasonable as the
first two. After all, the first two criteria are satisfied
automatically as long as the theory is adequately formalized. What
makes criterion (iii) less than straightforward is the fact that in
general, given a formal theory and a mathematical structure -- say a
first-order theory and a model-theoretic interpretation, it is
generally undecidable whether the structure satisfies the theory. But
if this is so, then why should we impose this third criterion?
The reason is this. In linguistic theories, the structures that we are
working with are, no matter whether they are trees, graphs, or some
combination of trees and graphs, are always finite: that is, as formal
mathematical objects, they only have a finite number of points or
parts or nodes. This is crucially important because of a well-known
fact of logic given in (6):
(6) Decidability of Model-Checking
For arbitrary n, given a finite structure S (i.e. an
interpretation) for an n-order language, and a finite theory T
(i.e. a set of axioms), it is decidable whether S is a model of
T (i.e. whether S satisfies the constraints imposed by T).
It follows from this that as long as our linguistic theory (together
with any background theory which is presupposed by it, beyond logic
itself) contains only a finite number of constraints, and as long as
they are stated clearly enough for us to be able to figure out what
they really mean, we can always decide in a systematic way whether a
given candidate structure -- that is, a putative model of the
structure of some linguistic expression -- actually satisfies the
grammar or not. So it turns out that, even though the third criterion
of generativity would be an absurd demand to impose on science in
general, it makes eminent good sense to insist on it in the special
case of grammatical theory.
I should mention here in passing that criterion (iii) is in no
way intended to imply that natural languages conceived as sets of
strings are decidable. This does not follow, and I don't know of any
good reason to believe this is true.
Unfortunately, some currently influential approaches to the study of
grammar fail to satisfy these criteria. One obvious example of this is
Chomsky's minimalist program, where nearly all the key concepts, such
as MERGE, FULL INTERPRETATION, and REFERENCE SET, are still in search
of a precise definition; I refer you to Johnson and Lappin's critique
for detailed discussion. Another example is syntactic optimality
theory, where such crucial notions as INPUT, OUTPUT, and the GEN
function exist only at an intuitive level. A third example, closer to
home for me, is the problematic status of lexical rules in HPSG,
which, conveniently enough, I do not have time to discuss.
Of course, the point of the generativity criteria is not to deny the
value, in appropriate contexts, of loose speculation based on
intuitive, imprecise notions. But we should not make the mistake of
dignifying such speculation with the term "theory", or mistake the
tentative conclusions we draw from such speculation for real
scientific results.
B. EXPRESSIVITY. Here I refer not to the expressivity of language
itself, but rather to the language in which the grammatical theory is
expressed. There used to be an influential point of view which held
that the formalism within which grammatical theory was formulated had
to be highly constrained. This in turn was supposed to constrain the
set of possible grammars, which in turn was supposed to make language
acquisition easier to explain. I'm not sure anybody actually believes
this argument anymore, but if anyone does, I recommend that he or she
stop believing it. For one thing, I don't know of any reason to
believe that a member of a relatively small set of languages should be
any easier to learn than a member of a relatively large set of
languages. This would only be true if the language acquisition device
knew in advance what the set of possible options was, but I know of no
reason to asssume this. And even if the LAD did have this
foreknowledge, it is hard to see how cutting down the number of
options would help learning. This would only be true if the set of
options were finite. Of course, there was a time when it was claimed
by adherents of Chomsky's principles-and-parameters approach that the
number of "core" grammars was finite; but in the absence of any
definition of what actually distinguishes the core form the periphery,
this claim is devoid of empirical content.
Another argument sometimes given for constrained formalisms was that
this was one way to impose decidability on the languages generated.
But as I already mentioned, there is not really any good reason to
believe that human languages QUA stringsets actually are decidable, so
this kind of argument does not have much force. In fact, we need our
formalism to be able to express undecidable problems, since some
linguistic problems are undecidable. For example, consider the basic
generation problem of finding the syntactic structures corresponding
to a given logical form, which in general are only recursively
enumerable (SNOW IS WHITE, IT IS TRUE THAT SNOW IS WHITE, IT IS TRUE
THAT IT IS TRUE THAT SNOW IS WHITE, etc.).
Within constraint-based grammar, by contrast, the usual view is that
the language in which the theory is expressed should be highly
expressive, that is, unconstrained. Thus: HPSG is expressed in a
feature constraint logic with classical boolean connectives and
definite relations. Arc-pair grammar is expressed in first-order
logic. And typical constraint-based computational linguistic systems
are expressed in languages like PROLOG, LISP, or special-purpose
languages built on top of them. Instead of the formalism imposing
constraints on possible grammars, it is the grammars themselves that
impose the constraints. In fact, if we move from linguistics to any
other branch of science, the whole idea that the formalism should
constrain the theory appears quite bizarre. Imagine if physicists
believed in this! Then we might witness conversations like this:
(7) If physicists required the formalism to constrain the theory
Editor: Professor Einstein, I'm afraid we can't accept this
manuscript of yours on general relativity.
Einstein: Why? Are the equations wrong?
Editor: No, but we noticed that your differential equations are
expressed in the first-order language of set theory. This is
a totally unconstrained formalism! Why, you could have written
down ANY set of differential equations!
Of course, this could never happen, because physicists already know
that it is the theory that imposes the constraints, not the language
in which the theory is expressed.
(8) Expressivity (of the language in which the theory is formulated)
a. Use an expressively rich language (first-order logic, feature
constraint logic, LISP, PROLOG, English, Korean, ...)
b. The language does not impose constraints on the theory; it is
the theory that imposes the constraints.
C. EMPIRICAL ADEQUACY. This is just a fancy phrase for getting the
facts right. As constraint-based grammarians and other scientists
realize, we often write down a constraint that captures an empirical
generalization, without having any idea why the constraint is true.
Then we are pounced upon by some well-meaning colleague who complains
that our constraint is totally ad hoc and uninteresting because it
doesn't follow from any deep principle. Again, imagine what would
happen if physicists acted this way:
(9) If physicists required all constraints to follow from
"deep principles"
Editor: Professor Einstein, I'm afraid we can't accept this
manuscript of yours on general relativity.
Einstein: Why? Are the equations wrong?
Editor: No, but they are totally ad hoc!
Einstein: Ad hoc, ad schmoc! At least they explain otherwise
unexplained data about the advance of the perihelion of
Mercury.
Editor: But this is nonexplanatory and therefore uninteresting.
You need to show that your equations FOLLOW from deep and
independently motivated principles!
What is so ridiculous about this, of course, is that every theory has
to have some constraints -- axioms -- that don't follow from anything
else. And those axioms, of course, no matter whether they are the Head
Feature Principle or the Case Filter, can always be accused of being
ad hoc and uninteresting. Alas, there is no one deep principle of the
universe from which everything follows, at least not as far as we
know. Instead, things have to go in the other order: we have to try to
establish wide-coverage empirical generalizations first, and worry
later about whether they follow from something else. In any case,
logically speaking there is no such thing as a deep principle in a
theory, since it is always possible to produce a new set of axioms
with the same entailments. Unfortunately, among many linguists
nowadays, it is considered more important to propose sweeping
fundamental principles, often so vague as to lack any empirical
content, than to come up with a constraint that provably gets the
facts right over a fairly broad empirical domain. This tendency must
be firmly resisted. Resistance to this tendency can be expressed as
the methodological principle (10):
(10) The Methodological Principle of Empirical Adequacy
a. There are no "deep principles", since any theory can be
reaxiomatized. In any case, science can only tell how
things are, not why. Therefore:
b. first write constraints that get the facts right, and worry
later about which constraints are axioms and which are
theorems.
D. PSYCHOLINGUISTIC RESPONSIBILITY. Like their predecessors in the
field of generative grammar, constraint-based grammarians still
consider themselves to be engaged in an investigation of human
linguistic competence. In other words, we take our theories to be
about a form of knowledge that resides in the human mind. However, we
don't claim that our theories directly reflect anything about human
language processing. To use a computational analogy: a
constraint-based grammar is more like a data base or a knowledge
representation system than it is like a collection of algorithms. To
put it another way, the knowledge that our grammars depict is a
resource that the human processing mechanisms consults.
Nevertheless, as we come to understand more about human language
processing, it is important that our linguistic theories be capable of
interfacing with plausible processing models. We must never forget
that human processing tasks such as understanding, speech production,
and the making of grammaticality judgments are actually feasible. Thus
the system of linguistic knowledge that the grammar encodes must in
principle be capable of being consulted by human linguistic processes
that actually terminate. Thus, even though a grammar is only a
competence model, we do not want it to be based irreducibly on
computations that the language user cannot be expected to carry out.
This is what I call the methodological principle of psycholinguistic
responsibility:
(11) The Methodological Principle of Psychoinguistic Responsibility
Grammars, in spite of being only competence models, must not
be based irreducibly on computations that the language user
cannot be expected to carry out.
The remaining four characteristics of constraint-based grammars are
closely related to psycholinguistic responsibility.
E. NONDESTRUCTIVENESS. This is a generalization of the property that
used to be called monotonicity for unification-based grammars. What
it means is that the grammar should not irreducibly make reference to
operations that destroy existing linguistic structure.
(12) Nondestructiveness
Grammars should not irreducibly make reference to operations
(e.g. MOVE) that destroy existing linguistic structure.
Thus there is no raising, no wh-movement, no affix-hopping, no head
movement. Similarly, there are no null functional categories whose
sole purpose is to carry features that must be checked off by moving
something into its checking domain. The reason for this is that it is
too hard to build plausible processing models that operate
destructively on structures already built up. I think this point is
easily grasped by most people who have tried to build a parser based
upon a linguistic theory, such as transformational grammar, that uses
such operations: usually what happens is that one tries to reformulate
the theory in a way that eliminates such operations, for example by
using chains instead of movement and parsing s-structures directly
without ever building d-structures at all.
In fact in the past it was often argued by practicioners of
transformational grammar that transformations were not really at
issue, since one always had the option of reformulating the theory in
nontransformational terms. Perhaps this is true of GB theory, though
we can't say for sure in the absence of an explicit formalization.
However, it seems evident that Chomsky's minimalist program is
irreducibly destructive in this sense: there is no way to reformulate
it without the operation MOVE, since economy conditions like
Procrastinate and the Smallest Derivation Principle are stated in
terms of it. It is very hard to see how such an approach can be
reconciled with a reasonable processing model. This is because the
branch point in a minimalist derivation has to be reached before the
syntax interfaces with the articulatory-perceptual system and the
conceptual-intentional system. But psycholinguistic research tells us
that us that language is processed incrementally, with syntactic
information being continuously integrated with semantic knowledge,
encyclopedic knowledge, and even probablistic knowledge of frequencies
of homophonous words. This point can be appreciated by comparing the
garden-path sentences in (13) with the structurally identical
non-garden-path sentences in (14) (these are from a recent paper by
Spivey-Knowlton et al.):
(13)a. The horse raced past the barn fell.
b. The woman warned the lawyer was misguided.
c. The bully pelted the boy with warts.
(14)a. The landmine buried in the sand exploded.
b. The woman thought the lawyer was misguided.
c. The woman searched for a priest with compassion.
F. LOCALITY. Here the term LOCALITY is used in contradistinction to
GLOBALITY. What this means is that given a candidate structure, the
question of whether or not that structure satisfies the grammatical
constraints must be determined locally, that is, solely on the basis
of the given structure, without reference to other "competing"
structures.
(15) Locality (or, the Prohibition on Transstructural Constraints)
Constraints are local in the sense that whether or not they
are satisfied by a candidate structure is determined solely
by that structure, without reference to other ("competing")
structures.
The effect of this is to rule out transderivational constraints or
their nontransformational analog, what we might call "transstructural
constraints". Again, the principle motivation for adopting this
characteristic is the lack of plausible processing models that
incorporate constraints which require comparing alternative
structures.
This characteristic places Chomsky's minimalist program outside the
realm of constraint-based grammar, since economy requires that any
convergent derivation be compared to all other convergent derivations
in its reference set. Since so far there is not even a clear
definition of reference set, it is unclear in the extreme how the
minimalist program could be interfaced with a processing model; but
even if it could be, the remaining obstacles are daunting. Here I
quote from Johnson & Lappin, who take as their point of departure
the following algorithm:
(16) Algorithm to test a string s for grammaticality within Chomsky's
minimalist program (after Johnson & Lappin)
1. Construct a numeration from the lexical items in s.
2. Compute the reference set RS of convergent derivations from N
to a well-formed pair.
3. Use the economy metric to compute the subset OD of RS
containing the optimal derivations in RS.
4. Check if there is at least on element of OD whose pair
is such that PF corresponds to S.
Johnson & Lappin go on to say this:
One could object to this analysis on the basis that more efficient
methods for implementing the MP are possible. So for example, one
might be able to design an algorithm for testing grammaticality
that identifies the set OD without computing the full RS of which OD
is a subset.... The burden of argument lies with the critic of our
analysis. It is not sufficient to simply assert the logical
possibility of an algorithm for implementing the MP that is more
efficient than the one we assume here.... Second, even if a more
efficient algorithm can be constructed, any implementation of an
economy-of-derivation model will still involve conceptual if not
computational complexity beyond that required by a local constraint
grammar.
Once again, I refer you to their paper for the detailed arguments.
Likewise, the locality criterion places syntactic optimality theory
outside the realm of constraint-based grammar. To see why, let me
first remind you of the the overall architecture of optimality theory:
(17) Overview of Optimality Theory (Prince and Smolensky 1993)
a. There are two sets of structures, INPUTS and CANDIDATES.
b. The function GEN maps each input I to a subset GEN(I) of
candidates.
c. Language-particularity consists of a ranking of a universal
set of constraints.
d. Given a subset S of CANDIDATES and a member C of S, S is
OPTIMAL in S provided, if C violates any constraint, then
every other member of S violates some higher constraint.
e. Given an input I, the OUTPUT associated with I, OUTPUT(I),
is the set of optimal members of GEN(I).
Now consider the task of testing a string for optimality within
optimality theory. This is shown in (18).
(18) Algorithm to test a string s for grammaticality within
syntactic Optimality Theory
a. Find the set of inputs INPUT(s) that correspond to s.
b. For each member I of INPUTS(s), compute GEN(I).
c. For each I in INPUTS(s), check whether the set OUTPUT(I)
of optimal members of GEN(I) is nonempty. If this is the
case for some I, the string s is grammatical.
This is a tall order. Even though the question of determining the set
of inputs corresponding to a string is one that has not been discussed
in syntactic Optimality Theory, let's give OT the benefit of the doubt
and suppose this can be done. The real problem is that steps b and c
depend crucially on the function GEN being well-defined.
Unfortunately, the function GEN in syntactic OT has become notorious
precisely because nobody ever defines it. Instead, in a typical OT
paper, we are presented with a tableau of candidates, and are supposed
to take it on faith that what we are shown actually is the output of
GEN for some input, or at least that any members of GEN of that input
which are missing from the tableau are obviously not optimal. To make
things worse, usually the sets of potential inputs and potential
candidates are not clearly defined either, so that syntactic OT does
not even satisfy the first criterion of generativity. Until these
undefined notions are made clear, there is no point even talking about
whether syntactic OT could be interfaced with a plausible processing
model. And even if these undefined notions were defined, syntactic OT
would still be subject to the same criticisms that Lappin and Johnson
level against the minimalist program.
G. PARALLELISM. It's widely recognized, both within and without
constraint-based grammar, that linguistic theory must make reference
to different levels of representation. Of course this idea is long
familiar from the T-model architecture of the Extended Standard Theory
and GB theory, and even the Minimalist Program retains the two
interface levels LF and PF. Constraint-based grammars also recognize
different levels of representation, although the identity and nature
of the levels varies from theory to theory. Some examples are shown
in (19)
(19) Some levels of representation in constraint-based grammar
Logico- Grammatical- Phonetic-
Syntactic Semantic Relational Prosodic
Theory Constituency Representation Structure Structure
---------------------------------------------------------------
HPSG DAUGHTERS CONTENT ARG-STRUC PHONOLOGY
VALENCE
LFG c-structure s-structure f-structure, prosodic
a-structure structure
RM syntactic conceptual phonological
structure structure structure
GB s-structure LF d-structure PF
analog
The difference between the constraint-based grammars and the T-model,
of course, is that none of the levels is derived by transforming one
of the others. Instead, the different levels exist exist in parallel,
being mutually constrained by the grammar. Thus a linguistic
expression is represented by an n-tuple of structures, or
alternatively by n features of a feature structure.
(20) Mutually Constrained Parallelism
No level of representation is derived by transforming
(= destructively operating upon) another level. Instead
all levels are parallel and mutually constrained by the
grammar. Thus a linguistic expression is represented by
an n-tuple of structures, or alternatively by n features
of a feature structure.
This architecture of mutually constrained parallel levels is defended
at length in the Jackendoff paper I mentioned above. I can't summarize
his arguments here, but let me mention just one of Jackendoff's
points. It's been remarked by a number of people over the past 20
years that both the T-model and the MP model are problematic with
respect to the issue of lexical insertion. The problem is that if
lexical insertion is early, as is usually assumed, then the
phonological and semantic information borne by the lexical entries has
to be dragged around uselessly through the syntactic derivation, only
to be handed off to PF and LF at the branch point of the derivation.
As Ivan Sag has put it, it is as if the syntax has to lug around two
locked suitcases, one on each shoulder, only to turn them over to
other components to be opened. Of course this view of things is
completely at odds with the psycholinguistic evidence that language
processing consults the various levels of information in a flexible
and interleaved fashion. By contrast, in the parallel architecture of
constraint-based grammar, there is no lexical insertion. Instead, the
lexical entries are just small-scale constrained parallel structures.
If we think of the constraints as recursively generating all the
well-formed n-tuples of parallel structures, then we can think of the
lexical entries as forming the base of the recursion. This is
summarized in (21):
(21) The lexicon in constraint-based grammar
There is no lexical insertion. Instead a lexical entry is
just a small scale n-tuple of constrained parallel
structures (or a feature structure with n features). If
the constraints recursively generate the well-formed n-tuples,
then the lexical entries form the base of the recursion.
H. RADICAL NONAUTONOMY. The last characteristic of constraint-based
grammars that I want to mention is what might be called radical
nonautonomy, in contradistinction to traditional assumptions about the
autonomy of syntax. This is really just a corollary to parallelism. As
we've seen, the grammar consists of assertions that mutually constrain
several different levels of structure. Some of these constraints may
apply only to one level, say to a phonological level or to a level
dealing with grammatical relations. But typically, constraints in
constraint-based grammar are interface constraints, in the sense that
they mutually constrain two or more levels. Thus we have. e.g.,
syntax-phonology interface constraints, such as linear precedence
theory; or syntax-semantics interface constraints, such as binding
theory and constraints on scope of quantifiers and operators; or
phonology-pragmatics interface constraints, such as the relation
between pitch accent and contrastive focus. The last thing we want is
an autonomous theory of syntax. Instead what we need are theories that
deal simultaneously with all linguistically relevant factors, be they
phonetic, morphological, syntactic, semantic, or pragmatic. And once
we get serious about interfacing the theory of competence with
processing models, nonlinguistic factors such as world knowledge,
frequency considerations, and the beliefs and goals of speakers must
also be brought into the picture. It seems to me that, among the
existing options, constraint-based grammar has the highest potential
to rise to this challenge.
(22) Examples of interface constraints
Syntax-Phonology: linear precedence (LP) constraints
Syntax-Semantics: binding theory; quantifier and operator
scope
Phonology-Pragmatics: contrastive focus and pitch accent
Argument Structure-Syntax: immediate dominance (ID) rules
Argument Structure-Semantics: linking theory