The major question was and is how to integrate a structural theory of language with models of brain and behavior so that there would be mutual contributions. At the time, Miller and his students were taking the Syntactic Structures model of language structure very seriously as a model of language behavior, especially memory for sentences, but also acquisition, perception and production. When I started graduate school, this movement was in its prime, with great excitement about interpreting linguistic models as psychological models, and then subjecting them to experimental “test”. Linguists, including Noam, were publicly skeptical about such efforts, noting that a few linguistic intuitions provide more psychological data than a few years’ worth of experiments – if a given experiment seemed not to support a theoretical structure so much the worse for the experiment. In the event, the Miller program collapsed as more experiments came in that showed that the one-one correspondence between linguistic rules and psychological processes was not consistent (most of these were by me, Fodor, Garrett and Slobin). This was the background for our attempts to develop a new way of thinking about the relation between language structure and behavior, a relationship mediated by language acquisition processes and adult behavioral processes setting constraints on learnable and usable languages.
The strongest themes of the day relating to behavior were: nativism/empiricism, underlying (aka, “deep”) structures, rules vs. associative habits. Miller et al attempted to show that sentences are organized by rule: Noam was arguing, as today, that the child’s data are too impoverished for an associative or pattern learning process, so language must be innate. I became involved
early on in experimental attempts to show that deep structures are actually computed as part of sentence processing.
A still small voice (well, small anyway) in all of this was the theme that language is a biological object. This had been most famously argued by Lenneberg and clearly was part of Noam’s background thinking. But it was not a major overt theoretical linguistics focus of the projects at hand, which were much more concerned with the architecture of daily syntax, phonology and semantics. When I started out on attempts to show in some detail how language is the result of a maturational and experiential process involving emerging structural abilities, language behavior patterns, experience, and cognitive constraints, it was a lonely adventure for quite some time. The idea that there is a large set of architecturally possible languages which is reduced by “performance” constraints was in the background, but not a prominent part of the research program. I got a lot of gas over it, even from Noam, or at least felt I did.
The intellectual tension remains between linguistic theory imperialism and psychological functionalism: the issues tend still to co-occur with the controversies over nativism/empiricism, rules/associations, and surface/deep representations, respectively.
Learning. For example, “parameter setting” acquisition theories maximize the extent to which the infant is equipped with fore- knowledge of the typological options for language – “learning” involves recognizing “triggers” that indicate the setting for each parameter. This view often seems compelling because the “alternative” learning theory has usually been limited to some form of associationism, which by itself is definitionally inadequate to account for what is learned: hence it is a straw opposition to parameter setting theory. What is now at issue is the construction of a more complex hypothesis testing model of language learning, which can integrate statistical generalizations with structural adaptations. A number of features of language are now being invoked as supporting this approach. First, the last decade has witnessed an explosion of investigations of the extent to which the statistically supported patterns of language behavior can carry structural information to the infant – it turns out that the extent is much greater than was often thought: but the rub is that inductive models that extract the regularities often require the equivalent of millions of computations to converge on the statistical patterns. This heightens the importance of another series of current studies showing that infants are pretuned to
focus on learning only certain kinds of serial patterns – not exactly language, but possible components of language universals. A third development is the resuscitation of the “laws of form” as constraining language to have certain kinds of structures, either because of categorical limitations or because of efficiency considerations. For example, it has long been argued that if language is hierarchical, then hierarchies cannot cross over into each other (see Barbara Partee, nee Hall for the original discussion): that is a categorical law of form. More recently, arguments have appeared about the kind of phrase structures and interlevel mappings that are most efficient computationally, as explanations of certain language universals.
Gradually, there may appear a union of certain kinds of inductive models interacting with laws of form and structural potential of the infant, to explain language learning and language structure.
All of this ferment is now subsumed under the now popular re- evocation of “biolinguistics”, now trumpeted as the leading idea integrating today’s language sciences. The historical trend in ideas about the generative architecture of syntax has also lead in this direction. In Syntactic Structures, virtually every “construction” type (e.g., passive, question, negative) corresponded to its own rule(s). Gradually, this has been whittled down, first removing “generalized” transformations that integrate propositions; then formulating constraints on transformations, ending up with GB theory, on which there was one “rule” (“relate/move alpha”), and numerous “theories” acting as filters on possible derivations after an initial phrase structure is created by Xbar theory and a rehabilitated version of merge (“case theory”, “theta role theory”, “binding”). Finally, today we see a further (ultimate?) simplification in which the surface hierarchical structure is itself built by successive iterations of the same structure building rules – most of the “construction” building work is now carried by the internal organization and constraints of individual lexical items. So over a long period of time, Syntactic Architectures have moved from a complex set of transformations, and a simple lexicon, to a complex lexicon and a simple set of recursive tree building processes. The goal now is to specify how syntax is the best possible interface between long evolved conceptual structures and recently evolved efferent motor capacities, such as the vocal tract or the hands.
This latest development raises new issues for nativism because it is not immediately obvious how parameter setting can work in relation to the minimalist architecture, since many parameters assume a hierarchical organization and/or complete derivation.
Parameters could apply to the interface between syntax and the phonological component but again this has them working as filters, without much rationale for their particular forms or evolution. My (recidivist) guess, in which I am no longer rare, is that parameters in large part comprise emergent simplicity constraints on learning, and language use: perhaps not at all a part of the universal architecture of language except insofar as that architecture creates decision points for variable parameters to be established.
Behavior and “Rules.” Starting in the 1980s there was a burst of interest in connectionist models, spurred by the discovery of various methods of enhancing perceptrons – basically the use of multiple layers – so they can asymptotically master problems that require the full range of logical operators. For several decades, connectionism dominated modeling efforts – it became difficult to get a behavioral finding published without a connectionist model that simulated it. The appeal of models that worked by varying associative activation strengths between conceptual “nodes” was often advertised as based on their similarity to how neurons interact in the brain. Describing language was a recognized goal, worthy of such modeling attempts. A number of toy problems were approached, including, famously the modeling of the strong vs weak past verb forms in English. The originators of these models went so far as to echo Zellig Harris’s lament about his rules, originally written 40 years earlier: what linguists take to be rules are actually descriptive approximations that summarize regularities in the statistical patterns of the real linguistic data. In the end the models’ successes have also been the undoing of the enterprise of debunking linguistic nativism. Just as in the statistical modeling of Motherese, the thousands of trials and millions of individual computations involved in learning even the toy problems become an argument that this process cannot be the way the child learns language, nor the way adults process it. Kids must have some innate mechanisms that vastly reduce the hyothesis space.
But especially the minimalist program of sentence structure building has set an even more abstract problem for modeling both acquisition and processing. For example, phrase structure trees are now composed by successive iterations of merge, starting with the most embedded phrase. In the case of a right branching language, this means that the basic structures of sentences are formally constructed starting at their end, and work backwards. Clearly this cannot be a viable model of actual serial language processing. We either must give up on a role for linguistic theory
as a direct component of processing, or we must configure a model that allows both immediate comprehension and somewhat later assignment of full structure. This has made demonstrations of the “psychological reality” of derivations important. Since the derivations may not be assigned until well into a sentence, the best way to test them is to test the results of their application: this has motivated various studies of the salience of empty categories during processing, most importantly WH-trace and NP-trace: experimental evidence that these inaudible elements nonetheless are present during comprehension is an important motive to build in the theory that predicts their occurrence. Our attempt at this has involved a resuscitation of analysis by synthesis: on this model, we understand sentences initially based on surface patterns, canonical forms and so on; then we assign a more complete syntactic structure based on derivational processes. Various kinds of evidence have emerged in support of this model, including behavioral and neurological facts. The idea that we understand sentences twice does not require that we wait until a clause is complete to assign a derivation; rather it requires that candidate derivations be accessed serially, immediately following the initial analyses based on surface patterns. My co authored book on sentence comprehension spelled out a range of behavioral data supporting this idea: interestingly, more recent electrophysiological and imaging methods have adduced brain evidence for a dual processing model.
But such a model is also vexed by the fact that in today’s linguistic theory, virtually every additional phrase structure level involves some form of movement, and hence trace. So the number of traces in a full description can sometimes be larger than the number of words. This is astoundingly true in the case of some versions of today’s “distributed morphology” in which individual lexical items can have in effect a derivation utilizing “light verbs” (an uncanny recapitulation and refinement of the best technical aspects of the ontological misadventures with Generative Semantics in the 1970s). How do we go about motivating the choice of which traces are psychologically active during sentence processing and which are not?
Another approach has been to finesse the problem by building models and evidence for them that indeed the correct syntactic structure is in fact assigned by a model that operates in real time, and in effect “top-down”. On this view, speakers have several complete syntactic models with (one hopes) strong descriptive equivalence. But the top-down models are driven up front not only by structural patterns, but by statistical indices on the
current likelihood of a particular pattern. So, under any circumstances we are faced with the prospect of inelegant humans, who insist on relating meaning and form via several distinct systems at least one of which is statistical, and the other structural.
Biology of language. Of course, “biolinguistics” should be informed by a combination of biologically based fields: aphasiology, neurolinguistics, cognitive neuroscience, genetics and evolutionary theory. The recent explosion of brain imaging methods has to some extent made brain images a replacement for connectionist modeling as an entre’ to publication: not always to important or good effect.
There was a great deal of skepticism for many years about the relevance of brain studies by those in the central dogma territory of Cambridge. For example, a good friend of us all, made a telling remark when I told him about an early result involving the N400 (the “surprise” component of the ERP brain wave). The result was that the N400 is especially strong at the end of a sentence like “this is the book the frog read [t]” suggesting that there is a real effect of the WH-“trace”. His remark was: “you mean the brain knows…[t]?” But the dogged persistence of a few international labs (e.g., Nijmegen, Jerusalem, Montreal, Leipzig, Seattle, and now New York) has begun to bear fruit on basic questions relating to language organization.
Of great interest now, is a glimmer of emerging study of genetic factors in the emergence of language. I am (I hope) contributing to this effort by focusing on differences in language representation and processing as a function of familial handedness. We have documented with at least 20 behavioral paradigms that right handers process language differently when they also have left handers in their family pedigree: the main difference is that they focus on lexical knowledge more readily than syntactic pattern knowledge. Recently (giving up my prejudice against) brain imaging studies are showing corresponding neurological differences in how the lexicon is organized and when it is accessed during processing. This may give us a handle into what to look for in the maturing language learning brain, as a function of specific polymorphisms associated with phenotypic left handedness. We are beginning to collaborate with laboratories in Leipzig, San Sebastian, Genoa and Trieste on these possibilities.