Tom Bever | 50 years of Linguistics at MIT

What did I want to get an answer to from studying MIT linguistics?

What is the current status of answers to this question?

Personal answer

I wanted to see if I could be a linguist.
Not sure.

(More) Professional answer.

I was already interested in language acquisition and adult behavior: I had privately translated Jakobson’s Kindersprache, and was the chief RA on an early childhood language acquisition project: my undergraduate thesis was on the emergence of phonology in the first year of life, in relation to neurological development. Jakobson was my advisor for that work and an advisor on the research project and through him I met Morris, or more felicitously for me, Morris met me. That first meeting made an indelible impression of Morris as a no- nonsense and insightful thinker about science who was prepared to treat even a brash kid as someone to argue with as an equal (he blew away a pompous proposal I had in mind about how to collect all possible data about early child vocalizations – interestingly, there is a project today at MIT with just such ambition). Eventually he invited me to be in the first MIT class. I was applying to psychology at Harvard, and interviewed Smitty Stevens (noted psychophysicist) who was a bit stern, and instructed me that I should go learn something first and THEN I could be a psychologist. I took this to heart and decided to learn linguistics and be in a more student- friendly place.

The question I always had in mind was how the brain incorporates and uses language. I pursued a dual career as a grad student in linguistics and in psychology (MIT also had a new program in that), and I was lucky enough, with Morris and Noam’s nomination and support to be awarded a Harvard Junior Fellowship: this gave me access to the generousity of George Miller at the Harvard Center for Cognitive Studies, where I had several research assistants of my own, while I was still a graduate student. I ran many experiments on sentence processing, most of which failed, which was a wonderful learning experience.

Other accidents abound in this background: for example, Jerry Fodor kindly picked me up daily in his (very small and very cold) Austin Healy and brought me to school during the first few years of my study: this lead to many discussions about the psychology of language and ultimately some early experiments together (clicks and all that).

HLTeuber, Chair of psychology was extraordinarily supportive. And so on. The only glitch was that Morris would not let me write a thesis on language processing – it wasn’t thought of as a part of linguistics at the time – so I duly did my sentence by analyzing Bloomfield’s analysis of Menomini Phonology, and some implications for how to unpack phonological rules – a burning question of the day. Eventually, I got a job at Rockefeller U and have pursued psycholinguistics mostly, since then.
Not sure.

(Most) Professional answer.

The major question was and is how to integrate a structural theory of language with models of brain and behavior so that there would be mutual contributions. At the time, Miller and his students were taking the Syntactic Structures model of language structure very seriously as a model of language behavior, especially memory for sentences, but also acquisition, perception and production. When I started graduate school, this movement was in its prime, with great excitement about interpreting linguistic models as psychological models, and then subjecting them to experimental “test”. Linguists, including Noam, were publicly skeptical about such efforts, noting that a few linguistic intuitions provide more psychological data than a few years’ worth of experiments – if a given experiment seemed not to support a theoretical structure so much the worse for the experiment. In the event, the Miller program collapsed as more experiments came in that showed that the one-one correspondence between linguistic rules and psychological processes was not consistent (most of these were by me, Fodor, Garrett and Slobin). This was the background for our attempts to develop a new way of thinking about the relation between language structure and behavior, a relationship mediated by language acquisition processes and adult behavioral processes setting constraints on learnable and usable languages.

The strongest themes of the day relating to behavior were: nativism/empiricism, underlying (aka, “deep”) structures, rules vs. associative habits. Miller et al attempted to show that sentences are organized by rule: Noam was arguing, as today, that the child’s data are too impoverished for an associative or pattern learning process, so language must be innate. I became involved early on in experimental attempts to show that deep structures are actually computed as part of sentence processing. A still small voice (well, small anyway) in all of this was the theme that language is a biological object. This had been most famously argued by Lenneberg and clearly was part of Noam’s background thinking. But it was not a major overt theoretical linguistics focus of the projects at hand, which were much more concerned with the architecture of daily syntax, phonology and semantics. When I started out on attempts to show in some detail how language is the result of a maturational and experiential process involving emerging structural abilities, language behavior patterns, experience, and cognitive constraints, it was a lonely adventure for quite some time. The idea that there is a large set of architecturally possible languages which is reduced by “performance” constraints was in the background, but not a prominent part of the research program. I got a lot of gas over it, even from Noam, or at least felt I did.

The intellectual tension remains between linguistic theory imperialism and psychological functionalism: the issues tend still to co-occur with the controversies over nativism/empiricism, rules/associations, and surface/deep representations, respectively.

Learning. For example, “parameter setting” acquisition theories maximize the extent to which the infant is equipped with fore- knowledge of the typological options for language – “learning” involves recognizing “triggers” that indicate the setting for each parameter. This view often seems compelling because the “alternative” learning theory has usually been limited to some form of associationism, which by itself is definitionally inadequate to account for what is learned: hence it is a straw opposition to parameter setting theory. What is now at issue is the construction of a more complex hypothesis testing model of language learning, which can integrate statistical generalizations with structural adaptations. A number of features of language are now being invoked as supporting this approach. First, the last decade has witnessed an explosion of investigations of the extent to which the statistically supported patterns of language behavior can carry structural information to the infant – it turns out that the extent is much greater than was often thought: but the rub is that inductive models that extract the regularities often require the equivalent of millions of computations to converge on the statistical patterns. This heightens the importance of another series of current studies showing that infants are pretuned to focus on learning only certain kinds of serial patterns – not exactly language, but possible components of language universals. A third development is the resuscitation of the “laws of form” as constraining language to have certain kinds of structures, either because of categorical limitations or because of efficiency considerations. For example, it has long been argued that if language is hierarchical, then hierarchies cannot cross over into each other (see Barbara Partee, nee Hall for the original discussion): that is a categorical law of form. More recently, arguments have appeared about the kind of phrase structures and interlevel mappings that are most efficient computationally, as explanations of certain language universals.

Gradually, there may appear a union of certain kinds of inductive models interacting with laws of form and structural potential of the infant, to explain language learning and language structure.

All of this ferment is now subsumed under the now popular re- evocation of “biolinguistics”, now trumpeted as the leading idea integrating today’s language sciences. The historical trend in ideas about the generative architecture of syntax has also lead in this direction. In Syntactic Structures, virtually every “construction” type (e.g., passive, question, negative) corresponded to its own rule(s). Gradually, this has been whittled down, first removing “generalized” transformations that integrate propositions; then formulating constraints on transformations, ending up with GB theory, on which there was one “rule” (“relate/move alpha”), and numerous “theories” acting as filters on possible derivations after an initial phrase structure is created by Xbar theory and a rehabilitated version of merge (“case theory”, “theta role theory”, “binding”). Finally, today we see a further (ultimate?) simplification in which the surface hierarchical structure is itself built by successive iterations of the same structure building rules – most of the “construction” building work is now carried by the internal organization and constraints of individual lexical items. So over a long period of time, Syntactic Architectures have moved from a complex set of transformations, and a simple lexicon, to a complex lexicon and a simple set of recursive tree building processes. The goal now is to specify how syntax is the best possible interface between long evolved conceptual structures and recently evolved efferent motor capacities, such as the vocal tract or the hands.

This latest development raises new issues for nativism because it is not immediately obvious how parameter setting can work in relation to the minimalist architecture, since many parameters assume a hierarchical organization and/or complete derivation.

Parameters could apply to the interface between syntax and the phonological component but again this has them working as filters, without much rationale for their particular forms or evolution. My (recidivist) guess, in which I am no longer rare, is that parameters in large part comprise emergent simplicity constraints on learning, and language use: perhaps not at all a part of the universal architecture of language except insofar as that architecture creates decision points for variable parameters to be established.

Behavior and “Rules.” Starting in the 1980s there was a burst of interest in connectionist models, spurred by the discovery of various methods of enhancing perceptrons – basically the use of multiple layers – so they can asymptotically master problems that require the full range of logical operators. For several decades, connectionism dominated modeling efforts – it became difficult to get a behavioral finding published without a connectionist model that simulated it. The appeal of models that worked by varying associative activation strengths between conceptual “nodes” was often advertised as based on their similarity to how neurons interact in the brain. Describing language was a recognized goal, worthy of such modeling attempts. A number of toy problems were approached, including, famously the modeling of the strong vs weak past verb forms in English. The originators of these models went so far as to echo Zellig Harris’s lament about his rules, originally written 40 years earlier: what linguists take to be rules are actually descriptive approximations that summarize regularities in the statistical patterns of the real linguistic data. In the end the models’ successes have also been the undoing of the enterprise of debunking linguistic nativism. Just as in the statistical modeling of Motherese, the thousands of trials and millions of individual computations involved in learning even the toy problems become an argument that this process cannot be the way the child learns language, nor the way adults process it. Kids must have some innate mechanisms that vastly reduce the hyothesis space.

But especially the minimalist program of sentence structure building has set an even more abstract problem for modeling both acquisition and processing. For example, phrase structure trees are now composed by successive iterations of merge, starting with the most embedded phrase. In the case of a right branching language, this means that the basic structures of sentences are formally constructed starting at their end, and work backwards. Clearly this cannot be a viable model of actual serial language processing. We either must give up on a role for linguistic theory as a direct component of processing, or we must configure a model that allows both immediate comprehension and somewhat later assignment of full structure. This has made demonstrations of the “psychological reality” of derivations important. Since the derivations may not be assigned until well into a sentence, the best way to test them is to test the results of their application: this has motivated various studies of the salience of empty categories during processing, most importantly WH-trace and NP-trace: experimental evidence that these inaudible elements nonetheless are present during comprehension is an important motive to build in the theory that predicts their occurrence. Our attempt at this has involved a resuscitation of analysis by synthesis: on this model, we understand sentences initially based on surface patterns, canonical forms and so on; then we assign a more complete syntactic structure based on derivational processes. Various kinds of evidence have emerged in support of this model, including behavioral and neurological facts. The idea that we understand sentences twice does not require that we wait until a clause is complete to assign a derivation; rather it requires that candidate derivations be accessed serially, immediately following the initial analyses based on surface patterns. My co authored book on sentence comprehension spelled out a range of behavioral data supporting this idea: interestingly, more recent electrophysiological and imaging methods have adduced brain evidence for a dual processing model.

But such a model is also vexed by the fact that in today’s linguistic theory, virtually every additional phrase structure level involves some form of movement, and hence trace. So the number of traces in a full description can sometimes be larger than the number of words. This is astoundingly true in the case of some versions of today’s “distributed morphology” in which individual lexical items can have in effect a derivation utilizing “light verbs” (an uncanny recapitulation and refinement of the best technical aspects of the ontological misadventures with Generative Semantics in the 1970s). How do we go about motivating the choice of which traces are psychologically active during sentence processing and which are not?

Another approach has been to finesse the problem by building models and evidence for them that indeed the correct syntactic structure is in fact assigned by a model that operates in real time, and in effect “top-down”. On this view, speakers have several complete syntactic models with (one hopes) strong descriptive equivalence. But the top-down models are driven up front not only by structural patterns, but by statistical indices on the current likelihood of a particular pattern. So, under any circumstances we are faced with the prospect of inelegant humans, who insist on relating meaning and form via several distinct systems at least one of which is statistical, and the other structural.

Biology of language. Of course, “biolinguistics” should be informed by a combination of biologically based fields: aphasiology, neurolinguistics, cognitive neuroscience, genetics and evolutionary theory. The recent explosion of brain imaging methods has to some extent made brain images a replacement for connectionist modeling as an entre’ to publication: not always to important or good effect.

There was a great deal of skepticism for many years about the relevance of brain studies by those in the central dogma territory of Cambridge. For example, a good friend of us all, made a telling remark when I told him about an early result involving the N400 (the “surprise” component of the ERP brain wave). The result was that the N400 is especially strong at the end of a sentence like “this is the book the frog read [t]” suggesting that there is a real effect of the WH-“trace”. His remark was: “you mean the brain knows…[t]?” But the dogged persistence of a few international labs (e.g., Nijmegen, Jerusalem, Montreal, Leipzig, Seattle, and now New York) has begun to bear fruit on basic questions relating to language organization.

Of great interest now, is a glimmer of emerging study of genetic factors in the emergence of language. I am (I hope) contributing to this effort by focusing on differences in language representation and processing as a function of familial handedness. We have documented with at least 20 behavioral paradigms that right handers process language differently when they also have left handers in their family pedigree: the main difference is that they focus on lexical knowledge more readily than syntactic pattern knowledge. Recently (giving up my prejudice against) brain imaging studies are showing corresponding neurological differences in how the lexicon is organized and when it is accessed during processing. This may give us a handle into what to look for in the maturing language learning brain, as a function of specific polymorphisms associated with phenotypic left handedness. We are beginning to collaborate with laboratories in Leipzig, San Sebastian, Genoa and Trieste on these possibilities.
So where are we today in relation to the original question, interrelating linguistic structure with maturation, behavior and the brain? Many of the specific aspects have been clarified, but an overall theory remains elusive. A few experiments purport to show that component computational processes involved in processing or representing sentences, involve uniquely located and/or timed brain processes. But we are a long way from understanding how such demonstrations will accumulate into a meaningful model. As is the usual case, careful serendipity will probably be our best bet.

Applied Coda.

Linguistics is more than a theoretical discipline. Quite a few (even MIT) linguistics graduates work in applied settings, on computational issues, or saving endangered languages, or on reading programs, just to name a few. In my own case, I have concentrated for years on using comprehension models to improve the readability and enjoyability of texts. A sample of basic methods involve: varying the between-word spacing to coordinate with major comprehension units: varying the clarity of individual letters as a function of their information value. For years, I tried to give away these ideas, but publishers are generally too skittish to accept them. So I patented implementations of these processes and others, and we are marketing them with some emerging success. Morris says this will be the first instance of anyone making money from Linguistics. I’m hoping it will eventually make it possible for an endowed chair of interdisciplinary studies as payback to the field. At the moment, the financial value is mostly theoretical.