Do “Ultraconserved Words” Reveal Linguistic Macro-Families?
Part 1: Linguistic Critique
Can words remain recognizable across more than a dozen millennia, their meanings understandable by people speaking languages in diverse linguistic families? Mark Pagel, Quentin Atkinson and their co‑authors answer in the affirmative. Journalists and bloggers, moreover, have tended to interpret their study as indicating little change in the core vocabulary of a massive assemblage of languages, those found in the supposed “Eurasiatic super-family” tying together Indo-European, Uralic, Altaic, Kartvelian (Georgian), Inuit-Yupik (Eskimo) and Chukchee-Kamchatkan.* Mark Frauenfelder at Boingboing.net reported that “a research team led by Mark Pagel at the University of Reading in England has identified 23 ‘ultraconserved words’ that have remained largely unchanged for 15,000 years”. Tia Ghose, LiveScience Staff Writer echoed with “The researchers could predict what 23 words, including “I,” “ye,” “mother,” “male,” “fire,” “hand” and “to hear” might sound like in an ancestral language dating to 15,000 years ago”. Even venerated publications could not avoid sensationalism. In the Washington Post, David Brown, claims that the following passage consists largely of such “ultraconserved words: “You, hear me! Give this fire to that old man. Pull the black worm off the bark and give it to the mother. And no spitting in the ashes!”. Brown further contends that:
“… if you went back 15,000 years and spoke these words to hunter-gatherers in Asia in any one of hundreds of modern languages, there is a chance they would understand at least some of what you were saying.”
Sorry, but there is no such chance. If you go back a mere two thousand years in any preserved language, changes have been enormous. If you go back 15,000 years, all comprehension would vanish. Within any one “Eurasiatic” family, which diversified much more recently, speakers of languages in one branch are seldom able understand anything in Brown’s passage if expressed in a language of a different branch.
Consider, for example, a direct translation into Russian:
Vy, uslyshte menja! Dajte etot ogon’ tomu staromu muzhu. Potjanite chjornogo chervja s kory i dajte jego materi. I ne plevat’ v pepel!
Yet the authors claim that such ultraconserved words are found across all of the branches of the hypothesized “Eurasiatic” family. It goes without saying that the mutual comprehension of this passage would be nil between speakers of Georgian, Chukchi, Sakha, Tamil, and Udmurt, to name just a few languages in this hypothetical family.
Brown’s article in The Washington Post continues:
“That’s because all of the nouns, verbs, adjectives and adverbs in the four sentences are words that have descended largely unchanged from a language that died out as the glaciers retreated at the end of the last Ice Age. Those few words mean the same thing, and sound almost the same, as they did then.”
Just how wrong this claim is can be seen from the example of just one of the 23 “ultraconserved words: man. First consider its history merely within English over the past millennium and a half. Today, it is pronounced /mæn/ and means one of two things: either ‘an adult male person’ or ‘a person of either gender’. The latter meaning, however, is considered sexist by many, and is thus falling out of use. Words such as chairman, fisherman, and policeman are thus being replaced by such gender-neutral forms as chairperson, fisher, and police officer, just as mankind is yielding to humankind. But as the gender-neutral meaning of man is still evident in manslaughter and in the phrase no man’s land. As it turns out, the meaning of ‘an adult male’ is relatively new. In Old English (roughly, prior to the Norman invasion of 1066), this word—pronounced then with a vowel articulated further back in the mouth—did not mean a ‘male person’ but had only the gender-neutral sense of ‘a human being, person (male or female)’. The word acquired the sense of ‘adult male’ in Middle English. Prior to that time, an adult male was a wer, as distinguished from a wif, which then meant ‘woman (of any marital status)’, as it still does in idiomatic expressions like old wives’ tale and in the compound midwife, originally meaning ‘with woman (during labor)’. The word wer began to disappear in the late 13the century and was eventually replaced by man, which retained its old, more general meaning as it acquired the new, gender-specific one. (The term wer did survive, however, in such terms as “werewolf,” which make one wonder whether a female lycanthrope should be referred to as “wifwolf”.) Note also that the Old English man had additional meanings besides ‘person’, including ‘servant, vassal’, as in all the king’s horses and all the king’s men (we retain this meaning to this day). Thus, clearly the meanings of even “ultraconserved words” show considerable change over much shorter periods than 15,000 years.
Pronunciations of such core terms change too, as I indicated above with the shift in vowel articulation in man through the history of English. Within the Germanic branch of the Indo-European family, the reflexes of the reconstructed ancestral Proto-Germanic form *manwaz include Old Norse maðr, Danish mand, Gothic manna. In other Indo-European branches we find Sanskrit (Indic) manuh, Avestan (Iranian) manu-, Old Church Slavonic (Slavic) mozi. The latter is related to the Russian form muzh, found in the Russian version of the odd “Stone Age” passage above. This plethora of phonological forms in related languages is a result of sound changes, different in each family.
The list of the “ultraconserved words” in the PNAS article itself contains quite a few surprises, even we restrict ourselves to the 1,500-year long history of English rather than the supposed 15,000 years of shared “Eurasiatic” history. Among those oddities are thou and ye, both of which changed their meaning (and form, in the case of ye), switching from informal to formal. Another surprise is not, a word of recent pedigree as a negative particle (Pagel et al. incorrectly call it an “adverb”). Not began its career in the mid-13th century as an unstressed variant of the emphatic noht/naht ‘in no way’, not unlike pas in the modern French two-part ne… pas negation. In fact, both English and French are undergoing the so-called “Jespersen’s Cycle” (named after a Danish linguist and Anglicist Otto Jespersen). In the first stage of this cycle negation is expressed by a single preverbal element; in the second stage, a postverbal emphatic element is added and made obligatory; and in the third stage, this postverbal emphatic element replaces the preverbal element, making the latter optional or eliminating it altogether. Thus, in Old English negation was expressed by a preverbal ne, as in ic ne seah (literally ‘I not saw’). In Middle English, the same sentence was expressed as I ne saugh noht (literally ‘I not saw nothing’). Finally, in Early Modern English (around the time of Shakespeare), this sentence became I saw not (eventually, lexical verbs stopped inverting around negation and the so-called do-support was introduced to give us the modern I did not see). In a parallel development, Old French had only the preverbal negation, as in jeo ne dis (literally ‘I not say’). In Modern Standard French both a preverbal and a postverbal element are obligatory, as in je ne dis pas (literally, ‘I not say nothing’), while in colloquial French, which represents Stage 3 of Jespersen’s Cycle, the preverbal ne is optional, so that je dis pas is perfectly acceptable.
All of these subtleties escape the authors of the PNAS paper, who ignore grammatical patterns and changes as much as possible. Even their assignment of some of the 23 “ultraconserved words” to “parts of speech” is flawed. For instance, they call the demonstratives this and that “adjectives”, though these words exhibit neither adjectival morphology nor adjectival syntax. For example, demonstratives are in complementary distribution with articles, quantifiers, and possessors, resulting in the ungrammaticality of *the this book, *every this book, and *John’s this book; whereas adjectives are perfectly capable of co-occurring with these elements, as in the interesting book, every interesting book, and John’s interesting book. Note also that Pagel et al. give different labels to who and what: according to them, the former is a “pronoun”, while the latter is an “adverb”. Yet, these two words clearly share the same syntactic properties (except for the demonstrative-like use of what, as in What book?). Both who and what must appear in the beginning of a question (e.g. Who did you see? and What did you see?). But only one of them can occur in the beginning if both are present, as in Who brought what to the potluck party? and What was cooked by who? (ignoring the who/whom distinction).
While these issues may seem trivial or irrelevant to the larger considerations of the PNAS paper, they underscore the central issue, something repeatedly missed or consciously ignored by these authors and their collaborators (cf. Gray and Atkinson 2003, Bouckaert et al. 2012, and elsewhere): to wit, language is not merely words. The interchangeability of “words” and “language” is a neat conjuring trick, evident in the first sentence of the article’s abstract (highlighting mine):
“The search for ever deeper relationships among the World’s languages is bedeviled by the fact that most words evolve too rapidly to reserve evidence of their ancestry beyond 5,000 to 9,000 y.”
Pagel and Atkinson’s search for family relationships among languages is set off course at the onset by looking in the wrong place. It has been understood at least since Antoine Meillet’s work a hundred years ago that grammatical properties are more reliable than words as indicators of familial relationships. As Meillet (1908: 126) noted “Les coincidences de vocabulaire n’ont en general qu’une très petite valeur probante” (“Coincidences of vocabulary are in general of very little probative value”). In recent years, the searchlight has been focused—by bone fide linguists, not evolutionary biologists—on abstract syntactic properties, establishing formal grammar as a population science; see, for example, the work of Giuseppe Longobardi and Cristina Guardiano (e.g. Longobardi & Guardiano 2009). Just as the biological classification of species, originally based on externally accessible characteristics, underwent a revolution on the grounds of progress in theoretical biology, namely the rise of molecular genetics, so too progress in the phylogenetic classification of languages must be based on progress in theoretical linguistics. In order to push the research frontier, we linguists need to identify the basic building blocks of language, its “atoms”, in Mark Baker’s memorable metaphor, and examine carefully how they play out in linguistic evolution. Looking for “words that survived since the last Ice Age”, in contrast, is a seductive but ultimately a futile enterprise.
Part 2: Geographical Critique
The map used by Mark Pagel, Quentin Atkinson, Andreea Calude, and Andrew Meade in “Ultraconserved Words Point to Deep Language Ancestry across Eurasia” is riddled with odd features and elementary errors. To begin with, the projection is inconsistent and distorted; a misplaced Alaska appears to be no larger than Kamchatka. Bizarre features include a massive white area in northwestern Russia. Is this supposed to be a “non-Eurasiatic” zone or a massive lake—Baikal substituted for Ladoga? Neither possibility makes any sense. Why is Kashmir mapped as a separate country; is this a deliberate political statement, designed to infuriate India and Pakistan in equal measure? The mapping of northern Borneo (Brunei and the Malaysian states of Sabah and Sarawak) as if it were an island in its own right is amusing, as is the mystery island off the coast of Yemen.
The map is also littered with minor errors in linguistic geography, such as the exclusion of Vasconic (Basque), the confinement of the Central European Uralic zone (Magyar) to Hungary, and the incorrect placement of the Middle Volga Altaic (Turkic) zone, which is located too far to the east. Such small mistakes are easily overlooked, however, especially as the authors have provided an honest disclaimer: “The color-shaded areas should be treated as suggestive only, as current language ranges will not necessarily correspond to original homelands, and language boundaries will often overlap.”
A number cartographic errors, however, are far more serious, and hence demand recognition. Intriguingly, some of the basic mistakes that characterized the Bouckaert et. al. Science article that we so harshly criticized appear yet again, such as the inexplicable exclusion of Moldova from the Indo-European realm. Considering the email exchanges that we had with Quentin Atkinson about this and related issues, we are surprised to see this oversight recurring. Note also that Macedonia is likewise excluded from the Indo-European zone, just as Estonia is left out of the Uralic zone. More problematic is the fact that Moldova, Estonia, and Macedonia are not even mapped within the supposed Eurasiatic macro-family. What kind of languages are we to imagine are spoken in these countries?
Errors outside Europe are equally serious. Turkic Azeri, with its 23 million speakers spread across Azerbaijan and northwestern Iran, is classified as an Indo-European language. The NE Caucasian, NW Caucasian, Indo-European, and Turkic languages of the north Caucasus are all misclassified in the Kartvelian family. Kurdish appears in southeastern Turkey, but not in Iran, Iraq, and Syria. Indo-European languages do not appear in either Kashmir or Tajikistan, yet they do on far northeastern India and northern Nepal. The extent of Dravidian (Brahui) in Afghanistan is grossly exaggerated, yet the family is absent in northern Sri Lanka. The mapping of Inuit-Yupik is laughable, showing it as limited to, yet entirely encompassing, Alaska. In actuality, this language family extends across the Arctic to Greenland, yet does not extend into central or southeastern Alaska, which are instead Na-Dene- and English-speaking. I could go on, by the exercise would quickly become tedious.
To be blunt, such slapdash cartography has no place in serious scholarship. I doubt that any authors who would approve of such mapping have an adequate knowledge of linguistic geography to carry out such a research program. We should be able to expect much better from both the authors and the journal.
In short, “Ultraconserved words point to deep language ancestry across Eurasia” is premised on the notion that cutting-edge research in historical linguistics requires little knowledge of linguistic geography, linguistic history, or even linguistics itself. It is hardly surprising that such a research program would yield inadequate results.
*Technically, the grouping proposed by Pagel et al. (2013) is different from the original extent of the Eurasiatic macro-family, as proposed by Joseph Greenberg, in that it does not include Nivkh, but does include Kartvelian and Dravidian families. Nor is this grouping co-extensive with the Nostratic macro-family, as proposed by Vladislav Illyč-Svityč and Aaron Dolgopolsky: Pagel et al.’s grouping includes Chukchee-Kamchatkan but does not include Afroasiatic. It is also noteworthy that most linguist find the Altaic family to be deeply problematic.