Tracing Indo-European Languages Back to Their Source—Through the False Mirrors of the Popular Press
A recent article “Mapping the Origins and Expansion of the Indo-European Language Family” published in Science (vol. 337, pp. 957-960) by a team of evolutionary anthropologists and biologists headed by Dr. Quentin D. Atkinson has created an uproar both in the popular media and the blogosphere.* This article purports to supply novel quantitative evidence for the Anatolian hypothesis, which locates the Indo-European homeland in what is now the Asian part of Turkey, as opposed to the more commonly accepted Kurgan theory, which places it in the Pontic-Caspian steppes of southern Russia and eastern Ukraine. Before we at GeoCurrents continue with our detailed critique of the Science article itself, we must first examine the media reports on the supposed findings of Atkinson and his colleagues. It is one thing for an unconvincing, error-filled report to appear in an academic journal, and quite another for it to be immediately trumpeted in the major newspapers and magazines as constituting nothing less than a major scientific breakthrough.
The major media typically report on linguistic matters only when the research is “done by biologists, computer scientists, or neurologists, but rarely if ever [do they] provide any reportage on the activities of professional linguists”, as was pointedly noted by a discussant on the LanguageHat blog. Journalists writing for major news outlets, moreover, often both misrepresent the linguistic (or pseudo-linguistic) research they report on and exhibit ignorance of even the most basic linguistic concepts, terminology, and findings. This post highlights the media’s major blunders as well as their more subtle distortions of the research in the article of Bouckaert et al..
The story appears to have been broken by The New York Times, which published a piece by its former science editor Nicholas Wade. The online version of this article appeared on Thursday August 23rd 2012, a day before Bouckaert et al.’s article appeared in Science. Wade summarizes Bouckaert et al.’s research as follows:
“Biologists using tools developed for drawing evolutionary family trees say that they have solved a longstanding problem in archaeology: the origin of the Indo-European family of languages.”
In this first sentence, Wade brushes aside more than 200 years of work in philology and historical linguistics, fields that began to flourish in the late 18th and early 19th centuries owing to path-breaking work on Indo-European languages. Generally, Wade portrays linguistics as only marginally relevant to the issues at hand, and linguists as all believing in a rival theory—associating speakers of Proto-Indo-European with chariot-driving, kurgan-building** pastoralists—which he presents as patently incorrect. Linguistic arguments for the Kurgan hypothesis, such as the “wheel” problem and the issue of prehistoric linguistic contacts between Proto-Indo-European and Proto-Uralic are waved aside; GeoCurrents will address these issues in forthcoming posts. Overall, the debate is presented as a duel of name-calling and hand-waving between two camps of non-linguists: Quentin D. Atkinson and his colleagues on the one side and archeologist David W. Anthony on the other. No real arguments from either camp are addressed, making it appear that the two approaches are mutually exclusive, and that only one can ultimately be right. Such a caricature, however, far from the truth, as several highly respected historical linguists, such as Don Ringe and April McMahon, take into account both the quantitative methods touted by Atkinson and the philological evidence explored by Anthony.
A more fundamental problem is Wade’s failure to address the core of Atkinson and company’s research: how one gets from considering cognates, that is words of similar meaning in different languages that stem from a common origin in an ancestral language, to placing the homeland of the group that spoke the parent tongue on the map. Atkinson summarizes the import of his team’s study as “explicitly incorporating spatial information” by:
“combining phylogenetic inference with a relaxed random walk (RRW) model of continuous spatial diffusion along the branches of an unknown, yet estimable, phylogeny to jointly infer the Indo-European language phylogeny and the most probable geographic ranges at the root and internal nodes.”
This RRW model and its application to language study, which supposedly enables these researchers “to locate cultural histories in space and time”, is the original contribution of the article to the debate—and its most controversial aspect. Yet it is curiously missed in all the popular media reportage.
Nor is it clear from Wade’s piece why biologists—whose field of study is emphasized in the title—would have anything novel to say about language matters. According to the Science article itself, “a Bayesian phylogeographic inference framework” used to trace Indo-European languages was “developed to investigate the origin of virus outbreaks from molecular sequence data”. This point is highlighted in the title of another popular media report, by yet another British science journalist, Emma Woollacott of TG Daily, entitled “Virus-tracking technique pinpoints origin of languages”. Woollacott’s vaguely formulated title, referring to “languages” rather than “Indo-European languages”, prompted numerous readers to offer bizarre comments about the origin of languages in general, wondering whether Hebrew was the original language, and whether Greek was “the first language of Europe”. Others pointed to Biblical stories of Noah’s Ark and the Tower of Babel, which they noted as having taken place in or near the region where Atkinson places the origin of the Indo-European language family.
On the whole, Woollacott’s article is brief and lacking in detail. Like Wade, she does not explain in what way the methodology successfully used for tracing virus ancestries and origins would be applicable to human languages. As a reader named Joan pointed out, “genetics deals with four element AGCT patterns while word have far more than four elements. A slew of additional comments note that Woollacott misspelled “Auckland” and “Caspian” in her original article, serious blunders for a piece discussing geospatial distribution of languages!
But it gets much worse. Consider, for example, a piece produced for the BBC News by Jonathan Ball: “English language ‘originated in Turkey’”. The very notion is so absurd that it took me a while to figure out what he could be referring to and to connect it to the Science article. Even assuming that Bouckaert et al. are correct and the Indo-European language family did originate in Anatolia, the title makes no more sense than “English language originated in Tanzania”. Indeed, English is a descendant of Proto-Indo-European, which may have been spoken in present-day Turkey. But it is equally a descendant of Proto-Human, the putative ancestor of all human languages. Proto-Human, like our species itself, is believed by most scholars to have originated in East Africa. (Although Atkinson, in another Science article, placed the homeland of the Proto-Human in West Africa; my objections to that work are summarized in a number of earlier posts, as well as in a peer-reviewed Technical Comment, co-authored with Rory van Tuyl and published in Science on February 10, 2012.)
Besides the poorly worded title—and the bad writing more generally (can one “interrogate language evolution”?)—Ball at least attempted to spell out such basic linguistic concepts as “language family” and “cognates”. Unfortunately, he also defined these essential terms incorrectly. For example, referring to the Ethnologue, Ball states that “more than 100 language families exist”. This number means nothing; many more than 100 language families can be identified simply because they are like Russian matryoshka dolls: smaller language families fit into larger ones, which in turn fit into yet larger ones, and so on. For example, as shown on the chart on the left, Russian is a member of the East Slavic family (together with Ukrainian and Belarusian). The East Slavic family is in turn a member of a larger Slavic family (together with West Slavic and South Slavic families, which include languages such as Polish and Bulgarian, respectively). The Slavic family is grouped together with the Baltic family (which includes Latvian and Lithuanian) into a larger Balto-Slavic family, which is in turn a member of the Indo-European family. Ball justifies his claim that “the Indo-European family is one of the largest families” by noting that it includes “more than 400 languages spoken in at least 60 countries”. Actually, when it comes to the number of languages, Indo-European is far from the world’s largest family, conceding that title to Niger-Congo . The Austronesian language family is also “larger” than Indo-European in this regard, with about three times as many individual languages. But the Indo-European family is indeed the world’s largest if measured by the number of speakers; approximately 40% of the world’s population speaks an Indo-European language as the mother tongue.
More serious is Ball’s deceptive definition of “cognates”, which he describes as “shared words” or “words of similar sound that often describe the same thing”. This explanation misses the essential fact that cognates are by definition words whose similarity of sound and meaning is due to common descent rather than lexical borrowing or sheer accident. And as it so happens, the “similarities” of cognates, both in meaning and sound, can be so obscured by language change as to be undetectable by non-specialists. However, in each case the differences between cognates, can be accounted for by traceable semantic and phonological changes. The English word hound and the Italian cane are cognates, even though their meanings are somewhat different and they do not exactly sound the same. Their sound differences can be accounted for by regular phonological changes, such as the First Germanic Sound Shift (also known as Grimm’s Law), which changed the /k/ into /x/ and later into /h/. The meanings of the two words diverged as well, undergoing processes that are known from other cases. In particular, the meaning of the word hound narrowed from a more general ‘dog’ to a certain kind of dog (the same type of meaning change applied also to the word meat, whose older, more general meaning of ‘food’ is still evident from the compound sweetmeats).*** In many instances, cognates are not apparent at all to the naked eye: for example, the English loaf and the Russian xleb ‘bread’ fit the category, as their meanings and especially their pronunciations have diverged significantly. Crucially, not all words that resemble each other in form and meaning are cognates. Words borrowed from one language into another, like the terms for ‘tea’ discussed in an earlier GeoCurrents post, are not cognates in the technical sense. Nor are such accidental look-alikes such as the Italian strano and the Russian strannyj, though both mean ‘strange’, and their stems stran- sound much the same. Yet, these two words derive from completely different sources: the Italian word comes form the Latin extraneus, meaning ‘external, foreign’, while the Russian word comes from strana, meaning ‘country’ (for longer discussions of these issues, see chapter 2 of my book Languages of the World: An Introduction).
Like most other reporters, Ball presents the issue of the Indo-European origins as a stark dichotomy, one in which two scholarly camps struggle for primacy. Such framing may add dramatic effect, but at the cost of accuracy. In fact, the Kurgan theory and the Anatolian theory, around which the Science article pivots, are but two out of no less than a dozen diverse proposals for the homeland of the Indo-European language family (many of which also discussed in detail in my book). These varying hypotheses make the Proto-Indo-European language as old as 12,000 or as young as 5,000 years, and put its place of origin in locales ranging from Northern Europe to India. Some of these proposals—including Colin Renfrew’s Anatolian hypothesis and Marija Gimbutas’ Kurgan theory—were put forward by archeologists trying to square evidence from their discipline with linguistic data. Other proposals come from linguists, such as Tamaz V. Gamkrelidze and Vyacheslav V. Ivanov (the latter teaches at UCLA), who argue for a recent origin of Indo-European languages in the area of the Armenian Highlands and the Lake Urmia region in northwestern Iran, and Johanna Nichols of UC-Berkeley, who placed the Indo-European homeland in ancient Bactria-Sogdiana (what is today northern Afghanistan and adjacent areas in Uzbekistan and Tajikistan).
Finally, much like Nicholas Wade, Jonathan Ball refers to the learned opinions of two scholars, one who gives a “thumbs up” and the other a “thumbs down” verdict for Atkinson’s work. The “pro” vote comes from Professor Mark Pagel, the Head of Bioinformatics Laboratory at the University of Reading, where Quentin D. Atkinson once worked (a fact unmentioned by. Ball). Pagel lists “linguistic evolution” last on his list of research interests and has no linguistic credentials, as far as I can tell. The “con” vote is from Professor Petri Kallio of the University of Helsinki, a trained Indo-European linguist, whose doctoral thesis focused on the early relations between Indo-European and Uralic, a topic that provides one of the strongest arguments for the Kurgan hypothesis, as we shall see in a forthcoming GeoCurrents post.
While Woollacott’s headline is too vague, Ball’s is misleading, the worst of the worst is the title of the article penned by Alyssa Joyce for Scientific American: “Disease Maps Pinpoint Origin of Indo-European Languages”. Besides the headline being a so-called “garden-path sentence” (where “maps” can be initially understood as a verb), both the title and the next line, “Turkey might be the geographic origin of languages from English to Hindi, according to epidemiological tracking techniques”, make it sound like language is a disease!
English, or course, originated in England rather than in Turkey, and virus-tracking techniques are best left to virus tracking. Sensationalist quasi-scientific reporting in major media once again confirms that little knowledge can be a dangerous thing!
*As is typical of scientific articles with multiple authors, the name of the project’s head appears last in the list. Confusingly, the article is then referred to by the first author’s last name et al. (in this case as “Bouckaert et al.”).
**”Kurgan” is a Turkic word denoting a tumulus, or burial mound, which are widespread on the Pontic Steppes.
***The meaning change in the case of hound was exploited nicely by the creators of the BBC TV series Sherlock, who seem to be more linguistically savvy than the BBC Science News journalists.
« Quentin Atkinson’s Nonsensical Maps of...
‘Wheel’ Vocabulary Puts a Spoke in... »