Tracing Indo-European Languages Back to Their Source—Through the False Mirrors of the Popular Press

Submitted by on September 6, 2012 – 10:19 pm  

A recent article “Mapping the Origins and Expansion of the Indo-European Language Family” published in Science (vol. 337, pp. 957-960) by a team of evolutionary anthropologists and biologists headed by Dr. Quentin D. Atkinson has created an uproar both in the popular media and the blogosphere.* This article purports to supply novel quantitative evidence for the Anatolian hypothesis, which locates the Indo-European homeland in what is now the Asian part of Turkey, as opposed to the more commonly accepted Kurgan theory, which places it in the Pontic-Caspian steppes of southern Russia and eastern Ukraine. Before we at GeoCurrents continue with our detailed critique of the Science article itself, we must first examine the media reports on the supposed findings of Atkinson and his colleagues. It is one thing for an unconvincing, error-filled report to appear in an academic journal, and quite another for it to be immediately trumpeted in the major newspapers and magazines as constituting nothing less than a major scientific breakthrough.

The major media typically report on linguistic matters only when the research is “done by biologists, computer scientists, or neurologists, but rarely if ever [do they] provide any reportage on the activities of professional linguists”, as was pointedly noted by a discussant on the LanguageHat blog. Journalists writing for major news outlets, moreover, often both misrepresent the linguistic (or pseudo-linguistic) research they report on and exhibit ignorance of even the most basic linguistic concepts, terminology, and findings. This post highlights the media’s major blunders as well as their more subtle distortions of the research in the article of Bouckaert et al..

The story appears to have been broken by The New York Times, which published a piece by its former science editor Nicholas Wade. The online version of this article appeared on Thursday August 23rd 2012, a day before Bouckaert et al.’s article appeared in Science. Wade summarizes Bouckaert et al.’s research as follows:

“Biologists using tools developed for drawing evolutionary family trees say that they have solved a longstanding problem in archaeology: the origin of the Indo-European family of languages.”

In this first sentence, Wade brushes aside more than 200 years of work in philology and historical linguistics, fields that began to flourish in the late 18th and early 19th centuries owing to path-breaking work on Indo-European languages. Generally, Wade portrays linguistics as only marginally relevant to the issues at hand, and linguists as all believing in a rival theory—associating speakers of Proto-Indo-European with chariot-driving, kurgan-building** pastoralists—which he presents as patently incorrect. Linguistic arguments for the Kurgan hypothesis, such as the “wheel” problem and the issue of prehistoric linguistic contacts between Proto-Indo-European and Proto-Uralic are waved aside; GeoCurrents will address these issues in forthcoming posts. Overall, the debate is presented as a duel of name-calling and hand-waving between two camps of non-linguists: Quentin D. Atkinson and his colleagues on the one side and archeologist David W. Anthony on the other. No real arguments from either camp are addressed, making it appear that the two approaches are mutually exclusive, and that only one can ultimately be right. Such a caricature, however, far from the truth, as several highly respected historical linguists, such as Don Ringe and April McMahon, take into account both the quantitative methods touted by Atkinson and the philological evidence explored by Anthony.

A more fundamental problem is Wade’s failure to address the core of Atkinson and company’s research: how one gets from considering cognates, that is words of similar meaning in different languages that stem from a common origin in an ancestral language, to placing the homeland of the group that spoke the parent tongue on the map. Atkinson summarizes the import of his team’s study as “explicitly incorporating spatial information” by:

“combining phylogenetic inference with a relaxed random walk (RRW) model of continuous spatial diffusion along the branches of an unknown, yet estimable, phylogeny to jointly infer the Indo-European language phylogeny and the most probable geographic ranges at the root and internal nodes.”

This RRW model and its application to language study, which supposedly enables these researchers “to locate cultural histories in space and time”, is the original contribution of the article to the debate—and its most controversial aspect. Yet it is curiously missed in all the popular media reportage.

Nor is it clear from Wade’s piece why biologists—whose field of study is emphasized in the title—would have anything novel to say about language matters. According to the Science article itself, “a Bayesian phylogeographic inference framework” used to trace Indo-European languages was “developed to investigate the origin of virus outbreaks from molecular sequence data”. This point is highlighted in the title of another popular media report, by yet another British science journalist, Emma Woollacott of TG Daily, entitled “Virus-tracking technique pinpoints origin of languages”. Woollacott’s vaguely formulated title, referring to “languages” rather than “Indo-European languages”, prompted numerous readers to offer bizarre comments about the origin of languages in general,  wondering whether Hebrew was the original language, and whether Greek was “the first language of Europe”. Others pointed to Biblical stories of Noah’s Ark and the Tower of Babel, which they noted as having taken place in or near the region where Atkinson places the origin of the Indo-European language family.

On the whole, Woollacott’s article is brief and lacking in detail. Like Wade, she does not explain in what way the methodology successfully used for tracing virus ancestries and origins would be applicable to human languages. As a reader named Joan pointed out, “genetics deals with four element AGCT patterns while word have far more than four elements. A slew of additional comments note that Woollacott misspelled “Auckland” and “Caspian” in her original article, serious blunders for a piece discussing geospatial distribution of languages!

But it gets much worse. Consider, for example, a piece produced for the BBC News by Jonathan Ball: “English language ‘originated in Turkey’”. The very notion is so absurd that it took me a while to figure out what he could be referring to and to connect it to the Science article. Even assuming that Bouckaert et al. are correct and the Indo-European language family did originate in Anatolia, the title makes no more sense than “English language originated in Tanzania”. Indeed, English is a descendant of Proto-Indo-European, which may have been spoken in present-day Turkey. But it is equally a descendant of Proto-Human, the putative ancestor of all human languages. Proto-Human, like our species itself, is believed by most scholars to have originated in East Africa. (Although Atkinson, in another Science article, placed the homeland of the Proto-Human in West Africa; my objections to that work are summarized in a number of earlier posts, as well as in a peer-reviewed Technical Comment, co-authored with Rory van Tuyl and published in Science on February 10, 2012.)

Besides the poorly worded title—and the bad writing more generally (can one “interrogate language evolution”?)—Ball at least attempted to spell out such basic linguistic concepts as “language family” and “cognates”. Unfortunately, he also defined these essential terms incorrectly. For example, referring to the Ethnologue, Ball states that “more than 100 language families exist”. This number means nothing; many more than 100 language families can be identified simply because they are like Russian matryoshka dolls: smaller language families fit into larger ones, which in turn fit into yet larger ones, and so on. For example, as shown on the chart on the left, Russian is a member of the East Slavic family (together with Ukrainian and Belarusian). The East Slavic family is in turn a member of a larger Slavic family (together with West Slavic and South Slavic families, which include languages such as Polish and Bulgarian, respectively). The Slavic family is grouped together with the Baltic family (which includes Latvian and Lithuanian) into a larger Balto-Slavic family, which is in turn a member of the Indo-European family. Ball justifies his claim that “the Indo-European family is one of the largest families” by noting that it includes “more than 400 languages spoken in at least 60 countries”. Actually, when it comes to the number of languages, Indo-European is far from the world’s largest family, conceding that title to Niger-Congo . The Austronesian language family is also “larger” than Indo-European in this regard, with about three times as many individual languages. But the Indo-European family is indeed the world’s largest if measured by the number of speakers; approximately 40% of the world’s population speaks an Indo-European language as the mother tongue.

More serious is Ball’s deceptive definition of “cognates”, which he describes as “shared words” or “words of similar sound that often describe the same thing”. This explanation misses the essential fact that cognates are by definition words whose similarity of sound and meaning is due to common descent rather than lexical borrowing or sheer accident. And as it so happens, the “similarities” of cognates, both in meaning and sound, can be so obscured by language change as to be undetectable by non-specialists. However, in each case the differences between cognates, can be accounted for by traceable semantic and phonological changes. The English word hound and the Italian cane are cognates, even though their meanings are somewhat different and they do not exactly sound the same. Their sound differences can be accounted for by regular phonological changes, such as the First Germanic Sound Shift (also known as Grimm’s Law), which changed the /k/ into /x/ and later into /h/. The meanings of the two words diverged as well, undergoing processes that are known from other cases. In particular, the meaning of the word hound narrowed from a more general ‘dog’ to a certain kind of dog (the same type of meaning change applied also to the word meat, whose older, more general meaning of ‘food’ is still evident from the compound sweetmeats).*** In many instances, cognates are not apparent at all to the naked eye: for example, the English loaf and the Russian xleb ‘bread’ fit the category, as their meanings and especially their pronunciations have diverged significantly. Crucially, not all words that resemble each other in form and meaning are cognates. Words borrowed from one language into another, like the terms for ‘tea’ discussed in an earlier GeoCurrents post, are not cognates in the technical sense. Nor are such accidental look-alikes such as the Italian strano and the Russian strannyj, though both mean ‘strange’, and their stems stran- sound much the same. Yet, these two words derive from completely different sources: the Italian word comes form the Latin extraneus, meaning ‘external, foreign’, while the Russian word comes from strana, meaning ‘country’ (for longer discussions of these issues, see chapter 2 of my book Languages of the World: An Introduction).

Like most other reporters, Ball presents the issue of the Indo-European origins as a stark dichotomy, one in which two scholarly camps struggle for primacy. Such framing may add dramatic effect, but at the cost of accuracy. In fact, the Kurgan theory and the Anatolian theory, around which the Science article pivots, are but two out of no less than a dozen diverse proposals for the homeland of the Indo-European language family (many of which also discussed in detail in my book). These varying hypotheses make the Proto-Indo-European language as old as 12,000 or as young as 5,000 years, and put its place of origin in locales ranging from Northern Europe to India. Some of these proposals—including Colin Renfrew’s Anatolian hypothesis and Marija Gimbutas’ Kurgan theory—were put forward by archeologists trying to square evidence from their discipline with linguistic data. Other proposals come from linguists, such as Tamaz V. Gamkrelidze and Vyacheslav V. Ivanov (the latter teaches at UCLA), who argue for a recent origin of Indo-European languages in the area of the Armenian Highlands and the Lake Urmia region in northwestern Iran, and Johanna Nichols of UC-Berkeley, who placed the Indo-European homeland in ancient Bactria-Sogdiana (what is today northern Afghanistan and adjacent areas in Uzbekistan and Tajikistan).

Finally, much like Nicholas Wade, Jonathan Ball refers to the learned opinions of two scholars, one who gives a “thumbs up” and the other a “thumbs down” verdict for Atkinson’s work. The “pro” vote comes from Professor Mark Pagel, the Head of Bioinformatics Laboratory at the University of Reading, where Quentin D. Atkinson once worked (a fact unmentioned by. Ball). Pagel lists “linguistic evolution” last on his list of research interests and has no linguistic credentials, as far as I can tell. The “con” vote is from Professor Petri Kallio of the University of Helsinki, a trained Indo-European linguist, whose doctoral thesis focused on the early relations between Indo-European and Uralic, a topic that provides one of the strongest arguments for the Kurgan hypothesis, as we shall see in a forthcoming GeoCurrents post.

While Woollacott’s headline is too vague, Ball’s is misleading, the worst of the worst is the title of the article penned by Alyssa Joyce for Scientific American: “Disease Maps Pinpoint Origin of Indo-European Languages”. Besides the headline being a so-called “garden-path sentence” (where “maps” can be initially understood as a verb), both the title and the next line, “Turkey might be the geographic origin of languages from English to Hindi, according to epidemiological tracking techniques”, make it sound like language is a disease!

English, or course, originated in England rather than in Turkey, and virus-tracking techniques are best left to virus tracking. Sensationalist quasi-scientific reporting in major media once again confirms that little knowledge can be a dangerous thing!


*As is typical of scientific articles with multiple authors, the name of the project’s head appears last in the list. Confusingly, the article is then referred to by the first author’s last name et al. (in this case as “Bouckaert et al.”).

**”Kurgan” is a Turkic word denoting a tumulus, or burial mound, which are widespread on the Pontic Steppes.

***The meaning change in the case of hound was exploited nicely by the creators of the BBC TV series Sherlock, who seem to be more linguistically savvy than the BBC Science News journalists.


  • Alfia Wallace

    As if obtuse statistical methodology weren’t bad enough. This looks like people invested in proving that the spread of agriculture into Europe had to do with an IE expansion. In skimming over the methodology it’s clear that a lot of assumptions were made. I’d be interested in what you think about it. You don’t have to be able to evaluate the statistics in order to evaluate what they chose as the bases for their “linguistic method” (positing rates of cognate creation and death, etc.)

    • Asya Pereltsvaig

      I am going to address those very questions in the next three posts, Friday, Monday and Tuesday. Let’s just say that my evaluation of what they chose as the basis for their linguistic method (pseudo-linguistic, if you ask me!) is not a very positive one. Why? Stay tuned!

      • Alfia Wallace

        Looking forward. :-)

        • Andrew Zolnai

          Great minds think alike. My later comment refers to work and theories on just that expansion (and ideas about patiarchy usurping matriarchy but I digress)

          • Martin W. Lewis

            Thanks for bringing up this impirtant issue, which I will address in a later post (or posts)

  • Andrew Zolnai

    Great post as usual, one simply couldn’t make this up if we tried! I once helped a seminar with in Calgary CDN to help journalists report facts straight… it seems that sort of thing is still needed! Nice to hear about Marija Gimbutas again, she’s my all-time hero, whose research lead to Riane Eisler writing (note the distinction).

    • Asya Pereltsvaig

      Thanks, Andrew! Basic training for linguists is still very much needed, I am afraid: whenever I read on anything to do with language matters, I don’t know whether to laugh (it’s so ridiculous!) or to cry. And there will be more on Gimbutas’ theory in the following posts, so stay tuned!

      • James T. Wilson

        I have to admit that, back when I was a kid interested in linguistics, I read a number of articles by Marija Gimbutas and then I read one of those articles that seemed to argue that early peoples properly listened to their goddess-worshiping mommies until those bad, old Jews started up their patriarchal sky-father worship. After that, I avoided anything else that had her name on it.

        • Asya Pereltsvaig

          I was speaking strictly of the kurgan theory, not the feminist angle. But you must be confusing something here, as the “bad, old” guys with “patriarchal sky-father worship” were Indo-Europeans from the southern Russian steppes, not Jews. Jews are from the wrong time and place. Unless one believes that Jews are at fault for everything! :)

          • James T. Wilson

            You know, I was doing something really unforgivable here, confusing Ms. Gimbutas with later, far less sophisticated, critics of Judeo-Christian patriarchy, who referred to her works. A bit like dismissing Nietzsche or Wagner (two artists I just love) because of a later admirer. When I have time, I may have to give some of her more purely linguistic work another read.

          • Asya Pereltsvaig

            Good points, James! Though it’s not the later admirer that makes me dismiss Wagner, but his “Das Judenthum in der Musik”.

        • Martin W. Lewis

          This is an important point, although as Asya mentions below, the militant, patriarchal people in question were presumed to be “Indo-Euopeans,” not Jews. I am currently working on a “why it all patters” post that will address such issues.

  • Lina

    I dislike it when I cannot read articles due to being unable to resize the text. I’ll have to assume that this article offers no useful information at all.

    • Kevin Morton

      Lina, for the moment try the Print/PDF button at the top of each article on this site. You can select whatever font size you want, with the added benefit of making more accurate assumptions.

      • Asya Pereltsvaig

        Thanks, Kevin!

        Asya Pereltsvaig
        Check out our response to the recent article in Science by Atkinson et al.!

      • Lina

        Print/PDF should not be the only method of providing readable text, especially on a site whose intent is to inform. This might as well be #150517 text on #000000 background.

        • Kevin Morton

          Alright Lina, it’s on the top of the list of updates. Print/PDF is an alternative for the interim if you like.

          I’m not sure if our other readers would like black-on-black text so much though..

          • Lina

            I love placating statements that offer no information. When exactly do you plan to change your site’s style sheet? Will you share it with us on reddit? You only post information there from, or on threads with links shared from, so it would be relevant to your limited post style.

          • Asya Pereltsvaig

            Lina, which browser are you using? I use Firefox and there’s no problem enlarging the font…

          • Dale H. (Day) Brown

            Oh for the FIDO BBS days. There, I could have Asya’s words in red, mine in green, Andrew’s in amber, James in blue…. and each poster had the same 80 columns rather than constantly narrower columns. Nobody ever got confused over who said what, because when you quoted, their colored words showed in your post.

        • James T. Wilson

          I have no clue about technical matters, but when I am using Google Chrome, I simply put two fingers on the touchpad of my laptop, spread them apart, and the characters can be made as large as I like. I often find myself doing that with maps, instead of opening them in another tab. Not to denigrate your point, which I am sure may be quite valid to those who are offended by breaches of computer style. Split infinitives drive me to tears, so I really have no standing to criticize.

          • Andrew Zolnai

            I think also that modern browsers will increase or decrease the size of screen content, both text and images, when you press “CTL +” and “CTRL -”.

  • Sankar

    You forgot to mention the indian popular media endorsement of the rightwing hindu’s view that everything originated in India….

    • Asya Pereltsvaig

      I didn’t so much forget as I was focusing on major US and UK media. But you are absolutely right in that the out-of-India theory of IE is getting traction only in political and journalistic circles, while it doesn’t have any serious proponents in the academic world.

  • Dale H. (Day) Brown

    Many debunk Gimbutas as if that solves the problem; she notes all the lakes up there, having rejected the Black Sea because there are no terms for marine environment. Had she known of the 5600 BC flood of the then fresh water Euxine basin, she would’ve fit the timber frame construction and artefacts to other wise obvious Cucuteni sites.

    Why does anyone waste time on all these other crack pot theories, or keep calling it “Indo-European” when it was never spoken on the Indus?

    • Asya Pereltsvaig

      The term “Indo-European” refers to the modern family, not the ancient tongue. Today, languages in the IE family are spoken from Europe to India, hence the name. The common ancestor is Proto-Indo-European (PIE), as proto-X is the typical terminology linguists use for shared ancestors of language families.