historical linguistics

On Mathematical Modeling and Inter-Disciplinary Work in Historical Linguistics: A Reply to Alexei Drummond—and a Friendly Critique of the Field

We would like to thank everyone who has posted comments on our recent posts on Indo-European linguistics, whether favorable or critical. As we have been highly critical ourselves, we can only expect the same in return; such is the give-and-take of the scholarly endeavor. We  will post detail replies to critical comments next week, after Asya Pereltsvaig returns from her travels. The present post responds only to the first comment posted on GeoCurrents by one of the co-authors of the Science article that we have taken to task. In that response, Alexei Drummond takes on some significant epistemological and methodological issues that demand a considered answer. As Drummond argues:

Personally I would love to include more direct evidence-based information into the computational analysis to correct the details (and see if that changes the main inference of the location of the origin), but that would require the linguists and archaeologists to actually embrace the value of computer models to synthesize large amounts of data. How can a human mind, however elegantly expressed its written conclusions, correctly balance the thousands of items of evidence to provide a probabilistic statement about history in a way that others can verify (i.e. The Horse, the Wheel and Language)? What is good about our approach is that the simplifying assumptions are clearly stated and can be improved upon in subsequent analyses. I just wish that the historical linguistics crowd would try *constructive* rather than destructive criticism for a change. We want what you want: to determine what happened. So as we are all scientists, we should work towards common ground, shouldn’t we?

Try as we might, we find little to disagree with in this eloquent appeal for the use of computational techniques and interdisciplinary research. As Asya Pereltsvaig has emphasized, we respect the work of linguists who use such methods in their own research. We advance no objections to computational methods per se, but rather to this specific application. Successful modeling cannot rest on unsubstantiated and most likely false assumptions about language spread and diversification, cannot disdain verification efforts, cannot be inherently unfalsifiable, and cannot be consistently contradicted by the empirical record. Drummond is surely right that well-crafted mathematical models can be continually adjusted to better fit the reality that they seek to represent—but only if they rest on solid foundation. Certainly the model under consideration could be sharpened, as has been suggested by another co-author, by incorporating elements of physical geography beyond the water/land dichotomy; such an improvement could weed out such blunders as having the Tocharians’ advance along 20,000-foot ridges while bypassing their eventual home in the Tarim basin. But as long as the model rests on the untenable assumption that languages spread through a contagion-like process and diverge in speciation-like events, the result will still be of little value. Subsequent posts will examine how languages do spread and change. As we shall see, such linguistic processes are vastly more complex than the scenarios posited by the Science team. That does not mean that they cannot be mathematical modeled, only that any such efforts will have be much more involved than what we have seen thus far.

We therefore hope that Alexei Drummond will continue to apply his formidable skills to the problems of language spread and diversification. We also hope that in the future he can collaborate not merely with other modelers, scholars whose skill sets overlap to a great extent, but also with experts with complementary skills and frameworks of knowledge. In particular, such work must be done with a bone fide Indo-Europeanist; collaborators with proficiency in world history, geography, and linguistics more generally would also prove highly beneficial.

Although is easy for us to dish out such advice, it would probably prove much more difficult for anyone to take it. As Drummond notes, it seems likely that many if not most historical linguists would rebuff any such invitations for collaboration. Here it becomes necessary for us reverse our critical attention and apply it to historical linguistics itself. Although this series of posts seeks to vindicate the field, we are convinced that a successful defense of any beleaguered intellectual enterprise demands a self-critical* eye.

Historical linguistics is currently in crisis not only because of unsubstantiated attacks or the failure of others to appreciate its intellectual achievements; it is also languishing because its practitioners have failed to meet the challenges that they face. All told, they have remained too insular and too comfortable with their own research paradigms. Emphasizing, like good scientists, the narrow acquisition of knowledge along established research fronts, few members of the guild have been willing to stand back and address the larger implications of their own work for the study of human pre-history (and history), let alone offer edification for a general audience. By the same token, few historical linguists have collaborated extensively with scholars in other disciplines. It is no accident that the three best-known scholars in the debate on Indo-European origins are (or were) all archeologists: Maria Gimbutas, Colin Renfrew, and David Anthony.

Historical linguists might reply that progress in linguistic research demands tightly focused inquiry and highly specialized disciplinary techniques, and would thereby gain little through interdisciplinary collaboration. Such arguments make sense when applied to specific issues, but collapse when it comes to broader matters, such as the origin of the Indo-European family, which is as much a matter of history and geography as it is of linguistics. And regardless of whatever intellectual arguments can be made for highly focused specialization, pragmatic considerations call for a different approach; it is a fact that historical linguistics is a diminishing field that has been unable to fend off mass-media celebrations of encroachments on its own terrain. If their field is to survive, historical linguists much realize that they can no longer be satisfied merely by communicating with each other. They not only must engage more with other scholars, but they must also reach out to the educated public.

Our charge is perhaps not as difficult as it might seem. The public is deeply interested in such issues, as attested by the articles in the popular press on the Bouckaert et al. paper. Asya and I have discovered the same interest while teaching on the intersection of linguistics, history, and geography in Stanford University’s Continuing Studies (adult education) Program, where our classes are consistently among the most popular offerings. Although we would like to think that our teaching skills have something to do with our enrollment numbers, we realize that they stem largely from demand for instruction on a topic that many people find intrinsically fascinating. Next winter, we will be teaching a class specifically on the geo-history of the world’s major language families. But in looking for a text that draws together the major issues within a single, comprehensible framework, we find ourselves frustrated. The best work that we have located thus far is a 1994 Scientific American article entitled, “World Linguistic Diversity,” by none other than archeologist Colin Renfrew. It is unfortunately short and somewhat dated, and it is almost certainly wrong on such major issues as the origin of Indo-European and the existence of Altaic. We do find it odd, and rather sad, that no comparable work has, to our knowledge, been produced by a historical linguist.

* “Self-criticism” is not the best term here, as neither of us is a historical linguist. I am a historical geographer and Asya Pereltsvaig is a linguist who specializes in syntax. What we thus offer is perhaps best described as “friendly criticism.”


On Mathematical Modeling and Inter-Disciplinary Work in Historical Linguistics: A Reply to Alexei Drummond—and a Friendly Critique of the Field Read More »

Mismodeling Indo-European Origin and Expansion: Bouckaert, Atkinson, Wade and the Assault on Historical Linguistics

Dear Readers,

As GeoCurrents passed through its August slowdown, plans were made for a series on the Summer Olympics. Thanks to the efforts of Chris Kremer, we have gathered statistics—and made maps—relating Olympic medal count by country to population and GDP, both overall and in regard to specific categories of competition. The series, however, has been put on hold by the recent publication of two heralded articles on the history and geography of the Indo-European language family. On August 24, a short piece in Science—“Mapping the Origins and Expansion of the Indo-European Language Family”—made extravagant claims, purporting to overturn the most influential historical-linguistic account of the world’s most widespread language family. On the same day, Nicholas Wade, noted New York Times science reporter, wrote a half-page spread in the news section of the Times on the Science report, entitled “Family Tree of Languages Has Roots in Anatolia, Biologists Say.” Over the next few days, the story was picked up—and often twisted in the process—by assorted journalists. Within a few days, headlines appeared as preposterous as “English Language Originated in Turkey.”

As Wade’s title indicates, the Science article, written by Remco Bouckaert and eight others (most notably Quentin D. Atkinson), seeks to overturn the thesis that the Indo-European (I-E) family originated north of the Black and Caspian seas. It instead locates the I-E heartland in what is now Turkey, supporting the “Anatolian” thesis advanced a generation ago by archeologist Colin Renfrew. The Science team bases its claims on mathematical grounds, using techniques derived from evolutionary biology and epidemiology to draw linguistic family trees and model the geographical spread of language groups. According to Wade, the authors claim that their study does nothing less than “solve” a “long-standing problem in archaeology: the origin of the Indo-European family of languages.” (Strictly speaking, however, the problem is not an archaeological one, as excavations by themselves tell us nothing about the languages of non-literate peoples; it is rather a linguistic problem with major bearing on prehistory more generally.)

As GeoCurrents is deeply interested in the intersection of language, geography, and history, the two articles immediately grabbed our attention. Our initial response was one of profound skepticism, as it hardly seemed likely that a single mathematical study could “solve” one of the most carefully examined conundrums of the distant human past. Recent work in both linguistics and archeology, moreover, has tended against the Anatolian hypothesis, placing Indo-European origins in the steppe and parkland zone of what is now Ukraine, southwest Russia, and environs. The massive literature on the subject was exhaustively weighed as recently as 2007 by David W. Anthony in his magisterial study, The Horse, the Wheel, and Language: How Bronze-Age Riders from the Eurasian Steppes Shaped the Modern World. Could such a brief article as that of Bouckaert et al. really overturn Anthony’s profound syntheses so easily?

The more we examined the articles in question, the more our reservations deepened. In the Science piece, the painstaking work of generations of historical linguists who have rigorously examined Indo-European origins and expansion is shrugged off as if it were of no account, even though the study itself rests entirely on the taken-for-granted work of linguists in establishing relations among languages based on words of common descent (cognates). In Wade’s New York Times article, contending accounts and lines of evidence are mentioned, but in a casual and slipshod manner. More problematic are the graphics offered by Bouckaert and company. The linguistic family trees generated by their model are clearly wrong, as we shall see in forthcoming posts. And on the website that accompanies the article, an animated map (“movie,” according to its creators) of Indo-European expansion is so error-riddled as to be amusing, and the conventional map on the same site is almost as bad. Mathematically intricate though it may be, the model employed by the authors nonetheless churns out demonstrably false information.

Failing the most basic tests of verification, the Bouckaert article typifies the kind of undue reductionism that sometimes gives scientific excursions into human history and behavior a bad name, based on the belief that a few key concepts linked to clever techniques can allow one to side-step complexity, promising mathematically elegant short-cuts to knowledge. While purporting to offer a truly scientific* approach, Bouckaert et al. actually forward an example of scientism, or the inappropriate and overweening application of specific scientific techniques to problems that lie beyond their own purview.

The Science article lays its stake to scientific standing in a straightforward but unconvincing manner. The authors claim that as two theories of Indo-European (I-E) origin vie for acceptance, a geo-mathematical analysis based on established linguistic and historical data can show which one is correct. Actually, many theories of I-E origin have been proposed over the years, most of which—including the Anatolian hypothesis—have been rejected by most specialists on empirical grounds. Establishing the firm numerical base necessary for an all-encompassing mathematical analysis of splitting and spreading languages is, moreover, all but impossible. The list of basic cognates found among Indo-European languages is not settled, nor is the actual enumeration of separate I-E languages, and the timing of the branching of the linguistic tree remains controversial as well. As a result of such uncertainties, errors can easily accumulate and compound, undermining the approach.

The scientific failings of the Bouckaert et al. article, however, go much deeper than that of mere data uncertainty. The study rests on unexamined postulates about language spread, assuming that the process works through simple spatial diffusion in much the same way as a virus spreads from organism to organism. Such a hypothesis is intriguing, but must be regarded as a proposition rather than a given, as it does not rest on a foundation of evidence. The scientific method calls for all such assumptions to be put to the test. One can easily do so in this instance. One could, for example, mathematically model the hypothesized diffusion of Indo-European languages for historical periods in which we have firm linguistic-geographical information to see if the predicted patterns conform to those of the real world. If they do not, one could only conclude that the approach fails. Such failure could stem either from the fact that the data used are too incomplete and compromised to be of value (garbage in/garbage out), of from a more general collapse of the diffusional model. Either possibility would invalidate the Science article.

Such a study, it turns out, has been conducted—and by none other than Bouckaert et al. in the Science article in question. Their model not only looks back 8,500 years into the past, when the locations and relations of languages families are only conjectured, but also comes up to the near present (1974), when such matters are well known. Here a single glance at their maps reveals the failure of their entire project, as they depict eastern Ukraine and almost all of Russia as never having been occupied by Indo-European speakers. Are we to believe that Russian and Ukrainian are not I-E languages? Or perhaps that Russians and Ukrainian speakers do not actually live in Russia and Ukraine? By the same token, are we to conclude that the Scythian languages of antiquity were not I-E? Or perhaps that the Scythians did not actually live in Scythia? And these are by no means the only instances of the study invalidating itself, as we shall soon demonstrate. An honest scientific report would have admitted as much, yet that of Bouckaert et al. instead trumpets its own success. How could that possibly be?

One can only speculate as to why the authors proved incapable of noting the failure of their model to mirror reality. Did they neglect to look at their own maps, trusting that the underlying equations were so powerful that they would automatically deliver? Could their faith in their model trump their concern for empirical evidence? Or could it be that their knowledge of linguistic geography is so scanty that they do not grasp the distribution of the Russian language, much less that of Scythian? If so, they are not operating at an acceptable undergraduate level of geo-historical knowledge. Alternatively, the authors might be aware that their model generates nonsense, but prefer to pretend otherwise, hoping to buffalo the broader scholarly community. They seem, after all, to conceal their approach as much as possible, couching their “findings” in jargon-ridden prose that proves a challenge not just for lay readers but also for specialists in neighboring subfields. (Translations of such passages as “Contours on the map represent the 95% highest posterior density distribution for the range of Indo-European” will be forthcoming.)

Regardless of whether the authors are intentionally trying to mislead the public or have simply succeeded in fooling themselves, their work approaches scientific malpractice. Science ultimately demands empirical verification, and here the project fails miserably. If generating scads of false information does not falsify the model, what possibly could? Non-falsifiable claims are, of course, non-scientific claims. The end result is a grotesquely rationalistic and hence ultimately irrational approach to the human past. As such, examining the claims made by the Science team becomes an example of what my colleagues Robert Proctor and Londa Schiebinger have aptly deemed “agnotology,” or “the study of culturally induced ignorance or doubt, particularly the publication of inaccurate or misleading scientific data.”

As the critique we offer is harsh and encompassing, GeoCurrents will devote a number of posts to examining in detail the claims made and techniques employed by Bouckaert, Atkinson, and their colleagues. But before delving into the nitty-gritty, a few words are in order about what ultimately lies at stake. We are exercised about the Science article not merely because of our passion for the seemingly esoteric issue of Indo-European origins, but also because we fear for the future of historical linguistics—and history more generally. The Bouckaert study, coupled with the mass-media celebration of the misinformation that it presents, constitutes an assault on a field that has generated an extraordinary body of rigorously derived information about the human past. Such an attack occurs at an unfortunate moment, as historical linguistics is already in crisis. Linguistics departments have been cutting positions in historical inquiry for some time, creating an environment in which even the best young scholars in the field are often unable to obtain academic positions.

The devaluation of historical linguistics is merely one aspect of a much larger shift away from the study of the past. Subdisciplines such as historical geography and historical sociology have been diminishing for decades, and even the discipline of history faces declining enrollments and reduced faculty slots. Academic history itself, moreover, has been progressively shying away from the deeper reaches of the human past to focus on modern if not recent historical processes. Such developments do not bode well for the maintenance of an educated public. At the risk of descending into hyperbole, we do worry about the emergence of something approaching institutionally produced societal dementia. The past matters, and we care deeply for the preservation of its study.

*Make no mistake: we at GeoCurrents are strong supporters of the scientific method. Linguistics is itself a logically constituted, rigorous endeavor that counts as a science in the larger sense of the word, and I have myself co-edited a work defending science and reason against eco-radical and other far-left attacks (The Flight from Science and Reason, edited by Paul R. Gross, Norman Levitt, and Martin W. Lewis. 1997. New York Academy of Sciences).


Mismodeling Indo-European Origin and Expansion: Bouckaert, Atkinson, Wade and the Assault on Historical Linguistics Read More »