Indo-European Origins

Can language spread be modeled using computational techniques designed to trace the diffusion of viruses? As recently announced in the New York Times, a team of biologists claims to have solved one of the major riddles of human prehistory, the origins of the Indo-European language family, by applying methodologies from epidemiology. In actuality, this research, published in Science, does nothing of the kind. As this series of articles shows, the assumptions on which it rests are demonstrably false, the data that it uses are woefully incomplete and biased, and the model that it employs generates error at every turn, undermining the knowledge generated by more than two centuries of research in historical linguistics and threatening our understanding of the human past.

Our video presentation of this series’ main points can be found at: http://youtu.be/4jHsy4xeuoQ

103 Errors in Mapping Indo-European Languages in Bouckaert et al., Part I

As our criticisms of Bouckaert et al. have been extremely harsh, we must justify them in some detail. I have accused the authors of erring “at every turn,” a charge that reeks of hyperbole. But even if that claim is exaggerated, it is still not too far from the mark. To demonstrate the extraordinary density of error in the Science article, the next few posts will dissect the authors’ base map of Indo-European languages (Figure S6 in their Supplementary Materials). This map, depicting the distribution of both modern and ancient Indo-European languages, forms a key input for their “explicit geographic model of language expansion” (Bouckaert et al., p. 957), as the locations of the sampled languages shown on this map are fed into the model in order to calculate the location of the PIE homeland. Many of the errors and inconsistencies found on their other maps stem from mistakes made in this initial figure.

The map in question shows the location of the 103 Indo-European languages analyzed. The brief caption notes that “colored polygons represent the geographic area assigned to each language based on Ethnologue.” This assertion is misleading at best. The Ethnologue does not consistently map modern languages, and it pays little attention to long-extinct ones such as Hittite. And where the Ethnologue does map, it typically does so in vastly greater detail than Bouckaert et al. Compare, for example, how the two sources depict the languages of what is now southern and central Pakistan in the paired figures to the left.

Regardless of the source (or sources) used, the map is highly inaccurate. To illustrate the cavalcade of error found in Bouckaert et al., I have isolated 103 miscues, some admittedly rather minor, but others highly significant. As recounting all of them would be tedious, I will simply note them in call-outs on expanded details from their “master map.” I have prepared twelve such enlarged maps, each focusing on a different part of the historically Indo-European-speaking world. I will post these maps sequentially over the next few days, discussing in the accompanying posts some of their more egregious errors. Today’s post will conclude with a consideration of South Asia; subsequent ones will move in a westward direction, terminating in the British Isles.

Before examining the portrayal of the Indian Subcontinent in Bouckaert et al., a few words are in order about their general approach to mapping. Analyzing their base-map is no easy matter, as they do not follow conventional cartographic procedures. Their all-important polygons are often impossible to trace, obscured by the large, numbered circles used to label the 103 languages. Another perceptual problem stems from their use of overlays, with multiple extinct languages (in red) layered upon extant languages (in blue). The resulting color blends yield confusing intermediate shades. Note on the detail posted to the left the depictions of Luvian, Hittite, Classical Armenian, Kurdish, and modern Armenian. Determining which language is indicated in which places takes some patience.

A more intractable problem concerns the map’s temporal framing. The short explanation provided in the caption makes the issue seem simple: “Red areas indicate ancient languages and blue areas indicate modern languages.” Left unanswered is the time frame of “linguistic modernity.” In some places, the term is defined broadly, extending back hundreds of years. Cornwall, for example, is shown as inhabited by speakers of modern Cornish. Such a view is anachronistic, as Cornish had disappeared from most of the peninsula by 1700, and was essentially extinct before the modern revival movement began in the 20th century. (Today Cornish is estimated to have only “a few” native speakers.) Elsewhere, the mapping of “modern languages” refers to the late 20th century. The German zone, for example, fits only the post-WWII period, after millions of German speakers had been expelled from Pomerania, Silesia, and Sudetenland. The map, to put it simply, plays fast and loose with time and space.

Even more problematic is the mapping of many languages on the basis of political rather than linguistic features. As was noted in an earlier post, all of the maps used in the study show signs of what I called “geopolitical contamination,” in which the boundaries of modern-day states incorrectly determine those of language groups, following Max Weinreich’s dictum that “a language is a dialect with an army and navy.” I was puzzled, for example, by the fact that Moldova was placed outside of the Indo-European realm in Figure S4, showcased on Quentin Atkinson’s website. The reason is readily apparent when one considers the map of the 103 language polygons (Figure S6). Here Romanian is depicted as almost exactly coincident with Romania. Moldova is fully excluded from this realm, even though the official “Moldovan Language” is differentiated from Romanian solely on political grounds. One can indeed identify a Moldovan subdialect of Romanian, but it spans the Romanian-Moldovan border. Moldova should thus have been placed within the Romanian polygon, yet it is instead depicted in the same manner as Hungary, giving the impression that it lies outside the Indo-European realm. The consequences of such a strategy are troubling for the contemporary world, but become positively pernicious when retroactively extended into the past, which is precisely what the Bouckaert model does. As a result, almost all of Moldova is ludicrously mapped as most likely never having been occupied by Indo-European speakers in Figure S4.

 

 

 

 

 

 

 

Such geopolitical contamination is clearly evident in the depiction of the languages of South Asia, posted here. Note that Bengali, often regarded as the world’s sixth most widely spoken language, is essentially limited to Bangladesh, its 80+ million speakers in the Indian state of West Bengal written out of the linguistic community. Even more unreasonably, Vedic Sanskrit is given the polygon of a modern political unit. The supposed territory of this ancient language is outlined and shaded in red in the map posted here. This area, it turns out, precisely fits the territorial extent of Punjab before it was partitioned by the British. That colonial-era Punjab would have no bearing on the distribution of Vedic Sanskrit, spoken some 3,000 years ago, should go without saying. It is also worth noting that the former Punjab included what is now the Indian Himalayan state of Himachal Pradesh, which features peaks 22,000 feet above sea level. It is safe to assume that such areas were never part of the Vedic Sanskrit realm.

 

Mapping Vedic Sanskrit is no easy task, but that is no excuse for using a modern geopolitical proxy. Careful studies show that the world of the Rig Veda was largely limited to what are now the Indian and Pakistani states of Punjab along with the Vale of Peshawar and Swat Valley. “Vedic India” in the larger sense extended from this region down the Ganges Valley through Bihar and southward to encompass Gujarat, as can be seen in the second map posted here. Either of these two areas could easily have been used for the Vedic Sanskrit polygon.

 

I will not comment further on the remaining errors and infelicities on the Bouckaert et al. portrayal of South Asia, as a number of them are noted on the map itself. I have also posted a fine Wikipedia map of the current distribution of the Indo-European languages of South Asia for comparative purposes. (Note that this Wikipedia map lumps a number if disparate dialects into single languages, such as Bihari.)

As we shall see in forthcoming posts, similar errors litter all other portions of the original language map employed by Bouckaert et al. As a result, it is difficult to avoid the conclusion that the authors simply do not have the level of geo-linguistic comprehension necessary for carrying out their task. I have taught the geography of modern languages at leading universities for twenty-five years, and I can peg the level of understanding demonstrated by students fairly accurately. That of Bouckaert et al. would clearly fall into the “B” range. Given the unfortunate realities of grade inflation, that means that more than half of my undergraduate students finish their terms with a better understanding of the distribution of languages than the authors of a supposedly path-breaking article on the origin and spread of the world’s largest language family published in one of the world’s leading scientific journals.

 

 

The Misleading and Inconsistent Language Selection in Bouckaert et al.

To successfully model the spread and divergence of a language family, one must select languages for one’s data set in a comprehensive, balanced, and consistent manner. Results will be skewed if large numbers of languages are excluded from analysis, if some regions and linguistic branches are covered much more thoroughly than others, or if both dialects and languages are selected based on different criteria in different parts of the world. Bouckaert et al., unfortunately, do all of this and more. The authors favor certain areas and linguistic sub-families, minimizing others. Biases relating to preservation and examination seem to guide most such decisions. Most extinct Indo-European languages that are well documented, such as Old English and Old Norse, are included in the analysis, whereas those that are poorly known, such as all of the Scythian languages of the hypothesized proto-Indo-European homeland in the Pontic Steppes, are simply ignored. Likewise, living languages that have been intensively studied get preference over those that have not received similar scrutiny. Selecting and ignoring languages in such a manner may be convenient for formal modeling, but deep and systematic distortions result.

One of the more vexing issues in linguistics is the differentiation of languages from dialects. As in biological taxonomy, “lumpers” argue endlessly with “splitters.” Whether one accepts either position is immaterial for formal analysis, but one must maintain consistency. Bouckaert et al., however, shift wildly from fine splitting to gross lumping. Their treatment of Albanian exemplifies the former approach, as they divide it into four separate languages (listed as Albanian C, Albanian K, Albanian G, and Albanian Top). Albanian is indeed divided into Gheg and Tosk, which can easily count as separate languages, but no other dialects approach such status in most divisional schemes. The split-happy Ethnologue, however, does count two minor Albanian dialects in Italy and Greece—linguistically indistinct from Tosk in Albania—as separate languages, an approach that Bouckaert et al. chose to follow. In several other parts of Europe they adopt a similar method, classifying Breton as three separate languages, Sardinian as three, and the minor Slavic tongue of Lusatian (also known as Upper Sorbian) as two. But elsewhere in Europe they reject such fine divisions. They take Serbo-Croatian, for example, as a single language—yet oddly give it the ISO code for its Bosnian dialect [BOS]). They also regard German as one tongue; if they had remained consistent and followed the Ethnologue here, they would have included such languages as Bavarian, Mainfränkisch (East Franconian), Pfalzisch, Upper Saxon, and Swabian. In South Asia and the Iranian zone, the authors’ “lumping” tendency reaches an extreme. They count Hindi as a single language despite its pronounced dialectal variation (even the Wikipedia discusses the “Hindi languages”). They do the same with Lahnda, a dialect continuum that encompasses, according to the Ethnologue, eight separate languages.

Bigger problems for Bouckaert et al. are encountered in their basic enumeration of the Indo-European languages of Asia. Whereas the comprehensive Wikipedia family tree for the Iranian branch of Indo-European includes more than fifty extant languages, the selective approach of Bouckaert et al. considers only nine. The authors are even more remiss when it comes to the Indo-Aryan languages of northern South Asia. Punjabi, widely regarded as the world’s tenth most widely spoken language with more than 100 million speakers*, is nowhere to be seen. Whereas the authors list only fifteen extant I-E languages in South Asia, the Ethnologue counts more than 200. A few of the major Indo-Aryan languages discounted by Bouckaert et al. include Rajasthani (20 million** speakers), Bhili (1.5 million), Sylheti (10 million), Garhwali (3 million), Kutchi (2 million), Awadhi (38 million), Kannauji (6 million), and Bhojpuri (38 million). Yet in one part of the region, they abruptly switch to an idiosyncratic splitting approach, differentiating the Waziri dialect from Pashto, which they oddly call “Afghan.” The major split in this language, the north/south divide between “Pashto” and “Pakhto,” however, remains invisible.

By including European I-E languages much more readily than non-European ones, the authors evince a form of Eurocentrism. The same tendency is encountered in their treatment of extinct languages. For western and central Europe, nine dead languages are listed, including Old Irish, Old High German, Old English, and Old Prussian. Fair enough. But for northern South Asia, an area of roughly similar territorial extent and historical population levels, only Vedic Sanskrit makes the list. The many extinct Prakrit languages are excluded without reason. Here preservation bias cannot be the culprit, as a number of these languages are relatively well known, Even Pali, a semi-living language owing to its liturgical position in the Theravada Buddhist community, is inexplicably left off the map.

The Bouckaert model stumbles even more sharply in regard to extinct Iranian languages. Only two are included: Old Persian and Avestan. Major Eastern Iranian languages that were once important literary vehicles, such as Sogdian, Bactrian, Khotanese and Khwarezmian, are simply disregarded. So too are the less well-known Scythian languages of the steppe zone.*** As noted in previous posts, had the Scythian languages been included in the model, the geographical patterns generated would likely have been quite different. Although one could argue that the Scythian languages are not known well enough to have been used, such an argument amounts to an admission that preservation bias compromises the approach. The failure to include well-known Sogdian, on the other hand, cannot be attributed to preservation bias, and is perhaps rooted instead in carelessness, ignorance, or the simple desire to mold the data in order to reach pre-established conclusions.

As the supplementary materials make clear, the authors of the study are fully aware that they have excluded a number of Indo-European languages, both living and dead. Yet in an interview with Isabelle Boni for the general public, co-author Quentin Atkinson maintains that “we compare these words across all Indo-European languages” (emphasis added). Such a statement is careless and misleading at best.

*Admittedly, Western Punjabi is sometimes counted as one of the Lahnda languages, but not Eastern Punjabi.

** The 20 million figure used here assumes that Marwari is counted as a separate language, as it is in Bouckaert et al.

***It is also notable that the Indo-European Thracian language(s), along with the other Paleo-Balkan languages, are likewise ignored.

 

On Pathological Rationalism and Other Epistemological Issues in the Indo-European Controversy

Epistemological issues lie near the core of the Indo-European controversy. How we acquire knowledge of the world is, in turn, one of the abiding issues of philosophy. At the risk of gross simplification, most claims to knowledge are based either on reason and evidence or on deference to accepted authorities. The latter foundation is generally religious (“it must be true because it is in the Bible/Quran etc.”), but can be based on secular idea systems (“it must be true because Aristotle/Marx etc. said so”). But such authority-based beliefs are seldom open to question, and hence have little bearing on actual intellectual debates. By the same token, appeals to intuition or mystical experiences carry no weight in scholarly discourse. In this realm, reason and evidence generally reign supreme.

In the history of philosophy, two epistemological tendencies long vied for supremacy: rationalism, which holds that reason alone gives us solid knowledge, and empiricism, which stresses the acquisition of knowledge through the senses.* To a significant degree, the “struggle” between these two approaches to knowledge was settled long ago; in the humanities as well as the sciences, most agree, both are necessary. Reason alone can generate extraordinarily complex certainties (pure mathematics), but by itself tells us little about the world. Data acquired through the senses alone can provide richly factual descriptions, but little in the way of coherent exposition let alone explanation. Science thus progresses through an intricate dance of reason and fact, the one always playing off against the other.

But if this central epistemological issue was resolved long ago, allowing the florescence of science, “rationalism” and “empiricism” remain rooted as psychological inclinations. Some scholars seek grand, unifying theories, ideally rooted mathematical logic; others remain suspicious of such schemes and instead stress complexity, diversity, and exceptions. In the long run, such a difference in proclivities is mostly productive, as two approaches can correct each other. For this to work, however, partisans on both sides must play by the same rules. “Empiricists,” for their part, need to accept the provisional truth** of abstract models that are shown to correspond closely to reality. “Rationalist,” in turn, must relinquish, or at least reformulate, their models when they are contradicted by solid evidence.

Such “good-faith rationalism” is encapsulated by one of my favorite quotations in intellectual history, uttered by T. H. Huxley (“Darwin’s Bulldog”) in reference to the social theorist Herbert Spencer: Spencer’s idea of a tragedy, Huxley quipped, “is a theory killed by a fact.” History and accumulated facts have not been kind to Spencer’s sociology, and his social Darwinism, based on the “survival of the fittest” (his coinage, not Darwin’s) is now generally rejected. Yet as the quotation shows, Spencer was open to contradictory evidence and hence remained well within the domain of genuine scholarship.

I will readily admit that my own inclination is one of pronounced empiricism. I love complexity and I revel in uncertainty. Driven by the quest for knowledge, I find theory useful and indeed necessary, but only in limited doses. (I can’t recall the source, but one wit once likened social theory to cheesecake: highly satisfying when consumed at the end of a balanced meal, but nauseating when eaten to the exclusion of all other foodstuffs.) I find “a theory killed by a fact” more comedic than tragic—and the more powerful the devastated theory, the greater my satisfaction. If new findings were announced tomorrow that cast doubt on the General Theory of Relativity, I would feel like dancing in the street. But for now, I fully accept the Einsteinian view of the universe (at least I can take some comfort in Einstein’s failure to reach a unified field theory of physics, just as I smile at the embarrassing lack of empirical validation for the mind-boggling assertions of string theory). If I were to reject relativity merely because I do not like it, then my empiricism would cease to be a mere tendency and would instead become an intellectual pathology.

As can perhaps be gathered from this discussion, I regard the rationalism of the Science article on Indo-European origins and expansion as exemplifying such intellectual pathology. Fact after fact after fact show that the model does not work, yet all such empirical evidence is merely brushed aside rather than being effectively countered or even argued against. As a result, the entire approach becomes anti-scientific, despite the intention of its authors and the name of the journal in which it was published.

I am also convinced that excessive rationalism, whether pathological or not, has had negative consequences for social, economic, and political inquiry more generally. Consider, for example, graduate programs in international relations in the United States, programs that train many future national and global leaders. I have some experience here, having been interim director of Stanford University’s MA program in International Policy Studies (IPS) from 2004 to 2006. Here I encountered a curriculum focused on such rationalistic endeavors as econometrics and game theory. A course on “efficient market theory” also occupied a prominent position—a theory that was later blamed by such experts as Paul Volcker for contributing to economic crisis of 2008. Substantive courses on history, geography, and politics were also on the books, but instruction here was haphazard rather than systematic.

As interim IPS director, my main initiative was to hold an annual workshop to bring back successful alumni of the program to talk about their experiences in the “real world” and muse on how their educations at Stanford had prepared them—or failed to prepare them—for the challenges that they later faced. I asked each participant whether game theory had been of any value in their post-gradate careers; every one of them said that it had never even come up.*** Many of them, however, stressed how hard they had had to struggle to master basic world geography, expressing regret that their educations had not afforded them adequate preparation in this crucial area.

After I was relieved of my duties at International Policy Studies, the university enhanced the program, increasing the number of courses needed to gain a degree and ensuring that the curriculum would become even more “rigorous”—in other words, more heavily geared toward abstract, mathematical models. I rarely see IPS students in my classes any more, much to my regret. The guiding belief seems to be that the necessary knowledge of the world can be picked up casually on one’s own time, without the need for systematic instruction or intellectual scaffolding.

From this experience I have come to suspect that the real purpose of such programs is not to teach students about international relations, although that does occur to some degree. The primary function is rather to certify that degree holders are intelligent and disciplined enough to master the involved mathematics of game theory, econometrics, and so on, and compliant enough to brook no objections to doing so. As such, this pedagogical system works much like that of Victorian Britain, when imperial administrators were trained through immersion in the Greek and Latin classics. Such a pedagogical approach is far from worthless, but neither is it ideal. We should thus not be surprised that our foreign policy experts and political leaders continue to make elementary blunders about the world. Such gaffes are not only embarrassing, but they also come at a high political cost.

*Admittedly, constructivist and pragmatist epistemological orientations do not fall into either the empiricist or rationalist camps, as they emphasize the making of knowledge. Only the most extreme forms of constructivism, however, jettison reason and evidence.

** Unassailable truth is a property of mathematics, not science. Science rests on verification, but it is accepted by almost all philosophers of science that such verification is never absolute.

*** This is not to argue that game theory has no practical applications, but only to insist that such applications are relatively limited. In several instances, moreover, game theory has failed empirical tests. As the authors of a 2006 article in Theoretical Population Biology blandly put it, “Economists and psychologists have been testing Nash equilibrium predictions of game theory models of human behavior. In many instances, humans do not conform to the predictions.”

 

The Hazards of Formal Geographical Modeling in Bouckaert et al.—and Elsewhere

The linguistic and historical failings of the Bouckaert et al. Science article have been examined in previous posts and will be revisited in subsequent ones. The model’s cartographic miscues have also been dissected. The present post takes on the more abstract geographical issues associated with the authors’ approach.

The Bouckaert et al. article is overtly geographical. “Mapping” is the first word in its title, and the second sentence focuses on “explicit geographical models.” But the geographical model employed is so stripped of substance as to become almost anti-geographical. No allowances are made for actual geographical features other than the basic differentiation of “land” and “water” (with the latter term apparently meaning “seas and oceans”). In one of several sub-models employed, the authors assume that “movement into water is less likely than movement into land by a factor of 100.” But the ease of movement over water depends on the technology at hand and the cultural proclivities of the people in question. Would one ever make such an assumption when modeling the spread of the Austronesian language family, which depended on the double-outrigger canoe? The language map of the Philippines posted here clearly shows that in this case it is water that links linguistic communities and land, specifically the mountainous interiors of the main islands, that separates them. It is also notable that those who model pre-modern transportation networks generally assume that movement over water is vastly more efficient than movement over land.*

In the Bouckaert et al. model, geography is essentially reduced to geometry, which in turn becomes merely a matter of distances and directions. Mountains, passes, rivers, badlands, dense forests, and so on account for nothing. Such a stripped-down view of geography is convenient for mathematical modeling, but only at the expense of truth. We know from numerous historical studies that the movement of peoples (which is not necessarily the same as the movement of languages) is often guided by variation in the physical landscape. Agricultural settlers typically sought out appropriate soils, such as loess in the case of Neolithic farmers venturing into central Europe; heavy clay soils were avoided for millennia. Pastoralists, by the same token, sought out good pastures; it is no accident that the equestrian Magyars, like the Huns and Eurasian Avars before them, settled on the grassy Alföld of the Danubian Basin. Agricultural settlers, like the supposed carriers of Indo-European languages in the Bouckaert model, do not simply “diffuse” over a landscape like pathogens jumping from host to host. The process is rather more intentional, and much more molded by the variegated features of actual physical landscapes.

Bouckaert and company’s modeling is by no means the first attempt to flatten geography into geometry. I am particularly concerned about this maneuver because an earlier attempt to do the same thing within geography greatly weakened the discipline. I am often asked why geography is such a weak field in the United States, absent from most leading universities. The issue is complicated, but a key event was geography’s own “quantitative revolution” in the early 1960s, an intellectually aggressive refashioning the discipline into a positivistic, statistics-dominated “science” centered around the discovery of supposedly invariant spatial laws. To allow the statistical methods that the young revolutionaries favored, geography had to be reduced to distance and direction. Most of their studies began by assuming that the landscape being investigated—or merely hypothesized—was an “isotropic plain,” completely uniform and featureless in all directions. Such an assumption rules out everything that differentiates actual landscapes. The main result was mathematically elegant but empirically questionable and often worthless conceptual structures.

A prime example of geography losing its way was Central Place Theory, initially developed in Germany in the 1930s and then celebrated by Anglo-American geographers as a conceptual breakthrough in the 1960s. Central Place Theory postulates that the distribution of cities and towns of various sizes follows regular hexagonal patterns generated automatically by retail marketing behavior. The theory is almost entirely deductive, beginning with a set of assumptions and then working out their logical consequences. The assumptions**, however, do not hold, and as a result the theory did not work as promised. It is true that in some relatively flat areas urban patterns approximate the expected form, but in such cases administrative hierarchies generally played a much larger role than retail marketing. In regard to the United States, moreover, geographer James Vance showed in the early 1970s that wholesaling was far more important that retailing in determining the location and relative standing of major cities. Vance was attacked at the time not so much for being incorrect as for challenging the new theoretical underpinnings of a discipline in the desperate thrall of physics envy.

It is difficult to exaggerate the damage done to geography by the quantitative revolution. Suddenly, cultural and historical geography were deemed trivial, widely viewed as examining little more than noise that distracted attention from the underlying spatial laws. Exploring the complex interactions found in any given region now seemed quaint if not pathetic, a mere descriptive exercise deemed insignificant when contrasted with mathematically rigorous and supposedly scientific investigations. For the same reason, world geography—the core of the field, as constituted since antiquity—virtually vanished from the curriculum. Teaching “the world” came to be viewed as the mere cataloging of facts, failing to provide the conceptual purchase necessary for real understanding. Field study in distant lands was for the same reason actively discouraged by many; why go to Ghana and suffer the inconveniences and indignities of travel in a poor country when the same invariant spatial laws could be discovered in Iowa in the comfort of one’s own lab? Armed with such scientific-seeming techniques, geographers could now reach the height of their profession without knowing much of anything about the actual world.

Needless to say, the “laws” discovered by the quantitative revolutionaries of the early 1960s seldom proved very powerful, just as the explanations they offered seldom had much explanatory power. It is no accident that the doyen of the movement, David Harvey, abandoned the entire effort soon after publishing Explanation in Geography. In the early 1970s, Harvey—“the 18th most-cited intellectual of all time in the humanities and social sciences”— abruptly converted to Marxism, a transition followed by many other geographers at the time. Within a few years, radically leftist social theory had displaced positivism as the “cutting edge” of the discipline. Despite the huge intellectual shift that this entailed, including the general rejection of mathematical methods, the insistence on high theory and the corresponding denigration of empirical study remained firmly entrenched. Throughout this period, important geography departments continued to be shuttered by budget-conscious university administrations.

Admittedly, a number of scholars did attempt to link the abstract models of geography’s quantitative revolution to real landscapes, with mixed results. A key figure here was the preeminent historical anthropologist of China, G. William Skinner (1925-2008). Skinner had become enamored of Central Place Theory in the 1960s, which he used to “explain” the location of cities and towns in China’s Sichuan Basin. He later turned his attention to larger regions, brilliantly arguing that the structure of Chinese history had to be conceptualized around a handful of “physiographic macro-regions” loosely coincident with drainage basins. Skinner subsequently tried to integrate such regional analysis with Central Place Theory as well as several other abstract spatial schemas into what he called the “Hierarchical Regional Space Model.” He was convinced that this model applied to any preindustrial agrarian society, and he went to heroic efforts to show that it worked equally well in France and Japan as in China. In the Skinner model, geographic cores and peripheries of varying scales coincide with drainage basins to form all-encompassing spatial structures. Everything from the average age of marriage to the average wage rate was supposedly predicated on positioning within such highly structured spaces. Unfortunately for Skinner, empirical verification proved elusive, and his project essentially came to naught. All that his three decades of lavishly funded research produced was a few minor articles and a massive, idiosyncratic cartographic archive. As it turns out, human geography is an intrinsically complex affair that is not so easily reduced to clean conceptual structures.

 

 

 

More recently, genuine progress has been made in applying technical analysis to geographical issues. The key has been to use such techniques as tools rather than ends in themselves. Geographical Information Systems (GIS), for example, offer no “explanations” on its own, but rather allows scholars to more effectively uncover patterns and visualize evidence and complexity, as noted by Andrew Zolnai in a comment on the previous GeoCurrents post.

Quentin Atkinson claims that he would like to refine his own model of Indo-European expansion to encompass actual geographical variation beyond the land/water dichotomy. Doing so would surely be advantageous, but as long as his underlying assumptions fail to withstand scrutiny, the end result will still be untenable. Again, this is not to argue that abstract models are of no use in geographical or historical analysis, but only to insist that they be applied with great care.

* For an impressive model of the transportation networks of the Roman Empire, see Orbis: The Stanford Geospatial Network Model of the Roman World.

** Walter Christaller, who originated Central Place Theory, made the following assumptions, as outlined on the Wikipedia article on the theory:

▪                an unbounded isotropic (all flat), homogeneous, limitless surface (abstract space)

▪                an evenly distributed population

▪                all settlements are equidistant and exist in a triangular lattice pattern

▪                evenly distributed resources

▪                distance decay mechanism

▪                perfect competition and all sellers are economic people maximizing their profits

▪                consumers are of the same income level and same shopping behaviour

▪                all consumers have a similar purchasing power and demand for goods and services

▪                Consumers visit the nearest central places that provide the function which they demand. They minimize the distance to be travelled. No provider of goods or services is able to earn excess profit (each supplier has a monopoly over a hinterland)

 

On Mathematical Modeling and Inter-Disciplinary Work in Historical Linguistics: A Reply to Alexei Drummond—and a Friendly Critique of the Field

We would like to thank everyone who has posted comments on our recent posts on Indo-European linguistics, whether favorable or critical. As we have been highly critical ourselves, we can only expect the same in return; such is the give-and-take of the scholarly endeavor. We  will post detail replies to critical comments next week, after Asya Pereltsvaig returns from her travels. The present post responds only to the first comment posted on GeoCurrents by one of the co-authors of the Science article that we have taken to task. In that response, Alexei Drummond takes on some significant epistemological and methodological issues that demand a considered answer. As Drummond argues:

Personally I would love to include more direct evidence-based information into the computational analysis to correct the details (and see if that changes the main inference of the location of the origin), but that would require the linguists and archaeologists to actually embrace the value of computer models to synthesize large amounts of data. How can a human mind, however elegantly expressed its written conclusions, correctly balance the thousands of items of evidence to provide a probabilistic statement about history in a way that others can verify (i.e. The Horse, the Wheel and Language)? What is good about our approach is that the simplifying assumptions are clearly stated and can be improved upon in subsequent analyses. I just wish that the historical linguistics crowd would try *constructive* rather than destructive criticism for a change. We want what you want: to determine what happened. So as we are all scientists, we should work towards common ground, shouldn’t we?

Try as we might, we find little to disagree with in this eloquent appeal for the use of computational techniques and interdisciplinary research. As Asya Pereltsvaig has emphasized, we respect the work of linguists who use such methods in their own research. We advance no objections to computational methods per se, but rather to this specific application. Successful modeling cannot rest on unsubstantiated and most likely false assumptions about language spread and diversification, cannot disdain verification efforts, cannot be inherently unfalsifiable, and cannot be consistently contradicted by the empirical record. Drummond is surely right that well-crafted mathematical models can be continually adjusted to better fit the reality that they seek to represent—but only if they rest on solid foundation. Certainly the model under consideration could be sharpened, as has been suggested by another co-author, by incorporating elements of physical geography beyond the water/land dichotomy; such an improvement could weed out such blunders as having the Tocharians’ advance along 20,000-foot ridges while bypassing their eventual home in the Tarim basin. But as long as the model rests on the untenable assumption that languages spread through a contagion-like process and diverge in speciation-like events, the result will still be of little value. Subsequent posts will examine how languages do spread and change. As we shall see, such linguistic processes are vastly more complex than the scenarios posited by the Science team. That does not mean that they cannot be mathematical modeled, only that any such efforts will have be much more involved than what we have seen thus far.

We therefore hope that Alexei Drummond will continue to apply his formidable skills to the problems of language spread and diversification. We also hope that in the future he can collaborate not merely with other modelers, scholars whose skill sets overlap to a great extent, but also with experts with complementary skills and frameworks of knowledge. In particular, such work must be done with a bone fide Indo-Europeanist; collaborators with proficiency in world history, geography, and linguistics more generally would also prove highly beneficial.

Although is easy for us to dish out such advice, it would probably prove much more difficult for anyone to take it. As Drummond notes, it seems likely that many if not most historical linguists would rebuff any such invitations for collaboration. Here it becomes necessary for us reverse our critical attention and apply it to historical linguistics itself. Although this series of posts seeks to vindicate the field, we are convinced that a successful defense of any beleaguered intellectual enterprise demands a self-critical* eye.

Historical linguistics is currently in crisis not only because of unsubstantiated attacks or the failure of others to appreciate its intellectual achievements; it is also languishing because its practitioners have failed to meet the challenges that they face. All told, they have remained too insular and too comfortable with their own research paradigms. Emphasizing, like good scientists, the narrow acquisition of knowledge along established research fronts, few members of the guild have been willing to stand back and address the larger implications of their own work for the study of human pre-history (and history), let alone offer edification for a general audience. By the same token, few historical linguists have collaborated extensively with scholars in other disciplines. It is no accident that the three best-known scholars in the debate on Indo-European origins are (or were) all archeologists: Maria Gimbutas, Colin Renfrew, and David Anthony.

Historical linguists might reply that progress in linguistic research demands tightly focused inquiry and highly specialized disciplinary techniques, and would thereby gain little through interdisciplinary collaboration. Such arguments make sense when applied to specific issues, but collapse when it comes to broader matters, such as the origin of the Indo-European family, which is as much a matter of history and geography as it is of linguistics. And regardless of whatever intellectual arguments can be made for highly focused specialization, pragmatic considerations call for a different approach; it is a fact that historical linguistics is a diminishing field that has been unable to fend off mass-media celebrations of encroachments on its own terrain. If their field is to survive, historical linguists much realize that they can no longer be satisfied merely by communicating with each other. They not only must engage more with other scholars, but they must also reach out to the educated public.

Our charge is perhaps not as difficult as it might seem. The public is deeply interested in such issues, as attested by the articles in the popular press on the Bouckaert et al. paper. Asya and I have discovered the same interest while teaching on the intersection of linguistics, history, and geography in Stanford University’s Continuing Studies (adult education) Program, where our classes are consistently among the most popular offerings. Although we would like to think that our teaching skills have something to do with our enrollment numbers, we realize that they stem largely from demand for instruction on a topic that many people find intrinsically fascinating. Next winter, we will be teaching a class specifically on the geo-history of the world’s major language families. But in looking for a text that draws together the major issues within a single, comprehensible framework, we find ourselves frustrated. The best work that we have located thus far is a 1994 Scientific American article entitled, “World Linguistic Diversity,” by none other than archeologist Colin Renfrew. It is unfortunately short and somewhat dated, and it is almost certainly wrong on such major issues as the origin of Indo-European and the existence of Altaic. We do find it odd, and rather sad, that no comparable work has, to our knowledge, been produced by a historical linguist.

* “Self-criticism” is not the best term here, as neither of us is a historical linguist. I am a historical geographer and Asya Pereltsvaig is a linguist who specializes in syntax. What we thus offer is perhaps best described as “friendly criticism.”

 

Why the Indo-European Debate Matters—And Matters Deeply

As expected, we have received a few complaints from friends, acquaintances, and Facebook-followers in regard to the current Indo-European series. “Why get so exercised over a single article,” some ask, reminding us that science is a self-correcting endeavor that will eventually winnow away the chaff. Others question the entire enterprise, wondering why we would care so much about such an obscure topic.

We agree that science is, in the long run, a self-correcting undertaking, which gives it vast power. But self-correction does not come automatically; it takes work, which we are happy to provide. And in the short-term, counterfeit research can do great harm, as the Lysenko Affair in the Soviet Union so well demonstrated. We also find it deeply troubling that a nonsensical article would not only be accepted for publication in one of the world’s premier scientific journals, but would immediately be trumpeted in the mass media for “solving” one of the key mysteries of human pre-history. The episode uncovers a whiff of corruption in the scientific-journalist establishment that needs a blast of fresh air.

In regard to the second set of complaints, we must reject them outright. The Indo-European issue is not obscure, trivial, or unrelated to pressing issues of our day. In fact, it is difficult to locate a single topic of historical debate that has been more ideologically fraught and politically laden over the past 150 years than that of Indo-European origin and expansion.

Indo-European studies took on a heavy ideological burden in the late 1800s, a development that would indirectly lead to the most hideous examples of genocide and mass-murder that the world has ever witnessed. The supposedly superior “Aryans” of Nazi mythology were none other than the speakers of Proto-Indo-European (PIE). Nazi propagandists conjured their own wildly off-base theories about I-E origins, but their fantasies had roots in the scholarly endeavors of German philologists. And while Nazism was militarily crushed and its ideological foundations pulverized, the movement refuses to die. Indeed, it seems to be experiencing something of a revival in eastern Germany, Hungary, and—of all places—Russia. On numerous occasions, I have found myself directed by Google to the odious “Stormfront” website while searching for images and ethnographic descriptions of various Eurasian ethnic groups. The Aryan myth also continues to feed racially troubling ideologies outside of Europe, particularly in Iran and northern India.

Even scholars who have sought to undermine the noxious notion of the Aryan Herrenvolk have occasionally generated their own benign but still fantasy-laden counter-narratives. The key figure here is the late Lithuanian-American archeologist Marija Gimbutas, noted for placing the I-E homeland in the Pontic Steppes. Gimbutas’s scientific research was solid, and we suspect that she was largely correct in locating the PIE homeland. But in seeking to turn the Nazi view on its head, she went too far—and some of her lay followers went much too far. In the feminist retelling of the tale that she inspired, the Aryans become the Kurgans, a uniquely violent, male-dominated people who destroyed the peaceful, gender-equitable if not matriarchal civilization of “Old Europe.” In Riane Eisler’s 1988 treatise, The Chalice and the Blade: Our History, Our Future, the Kurgan conquests are seen as ushering in a global age of male domination and mass violence. The work was a bestseller, blurbed by noted anthropologist Ashley Montagu as the “most important book since Darwin’s Origin of Species.”

Eisler’s global vision failed from the onset: as male domination characterized almost all historically known human societies, it cannot be attributed to a single ancient people located in one particular part of the Earth. Recent research has also tended to undermine many of her more specific claims. The Old Europeans were probably not as peaceful and female-centered as they had been portrayed, and the PIE speakers and their immediate descendents were probably not so insistently androcentric. Certainly the early Indo-European speakers were no strangers to violence and domination, but how do we account for the female Scythian skeletons from the Kurgan homeland tricked out in military gear? Perhaps Herodotus was on to something when he wrote of Amazon tribes in the area. More to the point, we now understand that the early Indo-European-speakers could not have simply invaded Old Europe and subjugated its inhabitants, as they lacked the state-level forms of military organization necessary for wide conquests. As Anthony shows so well in The Horse, the Wheel and Language, the process was almost certainly one of gradual incursions, marked by both social predation and mutualism, that allowed the militarily advantaged, semi-pastoral, equestrian I-E speakers to slowly spread their forms of speech. And while their languages did indeed expand over vast areas, they did not simply replace pre-existing tongues. Almost everywhere, older linguistic elements survived. Major non-I-E substrates characterize such I-E subfamilies as Germanic and Greek. A huge problem for both Nazi ideology and the Gimbutas/Eisler thesis is the fact that most of the Germanic root words pertaining to war are non-Indo-European. The mysteries here remain deep.

Considering the misuses to which the issue of I-E origins has been put, it is understandable that some people would want to reject the idea that the original speakers were war-like horse-riders from some remote, northern homeland. All such troublesome interpretations would vanish if I-E expansion could instead be linked to the gradual movement of simple farmers from the Near Eastern agricultural heartland into the sparsely settled lands of Mesolithic Europe. But if the evidence indicates otherwise, as it most assuredly does, the result is merely another myth. Scientific responsibility demands the search for truth, even if the truth leads into uncomfortable areas.

Regardless of the complications introduced by ideological distortions, investigations of I-E origins and expansion have a huge bearing of the study of human prehistory. Indo-European, after all, is by far the world’s largest language family when counted by the number of speakers. Linguistic evidence about the family’s spread tells us much of significance about the historical development of a vast section of the Earth’s surface over many centuries, even millennia. Studies of human prehistory depend crucially on three lines of evidence: those derived from archeological digs; from genetic studies; and from linguistics. Over the past decade, much progress has been made in bridging linguistic and archeological evidence, as demonstrated by David Anthony’s The Horse, the Wheel, and Language. To the extent that the burgeoning genetic investigations of Y- and mitochondrial DNA lineages can be incorporated into this linguistic-archeological nexus, a much richer understanding of the prehistoric human past awaits. For a path-breaking interdisciplinary foray into this territory, see Andrew Shryock and Daniel Lord Smail, Deep History: The Architecture of Past and Present.

Such developments, however, risk being cut short if the field of historical linguistics continues to languish. Further progress will depend not only on linguists carrying out their own research, but also on their passing down of their knowledge and techniques to future generations of students. Such lines of intellectual transmission, however, are threatened by cutbacks in linguistic departments, as well as by the assaults on the field mounted by interlopers who have somehow managed to convince many scientists that linguistic evidence is of little account when it comes to studying the history of languages. To the extent that the Anatolian hypothesis gains ground among archeologists and geneticists on the basis of the recent Science article, our collective knowledge of the past will take a sharp step backwards.

The most troubling aspect of the affair, however, is not the threats that it poses but rather the revelations that it makes about the integrity of the scientific and journalistic establishments. A scholarly journal such as Science is duty-bound to vet any potential contribution through established experts. Yet I have a difficult time imagining that the article in question was subjected to proper peer-review through any qualified specialist in the field in which it sits: Indo-European historical linguistics. Either the article was never sent to a competent linguistics reviewer, or the resulting review was irresponsibly ignored. And yet this is not the first time that a preposterous article on historical linguistics has appeared in Science (and also in Nature), as we shall see in future posts. Have the editors of this august journal decided that the discipline of linguists has somehow failed, and that its field of historical inquiry should therefore be handed over to epidemiologists and computational modelers? If so, on what possible grounds was this decision reached? Unless such questions can be answered, I have a difficult time avoiding the conclusion that the editors of Science have betrayed the basic canons of academic responsibility.

While contemplating these issues, I am continually reminded of the Sokal Hoax, an episode that revealed the vacuity of postmodernist literary theory and “science studies” in the mid-1990s. This affair came to my attention when I was participating in the conference on “The Flight from Science and Reason” organized by the New York Academy of Sciences. A rumor began to circulate among the attendees that a noted physicist and mathematician with solid leftist political credentials was perpetrating a prank that would debunk Social Text, perhaps the leading journal of poststructuralist theory, and in so doing deflate the pretension of those who sought to undermine science in the name of human liberation. Sokal’s article, entitled “Transgressing the Boundaries: Towards a Transformative Hermeneutics of Quantum Gravity,” argues that since science is merely a social construct, quantum gravity, especially as interpreted through the new-age lens of “morphogenetic fields,” can have progressive implications for political action. The paper was accepted and duly published, despite the fact that it was, as its author soon admitted, “a pastiche of Left-wing cant, fawning references, grandiose quotations, and outright nonsense . . . structured around the silliest quotations [by postmodernist academics] he could find about mathematics and physics.” Sokal designed the hoax as a kind of test of the allegations made by Paul Gross and Norman Levitt in their book Higher Superstition: The Academic Left and Its Quarrels With Science. As he discovered, even the most palpable nonsense imaginable could be published in Social Text so long as it sounded good and flattered the editors’ ideological preconceptions.”

While the Sokal Affair was a purposive hoax, the members of the Boukaert team evidently believe that their article constitutes a contribution to knowledge. But what the authors think about their own work is of no significance, as the arguments they make must stand on their own. Had Alan Sokal actually believed that the “construction” of quantum gravity could be a politically progressive act, would his article have been any less nonsensical? The current authors have thus perpetrated an unwitting hoax, but the end results should be no less embarrassing for the editors of Science than the Sokal Affair was for those of Social Text. Boukaert et al. begin by improperly framing the problem, and then go on to err at every turn. It is not so much that the article’s conclusions are incorrect, but rather that every assumption it makes, every technique it employs, and virtually every “fact” that it marshals is either incorrect, inappropriate, or misleading. Yet this work was published in one of the world’s most prestigious scientific journals. Something here smells rather fishy.

But if the mere publication of the article in Science raises questions about intellectual integrity, its immediate celebration in the pages of the New York Times points to a deeper mire. Science publishes hundreds of articles each year, a tiny fraction of which are ever mentioned in the New York Times, let alone showcased in the newspaper’s main section. Yet the Times has gone out of its way on more than one occasion to trumpet “contributions” to linguistic history from members of the Bouckaert team, specifically Quentin Atkinson. Evidently, the editors of the supposed newspaper-of-record in the United States have concluded that the work of these scholars constitutes one of the most important scientific stories of the past decade. On what possible basis could such an assessment have been rationally made?

Journalists, like academics, are expected to adhere to certain standards of professional behavior. Unless they are writing for the editorial pages or are explicitly employed in “advocacy journalism,” reporters are expected to remain as objective as possible, not letting their own interests, political predilections, or friendship and kin networks direct their work. Such guidelines are impossible to follow to the letter, and as a result complete objectivity is a mere ideal. But such an ideal is still supposed to influence behavior in self-respecting media outlets, eliminating the excesses of partisanship. In the present case, however, all such ethical fetters seem to have been removed. Nicholas Wade’s reporting on this issue has been non-objective in the extreme. One can only speculate as to why Wade has been determined to act as Quentin Atkinson’s pocket journalist, ever ready to proclaim his latest clumsy foray into linguistics as a scientific breakthrough on par with plate tectonics.

To appreciate the level of corruption revealed by the Bouckaert Affair, imagine that a parallel series of events occurred in a different walk of life, such as business. Imagine, for example, that an established financial firm with a reasonably good reputation decided to apply its mathematical models to an unrelated business, one in which both the leaders and employees of the company had no experience. Being ignorant of their new field, they made a number of naïve and ultimately untenable assumptions about how it operates, and thus when they applied their favored methods, unexpected breakdowns occurred. Soon the firm began to hemorrhage money. But rather than admit to their failure, the managers instead crowed about their success, hiding their mounting losses in misleading accounting sheets and obscurely written reports. But even as the company began to collapse, its reputation strengthened and its stock-market valuation rose. Such gains, it turns out, stemmed from glowing reports on its new venture in the business media, most notably the New York Times. The most substantive Times’ piece on the venture appeared not in the paper’s business pages, but in its main news section, gaining it a particularly wide readership. The fact that it was written by the former editor of its business section, a person widely regarded as one of the country’s leading economic journalists, helped propel the story. For a while, it appeared as if the firm could do no wrong. And then …

In the world of commerce, such a story would end with the quick death of the firm, as well as that of its business model. To the extent that any company making consistent losses will eventually fail, business—like science—is a self-correcting enterprise. Failure in business, however, is generally more pressing than it is in science, as rather more money and power is typically at stake. Intrinsic error can linger in science for decades, as demonstrated by the prolonged resistance of geologists to the ever-mounting evidence for continental drift. In a field as marginal as Indo-European studies, well-funded pseudo-scientific works could withstand invalidation by under-funded scholars for many years. In the popular imagination, moreover, erroneous ideas can escape correction altogether, lodging so firmly as to be all but irremovable by evidence. Examples include the widely known non-facts that the Eskimo languages have a multitude of words for snow, and that Europeans before Columbus thought that the world was flat. The Indo-European Affair, in short, matters, and matters deeply. I find it cause for deep concern, and as a result I will continue to write about it.

But after one more post, the current series on Indo-European origins will go on hiatus for a few weeks. Both Asya and I must travel for a short period, so blogging in general will be light for the next week or so.

 

Quentin Atkinson’s Nonsensical Maps of Indo-European Expansion

The website that accompanies “Mapping the Origins and Expansion of the Indo-European Language Family” (August 24 Science), maintained by co-author Quentin D. Atkinson, proudly features several maps that allow the easy visualization of the patterns generated by the model. One is a conventional map that purports to show “language expansion in time and space,” depicting and dating the spread of Indo-European languages through a red-to-blue color scheme. The other cartographic product is a sequence of numerous map-frames that ostensibly shows Indo-European (I-E) expansion from the seventh millennium BCE to 1974 CE. This Google-Earth-based animated map, or “movie,” as Atkinson calls it, is explained in terms that are at once simplistic and cryptic:

Watch the Indo-European expansion unfold. This movie shows how our model reconstructs the expansion of the Indo-European languages through time. Contours on the map represent the 95% highest posterior density distribution for the range of Indo-European.

The analysis that I provide below takes these maps on their own terms, as advertised: as if, in other words, they indicate what Atkinson and his colleagues believe to be the “unfolding” of the Indo-European language family in “time and space” as substantiated by their mathematical model. But if one reads the fine print found elsewhere, one discovers that the maps are not actually what they purport to be. The authors admit up front that these figures deliver incorrect information, owing to the fact that crucial pieces of data were excluded from the model:

This figure needs to be interpreted with the caveat that we can only represent the geographic extent corresponding to language divergence events, and only between those languages that are in our 
sample. The rapid expansion of a single language and nodes associated with branches not represented in our sample will not be reflected in this figure. For example, the lack of Continental Celtic variants in our sample means we miss the Celtic incursion into Iberia and instead infer a later arrival into the Iberian Peninsula associated with the break-up of the Romance languages (and not the initial rapid expansion 
of Latin). The timing represented here therefore offers a minimum age for expansion into a given area.

This admission is extraordinary, as it amounts to saying that “even though our data set is too incomplete to produce accurate results, our model should nonetheless be regarded as powerful enough to settle the most highly debated topic in historical linguistics,” and that “even though we make no claims as to the earliest dates in which Indo-European languages were established in any given area, our approach still shows that the language family originated in Anatolia.” I do not think that I have ever encountered a more flagrant example of “having one’s cake and eating it too” in an academic work. In fact, as is demonstrated in a previous discussion thread that is reproduced below, the “caveat” itself errs at virtually every turn.*

In a comment on the previous post, co-author Alexei Drummond framed the study’s limitations in more direct language:

Our geographical reconstructions are only for the language lineages that are direct ancestors of the particular sample of IE languages we analyzed. Our inferred geographic distributions don’t say anything about the full extent of IE languages at any time past or present.

If the geographic patterns depicted on the maps say nothing about the “full extent” of I-E languages “at any time,” why are viewers of the animation invited to “watch the Indo-European expansion unfold”? The claim is evidently inherently misleading. But as we shall see below, the problems run much deeper, as in numerous instances the maps fail to accurately show the partial extent of I-E languages. But before delving into such specificities, a few words about the mapping project in general are in order.

Many problems plague the authors’ cartographic depictions. The two maps, static and animated, fail to correspond in their details, often in a glaring manner. The animated map, moreover, lacks anything approaching a key, and hence is difficult to interpret. The temporal framing of the two maps is oddly displaced, as the “movie” purports to take the story up to 1974 CE, whereas the static map terminates at roughly 1800 CE. Potentially confusing is the fact that the static map gives dates in “BP,” or “before present” (which by conventions means prior to 1950 CE), whereas the animated map uses the historically Christian calendar. Both maps, it is essential to note, show only the expansion and not the contraction of Indo-European, although this essential feature also goes unmentioned. Areas that ceased to be Indo-European speaking centuries ago, such as the supposed Anatolian heartland, continue to be shaded as I-E throughout the animation.

Although the contours mentioned in the “explanation” of the animated map are visible in the greenish shading, the overall coloration scheme remains vague. As the animation unfolds, the hypothesized I-E homeland circa 6500 BC—Anatolia, the Caucasus, the northern Middle East, and the greater Aegean—is washed in yellow, whereas later geographical addition to the realm appear in shades of green. Yet at approximately 2225 BCE, most of the heartland abruptly turns green as well, with the exception of a swath extending from Cyprus through what is now Lebanon to central Iraq and two areas on either side of the Black Sea. Another such abrupt color switch occurs later in the animation.

Also unspecified are the thick green lines, which begin as a several-pixel splotch at roughly 6200 BCE that gyrates in place for about 1,500 years before spreading across the map to form a web. An unwary reader might assume that such lines indicate pathways of migration, but he or she would be mistaken, as movement along specific corridors defies the underlying diffusional model, which postulates gradual expansion along broad fronts with scattered outliers pushing into new territories. The lines actually indicate supposed examples of family-level linguistic divergence. Such relational links often extend into areas that are not shaded as I-E; note, for example, the green lines pushing into unmarked western Russia and northern Sweden on the first map. A naïve reader might wrongly assume that such extensions signal relatively recent movement, with little actual settlement to date.

As mentioned above, the static map and its animated companion do not correspond well. Unlike the animated version, the conventional map shows Corsica, the Balearic Islands, Crete, and Cyprus, for example, as never having been occupied by Indo-European speakers. The animation, to the contrary, puts Cyprus in the initial I-E homeland in the seventh millennium BCE. (Both depictions of the island are incorrect; the first known language of Cyprus, non-I-E Eteocypriot, was supplanted by the Greek (I-E) dialect of Arcadocypriot in the late Bronze Age.) Also notable is the static map’s depiction of Indo-European occupation in areas unmarked on the animated map, including western Norway and western Russia. (Neither map manages to show northern Norway as ever having been occupied by Indo-European-speakers.)

Although the discrepancies between the two maps are never explained, a few of them might be deduced. Consider, for example, the different treatments of western Russia in the maps posted here. In the animated depiction of 1974, only a small portion of this region is shaded as ever having been I-E speaking, yet the static map shows a sizable area as having become largely Indo-European over the past 500 to 1,000 years. This map depicts the distribution of I-E languages in western Russia with discontinuous blotches, seemingly placed at random, which would apparently indicate that the language family spread into this area in a spatially sporadic manner and never managed to fill in the gaps. On the basis of this particular disparity, one might assume that only areas of (supposedly) continuous I-E occupation receive shading on the animated map frames. But if this is indeed the case, the guideline is apparently reversed elsewhere. Note that sizable portions of Central Asia are similarly splotched on the static map, yet are shaded on the animated map. The area that now constitutes Kyrgyzstan is fully shaded on one map, yet remains almost entirely blank on the other. A swath across what are now Syria and Iraq is blobbed red on the static map, apparently indicating partial I-E expansion in the Neolithic, yet is blanketed with yellow on the animated map from the earliest frames. Cartographic consistency is evidently not high on the authors’ agenda.

Far more troubling than disparities between the two maps, however, are inconsistencies between both of them and the historical record. Overall, the fit between the modeled spread of I-E languages and what we know of its actual expansion is poor. In pointing out some of the more flagrant errors, I will begin at the end of the “movie,” which shows the accumulated spread of I-E languages to 1974 CE, contrasting it with the depictions on the static map. I will subsequently work backward in time on the “historically unfolding” movie, pointing out crucial errors for several particular periods. To reiterate, I will consider what the maps literally show, ignoring for the most part their hidden meanings.

As mentioned in the previous post, the most obvious blunder in the 1974 depiction is the omission of Russia and Eastern Ukraine from the Indo-European-speaking realm. On the final map frame, the only parts of Russia that are shaded are the Pskov district, the far southern Crimea, and the largely non-I-E-speaking northern Caucasus. The same map also fails to mark other areas long characterized by I-E speech, such as southern Iberia, Balochistan, southern Sri Lanka, and Orissa in eastern India. The static map, however, does successfully mark most of these places as I-E speaking, yet conversely errs in placing several non- (and never-) I-E-speaking areas in the Indo-European zone, such as northeastern Sri Lanka as well as Manipur and environs in northeastern India. Unlike the animation, this map does show I-E in Western Russia, but only in the past 1,000 to 1,500 years, as discontinuous as late as 1800 CE, and as disappearing entirely in far western Siberia. Such depictions, needless to say, are erroneous; although pockets of Uralic languages persist to the present in eastern European Russia and Western Siberia, the bulk of the region was solidly Russian speaking well before the termination date of 1974. Compounding such errors is the sprinkling of bluish dots in southern Tibet, northern Nepal, and northwestern Burma. Some of the most inhospitable parts of the central Sahara are also vaguely marked with blue to show I-E expansion over the past millennium.

The static map is, in a word, preposterous. What possible Indo-European language could ever have been spoken in the Kachin uplands of Burma over the past 1,000 years, much less in essentially uninhabited areas of the Tibetan Plateau and the Sahara Desert? Note as well that northern Tunisia and northeastern Algeria are clearly marked as having been substantially I-E speaking in recent centuries. On first glance, I wondered whether the authors were trying to show the spread of Latin in this region under the Roman Empire; if so, the coloration is wrong, as blue indicates I-E expansion in the past 1,000 years. But as we have seen, Latin does not count in Atkinson’s scheme, as it supposedly spread too quickly as an individual language (it actually spread quite slowly here; non-I-E Punic continued to be spoken in the region as a minority language up to Augustine’s time). But as it so happens, the blue splotches around Tunis do not indicate anything nearly so specific. Rather, like the light red blobs in central Arabia, they merely show that the model occasionally spits out randomly (and incorrectly) placed outliers at some remove from main areas of Indo-European speech.

Other inaccuracies abound on the static map, including incomplete I-E occupation at the termination date (1974) in western France, Andalucía (but not in Spain’s Basque Country!), and northeastern Scotland, as well as a complete absence of the language family from Gotland in the Baltic along with the previously mentioned Mediterranean islands. The map seems to show that Indo-European languages have never quite yet reached the Atlantic, although of course the authors would likely counter that the map does not actually depict what it claims to depict. Or consider the model’s portrayal of non-I-E-speaking areas in Fennoscandia with that of an actual language map of the region, as can be seen to the left. The fit is poor.

The Fennoscandia map detail also presents evidence that contemporary geopolitical boundaries anachronistically mold the hypothesized language-family distribution in the Science model. As can be seen on the actual language map, linguistic and political boundaries do not correspond particularly well in this area; Estonia and Finland may be non-I-E-speaking countries, but not over their entire expanses. On Atkinson’s map, however, I-E coloring abruptly and transhistorically ends exactly at the modern Estonian border, a most suspicious situation. The general lack of I-E shading for Moldova also makes me wary—and is completely bizarre. A clear example of contemporary geopolitical contamination is found in the portrayal of Central Asia. Note the salient of solid I-E coloration extending northward into Tajikistan’s portion of the Fergana Valley, avoiding the core of the valley held by Uzbekistan. Such a portrayal would be understandable if the map depicted merely present-day conditions, as Tajikistan is mostly I-E-speaking whereas Uzbekistan is not. But the sorting of “Sarts” into Uzbeks and Tajiks, along with the forced “Uzbekization” of many previously Persian speakers, in this historically heavily bilingual area is largely the product of Soviet geo-ethnic machinations. If one delves back to the first millennium CE and earlier, the entire region was heavily I-E-speaking (Sogdian and other Iranian languages).

 

As one dials back the animated map to earlier periods, the mire only deepens. As it would be too tedious to recount all of the map’s many miscues, I will focus on a few particular time slices.

 

 

 

 

 

Consider, for example, the depiction of western Europe circa 1000 CE. At this time, western France, Sicily, and the entire Iberian Peninsula are shown as non-I-E-speaking, although a line of I-E linguistic relationship has been etched across southern France roughly to the Spanish border at the crest of the Pyrenees. The false implications conveyed here—which are fully admitted as erroneous by the authors—are that Roman Hispania and Aquitania were never Latinized, and that the preexisting Celtiberian and Gaulish tongues were not I-E. The same 1000 CE map frame also incorrectly excludes from the I-E realm the South Asian areas that now constitute southern Gujarat, southern Balochistan, most of Maharashtra, and southern Sri Lanka. Note as well that most Norse areas are not given an I-E shading, nor is northern Scotland. Yet at the same time, southern Tibet is placed within the I-E zone! Even the essentially uninhabited and uninhabitable region of Aksai Chin is depicted as Indo-European-speaking at this time; I can’t help but imagine proto-Dardic speaking yetis.

Turn back to the portrayal of the year 18 BCE, and the errors compound. The most conspicuous I-E omission here is the Scythian/Sarmatian realm, which by itself is enough to discredit the model; it almost seems as if the authors intentionally manipulated their data to exclude the linguistically hypothesized steppe homeland of the I-E family. The northeastern salient of I-E languages depicted for the time, which denotes the Tocharian languages, oddly excludes a significant portion of the Tocharian homeland in the Tarim Basin to focus instead on the lofty Tien Shan Mountains. Tellingly, the diffusional front hypothesized here has the ancestors of the Tocharians advancing along ridges well in excess of 20,000 feet in elevation.

 

 

 

 

 

Several nice examples of demonstrably false information are found on the depiction of the Mediterranean Basin circa 700 BCE. Here we see the greater Aegean along with the Italian Peninsula clearly colored as I-E, but with little else falling in the same category; Sicily, most of Sardinia, and most of the littoral zone of southern France and eastern Iberia are excluded. Yet we have incontrovertible knowledge that Greek-speaking colonies had been firmly planted in western Sicily, Cyrenaica in North Africa, and over a large expanse of the northwestern Mediterranean coastlands. The spread of the Greek language to Crete, moreover, occurred much earlier, as attested by the Bronze Age Linear B script.  The model fails here in part because it does not count the “rapid” spread of individual languages; Greek colonization, however, took place over hundreds of years, and some of the dialects of ancient Greek were differentiated enough to be classifiable as separate languages.

 

While the 700 BCE map frame unduly restricts the spread of I-E over much of the Mediterranean, it also improperly extends it in other parts of the basin. Several relatively well-known non-I-E languages persisted in the map’s “green zone” well beyond 700 BCE. On the island of Lemnos, the non-I-E Lemnian language vanished only with the Athenian conquest in the fifth century BCE, while Etruscan and Raetic survived into the first millennium CE. Together, Lemnian, Etruscan, and Raetic seem to have constituted the extinct Tyrsenian language family, which might have included Minoan (Eteocretan) and Eteocypriot as well. The scattered distribution of this family in antiquity probably signals that Tyrsenian languages had blanketed a much broader area before the incursion of I-E speakers. In the Science model, however, the entire Aegean region is mapped as I-E speaking as early as 6500 BCE.  Are we to imagine a post-I-E migration of Tyrsenian speakers into the Aegean from Etruscan- or Raetic-speaking areas further to the west? Yet historians who have viewed the Tyrsenian Etruscans as non-indigenous have instead tended to locate their homeland in Anatolia, the hearth of I-E in the Science model! Today, however, a near consensus has emerged that the Tyrsenian languages represent a pre-I-E substrate that likely extended across much of the northeastern Mediterranean in the fifth millennium BCE, if not significantly later as well.

Finally, consider the depiction of supposedly I-E-speaking “greater Anatolia”—including what is now Syria and northern Iraq as well the Caucasus—in the Bronze Age, circa 1500 BCE. Yet we have unassailable historical evidence of widely spread non-IE languages over much of the region at this time, including Hurrian, Hattic, and, for a somewhat later period, Urartian. Much evidence suggests, moreover, that the three (or perhaps four) extant Caucasian language families covered much broader swaths of land in ancient times than they do today; modern Azerbaijan, for example, was a largely NE-Caucasian-speaking area, as attested by both historical sources and the extant language of Udi. For the Science model to make sense, later migrations of several different non-I-E groups would have had to have pushed through long-inhabited I-E lowlands to settle in inhospitable areas of mountainous terrain. Such a scenario, to say the least, strains credulity.

*. Let us consider here the various elements of the authors’ “caveat”:

1. “we can only represent the geographic extent corresponding to language divergence events.” Do languages really diverge in discrete events? Does not language divergence happen continually? Whenever one segment of a language community adopts a new word, a new sound, or a new grammatical feature, some degree of divergence has occurred. It is always an open question as to when diverging dialects become separate language; in the modern world, the issue is more political than linguistic (cf Serbo-Croatian, Serbian, Croatian, Bosnian, and Montenegrin).

2. “only between those languages that are in our sample.” That is interesting, seeing as Atkinson claims in an interview (to be cited later) that “all” I-E languages were included (an impossibility, as there are no hard and fast divisions between languages and dialects). But more to the point, if one can simply exclude languages at will from the sample, then one can then mold the results. Drop a few more languages, and the maps will differ. In such a manner, one can get the results that one wants.

3  “nodes associated with branches not represented in our sample will not be reflected in this figure.” Yes indeed, which is one reason why the figures are so spectacularly wrong.

4. “the lack of Continental Celtic variants in our sample means we miss the Celtic incursion into Iberia and instead infer a later arrival into the Iberian peninsular…” I am glad that the authors begin to acknowledge their own errors here, but they still do not go far enough; they do make an inference, and that inference is simply incorrect. They also miss not just Celtiberian and Latin, but also Mozarabic, Ladino, and several other I-E languages of the Iberian Peninsula (the map frame for 1000 CE still shows only partial I-E coverage).

5. “associated with the break-up of the Romance languages.”  The model assumes that Latin began to “break-up” with the fall of the Western Roman Empire.  That is incorrect, as divergence began much earlier. The “vulgar” Latin of the distant provinces was not the language of Cicero.

6. “not the initial rapid expansion of Latin.” Latin did indeed expand rapidly as a language of administration, but not necessarily as a language of everyday use. Basque remained in use throughout, although the maps produced by the study indicate otherwise.

7. “The timing represented here therefore offers a minimum age

for expansion into a given area.” This proviso is particularly rich, as it alone undermines the approach. In other words, I-E languages could have been found in any part of the study area at much earlier times than indicated? If so, how can one pinpoint Anatolia as the place of origin? If one claims to “find” a location of origin, then one is automatically making an argument for “maximum ages” in areas that fall outside that supposed birthplace.

Mismodeling Indo-European Origin and Expansion: Bouckaert, Atkinson, Wade and the Assault on Historical Linguistics

Dear Readers,

As GeoCurrents passed through its August slowdown, plans were made for a series on the Summer Olympics. Thanks to the efforts of Chris Kremer, we have gathered statistics—and made maps—relating Olympic medal count by country to population and GDP, both overall and in regard to specific categories of competition. The series, however, has been put on hold by the recent publication of two heralded articles on the history and geography of the Indo-European language family. On August 24, a short piece in Science—“Mapping the Origins and Expansion of the Indo-European Language Family”—made extravagant claims, purporting to overturn the most influential historical-linguistic account of the world’s most widespread language family. On the same day, Nicholas Wade, noted New York Times science reporter, wrote a half-page spread in the news section of the Times on the Science report, entitled “Family Tree of Languages Has Roots in Anatolia, Biologists Say.” Over the next few days, the story was picked up—and often twisted in the process—by assorted journalists. Within a few days, headlines appeared as preposterous as “English Language Originated in Turkey.”

As Wade’s title indicates, the Science article, written by Remco Bouckaert and eight others (most notably Quentin D. Atkinson), seeks to overturn the thesis that the Indo-European (I-E) family originated north of the Black and Caspian seas. It instead locates the I-E heartland in what is now Turkey, supporting the “Anatolian” thesis advanced a generation ago by archeologist Colin Renfrew. The Science team bases its claims on mathematical grounds, using techniques derived from evolutionary biology and epidemiology to draw linguistic family trees and model the geographical spread of language groups. According to Wade, the authors claim that their study does nothing less than “solve” a “long-standing problem in archaeology: the origin of the Indo-European family of languages.” (Strictly speaking, however, the problem is not an archaeological one, as excavations by themselves tell us nothing about the languages of non-literate peoples; it is rather a linguistic problem with major bearing on prehistory more generally.)

As GeoCurrents is deeply interested in the intersection of language, geography, and history, the two articles immediately grabbed our attention. Our initial response was one of profound skepticism, as it hardly seemed likely that a single mathematical study could “solve” one of the most carefully examined conundrums of the distant human past. Recent work in both linguistics and archeology, moreover, has tended against the Anatolian hypothesis, placing Indo-European origins in the steppe and parkland zone of what is now Ukraine, southwest Russia, and environs. The massive literature on the subject was exhaustively weighed as recently as 2007 by David W. Anthony in his magisterial study, The Horse, the Wheel, and Language: How Bronze-Age Riders from the Eurasian Steppes Shaped the Modern World. Could such a brief article as that of Bouckaert et al. really overturn Anthony’s profound syntheses so easily?

The more we examined the articles in question, the more our reservations deepened. In the Science piece, the painstaking work of generations of historical linguists who have rigorously examined Indo-European origins and expansion is shrugged off as if it were of no account, even though the study itself rests entirely on the taken-for-granted work of linguists in establishing relations among languages based on words of common descent (cognates). In Wade’s New York Times article, contending accounts and lines of evidence are mentioned, but in a casual and slipshod manner. More problematic are the graphics offered by Bouckaert and company. The linguistic family trees generated by their model are clearly wrong, as we shall see in forthcoming posts. And on the website that accompanies the article, an animated map (“movie,” according to its creators) of Indo-European expansion is so error-riddled as to be amusing, and the conventional map on the same site is almost as bad. Mathematically intricate though it may be, the model employed by the authors nonetheless churns out demonstrably false information.

Failing the most basic tests of verification, the Bouckaert article typifies the kind of undue reductionism that sometimes gives scientific excursions into human history and behavior a bad name, based on the belief that a few key concepts linked to clever techniques can allow one to side-step complexity, promising mathematically elegant short-cuts to knowledge. While purporting to offer a truly scientific* approach, Bouckaert et al. actually forward an example of scientism, or the inappropriate and overweening application of specific scientific techniques to problems that lie beyond their own purview.

The Science article lays its stake to scientific standing in a straightforward but unconvincing manner. The authors claim that as two theories of Indo-European (I-E) origin vie for acceptance, a geo-mathematical analysis based on established linguistic and historical data can show which one is correct. Actually, many theories of I-E origin have been proposed over the years, most of which—including the Anatolian hypothesis—have been rejected by most specialists on empirical grounds. Establishing the firm numerical base necessary for an all-encompassing mathematical analysis of splitting and spreading languages is, moreover, all but impossible. The list of basic cognates found among Indo-European languages is not settled, nor is the actual enumeration of separate I-E languages, and the timing of the branching of the linguistic tree remains controversial as well. As a result of such uncertainties, errors can easily accumulate and compound, undermining the approach.

The scientific failings of the Bouckaert et al. article, however, go much deeper than that of mere data uncertainty. The study rests on unexamined postulates about language spread, assuming that the process works through simple spatial diffusion in much the same way as a virus spreads from organism to organism. Such a hypothesis is intriguing, but must be regarded as a proposition rather than a given, as it does not rest on a foundation of evidence. The scientific method calls for all such assumptions to be put to the test. One can easily do so in this instance. One could, for example, mathematically model the hypothesized diffusion of Indo-European languages for historical periods in which we have firm linguistic-geographical information to see if the predicted patterns conform to those of the real world. If they do not, one could only conclude that the approach fails. Such failure could stem either from the fact that the data used are too incomplete and compromised to be of value (garbage in/garbage out), of from a more general collapse of the diffusional model. Either possibility would invalidate the Science article.

Such a study, it turns out, has been conducted—and by none other than Bouckaert et al. in the Science article in question. Their model not only looks back 8,500 years into the past, when the locations and relations of languages families are only conjectured, but also comes up to the near present (1974), when such matters are well known. Here a single glance at their maps reveals the failure of their entire project, as they depict eastern Ukraine and almost all of Russia as never having been occupied by Indo-European speakers. Are we to believe that Russian and Ukrainian are not I-E languages? Or perhaps that Russians and Ukrainian speakers do not actually live in Russia and Ukraine? By the same token, are we to conclude that the Scythian languages of antiquity were not I-E? Or perhaps that the Scythians did not actually live in Scythia? And these are by no means the only instances of the study invalidating itself, as we shall soon demonstrate. An honest scientific report would have admitted as much, yet that of Bouckaert et al. instead trumpets its own success. How could that possibly be?

One can only speculate as to why the authors proved incapable of noting the failure of their model to mirror reality. Did they neglect to look at their own maps, trusting that the underlying equations were so powerful that they would automatically deliver? Could their faith in their model trump their concern for empirical evidence? Or could it be that their knowledge of linguistic geography is so scanty that they do not grasp the distribution of the Russian language, much less that of Scythian? If so, they are not operating at an acceptable undergraduate level of geo-historical knowledge. Alternatively, the authors might be aware that their model generates nonsense, but prefer to pretend otherwise, hoping to buffalo the broader scholarly community. They seem, after all, to conceal their approach as much as possible, couching their “findings” in jargon-ridden prose that proves a challenge not just for lay readers but also for specialists in neighboring subfields. (Translations of such passages as “Contours on the map represent the 95% highest posterior density distribution for the range of Indo-European” will be forthcoming.)

Regardless of whether the authors are intentionally trying to mislead the public or have simply succeeded in fooling themselves, their work approaches scientific malpractice. Science ultimately demands empirical verification, and here the project fails miserably. If generating scads of false information does not falsify the model, what possibly could? Non-falsifiable claims are, of course, non-scientific claims. The end result is a grotesquely rationalistic and hence ultimately irrational approach to the human past. As such, examining the claims made by the Science team becomes an example of what my colleagues Robert Proctor and Londa Schiebinger have aptly deemed “agnotology,” or “the study of culturally induced ignorance or doubt, particularly the publication of inaccurate or misleading scientific data.”

As the critique we offer is harsh and encompassing, GeoCurrents will devote a number of posts to examining in detail the claims made and techniques employed by Bouckaert, Atkinson, and their colleagues. But before delving into the nitty-gritty, a few words are in order about what ultimately lies at stake. We are exercised about the Science article not merely because of our passion for the seemingly esoteric issue of Indo-European origins, but also because we fear for the future of historical linguistics—and history more generally. The Bouckaert study, coupled with the mass-media celebration of the misinformation that it presents, constitutes an assault on a field that has generated an extraordinary body of rigorously derived information about the human past. Such an attack occurs at an unfortunate moment, as historical linguistics is already in crisis. Linguistics departments have been cutting positions in historical inquiry for some time, creating an environment in which even the best young scholars in the field are often unable to obtain academic positions.

The devaluation of historical linguistics is merely one aspect of a much larger shift away from the study of the past. Subdisciplines such as historical geography and historical sociology have been diminishing for decades, and even the discipline of history faces declining enrollments and reduced faculty slots. Academic history itself, moreover, has been progressively shying away from the deeper reaches of the human past to focus on modern if not recent historical processes. Such developments do not bode well for the maintenance of an educated public. At the risk of descending into hyperbole, we do worry about the emergence of something approaching institutionally produced societal dementia. The past matters, and we care deeply for the preservation of its study.

*Make no mistake: we at GeoCurrents are strong supporters of the scientific method. Linguistics is itself a logically constituted, rigorous endeavor that counts as a science in the larger sense of the word, and I have myself co-edited a work defending science and reason against eco-radical and other far-left attacks (The Flight from Science and Reason, edited by Paul R. Gross, Norman Levitt, and Martin W. Lewis. 1997. New York Academy of Sciences).