Bouckaert

103 Errors in Mapping Indo-European Languages in Bouckaert et al., Part II: from Afghanistan to Anatolia

(Continued) Moving westward, the linguistic mapping of Iran and environs by Bouckaert et al. contains roughly the same density of error as that of South Asia. As most of these mistakes are noted in map call-outs, and others have been discussed in previous posts, I will focus here on the authors’ misperceptions about the Persian language.

The authors have divided Persian into two languages, labeled “Persian List” and “Tadzik” (a non-standard spelling of “Tajik”). Linguists, however, generally agree that Persian is a single language, albeit one with ten or so dialects, three of which serve as standard literary forms. These three official varieties are labeled Western Persian (or Farsi), found primarily in Iran, Eastern Persian (or Dari), spoken mostly in central and northern Afghanistan, and Tajik Persian (or Tajiki), located in Tajikistan and Uzbekistan. One would have to take an extreme “splitting” position to regard Farsi and Tajik Persian as separate languages. As the Wikipedia notes, “Persian-speaking peoples of Iran, Afghanistan, and Tajikistan can understand one another with a relatively high degree of mutual intelligibility, give or take minor differences in vocabulary, pronunciation, and grammar—much in the same relationship as shared between British and American English.” (It is also significant that the Tajiks historically call their tongue Zabani Farsī). And if separating Farsi and Tajiki is problematic enough, ignoring Dari Persian, spoken by 15-18 million people, is absurd. Doing so sunders the geographically contiguous Persian zone into two widely separated language zones.*

The most glaring blunder on the map of Anatolia and environs concerns the delineation of Kurdish. Here the main problem is the opposite of the one encountered in regard to Persian: several clearly separate languages are lumped together. By strictly linguistic criteria, Kurdish is a subfamily of related tongues. As the Wikipedia puts it, “Kurdish is not a unified standard language but a discursive construct of languages spoken by ethnic Kurds, referring to a group of speech varieties that are not necessarily mutually intelligible …” Kurdish proper is itself divided into two (or three) languages: Kurmanji, Sorani, and, sometimes, Kermanshahi. Philip G. Kreyenbroek, cited in the Wikipedia article referred to above, claims that, “From a linguistic or at least a grammatical point of view … Kurmanji and Sorani differ as much from each other as English and German.” The idea of a single Kurdish language is once again a political construct, albeit one based not on an actual political unit, but rather on the aspirations of most Kurdish people for a state rooted in trans-linguistic ethnic solidarity.

But not only do Bouckaert et al. elide the distinction between these two Kurdish languages, but they also subsume another language into the same category. The language in question is Zazaki (1.5-2.5 million speakers), located in the central part of eastern Turkey. The Zaza people are usually considered by others, and often by themselves, as members of the wider Kurdish ethnic formation, but their language is quite distinctive. It is most closely related to Gorani, spoken in Iran to the south of the Kurdish zone, but it also bears affinity with Talysh, another Iranian language ignored by Bouckaert et al.

Not only are the Kurdish languages misclassified, but so too they are inaccurately mapped. The Kurdish polygon of Bouckaert et al. is truly peculiar, as it excludes the southern part of the Kurdish region (most of the Sorani-speaking zone) while including a western extension into mostly non-Kurdish-speaking areas. Its longer eastern “panhandle” pushes far enough to take in the Kurdish areas in northeastern Iran, but in the process includes non-Kurdish areas along the Caspian Sea and in the Alborz (Elburz) Mountains. Such a fanciful depiction brings to mind the infamous “Gerry-Mander” of U.S. political history. If oriented conventionally, with north at the top, Bouckaert’s gerrymandered Kurdistan reminds me of a lounging rodent; if tilted on its side, it looks more like a galloping dinosaur.

I have also posted an excellent map of the ancient Anatolian languages, which makes a nice contrast to the simplistic depiction of these tongues in Bouckaert et al.

*As a final note on this map, western Afghanistan, a mixed Dari- and Pashto-speaking area, seems to contain an unlabeled polygon for a modern Indo-European languages, which I have marked with a question mark.

 

103 Errors in Mapping Indo-European Languages in Bouckaert et al., Part I

As our criticisms of Bouckaert et al. have been extremely harsh, we must justify them in some detail. I have accused the authors of erring “at every turn,” a charge that reeks of hyperbole. But even if that claim is exaggerated, it is still not too far from the mark. To demonstrate the extraordinary density of error in the Science article, the next few posts will dissect the authors’ base map of Indo-European languages (Figure S6 in their Supplementary Materials). This map, depicting the distribution of both modern and ancient Indo-European languages, forms a key input for their “explicit geographic model of language expansion” (Bouckaert et al., p. 957), as the locations of the sampled languages shown on this map are fed into the model in order to calculate the location of the PIE homeland. Many of the errors and inconsistencies found on their other maps stem from mistakes made in this initial figure.

The map in question shows the location of the 103 Indo-European languages analyzed. The brief caption notes that “colored polygons represent the geographic area assigned to each language based on Ethnologue.” This assertion is misleading at best. The Ethnologue does not consistently map modern languages, and it pays little attention to long-extinct ones such as Hittite. And where the Ethnologue does map, it typically does so in vastly greater detail than Bouckaert et al. Compare, for example, how the two sources depict the languages of what is now southern and central Pakistan in the paired figures to the left.

Regardless of the source (or sources) used, the map is highly inaccurate. To illustrate the cavalcade of error found in Bouckaert et al., I have isolated 103 miscues, some admittedly rather minor, but others highly significant. As recounting all of them would be tedious, I will simply note them in call-outs on expanded details from their “master map.” I have prepared twelve such enlarged maps, each focusing on a different part of the historically Indo-European-speaking world. I will post these maps sequentially over the next few days, discussing in the accompanying posts some of their more egregious errors. Today’s post will conclude with a consideration of South Asia; subsequent ones will move in a westward direction, terminating in the British Isles.

Before examining the portrayal of the Indian Subcontinent in Bouckaert et al., a few words are in order about their general approach to mapping. Analyzing their base-map is no easy matter, as they do not follow conventional cartographic procedures. Their all-important polygons are often impossible to trace, obscured by the large, numbered circles used to label the 103 languages. Another perceptual problem stems from their use of overlays, with multiple extinct languages (in red) layered upon extant languages (in blue). The resulting color blends yield confusing intermediate shades. Note on the detail posted to the left the depictions of Luvian, Hittite, Classical Armenian, Kurdish, and modern Armenian. Determining which language is indicated in which places takes some patience.

A more intractable problem concerns the map’s temporal framing. The short explanation provided in the caption makes the issue seem simple: “Red areas indicate ancient languages and blue areas indicate modern languages.” Left unanswered is the time frame of “linguistic modernity.” In some places, the term is defined broadly, extending back hundreds of years. Cornwall, for example, is shown as inhabited by speakers of modern Cornish. Such a view is anachronistic, as Cornish had disappeared from most of the peninsula by 1700, and was essentially extinct before the modern revival movement began in the 20th century. (Today Cornish is estimated to have only “a few” native speakers.) Elsewhere, the mapping of “modern languages” refers to the late 20th century. The German zone, for example, fits only the post-WWII period, after millions of German speakers had been expelled from Pomerania, Silesia, and Sudetenland. The map, to put it simply, plays fast and loose with time and space.

Even more problematic is the mapping of many languages on the basis of political rather than linguistic features. As was noted in an earlier post, all of the maps used in the study show signs of what I called “geopolitical contamination,” in which the boundaries of modern-day states incorrectly determine those of language groups, following Max Weinreich’s dictum that “a language is a dialect with an army and navy.” I was puzzled, for example, by the fact that Moldova was placed outside of the Indo-European realm in Figure S4, showcased on Quentin Atkinson’s website. The reason is readily apparent when one considers the map of the 103 language polygons (Figure S6). Here Romanian is depicted as almost exactly coincident with Romania. Moldova is fully excluded from this realm, even though the official “Moldovan Language” is differentiated from Romanian solely on political grounds. One can indeed identify a Moldovan subdialect of Romanian, but it spans the Romanian-Moldovan border. Moldova should thus have been placed within the Romanian polygon, yet it is instead depicted in the same manner as Hungary, giving the impression that it lies outside the Indo-European realm. The consequences of such a strategy are troubling for the contemporary world, but become positively pernicious when retroactively extended into the past, which is precisely what the Bouckaert model does. As a result, almost all of Moldova is ludicrously mapped as most likely never having been occupied by Indo-European speakers in Figure S4.

 

 

 

 

 

 

 

Such geopolitical contamination is clearly evident in the depiction of the languages of South Asia, posted here. Note that Bengali, often regarded as the world’s sixth most widely spoken language, is essentially limited to Bangladesh, its 80+ million speakers in the Indian state of West Bengal written out of the linguistic community. Even more unreasonably, Vedic Sanskrit is given the polygon of a modern political unit. The supposed territory of this ancient language is outlined and shaded in red in the map posted here. This area, it turns out, precisely fits the territorial extent of Punjab before it was partitioned by the British. That colonial-era Punjab would have no bearing on the distribution of Vedic Sanskrit, spoken some 3,000 years ago, should go without saying. It is also worth noting that the former Punjab included what is now the Indian Himalayan state of Himachal Pradesh, which features peaks 22,000 feet above sea level. It is safe to assume that such areas were never part of the Vedic Sanskrit realm.

 

Mapping Vedic Sanskrit is no easy task, but that is no excuse for using a modern geopolitical proxy. Careful studies show that the world of the Rig Veda was largely limited to what are now the Indian and Pakistani states of Punjab along with the Vale of Peshawar and Swat Valley. “Vedic India” in the larger sense extended from this region down the Ganges Valley through Bihar and southward to encompass Gujarat, as can be seen in the second map posted here. Either of these two areas could easily have been used for the Vedic Sanskrit polygon.

 

I will not comment further on the remaining errors and infelicities on the Bouckaert et al. portrayal of South Asia, as a number of them are noted on the map itself. I have also posted a fine Wikipedia map of the current distribution of the Indo-European languages of South Asia for comparative purposes. (Note that this Wikipedia map lumps a number if disparate dialects into single languages, such as Bihari.)

As we shall see in forthcoming posts, similar errors litter all other portions of the original language map employed by Bouckaert et al. As a result, it is difficult to avoid the conclusion that the authors simply do not have the level of geo-linguistic comprehension necessary for carrying out their task. I have taught the geography of modern languages at leading universities for twenty-five years, and I can peg the level of understanding demonstrated by students fairly accurately. That of Bouckaert et al. would clearly fall into the “B” range. Given the unfortunate realities of grade inflation, that means that more than half of my undergraduate students finish their terms with a better understanding of the distribution of languages than the authors of a supposedly path-breaking article on the origin and spread of the world’s largest language family published in one of the world’s leading scientific journals.

 

 

The Hazards of Formal Geographical Modeling in Bouckaert et al.—and Elsewhere

The linguistic and historical failings of the Bouckaert et al. Science article have been examined in previous posts and will be revisited in subsequent ones. The model’s cartographic miscues have also been dissected. The present post takes on the more abstract geographical issues associated with the authors’ approach.

The Bouckaert et al. article is overtly geographical. “Mapping” is the first word in its title, and the second sentence focuses on “explicit geographical models.” But the geographical model employed is so stripped of substance as to become almost anti-geographical. No allowances are made for actual geographical features other than the basic differentiation of “land” and “water” (with the latter term apparently meaning “seas and oceans”). In one of several sub-models employed, the authors assume that “movement into water is less likely than movement into land by a factor of 100.” But the ease of movement over water depends on the technology at hand and the cultural proclivities of the people in question. Would one ever make such an assumption when modeling the spread of the Austronesian language family, which depended on the double-outrigger canoe? The language map of the Philippines posted here clearly shows that in this case it is water that links linguistic communities and land, specifically the mountainous interiors of the main islands, that separates them. It is also notable that those who model pre-modern transportation networks generally assume that movement over water is vastly more efficient than movement over land.*

In the Bouckaert et al. model, geography is essentially reduced to geometry, which in turn becomes merely a matter of distances and directions. Mountains, passes, rivers, badlands, dense forests, and so on account for nothing. Such a stripped-down view of geography is convenient for mathematical modeling, but only at the expense of truth. We know from numerous historical studies that the movement of peoples (which is not necessarily the same as the movement of languages) is often guided by variation in the physical landscape. Agricultural settlers typically sought out appropriate soils, such as loess in the case of Neolithic farmers venturing into central Europe; heavy clay soils were avoided for millennia. Pastoralists, by the same token, sought out good pastures; it is no accident that the equestrian Magyars, like the Huns and Eurasian Avars before them, settled on the grassy Alföld of the Danubian Basin. Agricultural settlers, like the supposed carriers of Indo-European languages in the Bouckaert model, do not simply “diffuse” over a landscape like pathogens jumping from host to host. The process is rather more intentional, and much more molded by the variegated features of actual physical landscapes.

Bouckaert and company’s modeling is by no means the first attempt to flatten geography into geometry. I am particularly concerned about this maneuver because an earlier attempt to do the same thing within geography greatly weakened the discipline. I am often asked why geography is such a weak field in the United States, absent from most leading universities. The issue is complicated, but a key event was geography’s own “quantitative revolution” in the early 1960s, an intellectually aggressive refashioning the discipline into a positivistic, statistics-dominated “science” centered around the discovery of supposedly invariant spatial laws. To allow the statistical methods that the young revolutionaries favored, geography had to be reduced to distance and direction. Most of their studies began by assuming that the landscape being investigated—or merely hypothesized—was an “isotropic plain,” completely uniform and featureless in all directions. Such an assumption rules out everything that differentiates actual landscapes. The main result was mathematically elegant but empirically questionable and often worthless conceptual structures.

A prime example of geography losing its way was Central Place Theory, initially developed in Germany in the 1930s and then celebrated by Anglo-American geographers as a conceptual breakthrough in the 1960s. Central Place Theory postulates that the distribution of cities and towns of various sizes follows regular hexagonal patterns generated automatically by retail marketing behavior. The theory is almost entirely deductive, beginning with a set of assumptions and then working out their logical consequences. The assumptions**, however, do not hold, and as a result the theory did not work as promised. It is true that in some relatively flat areas urban patterns approximate the expected form, but in such cases administrative hierarchies generally played a much larger role than retail marketing. In regard to the United States, moreover, geographer James Vance showed in the early 1970s that wholesaling was far more important that retailing in determining the location and relative standing of major cities. Vance was attacked at the time not so much for being incorrect as for challenging the new theoretical underpinnings of a discipline in the desperate thrall of physics envy.

It is difficult to exaggerate the damage done to geography by the quantitative revolution. Suddenly, cultural and historical geography were deemed trivial, widely viewed as examining little more than noise that distracted attention from the underlying spatial laws. Exploring the complex interactions found in any given region now seemed quaint if not pathetic, a mere descriptive exercise deemed insignificant when contrasted with mathematically rigorous and supposedly scientific investigations. For the same reason, world geography—the core of the field, as constituted since antiquity—virtually vanished from the curriculum. Teaching “the world” came to be viewed as the mere cataloging of facts, failing to provide the conceptual purchase necessary for real understanding. Field study in distant lands was for the same reason actively discouraged by many; why go to Ghana and suffer the inconveniences and indignities of travel in a poor country when the same invariant spatial laws could be discovered in Iowa in the comfort of one’s own lab? Armed with such scientific-seeming techniques, geographers could now reach the height of their profession without knowing much of anything about the actual world.

Needless to say, the “laws” discovered by the quantitative revolutionaries of the early 1960s seldom proved very powerful, just as the explanations they offered seldom had much explanatory power. It is no accident that the doyen of the movement, David Harvey, abandoned the entire effort soon after publishing Explanation in Geography. In the early 1970s, Harvey—“the 18th most-cited intellectual of all time in the humanities and social sciences”— abruptly converted to Marxism, a transition followed by many other geographers at the time. Within a few years, radically leftist social theory had displaced positivism as the “cutting edge” of the discipline. Despite the huge intellectual shift that this entailed, including the general rejection of mathematical methods, the insistence on high theory and the corresponding denigration of empirical study remained firmly entrenched. Throughout this period, important geography departments continued to be shuttered by budget-conscious university administrations.

Admittedly, a number of scholars did attempt to link the abstract models of geography’s quantitative revolution to real landscapes, with mixed results. A key figure here was the preeminent historical anthropologist of China, G. William Skinner (1925-2008). Skinner had become enamored of Central Place Theory in the 1960s, which he used to “explain” the location of cities and towns in China’s Sichuan Basin. He later turned his attention to larger regions, brilliantly arguing that the structure of Chinese history had to be conceptualized around a handful of “physiographic macro-regions” loosely coincident with drainage basins. Skinner subsequently tried to integrate such regional analysis with Central Place Theory as well as several other abstract spatial schemas into what he called the “Hierarchical Regional Space Model.” He was convinced that this model applied to any preindustrial agrarian society, and he went to heroic efforts to show that it worked equally well in France and Japan as in China. In the Skinner model, geographic cores and peripheries of varying scales coincide with drainage basins to form all-encompassing spatial structures. Everything from the average age of marriage to the average wage rate was supposedly predicated on positioning within such highly structured spaces. Unfortunately for Skinner, empirical verification proved elusive, and his project essentially came to naught. All that his three decades of lavishly funded research produced was a few minor articles and a massive, idiosyncratic cartographic archive. As it turns out, human geography is an intrinsically complex affair that is not so easily reduced to clean conceptual structures.

 

 

 

More recently, genuine progress has been made in applying technical analysis to geographical issues. The key has been to use such techniques as tools rather than ends in themselves. Geographical Information Systems (GIS), for example, offer no “explanations” on its own, but rather allows scholars to more effectively uncover patterns and visualize evidence and complexity, as noted by Andrew Zolnai in a comment on the previous GeoCurrents post.

Quentin Atkinson claims that he would like to refine his own model of Indo-European expansion to encompass actual geographical variation beyond the land/water dichotomy. Doing so would surely be advantageous, but as long as his underlying assumptions fail to withstand scrutiny, the end result will still be untenable. Again, this is not to argue that abstract models are of no use in geographical or historical analysis, but only to insist that they be applied with great care.

* For an impressive model of the transportation networks of the Roman Empire, see Orbis: The Stanford Geospatial Network Model of the Roman World.

** Walter Christaller, who originated Central Place Theory, made the following assumptions, as outlined on the Wikipedia article on the theory:

▪                an unbounded isotropic (all flat), homogeneous, limitless surface (abstract space)

▪                an evenly distributed population

▪                all settlements are equidistant and exist in a triangular lattice pattern

▪                evenly distributed resources

▪                distance decay mechanism

▪                perfect competition and all sellers are economic people maximizing their profits

▪                consumers are of the same income level and same shopping behaviour

▪                all consumers have a similar purchasing power and demand for goods and services

▪                Consumers visit the nearest central places that provide the function which they demand. They minimize the distance to be travelled. No provider of goods or services is able to earn excess profit (each supplier has a monopoly over a hinterland)