How Large Was the Area in Which Proto-Indo-European Was Spoken?

October 27, 2012  
As the current series on the origin and expansion of the Indo-European languages nears its completion, only a few remaining issues need to be discussed. Today’s post examines once again the mapping by Bouckaert et al. of the area likely occupied by the speakers of Proto-Indo-European (PIE). The focus here, however, is not on the location of this ancestral linguistic homeland, which they situate in southern Anatolia, but rather on the size of the area over which the language was supposedly spoken. The area so depicted on their maps, it turns out, is almost certainly much too large to be credible. By mapping a Neolithic language as covering almost one hundred thousand square kilometers, Bouckaert et al. demonstrate, yet again, a fundamental failure to understand the basic patterns of linguistic geography.   

Bouckaert et al. give a surprisingly precise figure for the area that their model indicates as the probable homeland of proto-Indo-European: 92,000 km2, roughly equivalent to the extent of Hungary or of the American state of Indiana (see the yellow polygon in the map to the left). But given the characteristically opaque phrasing of the authors, it is not immediately clear if this zone is supposed to represent the actual (likely) spatial extent of the PIE-speaking community, or if it is merely supposed to show the broader area in which a much more spatially restricted language group was located. One can deduce, however, that that the former argument is being advanced based on the authors’ framing of the spatial hypotheses supposedly advanced by two different proponents of the steppe theory:

The areas of the hypotheses are approximately 92,000 km2 for the Anatolian hypothesis, 421,000 km2 for the narrow Steppe hypothesis, and 1,760,000 kmfor the wider Steppe hypothesis. So, these areas show a bias toward the Steppe hypothesis; the area covered by the narrow Steppe hypothesis is more than four times larger than that of the Anatolian hypothesis. Likewise, the area covered by the wider Steppe hypothesis is more then (sic) 19 times larger than that of the Anatolian hypothesis.

As can be seen in the map posted here, the area outlined by the “narrow Steppe hypothesis” fits precisely within the area demarcated by the “wider steppe hypothesis.” Such a depiction would not be logical if Bouckaert et al. were proposing that these “areas” were merely the proposed zones in which in a more spatially restricted language had been located, as opposed to the probable zone that such a language actually covered. If the latter meaning had been intended, the “narrow Steppe hypothesis” would merely be a more precise version of the “wider Steppe hypothesis” rather than a different “hypothesis” altogether. One can thus conclude that the authors intend the yellow polygon to indicate the area over which Proto-Indo-European had been spoken, as posited by their model with the given parameters of uncertainty.


In the modern era, and to a significant extent across the past several thousand years, there is nothing unusual in a single language being spoken over a 92,000 square kilometer block of territory. But for such a situation to obtain, expansive spatial connectivity is necessary, which in turn depends on the power of the state or of some other form of social integration. In the world of Neolithic farmers, such regionally integrative institutions were almost certainly lacking, and as a result linguistic communities would have been much more spatially restricted. Such spatial limitations would have been even more pronounced in areas characterized by rough topography and formidable mountain ranges, as such barriers impede communication and thus enhance social and linguistic fragmentation. Yet as can be seen in the map posted here, Bouckaert et al. place the PIE homeland precisely in such a location. A single language spoken by tribal farmers over such a vast expanse of broken topography is all but impossible.

The situation in regard to the homeland identified by the steppe hypothesis would have been different. Under conditions of equestrian-oriented pastoral nomadism, linguistic communities could have occupied much larger territories than those found among agriculturalists living at the same time. The relatively flat topography of the steppe zone, moreover, would have allowed relatively easy communication among scattered groups. Sizable seasonal aggregations, often of a ceremonial nature, are also common under such circumstances, enhancing social solidarity over a broad expanse of land. But even given all of these considerations, the 421,000 km2 and the 1,760,000 km2 figures noted by Bouckaert et al. for the PIE homeland in two versions of the “steppe hypothesis” are still improbable. Geographically aware theorists thus tend to argue only that the original PIE homeland was situated in the western steppe zone, not over its full extent.

We cannot, of course, determine the areal extent of any prehistoric language, as the needed documentary evidence is lacking. It is tempting to associate specific languages with archeologically attested “cultures” that can be mapped, but it must be recalled that language often fails to correspond to groups defined on the basis of shared material culture; consider, for example, the “Pueblo Indians” and the Northwestern cultures of indigenous North America, both of which were highly multilingual, even at the language family level, yet substantially shared the same material cultures. Material culture, after all, is much more dependent on—and serves in part as an adaptation to—the physical environment, whereas languages seldom co-vary with physical geography; there is no way in which a certain word order pattern, or morphological type, or sound system would be more appropriate for any given landscape. All that we can do, therefore, is argue on the basis of contemporary analogues. Here we find that the areas covered by linguistic communities in those parts of the world that maintained “Neolithic” agricultural systems and forms of socio-political organization into modern times were of a restricted spatial scale. The archetypical location here is New Guinea, which is to this day characterized by pronounced linguistic fragmentation, as can be seen in the map posted here. One might object, however, on the basis that New Guinea is an extreme case and as such should not be used for comparative purposes. But in historically stateless areas elsewhere in the world, even where Neolithic technologies were superseded millennia ago, highly restricted linguistic territories remained the rule, as can be appreciated from the language map of central Nigeria posted here.* Maintaining a single language over an area as large as Hungary in such a context is highly unlikely, to say the least.

Similar objections apply to the mapping of the proto-languages of the major IE branches in Bouckaert et al. One must again consider the authors’ intentions in regard to their portrayal of these languages. It is not exactly clear, for example, what they mean by “the inferred location at the root of each subfamily is shown on the map” (see the map caption posted to the left). The “inferred location” of what? Presumably, they mean the inferred location “of the root,” and presumably “the root” refers to the proto-language that later generated each IE branch. It is still not clear, however, whether the colored areas are supposed to indicate the likely locations over which these proto-languages were spoken, or whether they merely show the probable zones in which much more spatially restricted languages were spoken. If the former scenario is indeed the case, the areas depicted are again much too large.

Of the “root languages” mapped on this figure, that of the Indo-Iranian languages is most preposterous. The previous post specified most of the problems associated with this inferred location. The map posted here also shows the extraordinary disconnection between the existing archeological evidence and the spatial hypothesis advanced by Bouckaert et al. I would further note that the area they advance for the origin of the Indo-Iranian languages makes no sense from the standpoint of physical geography. Its western apex is located in the middle of the uninhabitable Dasht-e Kavir (Great Salt Desert), its central portion is situated in the heights of the Hindu Kush, and its eastern extremity lies in the fertile plains of Punjab. It is unthinkable that any sedentary Neolithic population would have occupied such a territory at any given point in time.

*One could, however, argue that New Guinea and central Nigeria are highly linguistically diverse in part as a function of time. Both areas have been inhabited by modern humans for a very long period. Most of Eurasia has been populated by Homo sapiens sapiens for considerably time than West Africa, and to some extent even New Guinea (the presence of Neanderthals probably impeded the movement of modern humans into western Eurasia for millennia). As a result, one might expect somewhat greater linguistic differentiation in those places as compared to southern Anatolia. But it is also true that the Americas, which had been populated by modern humans for less time than western Eurasia, were also characterized by pronounced linguistic diversity. Significantly, agricultural areas in pre-Columbian North and South America that were not occupied by state-level societies were characterized by spatially restricted language groups.


  • “But it is also true that the Americas, which had been populated by
    modern humans for less time than western Eurasia, were also
    characterized by pronounced linguistic diversity.”

    The recent peopling of the Americas is largely a myth for which there’s little archaeological evidence. The most widespread and one of the earliest Paleoindian cultures, Clovis, has its roots in Texas, with the discernible signature of a migration northward to Alaska and Northeast Asia. In fact, since America harbors 2/3 of world linguistic diversity, it was likely peopled early, at least as early as Papua New Guinea. High linguistic diversity crops up precisely in ancient refugia such as Papua New Guinea and the Caucasus. High linguistic diversity is consistent with the kind of population structure that existed in the Mid- and Upper Pleistocene – small isolated demes. East Africa – rather than Nigeria, is the area of greatest linguistic diversity in Africa, with members of all of the major (or traditional) Africa language families (Afroasiatic, Nilo-Saharan, Niger-Congo and Khoisan) represented there.

    I develop this argument further in various posts at

    • How do you determine that the Americas contain two-thirds of world linguistic diversity?

      • Good question, Randy! I would think PNG is higher in linguistic diversity… and even so I wouldn’t say that it has 2/3 of the world’s diversity…

        • See above. America exceeds PNG in the number of isolates, which is a base level of diversity calculations. At the same time, PNG has more isolates than Africa or Asia, which again confirms that PNG is just a pocket of New World kind of diversity in the Old World. This in my opinion is what the world might have plausibly looked like linguistically in Mid-to-Late Pleistocene.

          • Counting diversity in language families or isolates is a bit misleading as it assumes that (1) the rates of change are universal, and (2) language families (or isolates, which are effectively one-language families) can be established based on some uncontroversial criteria. Whereas for practically every proposed language family, someone believes that it is either overreaching or underreaching. What if the rate of diversification is higher in the Americas than in the Old World (for whatever reason, such as Diamond’s arguments for example)? Not only would it lead to more languages/families but it would also make identification of larger families/phyla harder (especially since there’s no written records from 2,000-3,000 years ago!). Thus, the 140 families that you talk about (which is not by any means an uncontroversial number) may well be an overestimate that has more to do with our inability to conclusively prove higher-order relations among these families than with them being unrelated.

          • The 140 families is very uncontroversial. Instead, any attempt to reduce this number considerably faces stubborn resistance from American Indian linguists. I do agree with you, Asya, that any simplistic model is problematic, but out of African has also turned out to be based on simplistic grounds. My goal is just to get people thinking about a radical alternative and develop methods to thoroughly test it without assuming that it’s impossible from the get-go. It’s also possible that the degree of language diversification in the Americas is overestimated due to the nature of American Indian languages (polysynthetic, head-marking, etc.), but it could just as easily be underestimated for the same reasons. We don’t know. We don’t even know whether the number of known American Indian language families got reduced as a result of the European colonization since 1492. No other region of the world lost as many languages since 1492 as the New World.

          • I agree that the events of the 16-17th centuries (and later) reduce our understanding of the pre-colonial linguistic situation in the Americas. However, I don’t think that polysynthesis or head-marking are necessarily barriers to correctly identifying family relations. After all, Vajda’s work on Ket and Na-Dene seems rather convincing (to specialists I’ve consulted). Nor do I have a problem with the majority of Native Americanists’ conservative approach to language classification. My point is that the apparent diversity of Native American languages may be an artifact of rapid pace of change plus shallow depth of direct evidence about these languages and their development. If we had 3,000 year old inscriptions in earlier forms of these languages, perhaps we’d be able to connect them into larger families?

          • Johanna Nichols offered an opinion that the head-marking structure of American Indian languages (and all other grammatical features that stem from it) may be conducive to their tendency to lose traces of genetic kinship faster than the languages of the Old World. The very fact that people are looking for such inventive ways to explain American Indian linguistic diversity testifies to the fact that it does pose a challenge. The rapid pace of change can also be used to explain African molecular divergence from the rest of modern humans. And it was invoked in some of the earlier works on mtDNA. The lack of written attestations could make American Indian languages look more differentiated than Indo-European, Sino-Tibetan or Semitic, but how does it make them special in comparison to Sub-Saharan African or PNG languages? Most language families in the world have no ancient attestations.

          • I agree that there may be different rates of change in different languages. Quite possibly, American languages change faster. I can even understand why it would happen for structural or social reasons. Why DNA would change at a different rate in different parts of the world, I am not sure.

          • Why would specifically American Indian languages will be selected to change faster than other languages? I think it’s pretty systemic that PNG and America show similar levels of genetic diversity. It reflects a longer history of isolation compared to other regions. There’s pretty good evidence that different mtDNA lineages evolve at a different rate. There’s pretty good evidence that molecular clock is time-dependent ( Theoretically, effective population size or ecology may impact evolutionary rate. Currently, there are at least 3 different molecular clocks (evolutionary, germline and intermediate), which makes the whole issue too controversial to derive any firm conclusions.

          • Agreed on “too controversial to derive any firm conclusions”.

            Regarding higher levels of linguistic diversity in PNG and the Americas, I don’t think they are particularly high. I’d rather think that linguistic diversity is too low elsewhere. See my earlier post here:

      • There are some 140-150 language families and isolates (or single-language families) spoken in the Americas (see Campbell’s American Indian Languages). If you remove all the Papuan languages from an Old World sample (just because they simply replicate in a small geographic pocket area of the Old World the pattern of New World diversity, plus they are a “world of their own” located way outside of the broad geographic region encompassing West Eurasia, East Asia and Southeast Asia presumably involved in the colonization of the New World), you’re left with just 1/3. The point is that one can’t derive 140-150 language families in some 12-15,000 years from a source region that’s not nearly as diverse as the target region.

        • “one can’t derive 140-150 language families in some 12-15,000 years” — depends on one’s assumptions, I suppose. According to Foley’s fairly conservative model (starting with one language, splitting into two every 1,000 years), we’d get nearly 33,000 languages, and I don’t see why they couldn’t form 140 families.

    • Thank you for the link to your blog. However, I am not sure what to make of your comment. Besides the question that Randy has asked about “2/3 of world linguistic diversity”, isn’t your argument turned upside down. There is pretty good evidence, genetic and archeological, that the Americas were peopled relatively late, so the question is why was there so much diversity. Topography, social organization, north-to-south orientation (a la Diamond) may all be answers. And there may be more answers that I haven’t mentioned…

      As for the question of whether East or West Africa is more linguistically diverse, you are right in that (depending on the area selected) East Africa may be more diverse when it comes to families, but on the language level, West Africa seems to take the cake. According to the Ethnologue figures, Nigeria has 514 languages and Cameroon 278, to Tanzania’s 128. Similar trend appears if we correct by the number of people, with Gabon and Equatorial Guinea having the highest number of languages per capita.

      • There’s no positive archaeological evidence that America was peopled relatively late. This is the conclusion one must draw from the last 20 years of archaeological research. It’s a simple fact that Clovis is derived from modern southern U.S., rather than from Alaska or NE Asia. Then, there’s a tradition of interpreting the paucity of pre-Clovis finds as a sign of the absence of humans but low population size, low density, environment and the certain pattern of adaptation (fewer lithics, more perishables in the toolkit) could generate the same semblance of human absence. Genetics fully confirms that American Indians have maintained low population numbers over a long period of time. All genetic metrics point to that. Again, following archaeology, haploid genetics supported a late entry into the Americas, but this is again only if one interprets low intragroup genetic diversity in the New World as a sign of recency and not as a sign of long-term low effective population size. Had geneticists geared their research to the pattern of linguistic diversity in the New World, instead of trying to dovetail it with archaeological consensus, I’m not sure they would have supported a late entry to the New World. Recent full-genome analyses ( have already identified an “Amerindian component” in West Eurasia.

        My research in the evolution of kinship terminologies (, which is based on arguably the largest database of linguistic data (2500+ languages) and serves as a proxy for the evolution of human social organization, suggests that the relative recency of New World populations is likely a flawed, albeit sticky, idea.

        • Many thanks to all for the fascinating discussion. As soon as we finish the current series, I am going to carefully read German Dziebel’s website! I have no particular stake in the debate on the the peopling of the Americas. Even if a relatively recent date holds true, 12,000 years is still enough time to generate substantial linguistic diversity even at the family level — although perhaps not enough time to generate the extremely deep diversity that one does find (linguistic research does seem to be trending toward a proliferation of “Amerindian” language families).

          One problem for an early settlement date (assuming that modern humans did originate in Africa and then spread across Eurasia before reaching the Americas) is the fact that the key archeological sites would be mostly offshore, as the dispersion routes would have been coastal, considering the continental glaciation, and sea levels were much lower during the late Pleistocene. Geographically determined preservation bias does seem to be a major archeological stumbling block here. I suspect that the first inhabitants of the New World lived much like the coastal Fuegians, relying heavily on marine resources. It has been suggested, moreover, that the Fuegians represented a “Melanesian/Australian” strand that arrived in the Americas before the later” Amerindian proper” influx from Siberia, but that seems to be a bit of a stretch. I do wonder what German thinks about this hypothesis. (see

          But all of this is secondary as far as the original post is concerned, which is about the maximum possible geographical extent of a Neolithic linguistic community, which I believe was much less than 100,000 KM2.

          • Martin, your link is broken but I think I know what you mean. The two-migration hypothesis advanced on the basis of the pronounced morphological contrast between Paleoindian and Archaic through modern American Indian skulls – the former cluster with modern Australo-Melanesians, the latter with modern East Asians – runs into a couple of problems. First, genetically the earliest American Indian remains belong to the very same mtDNA haplogroups as modern American Indians. Second, similarities with East Asians can be explained as a back-migration from the Americas at the end of the Ice Age. The origin of a Mongoloid morphological complex has never been settled on the basis of Old World data, and it’s noteworthy that the earliest skulls with Mongoloid features turn up in the Americas and not in East Asia.

        • “There’s no positive archaeological evidence that America was peopled relatively late” — Well, there can be no positive evidence of late peopling of the Americas, but there is also no positive evidence of earlier inhabitants either (the earliest human remains, IIRC, are from about 13,000 years ago). And while “absence of evidence isn’t evidence of absence”, it is strongly suggestive of late settlement.

          Also, assuming your “out of America” hypothesis, I see that one can explain the low genetic diversity in the Americas by low population size, but how to explain the high genetic diversity in the Old World, especially in Africa?

          • Three factors here: one is progressive population growth outside of the New World under the pressures of colonization of vasts swaths of land; the other is greater intermixing between human populations outside of the New World and PNG; and the third one is the absorption of local hominid populations through admixture. Both processes would drive intragroup genetic diversity up. America stayed “purely human” as there were no hominids there, and maintained low population size, or at least didn’t grow as fast as the Old World. Denisovans, for example, had very low intragroup diversity, even lower than South American Indians and Papuans, so the pattern of variation in such a Mid-Pleistocene hominid population was more like among modern Amerindians than like among modern Africans. In Africa, Hadza shows low intragroup genetic diversity (similar to Amerindians and Papuans), and it’s also the most divergent language in the Khoisan family.

          • 1) The Americas are pretty vast swaths of land too, so why weren’t there comparable pressures of colonization?

            2) intermixing does not create hierarchical mutations that define haplogroups

            3) IIRC, the percentage of Neanderthal DNA in modern humans is very low… Also, how many different groups of Denisovans have they uncovered? Last I checked it was a finger bone…

          • 1) Amerindian populations expanded (South American Indians are more diverse than Denisovans) but just not as much as Old World populations. 2) Oh yes it does. The root of a tree shifts in the direction of admixed sequences.3) not sure I understand what you are trying to say here.

          • 1) agreed, but why? 2) ??? 3) You said yourself that all we have of the Denisovans is a finger bone. In general, it is my understanding that we don’t have much admixture from non-human hominids in modern human DNA…

          • 1) Because there was a speciation event happening in an area not habitually occupied by archaic hominins. America is an idea place for such an event (although the ultimate source is an Asian hominin species). Colonization pressures were smaller in the areas closest to the speciation event and greater with increasing distance from it; 2) see here ( but you need to read the article itself, not just the abstract; 3) we don’t exactly how much archaic admixture African populations have because we don’t have archaic sequences from Africa. Also, the fact that archaic admixture is counted at 5% genome-wide doesn’t mean it wasn’t substantial. It just decayed over time through drift, selection, etc.

          • 1) I don’t really understand your argument here

            2) I will give it some thought later; thanks for the link

            3) Or perhaps it increased through drift, etc.?

          • I agree that the absence of archaeological finds in the Americas earlier than 13,000 years is interesting and noteworthy but it does fit the pattern whereby the archaeological signatures of modern humans east of the Mobius line are generally much more sparse than west of it (in Africa and West Eurasia). It’s well known that in Australia the “modern human toolkit” doesn’t become fully visible until the Holocene. Again, it’s the pattern of social and demographic adaptation (lots of languages, myths, social structures, etc, not too many stone tools), rather than an indicator of presence or absence of humans. We have just grown biased in favor of archaeology as a discipline capable of telling us everything we need to know about the past. But in reality Pleistocene archaeology is no less fuzzy than long-range comparison between languages.

          • I agree that archeology cannot tell us the whole story. But it seems that linguistic and genetic stories correlate with the archeological one… By the genetic story I mean the bottlenecks as the cause of lower diversity in the New World.

            But here’s another question: I can understand why we are not finding any earlier stone tools in the Americas (culturally, they didn’t have them, or didn’t have many of them), but why no human sceletons?! It’s not like myths and social structures vaporized them?!

          • Human skeletons are rare everywhere and it takes time to discover them. Denisovans are known to have existed only because of a pinky bone and a tooth. And they were identified only in the past 5 years. And this is a whole new species! Up until 2005, we didn’t have any remains of chimpanzees. And we still don’t have any gorilla remains. By our current logic, chimpanzees and gorillas must have descended from anatomically modern humans because we have remains of the latter but not the former.

          • Very true. But again, what is lacking is positive evidence of early peopling of the Americas, not of late peopling.

          • We have a sheer lack of data. Archaeologists were so convinced for so many years than Clovis is the single earliest Paleoindian culture that forgot to have a plan B. With Clovis originating from something like the Buttermilk Complex in Texas ( and apparently expanding north into Alaska (Mesa) and NE Asia (Uptar) we don’t have much in terms of accepted archaeological sites to illustrate the peopling of the Americas at a late or an early date. Hence, nothing in archaeology disproves out of America I, II or III. What seems to be the case is that the earliest Amerindian adaptations were different from Old World ones (and Monte Verde supports this), hence we can’t apply the same measuring stick to New World Pleistocene as we do Africa and Europe. Lack of lithic debitage doesn’t mean lack of humans.

          • So we agree on the “back to the drawing board” sentiment.

          • yes, we do

          • How does linguistics correlate with genetics? I thought you developed a whole lengthy argument against Atkinson’s belief that phonemic diversity decays out of Africa?

          • Linguistics and genetics don’t correlate very well:
            Most importantly, linguistics can only trace history for a shorter span of time, while genetics can look deeper…
            Moreover, the fact that phonemic diversity, as Atkinson understands it, doesn’t prove out-of-Africa doesn’t mean that it’s not true.

          • Hmm, you wrote: “But it seems that linguistic and genetic stories correlate with the
            archeological one… By the genetic story I mean the bottlenecks as the
            cause of lower diversity in the New World.” Atkinson was trying to prove that linguistic evidence is correlated with genetic evidence and both point to out of Africa as the homeland. Genetics and linguistics correlate pretty strongly if one looks out of America. And they don’t correlate if one looks out of Africa. In fact, if one looks out of Africa, genetics and linguistics cancel each other out. Genetic diversity dramatically drops twice in human history (at the exit from Africa and at the entry to the Americas), while linguistic diversity increases every time genetic diversity drops. This doesn’t make sense.

          • That Atkinson fails to prove X doesn’t mean that the opposite of X is true. It just means that he is out of his depth when it comes to even the basic linguistics. Also, I don’t think we’ve agreed on a measure of linguistic diversity, so we seem to argue about different things. And finally, I believe that the spatial distribution of linguistic diversity is much more a result of factors that diminish it as of factors that increase it, much more so than genetic diversity.

          • Agree.

    • “High linguistic diversity crops up precisely in ancient refugia such as Papua New Guinea and the Caucasus. High linguistic diversity is consistent with the kind of population structure that existed in the Mid- and Upper Pleistocene – small isolated demes.”

      Small isolated demes and subsequent diversification of languages surface in geographically difficult terrain anyway, regardless of timespan; I don’t think you can safely put it down to Pleistocene population structures. This is largely why PNG has more than 10% of the world’s distinct languages, in an area comprising less than 0.4% of the planet’s land surface. Same goes for the 200-odd languages of Brazil.

      In addition, claiming the Caucasus’s linguistic diversity as being the result of an ancient refugium is somewhat misleading. There are only about 40 autochthonous Caucasian languages in three distinct families; not that high, really, compared to PNG (~850), Nigeria (~500), India (~400), Mexico (~70), etc. And if you subscribe to the minority view that claims Nostratic and North Caucasian as unit families (I don’t think either is conclusively proven, but both are intriguing hypotheses), then it could be argued Kartvelian is a Nostratic intrusion into the area, leaving only a single autochthonous language family, North Caucasian, and the rest of the Caucasian diversity then coming from Holocene incursions.

      • Thank you for your comment, Rohan! You make an excellent point that both time and topography are responsible for linguistic diversity. Also, the high level of linguistic diversity in the Caucasus is probably due in large part to fairly recent developments, as discussed by Johanna Nichols. But when it comes to comparing the Caucasus to PNG, Nigeria, India, or Mexico, one should correct for area and population size, as the Caucasus is relatively small…

      • “Small isolated demes and subsequent diversification of languages surface
        in geographically difficult terrain anyway, regardless of timespan…”
        I’m not sure I understand why geographical constraints disprove the time argument. Language accretion occurs over time and it takes place on a certain terrain. Refugia maintain the level of linguistic diversity characteristic of a wider area longer. Geographic structuration and evolution occur in sync, not one at the “expense” of the other.

        Regarding the Caucasus, it’s true that in general West Eurasia (and to a lesser degree Africa) are less diverse than America, PNG and parts of East Asia, so we won’t find in the Caucasus the same cornucopia of languages as in PNG or Amazonia. But by West Eurasian standards the presence of small but deeply divergent North Caucasian (Johanna Nichols even questions the kinship between Abkhazo-Adygean and Nakh-Dagestanian) coupled with Kartvelian (let’s forget about Nostratic, as these hypotheses confuse the matter) is reminiscent of the situation one encounters in America, PNG and East Asia. The presence of Indo-European and Turkic in the Caucasus is a good illustration of how refugia accrue diversity over time. They absorb population flows representing the earliest stages in the evolution of a language family and then conserve them to create visible strata. A refugium doesn’t mean that only Pleistocene-deep language families live there but it does indicate systematic isolation and conservation.

        The number of languages within a family seems to be more reflective of sheer population numbers than of linguistic history. The 1200+ Austronesian languages are only some 6,000 years old. Same goes for Niger-Congo (or at least for parts of it).

  • I’ve enjoyed this series ― to the extent that I can keep up with it. Two points on the current post: (1) “languages seldom co-vary with physical geography; there is no way in which a certain word order pattern, or morphological type, or sound system would be more appropriate for any given landscape”: The second clause is true. I suppose the first depends on your understanding of ‘seldom’, but I can think of lots of examples without effort: Indic-Iranian, Indic-Tibetan, Iranian-Arabic, Tibetan-Sinitic, Hungarian-Romanian, Ethiopian Semitic-Afar, etc. It seems like a common pattern, and more so in the past. (2) Ethnologue is decently reliable on families, but has a pronounced bias (as I think we’ve discussed) towards splitting at the language-dialect level, and so is not, I think, a good (fair) illustration of linguistic diversity. This bias is partially a matter of judgement; even varieties that Ethnologue admits are mutually intelligible are separated based on politics or identity. It is also to be expected from the state of knowledge. Varieties apparently enter(ed) Ethnologue as tribal identifications. We are not likely to find that one of these small communities actually speaks multiple mutually-unintelligible dialects, whereas further study will often reveal that two Ethnologue communities can understand each other.

    • I’m not sure what you mean by you mention of Indic-Iranian, Indic-Tibetan, Iranian-Arabic, Tibetan-Sinitic, Hungarian-Romanian, Ethiopian Semitic-Afar, etc.

      And yes the ethnologue is know for thei splitting at the language/dialect leve…

      • I meant that each of the hyphenated pairs do indeed vary with physical geography, particularly where they meet. The Indic continuum occupies the plain, the Iranian family occupies the highlands to the west. The Iranian family occupies the highlands, and the Arabic speakers live in the adjacent lowlands to the south and west. It’s not exact, of course, but the correlation is strong.

        Your Ethnologue reply was cut off, apparently.

        • Yes, the distribution of linguistic groups often correlates with physical geography, but the linguistic features of the languages themselves seldom if ever do. So the second part of that sentence is quotes above essentially clarifies what is meant by the first.

          And the Ethnologue part wasn’t really cut off, I just left out an “l” in level (now fixed).

          • I’m a fan, Asya. I like Martin personally, I think you both know what you’re talking about, and I like what you’re doing here. I just disagree that the second clause clarifies the first. In fact, I think you’ve just repeated the mistake. Linguistic features often -do- correlate with physical geography, because features correlate with language and language often correlates with physical geography. The many features that distinguish Arabic from Kurdish and Persian correlate with the same terrain divide I mentioned. In fact, if you identify an Indian subcontinent isolated by the northern mountains and the southern sea, aspirated voiced stops correlate with this area even when -two- families are involved. Terrain doesn’t -cause- features, and the features aren’t adapted to a terrain. But Martin didn’t deny causation; he denied co-variance, and context can’t make ‘co-vary with’ into ‘is caused by’. But again, I don’t think that’s a knowledge error. It just came out wrong.

          • Thanks for the compliment, O.T. You may be right that the sentence came out wrong or unclear, and as this post was co-written (and I think I might have contributed that particular sentence), let me clarify. When we have a divide between two distinct languages, such as the Arabic-Persian boundary that you mentioned, we do by definition have a boundary for many linguistic features (i.e. isoglosses) aligned with that. And it is true that certain features have areal, rather than phylogenetic, distribution (tonal systems is another good example, when it comes to East Asia). However, if you look globally, there is seldom (if ever) a correlation between language (as in “linguistic elements” not just the label) and geographical features. If you just focus on the Arabic-Persian boundary, yes you get SOV order in Persian but not in Arabic and that correlates with the terraine in that region, but more globally, there is no correlation of SOV (or just OV) order with highlands, or temperate climates, or whatever. I hope this is clearer.

    • Thanks for bringing up an important matter. I think that the discussion thread between you and Asya adequately deals with the crucial issues. But I also need to mention that I agree with your main point, and hence I will no longer write “languages seldom covary with physical geography.” What I should have written is that “linguistic features are not themselves influenced by physical geography,” or something similar. Language groups often do co-vary with physical geography, and hence languages do as well, as you argue

  • Well, if proto-indoeuropean was a dialectical continuum rather than a sharply limited language, the area could be rather wide.

    • Thanks for your comment, Knut. Actually, PIE was probably a dialect continuum; however, its area could not have been rather wide anyway, as in that case the dialects would have quickly differentiated into mutually incomprehensible languages—as indeed they have eventually, when PIE split into daughters.