Recent Focused Series »

Indo-European Origins
Siberia
Northern California
The Caucasus
Imaginary Geography
Home » Genetics, Historical Geography, Indo-European Origins, Linguistic Geography, Population Geography

What is Phonemic Diversity? —And Does It Prove the Out-of-Africa Theory?

Submitted by on October 2, 2012 – 6:48 pm 16 Comments |  

The article by Bouckaert et al. “Mapping the Origins and Expansion of the Indo-European Language Family” in Science is not the first foray of (some of) the authors into the realm of historical linguistics and language evolution. In an earlier article “Phonemic Diversity Supports a Serial Founder Effect Model of Language Expansion from Africa”, also published by Science and lavishly praised by Nicholas Wade of the New York Times, Quentin Atkinson—this time the sole author—claims that by applying mathematical methods used in genetics to linguistic data from 504 living languages around the world, one can trace the origin of human language to West Africa (see map on the left). This result is intriguing, especially in light of the fact that most researchers place the origin of human language in East Africa. Yet a number of responses published in Science by linguists, cognitive scientists, and statisticians, including a Technical Comment I co-authored with Rory van Tuyl, identify serious methodological and substantive flaws in Atkinson’s research. Here, I focus on the errors apparent merely in the abstract of the article, several of which show a lack of understanding of even the most basic linguistic concepts, taught in introductory classes.

Here is the abstract of Atkinson’s article:

“Human genetic and phenotypic diversity declines with distance from Africa, as predicted by a serial founder effect in which successive population bottlenecks during range expansion progressively reduce diversity, underpinning support for an African origin of modern humans. Recent work suggests that a similar founder effect may operate on human culture and language. Here I show that the number of phonemes used in a global sample of 504 languages is also clinal and fits a serial founder–effect model of expansion from an inferred origin in Africa. This result, which is not explained by more recent demographic history, local language diversity, or statistical non-independence within language families, points to parallel mechanisms shaping genetic and linguistic diversity and supports an African origin of modern human languages.”

The decline in genetic diversity as one moves farther from the putative origin of modern humans in Africa is well-documented and easily explainable in terms of successive population bottlenecks: as only a subset of the original population survives such a bottleneck, the amount of genetic variation in the resulting population decreases. According to Atkinson, this pattern is observable—and for the same reasons—in regard to linguistic diversity. Here, the first conceptual issue arises: the term “linguistic diversity” is typically used in linguistics to signify the number of languages/dialects in a given area, country, continent, or population group. In other words, “linguistic diversity” refers to the number of languages, not to any properties internal to these languages. It is thus parallel to the concept of “biodiversity” in life sciences, not to “genetic diversity”. It has been observed that the spatial distribution of linguistic diversity and biodiversity largely overlap, perhaps indicating that similar mechanisms underpin both types of diversity. In an earlier GeoCurrents post, I concurred with Gorenflo et al. (2012) that “although different processes may have given rise to the diversification of languages, cultures, and species in different areas, similar forces currently appear to be driving biological extinctions and cultural/linguistic homogenization”.

Maps of linguistic diversity, such as the one reproduced on the left, mark each language by a dot; the more dots cluster in any given area, the higher the degree of its linguistic diversity. As can be seen from this and similar maps, linguistic diversity does not decline as one moves away from West Africa. While it is true that West Africa itself—especially the area around the border of Nigeria and Cameroon—is highly linguistically diverse, other areas of comparable if not higher linguistic diversity can be found in the Caucasus, Nepal, and especially in Papua New Guinea, arguably the most linguistically diverse place on Earth. Particularly damaging to Atkinson claim is the area of pronounced linguistic diversity in the Mesoamerica, since this area is quite remote in terms of human migration out of Africa.

However, from the rest of the abstract and the body of the article, it becomes clear that Atkinson used the term “linguistic diversity” to mean something quite different, essentially “phonemic diversity”. Atkinson purports to “show that the number of phonemes used in … 504 languages is … clinal” (italics mine). However, Atkinson demonstrates nothing of the sort, as he does not actually count the phonemes used in any given language. Instead of adding likes with likes, he adds “apples and oranges”… and tomatoes! A proper calculation of the number of phonemes in any given language would add the number of consonant phonemes to the number of vowel phonemes. Instead, Atkinson adds together consonants phonemes, vowel qualities (typically defined by such features as height, backness, and roundedness), and tones (suprasegmental features). An equivalent of this calculation in physics could be the sum of certain molecules, atoms, and electrons, meaningless mélange of disparate entities added together. The number of vowel qualities in a given language comprises only a subset of vowel phonemes in that language, as a language may employ, for example, a binary length as a meaningful (i.e. phonemic) distinction, which effectively doubles the number of vowel phonemes. For instance, Latin had five vowel qualities (i, e, a, o, and u) and a two-way length distinction, which gives a total of ten vowel phonemes rather than five listed by Atkinson, who counts only the vowel qualities. Similarly, Finnish has eight vowel qualities (i, y, e, ø, æ, a, o, and u) and a two-way phonemic length distinction (e.g. il ‘day’ vs. i:l ‘work’, tuleen ‘into fire’ vs. tuuleen ‘into wind’); this adds up to the total of 16 vowel phonemes rather than eight. Tones, like length, are not themselves phonemes, but are rather superimposed over (vowel) phonemes, and hence are known as suprasegmental features. A two-way tone distinction in a language with five vowel qualities would result in ten vowel phonemes, rather than seven phonemes à la Atkinson (i.e. five vowel phonemes + two tones). Similarly, a three-way tone distinction in a language with the same five vowel qualities would result in 15 vowel phonemes, not eight; and a five-way tone system in a five-vowel-qualities language would produce 25 vowel phonemes rather than ten.

Would Atkinson’s clinal pattern hold if the true number of phonemes is calculated for each language, adding up consonant and vowel phonemes (the latter number taking into account meaningful length and tone distinctions, where applicable)? The languages that best fit the pattern that Atkinson supposedly found include the click languages of Africa (some of which have more than 100 phonemes) and Hawaiian, toward the far end of the human migration route out of Africa, which has only 13. (English is roughly in the middle, with about 45 phonemes, depending on the dialect).

However, a quick examination of the WALS map of consonant inventory size, reproduced on the left, reveals that languages with rich (consonant) phoneme inventories—marked by dark red dots—include not only African click languages like !Xóõ and Ju|’hoan, but also non-click languages of the Caucasus (e.g., Lezgin and Kabardian), some Papuan languages, and even some languages in South America, at the farthest end of the human migration route out of Africa (e.g., Jaqaru and Araona), in direct contradiction to Atkinson. Also, quite a few languages with very small (consonant) phoneme inventories are located in western Africa, as revealed by the dark blue dots on this map.

A study conducted by Keith Hunley, Claire Bowern, and Meghan Healy (HB&H) and published in February 2012 in the Proceedings of the Royal Society also contradicts Atkinson’s findings. Unlike Atkinson, they used full consonant and vowel inventory figures from 725 languages, which together contain 908 distinct phonemes. HB&H show that there is no negative correlation between the number of phonemes and the distance from the putative area of origin in West Africa. Moreover, they test three other predictions of Atkinson’s Serial Founder Effects (SFE) theory and show that they hold for genetic variation but not for phonemic variation. The first prediction—based on the idea that founder effects along the out-of-Africa migration routes reduce variation—is that Africans will possess more unique alleles and phonemes than the indigenous peoples of other regions. This prediction is borne out for alleles but not for phonemes: though Africa has more private phonemes (i.e. phonemes unique to this region) than any other region (as expected), Oceania has a relative deficit of private phonemes, compared with private alleles, and the Americas have a relative excess. The second prediction is that following each founder event, the new daughter group will carry only a subset of the variation of its parental group, so that a negative correlation must exist between within-group variation and geographical distance from the African origin. Again, while this pattern holds for genetic variation, it fails in the case of phonemic variation, as the number of phonemes is highest on average in Eurasia, not in Africa.

The third prediction is that the pattern of among-group variation will be tree-like, and the tree will be rooted in Africa. This prediction too is borne out for genetic variation but not for phonemic variation: a midpoint-rooted phoneme tree, produced using a Bayesian approach, exhibits some regional clustering, but it contains considerably less geographical structure than the microsatellite neighbor-joining (NJ) tree (see image reproduced on the left). Thus, HB&H effectively disprove Atkinson’s SFE hypothesis, showing instead that phoneme inventories provide information about recent contacts between languages, but fail to illustrate more ancient evolutionary processes, in direct contradiction to Atkinson’s claim (in the above-cited abstract) that this pattern “is not explained by more recent demographic history”.

A more general issue, however, is why it is the number of phonemes should be expected to exhibit a clinal pattern in the first place. Why not the number of basic color terms? Or the number of grammatical genders? Or any other feature quantitatively describing language? It appears from Atkinson’s writing that he takes phonemic variation to run parallel to genetic variation, thus revealing an egregious lack of understanding of what a phoneme is, and perhaps of what genetic variation is too. The term “genetic variation” is actually misleading, as the variation is not among genes as such, but among alleles, that is alternative forms of those genes. Following a bottleneck, the surviving population will carry only a subset of the alleles—not of the genes—of its parental group. But phonemes—by definition, linguistic sounds that are used to discriminate meaning—are parallel to genes, not to alleles. Alternative forms of phonemes, which can be seen as parallel to alleles, are called allophones. For example, /d/ and /t/ are distinct phonemes in English, as they contrast in words like dent and tent, or write and ride. Both /d/ and /t/ have a range of allophones that are conditioned by location within the word as well as regional dialect and social class (see, for example, Labov 2001). For instance, /t/ in top is pronounced as an aspirated voiceless stop, whereas in stop it is unaspirated; in pot it is typically pronounced as an unreleased stop. Cockney speakers, as well we those of certain Scottish accents (e.g. in Edinburgh and Buckie), pronounce the intervocalic /t/ in better as a glottal stop, while most speakers of American English (as well as those from Belfast, New Zealand, Singapore, and the younger speakers from North Devon) pronounce it as an r‑like flap (this is also the way /t/ is pronounced in writer, which makes it sound exactly like rider for these speakers). A bottleneck in the population of English speakers may eliminate, say, the glottal stop from the range of allophones of the phoneme /t/, thus reducing allophonic variation; however, the phonemic inventory of English would remain the same. In fact, “variation in allophones is found in all languages and is a major driver of language change. In contrast, the level of phonemic variation within a language is small” (HB&H, 2012, p. 6).

If one is to draw a parallel between sound change and genetic evolution, the SFE model may turn out to be applicable to allophonic variation: “a daughter population would contain a subset of the allophonic diversity found in the parent, and the daughter would then be subject to processes of allophonic change, drift and selection that lead to sound change. Crucially, such changes are largely neutral with respect to phoneme inventory size” (HB&H, 2012, p. 6). Unfortunately, this hypothesis cannot be tested currently, as no databases of the allophonic variation exist so far.

Moreover, there is no bias towards decreasing the size of phonemic inventory over time as human populations moved out of Africa, as phonemes may be added as well as eliminated. An example of the former process is the addition of the phoneme /v/ to English as it passed from the Old English stage to Middle English. In Old English, [v] was an allophone of the phoneme /f/ used in intervocalic position (i.e. between vowels); there were no minimal pairs like few/view, where /f/ and /v/ would contrast. However, the borrowing of numerous words from Normal French where [v] appeared in non-intervocalic position, such as virgin, veil, and veal, led to a reanalysis of /v/ as a separate phoneme. Conversely, the voiceless labiovelar approximant /ʍ/, which contrasted phonemically with /w/, as in which/witch, has been lost in all but few varieties of English; those that retain it include Hiberno-English, Scottish English, and some Southern American accents:

Both increases and decreases in phonemic inventory size may characterize populations that migrate and those that stay. The pattern observed for recent migrations in historical times, however, is exactly the opposite of what Atkinson hypothesizes for prehistoric migrations: “émigré” languages tend to be more conservative than their “home country” counterparts. This is true of their lexicons (e.g. Québecois French retains rue barrée for a street closed off from traffic, which has been replaced in France by rue fermée), as well as grammars (e.g. Judeo-Spanish preserved the feminine gender for nouns in -or such as calor ‘heat’, color ‘color’, and favor ‘favor’, which were feminine in the Middle Ages but are now commonly masculine in Standard Spanish). The same is true of phonemic inventories. For example, Judeo-Spanish preserves the phonemic distinction between /ʃ/, /ʒ/ and /dʒ/, as in deshar ‘to leave’ (Modern Spanish dejar), hijo ‘son’, and gente ‘people’, respectively. This distinction existed in Castilian Spanish in the 15th and 16th centuries, but later disappeared from Modern Spanish, with all three sounds being replaced by /x/. Similarly, Yiddish preserved a phonemic contrast between word-initial /s/ and /z/; witness such minimal pairs as sok ‘syrup, sap’ (from Slavic) and zok ‘sock’ (from Germanic). Examples like this cannot be ignored, as they show directly that languages that split-off and relocate in new areas often maintain phonemic contrasts that are lost in the language of the population that stays in place.

More generally, analyzing phoneme inventory size rather than composition ignores the fact that languages may have identical inventory sizes but yet very little overlap. For example, Arabic and Georgian both happen to have 28 consonant phonemes; however, only 13 of them are found in both language. The consonants of Arabic not found in Georgian include pharyngeal and pharyngealized consonants, whereas those found in Georgian but not in Arabic are aspirated stops and affricates, as well as ejective sounds. According to HB&H (2012, p. 6), in the evolution from PIE to Proto-Balto-Slavic, the consonant phoneme inventory shrank from 25 to 19 members, though only 15 of those consonants were present in PIE. In fact, languages often simultaneously lose and gain phonemes, which obfuscates a direct relationship between inventory size and language change. An excellent example of this process from the history of English is the Great Vowel Shift, which took place between 1350 and 1700.  This shift resulted in the loss of long vowel phonemes like /e:/ and /ɛ:/, as in see and speak (which merged into the same phoneme /i:/), as well as in the acquisition of diphthongs such as /ej/ in name and day. However, simply stating that the 15 vowel phonemes of Middle English were reduced to 11 in Early Modern English misses all the complexities involved in this sound change.

All in all, the Science article by Atkinson on phomenic diversity seems to be yet another example of shoddy work in which mathematical methods are applied in a simplistic fashion, without any understanding of concepts and phenomena under consideration. Such works produce results that contradicts well-known facts about the nature of human languages, as well as plain common sense.

 

Sources:

Gorenflo, L.J.; Suzanne Romaine, Russell A. Mittermeier, & Kristen Walker-Painemilla (2012) “Co-occurrence of linguistic and biological diversity in biodiversity hotspots and high biodiversity wilderness areas”. PNAS online.

Hunley, Keith; Claire Bowern; & Meghan Healy (2012) “Rejection of a serial founder effects model of genetic and linguistic coevolution”. Proceedings of the Royal Society. pp. 1-9.

Labov, W. (2001) Principles of Linguistic Change, Vol. 2: Social Factors (Language in Society). Oxford, UK: Blackwell.

 

Previous Post
«
Next Post
»

Subscribe For Updates

It would be a pleasure to have you back on GeoCurrents in the future. You can sign up for email updates or follow our RSS Feed, Facebook, or Twitter for notifications of each new post:
        

Commenting Guidelines: GeoCurrents is a forum for the respectful exchange of ideas, and loaded political commentary can detract from that. We ask that you as a reader keep this in mind when sharing your thoughts in the comments below.

  • http://www.facebook.com/people/James-T-Wilson/682045086 James T. Wilson

    I use the voiceless labiovelar approximant, as did my father and my grandfather. I can remember my father pointing out that the country of Wales was not pronounced at all like the plural whales. I suppose it could be a remnant of early nineteenth-century Scottish English, but that line of my family came over shortly before 1820. I was born in Washington State, as was my father. His father moved to Washington from Nebraska as a young boy in the 1890s. I had always thought this might be a northwestern regionalism, but I don’t remember noticing if other older Washingtonians made this distinction.

    • http://www.pereltsvaig.com Asya Pereltsvaig

      Ah, it’s good to keep those family traditions! Yes, I am not aware of the voiceless labiovelar approximant being a northwestern regionalism, so I am leaning towards it being a remnant of Scottish English… or some other local accent… After all Shakespearian English had it too!

    • Anthony_A

      One of my school teachers in Delaware tried to enforce the which/witch distinction, with almost no success. (I do not make the distinction at all.)

      However, it disappeared from upper-class English rather later than Shakespeare – in “The Pickwick Papers”, Charles Dickens has Sam Weller (of lower-class background) speaking without the distinction, and spelling some of his words differently as a result, like “wen” for “when”. (To confuse matters, Sam Weller and his father also blur v and w, which Dickens shows in such words as “wery”.)

      • http://www.pereltsvaig.com Asya Pereltsvaig

        Thank you for your comment, Anthony. Indeed some non-standard dialects of English preserve the which/witch distinction to this day…

  • Alex Jaker

    This actually *could* have been an interesting study if it had been done better. The more general point is, if you’re doing quantitative linguistics, you have to know *what* it is you want to measure. This seems so obvious, but I wonder if some people need to be reminded.

    • http://www.pereltsvaig.com Asya Pereltsvaig

      Well-said, Alex! I wholeheartedly agree: just counting stuff without understanding what it is they count may work for accountants but not for true scholars! I am generally dismayed to see too many researchers turn to quantitative methods ignoring the linguistic phenomena behind them.

      As for your suggestion that it could have been an interesting study if it were done right, I think the HB&H study that I refer to in the post is exactly that kind of better-designed study.

      By the way, Martin and I will be giving a lecture (open to the public) on these issues at Stanford in November, we hope. Stay tuned for more details on time and place!

  • http://www.facebook.com/people/German-Dziebel/535243148 German Dziebel

    Again, Asya, I think you’re generally spot on with your critique of Atkinson. A few potential disagreements are below:
    1. “The second prediction is that following each founder event, the new
    daughter group will carry only a subset of the variation of its parental
    group, so that a negative correlation must exist between within-group
    variation and geographical distance from the African origin. Again,
    while this pattern holds for genetic variation, it fails in the case of
    phonemic variation…”

    We don’t know what the observed global pattern of genetic diversity means. Geneticists assume that it reflects an African origin and a serial founder effect across the globe to the New World. But how do we know that this is what happened in history? As the colonization of the New World by Europeans after 1492 shows, American Basques are no less diverse than European Basques and Bahama Blacks are not less diverse than aboriginal West Africans. (Your example of emigrant conservatism – “The pattern observed for recent migrations in historical times, however,
    is exactly the opposite of what Atkinson hypothesizes for prehistoric
    migrations: “émigré” languages tend to be more conservative than their “home country” counterparts” – seems to be up the same alley.) A similar argument was made for highly structured foraging populations. See more at http://anthropogenesis.kinshipstudies.org/2012/07/how-to-interpret-patterns-of-genetic-variation-admixture-divergence-inbreeding/. Geneticists assume a panmictic population but under a structured population assumption the meaning of the global pattern of genetic diversity will be different. In a word, I wouldn’t hold genes as being “better” than phonemes in representing human prehistory. Both pictures are assumption-driven. It’s a matter of social consensus in different disciplines: if all linguists were like Greenberg or like Atkinson, they would have applauded to Atkinson’s interpretation of phonemic diversity, just like geneticists have been supportive of SFE not because it was proven but because it was agreed upon by the majority.

    2. “phoneme inventories provide information about recent contacts between
    languages, but fail to illustrate more ancient evolutionary processes.”

    It be interesting to shift from language to language families as the primary unit of analysis. It’s possible that, once proto-phonemes reconstructed via traditional comparativist method are compared, the recent “noise” will go away. A proto-phoneme may correspond to several different phonemes in an extant language (e.g., PIE *gwh yielded ph, th and kh in Greek depending on the phonetic environment). The opposition between a phoneme and an allophone is an abstract and synchronic one; it’s the contrast between a phoneme and a proto-phoneme that may be relevant to studies such as Atkinson’s. It’s noteworthy that historical linguists such as Jaako Hakkinen (from your other post) prefer to build linguistic phylogenies on the basis of phonology, which suggests that, if properly analyzed, phonology guarantees stability.

    3. “however, a quick examination of the WALS map of consonant inventory size,
    reproduced on the left, reveals that languages with rich (consonant)
    phoneme inventories—marked by dark red dots—include not only African
    click languages like !Xóõ and Ju|’hoan, but also non-click languages of
    the Caucasus (e.g., Lezgin and Kabardian),
    some Papuan languages, and even some languages in South America, at the
    farthest end of the human migration route out of Africa (e.g., Jaqaru
    and Araona), in direct contradiction to Atkinson. Also, quite a few
    languages with very small (consonant) phoneme inventories are located in
    western Africa, as revealed by the dark blue dots on this map.”

    It was shown (Cysouw et al. 2011, Suppl. Mat, 19) that, if corrected for population size, the greatest phonemic inventories are found in North America (http://anthropogenesis.kinshipstudies.org/2012/09/typological-linguistics-and-population-genetics-a-synthethis-or-a-controversy/).

    4. “Papua New Guinea, arguably the most linguistically diverse place on Earth.”

    The New World is most diverse linguistically if diversity is measured by the number of independent linguistic stocks (140-150). Amazonia has the greatest number of language-isolates, which is the most extreme form of diversity. I would count Papua New Guinea and Australia together as Sahul to make it a bit more comparable geographically to the New World.

    5. “Particularly damaging to Atkinson claim is the area of pronounced
    linguistic diversity in the Mesoamerica, since this area is quite remote
    in terms of human migration out of Africa.”

    This is consistent with my out-of-America theory. Overall, there’s a very strong correlation between genetic and linguistic variation: both INTER-group allelic and linguistic stock diversity decline with increasing distance from the New World and the Sahul. Johanna Nichols (1992, Linguistic Diversity in Space and Time) showed that the New World and the Sahul contrast with Africa and Europe in terms of typological features, with head-marking-driven grammatical forms found at high frequencies in the former zone likely constituting a more “ancient” form of language compared with dependent-marking structures found in the latter. This suggests that phonemic diversity, if properly analyzed, may yield the same pattern.

    • http://www.pereltsvaig.com Asya Pereltsvaig

      Thank you for your detailed comments, German!

      1) An interesting point about American Basques, I didn’t know. But the way I understand this (and I might be totally wrong) is that a bottleneck MAY lead to diminished diversity, and often does, but does not have to. It will likely depend on the initial diversity and other factors as well.

      As for applauding Atkinson’s interpretation of phonemic diversity, only people who don’t understand what a phoneme is could do that, I think. I might be wrong on the genetics side of things, so maybe genetic SFE theory is wrong too, but it is simply unapplicable to phonemic diversity, as I argue in the post.

      2) Interesting point about proto-phonemes. But aren’t those ultimately supposed to be (approximations of) the phonemes of some really spoken language? At least, theoretically? If so, I am not sure how reconstructing proto-phoneme inventory SIZE would help us any. Assuming that prehistoriс languages were just like those we know from historical times that is. I do, however, agree with Jaako Hakkinen that relying on phonological and morphological systems (and syntactic if we had enough evidence) is a better idea than simply looking at the lexical level. The word SYSTEMS being the operative word here: phonology/morphology/syntax are systematic, whereas lexicons are collections of idiosyncratic and potentially exceptional items, not patterns. Steven Pinker’s “Words and Rules” comes to mind…

      3) Thank you for bringing this up. Please remind me if Cysouw et al. looked at true phoneme counts or the sort of things that Atkinson looked at?

      4) Hence “arguably”! The reason I am reluctant to call South America the most linguistically diverse place is that I think we simply don’t know enough about these languages to connect them into families, and that’s why controversies abound, in both North and South America… To some extent this is also true of PNG, but I think a better picture has emerged there…

      5) As far as I can tell, “out of America” theory contradicts a lot of other evidence as well, in addition to the inter-group allelic variation, which I don’t think is all that high in the Americas, compared to the Old World. I might have missed the memo though… :)

      Johanna Nichols’s work is very interesting, but I don’t think one can argue about the antiquity of head- vs. dependent-marking. It does seem to correlate with her “refuge zones” vs. “spread zones” to some extent, although I don’t think it’s a perfect correlation. If the popularity of head-marking in the Americas is supposed to argue for the “out of America” theory, what about the Caucasus?

      Finally, based on results of HB&H’s study that I refer to in the post, I can’t say that there is any geographical pattern supporting either out-of-Africa, out-of-America or out-of-any-place-else. The only geographical pattern that does seem to be valid is that both phoneme inventory size and its composition (preference for consonants, vowels, or certain types of consonant, for example) is an areal feature (as well as phylogenetic feature). Relevant WALS chapters discuss this to some extent…

      • http://www.facebook.com/people/German-Dziebel/535243148 German Dziebel

        1. A bottleneck always leads to diminished diversity but it is not a fact that all migrations are accompanied by a bottleneck. It truly depends. Just like higher genetic diversity doesn’t automatically mean greater age, as it can be caused by larger effective population size or by admixture (among the simplest causes).Since 1492, the New World has become the most genetically diverse
        continent, although this has nothing to do with the age of the
        population. The bottomline: genetic evidence is subject to different interpretations depending which model one chooses to follow. Just like with linguistic data. It’s a nature of a historical inquiry, which geneticists tend to downplay in favor of a belief in hard scientific facts. But if we take yours and others critique of Atkinson’s paper to its logical conclusion, if his model doesn’t work for phonemes and other linguistic systems, it may not work for the genes either. So, it’s the problem of the out-of-Africa model, not whether or not genes and phonemes are in co-evolution. We simply don’t know because nobody has done a quality, non-assumptive work to see if the two can be aligned and what the basic units of analysis are. (see below).
        2. The advantage of using proto-phonemes is that you instantly remove recent noise and reduce the risk of recent non-population driven effects on phoneme diversity to confuse the data. Dediu and Levinson recently did it for structural stability making the language family and not the language the focus of the study (http://anthropogenesis.kinshipstudies.org/2012/09/stability-vs-diversity-a-novel-method-for-analyzing-worldwide-linguistic-structures/). Stochastic effects then get contained by the sheer size of your primary object of analysis. In addition, once population niches have been filled up after agricultural expansions, one would expect a lot of phonemic diversity to go up and down without population migration. But in pre-agricultural times we may expect more mobility, hence potentially more pure effects of that population mobility on linguistic structures.

        I’m completely with you on your point about SYSTEMS. This is precisely what characterizes kinship terminologies – it’s a structured set and as such has a strong potential to be a valid source of information about human prehistory (see my Genius of Kinship, which is firmly rooted in the tradition of studying kin terminologies in anthropology). ‘Basic” vocabularies are composed of random pullouts from different structured sets, and hence are problematic.

        3. I’m sorry I can’t tell which count they used. I assume it’s true phoneme count. They just say it was their preferred method of counting. I will e-mail you the Supplement.
        4. True, but North and South America were also hit by large-scale language and population extinctions post-1492, so it may even out the side effects of excessive splitting exists in this continent. Although, as you know, all Americanists will stand firmly by 140-150 family count, which is nearly 2/3 of world linguistic diversity. This is consistent with the Sahul situation, as both areas are part of the broader Circumpacific zone, and both show a striking contrast with Africa and Europe, which is again not coincidental, I believe.
        5. Inter-group diversity (not to confuse with intragroup diversity) is the highest in America followed by Papua New Guinea. This is an exact parallel to the linguistic situation above, so both linguistics and genetics show the same picture. It is intragroup diversity that’s the highest in Africa and the lowest in the New World and Papua New Guinea. It is precisely this diversity that geneticists interpreted as the sign of greater age of African populations. But, as Denisova DNA convincingly demonstrates, ancient Eurasian populations, with roots in the Mid-Plesitocene, were low on intergroup diversity (even lower than some South American tribes) but high on intragroup diversity.

        “If the popularity of head-marking in the Americas is supposed to argue for the “out of America” theory, what about the Caucasus?”

        It’s perfectly natural: the Sahul and the Caucasus are the refuge areas where linguistic diversity is high and the retention of ancient grammatical patterns (polysynthesis, head-marking, etc.) is high, too. America is just a refugium par exellence.

        I agree with you that there is a lot of uncertainty in the conclusions we can derive from global datasets (linguistic or genetic). I’m optimistic, though, about the possibility to take another look at them without an out-of-Africa bias. On the linguistic end of things, it’s important to see if we need to study global phoneme variation in conjunction with word length (Nettle observed correlations between phoneme inventory size and word length in Africa) and morpheme-to-word ratio. Interestingly enough, simple phoneme inventories may be in correlation with high morpheme-to-word ratio because, as Mithun showed, in children who grow up learning polysynthetic languages extract phonetic information first, which yields to the hypothesis that a complex phoneme inventory combined with complex morphosyntax will be impossible to acquire.

        • http://www.pereltsvaig.com Asya Pereltsvaig

          1) I agree that not all migrations are accompanied by a bottleneck, and that different factors cause decreases and increases in genetic diversity. However, I cannot agree with you that “it’s the problem of the out-of-Africa model, not whether or not genes
          and phonemes are in co-evolution. We simply don’t know because nobody
          has done a quality, non-assumptive work to see if the two can be aligned
          and what the basic units of analysis are.” — HB&H, as well as I and others have shown conclusively that phonemes do not align with genes in the way Atkinson wants them to. So the problem is not with out-of-Africa theory (which is supported by many types of evidence), but with the shoddy work that tries to align phonemes and genes, without understanding what the former is (and maybe also what the latter is too).

          2) The problem with using proto-phonemes is of course the possible errors in reconstruction. But regardless of whether we use modern phonemes or proto-phonemes of reconstructed ancestral languages, the issue is the same — phonemes don’t disappear within a language just because fewer people speak it. Lots of nice, small languages with lots and lots of phonemes. Something that should have struck Atkinson too: Khoisan languages that serve as the model of a phoneme-rich language for him are spoken by small groups that underwent population bottlenecks.

          When it comes to kinship terminologies, even though they are systematic, there is no reason that any given kinship term in this or that language won’t undergo a non-systematic change, as happens to other types of vocabulary items.

          3) I will make no assumptions until I see what count they used. Besides, even if their finding is correct, wouldn’t it just go to show that bottlenecks (which native American languages underwent to an extreme degree, in recent times, well-documented, etc.) do not affect phoneme inventory size?

          4) Not sure your statement that “all Americanists will stand firmly by 140-150 family count” is correct — I’ve seen much controversy and arguments for both splitting (that would result in such a high number) and lumping. High linguistic diversity (if it is not an artifact of how little we actually know of these languages and their past, which I think it in part is) could well be explainable in terms of Diamond’s “Guns, steal…” etc.

          5) “Inter-group diversity (not to confuse with intragroup diversity) is the highest in America followed by Papua New Guinea.” — With respect to genes? Really? Contradicts everything I’ve ever read on this… I would like to see some evidence…

          • http://www.facebook.com/people/German-Dziebel/535243148 German Dziebel

            5) Yes, with respect to genes. See, e.g., http://www.sciencemag.org/content/324/5930/1035.short
            There are 2 kinds of diversity: intergroup and intragroup. Intragroup diversity is highest in Africa (or was until 1492), intergroup diversity (as measured by Fst type of statistics) is highest in America followed by Papua New Guinea.
            4) There’s no controversy regarding the 140-150 language families in the Americas. Greenberg and Ruhlen tried to create one, but they were dismissed pretty much unanimously. Nichols raised an interesting point regarding the possible effect of Amerindian grammatical structures on obscuring genetic relationships between families, but it has remained purely speculative at this point.
            3) They used UPSID if this helps. You make a good point about recent depopulation in the Americas. No, phonemic inventory size didn’t get reduced, just whole languages died out.
            2) I agree with your critique of Atkinson’s approach. Khoisans may not be a good example, though, because the excess in phonemes among Khoisans comes largely from clicks, which Atkinson would say are “archaic” sounds lost from all other languages as a result of a bottleneck that happened not in the Khoisan lineage but in the lineage leading to the rest of humans.

            But the overall agreement between genetics and linguistics may be strong globally, hence we just need to figure out how to map it properly.

            “When it comes to kinship terminologies, even though they are systematic,
            there is no reason that any given kinship term in this or that language
            won’t undergo a non-systematic change, as happens to other types of
            vocabulary items.”

            In kin terminologies, it’s the categorical relations between terms that form historical sequences. Imagine you would know how different pronominal systems line up plausibly into an evolutionary sequence, with the actual forms being secondary.

            1) I think we are in agreement on Atkinson. But I’d be curious to see what happens with the phonemes-to-genes mapping exercise if we use proto-phonemes and correct for a) speaker community size and b) morpheme-to-word ratio and to word length.

          • http://www.pereltsvaig.com Asya Pereltsvaig

            5) “Intragroup diversity is highest in Africa (or was until 1492),
            intergroup diversity (as measured by Fst type of statistics) is highest
            in America followed by Papua New Guinea.” — I don’t think the article you gave a link to states that. In fact, the way I understand it, they give exactly the opposite conclusion. For Africa, as they didn’t look at the Americas (beyond African Americans) or PNG.

            4) I am at Stanford, as was Greenberg and Ruhlen still is… :)
            More seriously, beyond Greenberg’s ultimate lumping there are lumping efforts on other levels as well. For example, the number of native American language families in California varies from source to source, from 7 to 20:
            http://geocurrents.info/place/north-america/northern-california/the-geographical-complexity-and-linguistic-peculiarities-of-the-indigenous-languages-of-northern-california

            3) UPSID doesn’t help, I am afraid.
            “No, phonemic inventory size didn’t get reduced, just whole languages died out.” — So linguistic diversity was decreased, but phonemic or allophonic diversity was not. Also, the sizes of each language population were decreased, and that didn’t affect the phonemic (or allophonic) diversity either.

            2) “the excess in phonemes among Khoisans comes largely from clicks, which
            Atkinson would say are “archaic” sounds lost from all other languages as
            a result of a bottleneck that happened not in the Khoisan lineage but
            in the lineage leading to the rest of humans.” — Atkinson may say so, but (a) there is no evidence that click sounds are “archaic” in the sense that they were present in the Khoisan lineage in prehistoric times, or in some other equally old languages. This issue is further discussed in my other blog here:

            http://languagesoftheworld.info/historical-linguistics/which-language-is-the-oldest.html

            Also, regardless of whether clicks were lost from non-Khoisan lineages or not, why weren’t at least some of them lost from Khoisan languages, since their speakers were through significant bottlenecks?

          • http://www.facebook.com/people/German-Dziebel/535243148 German Dziebel

            5) Here’s a direct quote from Tishkoff et al., 2009, 1035: “The proportion of genetic variation among all studied African populations was 1.71% (table S3). In comparison, Native American and Oceanic populations showed the greatest proportion of genetic variation among populations (8.36% and 4.59%, respectively)…” For Eurasia, if you look into Table S3 in SM, you’ll get the number 1.97 – between Africa and the New World/PNG. For Europe, it’s 0.74, lower than in Africa. Again, we have a nice correspondence between declining linguistic and intergroup genetic diversity from the New World/Sahul to Africa and Europe. Does phonemic inventory size differences follow the same pattern? I don’t know but judging by the fact that North and central America have some of the largest inventories (depends how you count, too, of course), while South America has some of the smallest (Piraha), while Africa tends to have those huge inventories on tonal and clicking “steroids” pretty uniformly and consistently, I think we may be able to observe the same cline.
            4) Take any reference work on American Indian languages (e.g., Campbell’s American
            Indian Languages: the Historical Linguistics of Native America, 1997) and you will see how many stocks are observed in the New World. I personally knew Joe Greenberg and I’m still in touch with Ruhlen, but with all due sympathy for them, their Amerind classification was rejected pretty decisively. I’m not aware of your contributions to the field.
            3) My comment was meant as an irony: depopulation does cause phonemic diversity reduction. Reduction to naught.
            2) Agree that “clicks as archaic sounds” idea lacks proof. At the same time, as with African tones, we don’t have an idea how clicks evolved (Greenberg once confessed to me that he had an explanation but he took this secret into his grave), which suggests at least some antiquity. And it is true that one click-bearing language, Hadza, may have gone through a severe bottleneck, as evidence by its high linkage disequilibrium and low intragroup allelic diversity. All other Khoisan groups don’t have genetically observable signs of a past bottleneck. The click repertoire does regularly get reduced when clicks are borrowed into a Bantu language, which tends to correlate with a subset of (maternal) genes from a Khoisan population to go with it, as among Fwe, but this case if not considered by Atkinson.

          • http://www.pereltsvaig.com Asya Pereltsvaig

            5) “we have a nice correspondence between declining linguistic and
            intergroup genetic diversity from the New World/Sahul to Africa and
            Europe. Does phonemic inventory size differences follow the same
            pattern?” — Not at all. There are languages with large phoneme inventories and languages with small ones in the Americas, PNG, even Africa. There is no global pattern. The map of consonant inventories (which typically account for the largest proportion of the overall phoneme count, cross-linguistically), reproduced in the post, shows as much.

            4) I wasn’t referring to any of my contributions. But in the link I gave there are references to sources that provide varying assessments of the validity of certain “families” such as Penutian.

            3) I got the irony part, but if phonemic diversity is defined as a number of phonemes *per language*, it is not reduced to naught.

            2) “Lack of evidence is not evidence of the lack” = that we don’t know how clicks might have evolved does not mean that they didn’t.

            Whether Khoisan groups, besides Hadza and Sandawe, show high intragroup diversity or not, we do know that they went through severe bottlenecks in terms of the population size reductions.

            That the click repertoire is reduced in Bantu languages is evidence merely of the fact that a system need not be borrowed in its entirety, and that the more easily discriminated members of the system are more likely to be borrowed. It does not prove that a whole system (or subsystem, say front rounded vowels) cannot be borrowed.

          • http://www.facebook.com/people/German-Dziebel/535243148 German Dziebel

            5) I wrote “linguistic diversity,” which is the diversity measured by the number of language stocks, not phonemic diversity, which is Atkinson’s misnomer.
            2) We don’t know that Khoisans went through bottlenecks. The only way to “know” this is through the genetic diversity statistics containing implications for effective population size (which is only a distant proxy for demographic population size) fluctuations and they haven’t shown any significant reductions for Sandawe or San. For Hadza, yes, this evidence exists.

          • http://www.pereltsvaig.com Asya Pereltsvaig

            5) But Atkinson is interested in phonemic diversity (as in “number of phonemes”) and we’ve been criticized for taking him “on his own terms” :)

            2) Let me rephrase: we don’t know if Khoisan went through bottlenecks, but they certainly went through population decline (at least many of the Khoisan languages did). So according to Atkinson’s assumption that population size decrease leads to phoneme inventory decrease, they should have fewer phonemes than they do…

  • Pingback: Out-of-Africa as Ghost Science