Focused Series »

Indo-European Origins
Northern California
The Caucasus
Imaginary Geography
Home » Cultural Geography, Historical Geography, Indo-European Origins, Linguistic Geography

Linguistic Phylogenies Are Not the Same as Biological Phylogenies

Submitted by on October 17, 2012 – 7:29 pm 24 Comments |  

(Note: This post is jointly written by Martin Lewis and Asya Pereltsvaig)

A key assumption of Bouckaert et al. is that the diversification and spread of languages operates so similarly to the diversification and spread of biological organism that the two processes can successfully be modeled in the same manner. The parallels between organic and linguistic evolution are indeed pronounced. Both processes entail replicating codes that continually change, giving rise to novel varieties that increasingly differ from their progenitors over time. As a result, “phylogenetic trees,” showing descent from common ancestors, are a common feature of both evolutionary biology and linguistics.

But despite their similarities, organic evolution and linguistic evolution are in many ways highly dissimilar. Encoding information for communication is not the same as encoding information that generates life: language is vastly more fluid and complex than the genetic code; individual languages are much less clearly differentiated from each other than are species; and language is a social phenomenon, given to influences largely irrelevant for biological evolution. The key differences can be summarized as follows: biological evolution is unconstrained but governed by natural selection (any mutation can happen, but which mutations remain in the pool depends in large part on natural selection), whereas linguistic variation (seen in terms of deep grammatical properties) is constrained by a system of parameters but is not subject to natural selection. As a result, the branching trees of linguistic descent are merely analogous to the phylogenetic diagrams of biological evolution, and do not indicate the same kind of relationships.

Although organic evolution operates through a much more restricted set of message-carrying units than does human language, it nonetheless produces diversity at a much deeper level. Given the biological constraints of the human brain/mind (as of yet less than fully understood), there are only so many ways in which any given language can be structured. To be sure, the number of possible human languages, both extant and extinct, as well we those that may arise in the future, is vast, but all human languages appear to be “variation on a theme,” guided by the same parameters. Some languages have as few as two vowels (Ubykh, Northwest Caucasian) and others as few as six consonants (Rotokas, North Bougainville); other languages may have as many as 20 vowels (e.g. the Taa language, spoken in Botswana and Namibia, is reported by some sources to have as many as 20 or even 30 vowels, depending on analysis) and as many as 84 consonants (as in Ubykh; the Taa language is reported to have 87 consonants under one analysis, 164 under another). But crucially, all languages differentiate vowels from consonants and use both. Some languages put verbs before subjects and objects, while others place them at the ends of sentences, but all languages have verbs, subjects and objects.* Some languages can build sentence-long words packed with of numerous prefixes, infixes, or suffixes, while others use stand-alone, stripped-down words to do the grammatical work of expressing tense, number etc., but all languages make words from morphemes—and all construct sentences. As a result of this limited space of possibilities, completely unrelated languages evolving on their own often come to share major grammatical traits.

Linguistic evolution, unlike that of the biological realm, moves at a rapid clip. In non-literate societies, words change so quickly that after some five to eight thousand years not enough cognates can be traced back to establish linguistic relatedness. In the same time span, grammatical structures can undergo wholesale transformations, and sound inventories can change drastically as well. As a result, even clearly related languages can have next to nothing in common with each other, and can only be linked through investigations into their ancestors. Hindi and English, two of the three most widely spoken Indo-European languages, are dissimilar in almost every respect.** On casual inspection, Hindi would seem to have more in common with the non-Indo-European languages of the Indian sub-continent than it does with English.

Thus, relatedness at the family level and overall linguistic similarity often fail to correspond. Maps showing major language patterns typically bear little if any resemblance to maps depicting linguistic families. Even something as seemingly basic as word order correlates poorly with lines of descent. For example, Indo-European languages can be SVO (subject-verb-object; marked by red dots on the map to the left), such as English, Romance, and most Slavic languages (but Sorbian, a Slavic language, is SOV); SOV (marked by blue dots), such as the Indo-Iranian languages (yet Kashmiri is SVO); or VSO (marked by yellow dots), such as the Insular Celtic languages (yet Cornish is SVO). Some other families, such as Austronesian, have an even greater variability in the basic word order:  Niuean is VSO, Malagasy is VOS, Rotuman is SVO, and Tuvaluan is OVS.

Similarly, features of morphological typology (how words are formed from morphemes) often cross-cut connections established by common descent. Whereas Proto-Indo-European, like most of its daughters, was a synthetic language (building words from multiple non-root morphemes), English and Afrikaans are relatively analytical (with low ratios of morphemes to words), which gives them a certain affinity with Mandarin Chinese (a highly analytical language). As discussed in an earlier GeoCurrents post, isolating languages are found in Africa (Hausa, an Afroasiatic language), Asia (Vietnamese, Austroasiatic), Oceania (Rapanui, Austronesian), and the Americas (Kipea, Kiriri). In phonology as well, similar patterns obtain, as sound inventories often fail to show systematic correspondences with language families. The Indo-European languages of South Asia, for example, are in many respects more phonologically similar to the Dravidian languages of the same region than they are to most other IE language. One of the characteristic phonological markers of the region, the rich inventory of retroflex consonants, is also scattered across the rest of the world, found in about 20 percent of all languages belonging to a wide variety of families.

One of the best ways to appreciate the relative insignificance of language families in regard to the global distribution of such features is to explore the maps that can be generated on the WALS website, such as the one reproduced above. Few if any of these maps bear much resemblance to the familiar depiction of the world’s major language families.

Again, the contrast with biological evolution is stark. The farther removed organisms are from each other on the tree of life, the fewer genes they necessarily share. Even when convergent evolution results in similarities between distantly related organisms, the parallels are relatively superficial. As a result, modern genetic inquiry can establish precise levels of biological relatedness, a process that has revolutionized taxonomy over the past few decades. In the biological realm, moreover, the farther one moves up different branches of evolutionary descent, the more distinctive the organisms found along it generally become. Chordates (the phylum that includes vertebrates) share a distant common ancestor with echinoderms (sea stars and their relatives), and some tunicates, primitive members of phylum Chordata, might be mistaken by unschooled observers for sea lilies in phylum Echinodermata. (Tunicates more generally look like unrelated jellyfish and other cnidarians; a few could be mistaken for rocks, but such rocks disconcertingly bleed when cut open.) But no one would ever mistake any mammal with a sand dollar, a sea cucumber, or any other echinoderm, animals characterized by radial rather than bilateral symmetry. The two phyla have simply evolved in strikingly different directions. If linguistic evolution worked in the same manner, it is questionable whether translation between distant languages would even be possible. Moreover, the disparate patterns of spatial distribution of deep grammatical properties, such as the ones illustrated by the WALS maps, would not be found.

In language, deep grammatical properties can radically change, often taking on the same forms as those encountered in wholly unrelated tongues. As a result, linguistic relationships are often anything but obvious, and can only be discerned though intensive study; significantly, such hidden connections can hold true even for relatively recently emerged languages. A fluent speaker of the major Germanic languages, for example, might be nonplused to learn that Frisian is more closely related to English than it is to Dutch. Yet according to some specialists, even Low German is “phylogenetically” closer to English than it is to (High) German—even though Low German is generally regarded as a mere dialect (or group of dialects) of German!

Linguistic evolution is only vaguely analogous to organic evolution for a variety of reasons, but a crucial factor is the fact that vastly less sharing occurs across biological lineages. We now know that genes can jump from one species to another, but the process is relatively rare; in this realm, change generally occurs as a result of random mutations acted upon by natural selection, not from the borrowing of elements from other species. When it comes to languages, however, sharing is ubiquitous. Languages are almost always borrowing words, and sometimes they adopt grammatical properties of other languages as well. At times, two completely unrelated languages essentially merge to create a hybrid tongue. To be sure, linguists are almost always able to determine which language contributed more elements and more basic structures, and hence should count as the parent tongue. (It should be noted that the use of the terms “parent” and “daughter” in relation to languages is misleading since, unlike in the biological realm, where individual organisms are discrete, the transition from “parent” to “daughter” language is always gradual.) When it comes to creole languages, however, such determinations are not always easy. In regard to grammar, different creoles of completely different parentage are often more similar to each other than they are to any of their source languages. In some instances of mixed languages, admixtures of vocabulary, grammar, and phonology run so deep that linguists abandon the quest for unambiguous classification. Cappadocian Greek, for example, is slotted by the Wikipedia into the seemingly impossible “Greek-Turkish” language family. Does Indo-European therefore encompass this language? Other sources, such as the Ethnologue, place this language in the Greek branch of the Indo-European family, but Turkish influences on Cappadocian Greek are pronounced: it has certain sounds that have been borrowed from Turkish, as well as vowel harmony; it has developed agglutinative inflectional morphology and lost (some) grammatical gender distinctions; and its basic word order is SOV. And Cappadocian Greek is by no means the only example of such a thoroughly “mixed language.” In the biological realm, in contrast, such mixtures are so obviously impossible that they have generated their own nonsense genre, as exemplified by Sara Ball’s delightful flip-book, Crocguphant.

Linguistic family trees must therefore be taken as often showing lines of partial descent, unlike the phylogenetic diagrams of organic evolution. To gain a more complete understanding of linguistic relatedness, it is necessary to complement language families with other kinds of connections. The various languages of a Sprachbund, or a linguistic convergence area, for example, derive from different families, yet nonetheless come to share many features through long histories of mutual interaction. One must also consider linguistic strata, which take into account the influences imposed by one language on another. The role of a linguistic substratum, derived from a previously existing language that was later supplanted by another tongue, can be profound. In many cases, such linguistic substrates were instrumental in generating subfamilies; the Germanic languages, for example, are distinct from other Indo-European languages not merely because they drifted in their own particular direction, but also because that acquired a major substrate from another (unknown) language family. Sometimes, the ghostly presence of a long extinct language or language family can be detected through such substrates. Vedic Sanskrit, for example, was definitely an Indo-European language, but it was influenced not only by the preexisting Dravidian and Munda languages of the Indian subcontinent, but also by an unknown substrate deemed by Colin Masica “Language X.”

A useful alternative to the linguistic tree is the so-called wave model, or Wellentheorie, originally devised to explain some of the characteristics of the Germanic languages that seemed to defy the phylogenetic approach. In wave theory, fluid dialect continua replace the stable, geographically bounded languages required by models predicated on direct descent from ancestral tongues. Here, innovations can occur at any points within a dialect continuum; such changes then spread outward in a circular manner, eventually dissipating as the distance from the innovation center increases.*** If a bundle of innovations substantially overlap and become entrenched, a new dialect, or even language, can be said to have emerged. But according to wave theory, such a “language” is still best viewed as an “impermanent collection of features at the intersections of multiple circles.”

Wave theory does recognize, however, the fact that a single language/dialect can appropriate an entire dialect continuum, subordinating more localized speech forms and eventually driving them into extinction, as indeed was the case in regard to Standard German over most of Germany. Such a process, however, generally requires the power of the state or of some other overarching institution. Such geographically expansive and culturally potent organizations, however, are a feature of the relatively recent past; for most of humankind’s existence, the institutions necessary for producing linguistic standardization over broad areas were lacking. We are so used to the modern world of mass communication over vast distances and of language-standardizing governments and educational systems that we easily forget that in earlier times, and in many remote areas to this day, different linguistic environments prevailed. Overall, we suspect that for most of human history, the wave theory more accurately captures the process of language change than does the standard phylogenetic model. Yet in the most general terms, the two models complement each other relatively well.

*Debate does rage, however, about whether the so-called “non-configurational languages” such as the Australian language Warlpiri, have subjects and objects in the same sense as the more familiar, “configurational” languages like English or French. The reader is referred to Baker (2001) for evidence of subject-object asymmetries in such non-configurational languages.

**For example, Hindi makes a phonemic distinction between aspirated and unaspirated voiced stops, has fusional case/number morphology, subject-object-verb word order, postpositions, and uses the ergative-absolutive alignment in the preterite and perfect tenses; English, in contrast, has no aspirated voiced stops (and does not use aspiration phonemically at all), has largely abandoned fusional morphology, has lost the case system except with pronouns, employs a subject-verb-object word order, uses prepositions rather than postpositions, and is characterized by nominative-accusative alignment.

***Ironically, the diffusion analogy of Bouckaert et al. may be best suited to describing dialectal continua rather than divergence and expansion of languages and language families; we shall return to this point in a forthcoming post.



Baker, Mark C. (2001) The Natures of Nonconfigurationality. In Mark Baltin and Chris Collins (eds.) The Handbook of Contemporary Syntactic Theory. Oxford: Blackwell. Pp. 407-438.


Previous Post
Next Post

Subscribe For Updates

It would be a pleasure to have you back on GeoCurrents in the future. You can sign up for email updates or follow our RSS Feed, Facebook, or Twitter for notifications of each new post:

Commenting Guidelines: GeoCurrents is a forum for the respectful exchange of ideas, and loaded political commentary can detract from that. We ask that you as a reader keep this in mind when sharing your thoughts in the comments below.

  • Pingback: Do Languages Spread Solely By Diffusion? « Cultural Geography « GeoCurrents()

  • Jaska

    “Thus, relatedness at the family level and overall linguistic similarity often fail to correspond.”

    Again great point!

  • TʀoᴘʏʟıuM

    While the evolution of languages is indeed in several aspects rather different from that of species, could not a better comparision be found adjusting the scale somewhat?

    The tree of life does not end at the resolution of species — it continues as a deep thicket within them. As sexual reproduction
    requires two parents for a new individual, the “tree” is actually a web at its most basic level; and indications of this can be seen in e.g. the recent results suggesting interbreeding between some erly human populations.

    Similarly, while individual language boundaries may be fuzzied by diffusion of waves of innovation (and at the most basic level of the idiolect, a speaker can absorb influences from several sources, again leading to a web structure, only denser), at the level of major language groups a clear structure emerges. Determining if someone’s idiolect is French or Dutch (perhaps rather: Western Romance or West Germanic) is a perfectly discrete question. If a child grows up with one parent speaking French and the other Dutch, they will not end up speaking a French-Dutch hybrid, but independantly both.

    Under this view, the resolution level of an individual language variety might be seen as akin to a race or a clan within a species. And much like not all species cleanly separate into races (or at all), not all dialect continuums cleanly separate into individual languages (or at all).

    Also the fact that both the web of idiolects and the web of individuals resolve into essentially independant lineages at a large scale does not yet imply anything about the structure of this emergent meta-tree. Both are more akin to a physical tree with a thickness to its branches, than a purely mathematical tree structure. This kind of a tree could develop in binary splits, as we know is the case for species; it could develop as messy non-binary splits, as we know may be the case when a dialect continuum begins to fragment.

    For determining in the most general case to what extent results of biological phylogeny may be applied with linguistic phylogeny, the mechanisms responsible for maintaining the emergent tree structure would need to be work’d out. This is quite trivial for the former (impossibility of interbreeding due to geographical isolation or biological incompatibility). I am less sure we have a clear picture of the latter, given the existence of phenomena such as Sprachbunds, loanwords, code-switching and creole languages.

    (PS. Hi Jaska! This is JPys.)

    • Thank you for your thought-provoking comment! You make an interesting suggestion, but since language is not parallel to species (perhaps a better parallel being a language family to species, as you suggest), why should it be modeled as such? You also bring up another interesting point: sexual reproduction requires two parents for a new individual, but language does not work the same way…

  • Nice post, thank you.

    “In non-literate societies, words change so quickly that after some five
    to eight thousand years not enough cognates can be traced back to
    establish linguistic relatedness. In the same time span, grammatical
    structures can undergo wholesale transformations, and sound inventories
    can change drastically as well. As a result, even clearly related
    languages can have next to nothing in common with each other, and can
    only be linked through investigations into their ancestors.”

    We don’t really have any evidence that in pre-literate societies words change so quickly as there are no cognates left after 8,000 years. I think Wichmann did a study on the rate of change in small speech communities and didn’t find much difference from larger ones. “Even clearly related languages can have next to nothing in common”? Not sure this rings right – many if not all language families must be related but this is currently formally not demonstrable. Once regular correspondences are established, similarities due to genetic causes come to the fore and become visible to a non-specialist in these languages.

    • Thank you for your comment, German!

      We didn’t say that there are no cognates left, only that there are too few for any conclusive analysis. With respect to pre-literate society, the reference is to studies of Australian aboriginal groups, which seem to be changing so fast that several generation down the line it’s nearly non-recognizeable to people in the group itself.

      As for the statement “Even clearly related languages can have next to nothing in common”, the reference here is to languages within a family, like say English and Hindi, which share little by way of grammar and are not obviously similar to a lay person.

  • Pingback: The Consistently Incorrect Mapping of Language Differentiation in Bouckaert et al. « Cultural Geography « GeoCurrents()

  • Pingback: Out-of-Africa as Ghost Science()

  • Kian

    I love this article because it puts so many misconceptions to rest.

    When I first began to learn about languages and language families, I always thought they were a simple “black and white” matter.

    Now I realize there IS NO black and white in linguistics. Everything’s grey, just like everything else in life.

    Just as people’s genetic compositions and cultures are mixed to varying degrees, so are languages. Language classifications tend to trick beginners into thinking that it’s a simple black and white issue, which is a trap that nationalists easily fall into.

    For example, I know an Iranian who once told me that Persian is Indo-European and therefore similar to languages such as English, German, etc. But after closer observation, I saw almost little similarity between Persian and Germanic, apart from the fact they’re somehow lumped together into the same family category. Not only are the two language branches different in their word order, but they’re also different in their descriptive natures (I don’t think that’s even a term lol). For example, “Los Angeles City Stadium” would be “Stadium City Los Angeles” in Farsi. Ironically, Farsi’s way of describing the LA city stadium is identical to the Arabic manner, which is unlike the Germanic branch of IE languages. Interestingly, Turkish and Azerbaijani (which are Turkic, not Indo-European) are closer to Germanic in their “descriptive natures” than Indo-Iranian languages are to Germanic. So in Turkic’s case, Los Angeles City Stadium would be “Los Angeles Shehir (city) Stadyumu (stadium)”. So ironically, Turkish and Azeri are probably in a stronger position to integrate, to an extent, with some of the western languages (especially the Uralic and possibly Germanic languages) than Persian ever will be. And over the course of time, the gap is bound to get narrower, in part due to the cultural integration of their societies with European civilization. But in Persian’s case, although it is Indo-European, it is so distant from its genealogical relatives and is continuing to move further away with each generation. Infact it’s probably only a word order away from being intelligible to Arabic speakers. Not only is its vocabulary mostly derived from Arabic, but even its simplest grammatical rules, such as turning a word from singular to plural, are becoming more like the rules of Semitic languages than the languages of the Eurasian phylum. Maltese, on the other hand, is an example of a language that is classified as Semitic, but is slowly becoming more similar to European languages and less similar to its Semitic “brethren”.

    So really, the whole “we’re from the same family so we must be similar” argument is nonexistent when one goes into the details. I always think about it this way: I belong to the same family as my brothers and sisters. But I have more similar traits with some of my friends than I do with my own family members.

    Thanks for the article! 🙂

    • Thank you for your comment, Kian! Let me clarify a few things. First of all, “the whole “we’re from the same family so we must be similar” argument”, as you call it, is valid. Except the similarities are not always apparent to an untrained eye (same in biology, by the way), nor are all similarities evidence of phylogenetic (familial) relationship between languages. The word order phenomena (including what you call “descriptive natures”, which is more properly referred to as the order of possessor and possessee) rarely correlate with families. Within the Indo-European family, Germanic and Romance languages are SVO (subject-verb-object), Indic and Iranian languages are SOV, and Celtic languages are VSO. And yet there are exceptions to each of these three generalizations, so even smaller groupings are not necessarily uniform. The possessee-possessor order you illustrate for Farsi with “Stadium City Los Angeles” works the same in Farsi (Indo-Iranian, Indo-European), Irish (Celtic, Indo-European), Tatar (Turkic), Hebrew (Semitic), and scores of other unrelated languages. The finer details of such constructions, surely, differ, but the overall order is the same. But these languages are not in the same family.

      When it comes to grammatical changes due to contact (borrowing), they are notoriously slow. To use your own example, Maltese has borrowed a lot of words from Italian (and more recently from English), but it “dresses them up” in Arabic “dress”, if you’ll pardon the metaphor. For example, borrowed nouns form plurals the “traditional” way, via broken plurals, whereas borrowed verbs inflect according to the same root-and-pattern method as native verbs. There are specific examples of this in chapter 5 of my book (see link on the righthand panel).

  • Eskarpas

    Why is Lithuanian classified as SVO? World order depends on the exact meaning meant to convey in Lithuanian; SVO, SOV, VSO, VOS, OVS, OSV are all correct, although the meaning differs slightly.

    • Indeed. Lithuanian, as well as Russian, which is quite similar in this respect, is classified as SVO because that is the the default word order, used for example in “out of the blue” context (e.g. as an answer to “What happened?” or as a conversation starter). The other orders are also possible but can be used only when a certain element is mentioned in prior discourse. For example, if we are talking about a girl who was walking down the street, in Russian or Lithuanian one would say “A car run the girl over” with OVS order (literally, girl-ACC ran over car-NOM) because the girl has been mentioned in prior discourse and the car (or running over) has not. Similarly, for other non-SVO orders.

      There are also some really ingenious processing experiements with Russian (though I don’t know of similar work on Lithuanian) that showed that when speakers are presented with sentences in the form “noun verb noun” where morphological forms of both nouns do not indicate which is the subject and which is the object (e.g. “trollejbus obognal tramvai”), speakers always interpret them as SVO rather than OVS. In this case, the trolley overtook the tram, not the other way around. Would it work the same in Lithuanian? (Of course, my asking you this isn’t a clean experiement, but you get the idea.)

  • michel bostrom

    As (I can only assume) a native English speaker, you may think that English has no aspirated stops. To my ear, as a native speaker of French, all English stops are aspirated, as is the case in all Germanic languages. French, like other Latin languages, has only non-aspirated stops. In Hindi, which has both, English words are almost invariably transliterated with aspirated stops. The distinction is not an arbitrary one, as is the case with many phonemes. When I hold my hand before my mouth and say “cat” in English, I feel a puff of air. If I say “canard” in French, or any other stop in any romance language, there is no such “aspiration”.

    • Thanks but no, I am not a native speaker of English…

      Regardless, however, I did not claim that “English has no aspirated stops”. If you read the post carefully, you’ll notice that I said “English… has no aspirated voiced stops (and does not use aspiration phonemically at all)”, which is perfectly correct: English does not have voiced aspirated stops (bh, dh, gh), such as Hindi has, nor does it use aspiration phonemically, that is to distinguish words with different meanings. Thus, “cat” indeed has an aspirated /k/ but not an aspirated /t/—the aspiration only occurs syllable-initially. Moreover, there is no word “cat” (that means something else) with an unaspirated /k/ or aspirated /t/. And indeed, there is no aspiration of any kind in Romance languages.

      • I’m not sure if this was part of Michel’s point or not, but I think it’s interesting that he said “To my ear…all English stops are aspirated.” A linguistically-trained native Mandarin speaker has also told me that to her ear, there appears to be some aspiration even in phonemically unaspirated English (initial) stops. I wonder if English word-initial stops do have some sub-phonemic level of aspiration occurring (after all, voicing as well as aspiration are available for distinguishing them from their phonemically aspirated counterparts when they are word-initial). Or perhaps these listeners are actually hearing the English word-initial voiced stops’ positive voice onset time (a kind of sub-phonemic-level “devoicing”) and confusing that with aspiration. Though that seems less likely for the speaker of Mandarin, in which voiceless unaspirated stops are (speaking from my own experience) nearly identical to English word-initial voiced stops, but contrasted with other Mandarin stops by aspiration rather than voicelessness.

        • Indeed, other people’s perceptions can be quite different from those of native speakers—or from the actual articulatory/acoustic properties of sounds. A fascinating topic in its own right…

  • I just read this article now, but I really enjoyed it. It’s interesting to think about the similarities and differences between biological and linguistic phylogeny, and the ubiquitous lateral transfer between languages has always struck me as a major flaw in the analogy as well.

    I would point out though that biological phylogeny (as seen in a literal “family tree”) and biological speciation are not actually the same process. The latter is a macro-process driven on the micro-level by the former. Biological phylogeny per se is no more subject to natural selection than linguistic phylogeny is – it’s only in the macro-process of speciation where natural selection begins to come into play.

    In any case, linguistic diversification is, as you argue well here, not closely analogous to biological speciation. And there’s no correspondence on a more “micro” level either: actual biological phylogeny among individuals has few parallels to the differentiation of, say, idiolects, since the latter is so dominated by lateral exchange that “phylogeny” is probably not a useful concept for describing it at all. Just more indications of Bouckaert et. al. having taken the metaphor too far.

    • Thank you for your comment, Evan. Good point about micro and macro levels in biological evolution. And quite true that the metaphor has been taken too far…

  • As a side note, though the quality of this article is overall high, I was a bit disturbed by your mostly uncritical assertion of a number of alleged language universals which I doubt would be assumed by all linguists. Do you have proof that all languages structurally distinguish vowels and consonants, or could those be categories imposed arbitrarily by the Euro-centric tradition of linguistic description. The same question applies even more so for the concepts of “words” and “sentences”, and I am quite certain there are various highly-respected linguists who either question or outright reject the universality of “verbs, subjects, and objects” as natural classes. These kinds of assumptions may be part of the Generative Grammar discourse, but are dealt with much more critically by other schools of linguistics.

    Personally, I think a better way to demonstrate the universal constraints of human language would be to point out that all spoken language varieties draw from a relatively small inventory of possible sounds that can be produced using the human vocal tract. Signed languages are not subject to this particular constraint, but are similarly limited by other constraints of human physiology. I imagine there must also be universal psychological or sociological features of the way language functions, though I’m not sure whether that’s a topic which has been explored scientifically with enough thoroughness to demonstrate what exactly they are.

    • Thank you for this comment, Evan! Oh I wish you could attend the Science Festival in Rome that Martin Lewis and I have just returned from as many of these issues were discussed there. Briefly, indeed it appears to be the case that all languages distinguish consonants and vowels, words and sentences, nouns and verbs, and subjects and objects. I say “appears to be the case” because of course it’s not something one can prove: if tomorrow a new language is discovered that does not, that would be the proof of the opposite. But as far as we can tell these categories are indeed universal although languages may vary as to their treatment of these categories. Some linguists have claimed that it is not the case that one or the other of these categories is universal—but they have not shown any proof.

      On a more general note, do you think it might be interesting/beneficial for our readers if we were to post a brief summary of the talks delivered at the Festival by some of the world’s best linguists?

      • Wish I could have been there too! I guess I can’t speak for other readers, but I would certainly be interested in reading your summaries of the talks.

        I’m very curious to know what evidence was presented to support the idea that said categories are universal natural categories within languages. Certainly an outside observer can carve up any language’s vocabulary into categories that he or she then names “noun,” “verb,” etc., but to claim the categories are reflected in the structure of the language itself is something else. I should think the burden of proof would be on those claiming they are universal, not on those cautioning that they may not be.

        • Thank you for your comment, Evan! I am going to put together a post with a summary of the talks and some links, but there’s some issues re: permissions and copyright that have to be worked out first!

          As for the question of universality of “noun”, “verb” etc., that’s an empirical question. One has to start by defining what one means by these labels and then see if thusly defined categories are indeed found in all languages (particularly, in those for which the question has been raised). Fortunately, this has been done. See:

          Baker, Mark C. (2003) Lexical Categories. Verbs, Nouns, and Adjectives. Cambridge, UK: Cambridge University Press.

          (I couldn’t possibly do justice to a complex issue like this in a brief comment, so I won’t even try, but it might become a subject for a future post, if there’s enough interest.)

          So yes, if the burden of proof is on those who claim them to be universal, such proof has been provided. Now it’s the burden of the non-universalist camp to engage with these arguments and to show where they’ve gone wrong (if indeed they have gone wrong, which I very much doubt). That hasn’t been done.

          • Thanks for the reply to this comment too, by the way (I realize it’s been a long time). Hopefully I can find the opportunity to read up this more sometime.

          • Ah yes, we are still working on a YouTube video of our talk in Rome. When that’s done, I will put up a brief post with all the links.

  • JPepple

    Very interesting research and study. Has anyone ever truly been able to connect all language families into one language family yet? Indo-european languages can somewhat easily be traced to a common ancestor, but what about the connection between Afro language groups to Indo-european? All the research I can find still puts a big gap there as well as between several other major language families. Not that we have to force all the language groups to come from one common ancestor, but it is interesting. Language has a relation to culture and genetics, but also has its own unique patterns and activities. Your articles are very interesting.