Linguistic Geography

Discrepancies in Mapping Persian/Farsi in Iran

GeoCurrents is deeply concerned with language mapping, as we find maps of language distribution to be highly useful and, if done properly, aesthetically appealing. But we also tend to be critical of linguistic cartography, as the spatial patterning of language is often too complex to be easily captured in maps. Dialect continua, zones of pervasive bilingualism, overlapping lingua francas, areas of linguistic interspersion, urban/rural language discrepancies, and mobile language communities all present major challenges for the mapmaker. Differences in population density is another tricky issue. Should one map a virtually unpopulated area in the same manner that one depicts a densely populated zone? And if one decides to leave uninhabited (or mostly uninhabited) areas unmarked, how large and how unpopulated do they have to be before they appear on the map?

As a result of these and other issues, linguistic maps, whether of a particular place, an individual language, or a language family, often vary greatly from one cartographer to another. Such differences were recently brought home as I examined various language maps of Iran, many of which are readily available on the internet. In particular, the area covered Farsi/Persian, the national language, differs significantly. I therefore decided to overlay these different depictions of Persian/Farsi on a uniform base-map of Iranian provinces so that they can be easily compared. Eleven such maps are posted here, both in their original form and with the Persian/Farsi zone extracted and placed on the common base map. The overlays are not particularly precise, owing largely to differences in map projection; a large amount of tedious handwork was necessary to make them accord as closely as they do with the originals. It is also important to note that the original maps themselves vary in regard to the area depicted. Some show merely Iran, but others include neighboring countries as well. The overlay maps, however, show only Iran.

The maps are arranged in rough descending order, with the first map showing the largest expanse of Persian/Farsi, and the last map showing the smallest one.

Farsi Language Map1

Depiction 1.  The first map is by far the simplest, as it shows Iran as uniformly Persian-speaking. Such a depiction is accurate in one sense, as Persian/Farsi is the national language, and hence is used for official purposes throughout Iran. It also serves as the lingua franca of those parts of the country in which it is not the dominant mother tongue. The source map for Depiction 1, however, is problematic, as it purports to show the overall distribution of Persian, yet it does so entirely on the basis of national boundaries. Depicting Afghanistan and especially Uzbekistan as uniformly PMap of Persian Speakersersian-speaking is far from accurate.










Depiction 2. The source map for DeFarsi Language Map 2piction 2, found in the Wikipedia Commons, is oddly titled “iranethnics,” implying that it is concerned with ethnic identity rather than language per se. All of the categories mapped, however, are rooted in language, although the term “Fars” (the name of a province and, more generally, a region) is used rather than “Farsi.” In purely linguistic terms, “Fars” refers to a series of Persian dialects that are quite distinctive from standard Farsi. As one Wikipedia article puts it: “Northwestern Fars is one of the Central Iranian varieties of Iran. Its name is purely geographical: It is not particularly close to Farsi (Persian), but rather to Sivandi.” The Wikipedia’s family IranEthnics Maptree of Iranian languages treats Fars a distinct minor language, with some 100,000 speakers. On the source map for Depiction 2, however, all languages in the Iranian family are subsumed under the “Fars” category except Kurdish and Baluchi. Linguistically, this maneuver makes little sense, as the Iranian languages or northern Iran, such as Gilaki and Talysh, are more closely related to Kurdish than they are to Persian/Farsi. But it is also true that that Gilaki- and Talysh-speakers tend to be much less ethnically distinct from Persians than the Kurds. Finally, this map restricts the extent of several minority languages, particularly Arabic, more than many other language maps of Iran.


Farsi Language Map 3Depiction 3. The base map used for Depiction 3, also found in the Wikipedia, depicts the various languages of the Iranian family, both in Iran and neighboring countries. As non-Iranian languages such as Arabic and Azeri are not depicted, areas in which they are spoken are generally mapped as Persian speaking (“Persan,” on the French map) or at least as partly Persian speaking* (as in the case of the Azeri-speaking area). The Caspian languages (Gilaki, Mazandarani, etc.) are depicted, but only in the Alborz (Elburz) Iranian Tongues MapMountains; the Caspian coast is instead shown as Persian speaking, a somewhat unusual depiction. The base map is also distinctive in elevating the Mukri dialect to the status of a separate language (even the Ethnologue, which tends to split languages, treats it as a mere dialect), and in depicting a sizable “Sangesar” area in the mountains of northern Iran. Yet according to the Wikipedia, the Sangsari language has only 36,000 speakers and is largely limited to the town of Sang-e Sar** (Mehdishahr), located south of the Alborz Mountains in Semnan Province. Related tongues in the Semnani branch of Iranian languages have similarly restricted distributions.

Farsi Language Map 4 Depiction 4. The base map used for Depiction 4, found on the website of a Farsi translation service, is crude and politically compromised, as it incorrectly depicts the distribution of several languages as coincident with provincial boundaries. It incorrectly labels Azeri as “Turkish” and Balochi as “Pashto.” (In contrast to Turkish and Azeri, which are closely related, Balochi and Pashto are only distantly related, as they are members of distinct branches within the Iranian family.)  It also unconventionally classifies the dialects of Farsi spoken in Khorasan as Dari, a term genIranLanguage:Ethnic Maperally limited to Persian as found in neighboring Afghanistan. But the boundary between Farsi proper and Dari—both forms of Persian—is difficult to draw. As the Wikipedia explains:

 The dialects of Dari spoken in Northern, Central and Eastern Afghanistan, for example in Kabul, Mazar, and Badakhshan, have distinct features compared to Iranian Persian. However, the dialect of Dari spoken in Western Afghanistan stands in between the Afghan and Iranian Persian. For instance, the Herati dialect shares vocabulary and phonology with both Dari and Iranian Persian. Likewise, the dialect of Persian in Eastern Iran, for instance in Mashhad, is quite similar to the Herati dialect of Afghanistan.

Farsi Language Map 5Depiction 5. The base map used for Depiction 5, found on a website devoted to Iranian languages, is similar to that of Depiction 3, although it shows a more limited distribution of Persian.

Iranian Languages Map2







Farsi Language Map6






Depiction 6. The base map used for Depiction 6, found in the Wikipedia, is labeled “Languages of Iran.” This map shows a relatively limited distribution of Persian, barely depicting it as reaching the sea. It also shows much larger than usual Arabic- and “Lorish”-speaking areas. It subsumes Mazanderani and the Semnani languages into the “Tabari” category, although according to most analyses Mazanderani is closer to Gilaki (mapped here as a separate language) than it is to the Semnani tongues. (Significantly, the people Iran Main Languages Mapof Mazandaran call their own tongue “Gileki.”) Oddly, the Qashqai Turkic area in Fars Province is missing.







Farsi Language Map 7


Depiction 7. The base map used for Depiction 7 is found on the “Maps of Net” website and is based on Ethnologue cartography. This map also restricts the distribution of Farsi; again it barely reaches the sea, but it does so in a different place than that indicated on Depiction 6. This map shows much larger than usual areas covered by Azeri (“Azerbaijani” Main Ethnic Languages in Iran Maphere), Arabic, and “Balouchi.” It also incorrectly portrays the northeastern Kurdish area as Turkic, labeling it “Khorasani Turks” and coloring it as if it were “Azerbaijani.” The extent of the Qashqai Turkic area in Fars province seems surprisingly large. Perhaps the oddest feature of this map is its exaggeration of the area covered by the southernmost Luri dialect, a very minor tongue by most accounts, and its elevation of this dialect to the status of a separate language (designated here as “Lari” to distinguish it from the “Lori” language of the north). This map also shows one uninhabited area, the Dasht-e Kavir (salt desert), in north-central Iran.

Farsi Language Map 8Depiction 8. The base map used for Depiction 8 is found in a Wikipedia article on Iranian languages. It shows large areas in central Iran as non-Persian speaking; presumably most of these areas are excluded by virtue of being largely uninhabited rather than by speaking a different language, but the mapping conventions make it impossible to be Iranian Language Map 3certain. This map also shows a much larger than usual distribution of the Balochi language, in several discontinuous patches, in northeastern Iran. As in Depiction 3, the Caspian lowland is depicted as Persian speaking.






Farsi Language Map 9Depiction 9. The base map used for Depiction 9 is found on yet another Wikipedia page. It leaves large “sparsely populated” areas in eastern and central Iran blank, thus restricting the distribution of Farsi/Persian. It depicts Lur as a separate language, but divides it into two separate areas, mapping the central Luri zone as Persian speaking. It Iran Ethnoreligious Mapdepicts a sizable area along the Afghan border as “other,” which would presumably refer to Pashto.







Farsi Language Map 10



Depiction 10. The base map used for Depiction 10 comes from a Wikipedia map of ethnicity in Iran, although its categories are again are based on largely linguistic criteria. This map shows sizable uninhabited areas in east-central Iran, a not uncommon maneuver, but also does the same in southeastern Iran, an uncommon move (also found in the base map for Depiction 9). Again like Depiction 9, this map portrays the central Luri areas, but not the northern and southern ones, as Persian-speaking. It depicts a highly restricted Iran Ethnicity MapArabic zone in both Khuzestan Province and farther south along the coast.







Farsi Language Map 11


Depiction 11. The base map used for Depiction 11 comes from an older version of the language map of Iran posted on the Gulf 2000 site, which features the extraordinarily detailed cartography of Mike Izady. This map leaves large areas of sparse population unmarked, and hence restricts the distribution of Persian more than the other maps considered here. It makes several other unusual maneuvers. Luri is mapped as a dialect of Persian, yet the Raji dialect of central Iran is elevated to the status of a separate language. The Minabi dialect of the southeast, described by the Wikipedia as “a dialect which is something between Bandari and Balochi and Persian,” is also mapped as a separate language, and a small Cushitic-speaking zone (labeled “Somali, etc.”) is depicted in the same general area. The extent of Tati, closely related to Talysh, is much greater than in any other language map of Iran that I have investigated.

Iran Languages Izady Map









I am not qualified to assess which of these maps is the most accurate, and I hesitate to say whether such an assessment can even be made. I welcome feedback from readers on these and other issues pertaining to these maps.

*Note: for all depictions, areas shown as mixed between Farsi/Persian and some other language are left unmarked.


**This small city has an interesting recent history. According to the Wikipedia, “The primary religious belief in the area now is Shi‘ite Islam, but before the Islamic Revolution, there were many Bahá’ís in Sangsar, who had to migrate from the city after the revolution, due to a wide range of persecutions. As for other towns of Iran, the name has thus been changed by the Islamic authorities into Mahdishahr as if to signal its imposed pure Muslim identity. Mahdi is the Shia Muslim hidden Imam and Shahr means town in Persian, so Mahdishahr literally means town of Mahdi.”



How Large Was the Area in Which Proto-Indo-European Was Spoken?

As the current series on the origin and expansion of the Indo-European languages nears its completion, only a few remaining issues need to be discussed. Today’s post examines once again the mapping by Bouckaert et al. of the area likely occupied by the speakers of Proto-Indo-European (PIE). The focus here, however, is not on the location of this ancestral linguistic homeland, which they situate in southern Anatolia, but rather on the size of the area over which the language was supposedly spoken. The area so depicted on their maps, it turns out, is almost certainly much too large to be credible. By mapping a Neolithic language as covering almost one hundred thousand square kilometers, Bouckaert et al. demonstrate, yet again, a fundamental failure to understand the basic patterns of linguistic geography.   

Bouckaert et al. give a surprisingly precise figure for the area that their model indicates as the probable homeland of proto-Indo-European: 92,000 km2, roughly equivalent to the extent of Hungary or of the American state of Indiana (see the yellow polygon in the map to the left). But given the characteristically opaque phrasing of the authors, it is not immediately clear if this zone is supposed to represent the actual (likely) spatial extent of the PIE-speaking community, or if it is merely supposed to show the broader area in which a much more spatially restricted language group was located. One can deduce, however, that that the former argument is being advanced based on the authors’ framing of the spatial hypotheses supposedly advanced by two different proponents of the steppe theory:

The areas of the hypotheses are approximately 92,000 km2 for the Anatolian hypothesis, 421,000 km2 for the narrow Steppe hypothesis, and 1,760,000 kmfor the wider Steppe hypothesis. So, these areas show a bias toward the Steppe hypothesis; the area covered by the narrow Steppe hypothesis is more than four times larger than that of the Anatolian hypothesis. Likewise, the area covered by the wider Steppe hypothesis is more then (sic) 19 times larger than that of the Anatolian hypothesis.

As can be seen in the map posted here, the area outlined by the “narrow Steppe hypothesis” fits precisely within the area demarcated by the “wider steppe hypothesis.” Such a depiction would not be logical if Bouckaert et al. were proposing that these “areas” were merely the proposed zones in which in a more spatially restricted language had been located, as opposed to the probable zone that such a language actually covered. If the latter meaning had been intended, the “narrow Steppe hypothesis” would merely be a more precise version of the “wider Steppe hypothesis” rather than a different “hypothesis” altogether. One can thus conclude that the authors intend the yellow polygon to indicate the area over which Proto-Indo-European had been spoken, as posited by their model with the given parameters of uncertainty.


In the modern era, and to a significant extent across the past several thousand years, there is nothing unusual in a single language being spoken over a 92,000 square kilometer block of territory. But for such a situation to obtain, expansive spatial connectivity is necessary, which in turn depends on the power of the state or of some other form of social integration. In the world of Neolithic farmers, such regionally integrative institutions were almost certainly lacking, and as a result linguistic communities would have been much more spatially restricted. Such spatial limitations would have been even more pronounced in areas characterized by rough topography and formidable mountain ranges, as such barriers impede communication and thus enhance social and linguistic fragmentation. Yet as can be seen in the map posted here, Bouckaert et al. place the PIE homeland precisely in such a location. A single language spoken by tribal farmers over such a vast expanse of broken topography is all but impossible.

The situation in regard to the homeland identified by the steppe hypothesis would have been different. Under conditions of equestrian-oriented pastoral nomadism, linguistic communities could have occupied much larger territories than those found among agriculturalists living at the same time. The relatively flat topography of the steppe zone, moreover, would have allowed relatively easy communication among scattered groups. Sizable seasonal aggregations, often of a ceremonial nature, are also common under such circumstances, enhancing social solidarity over a broad expanse of land. But even given all of these considerations, the 421,000 km2 and the 1,760,000 km2 figures noted by Bouckaert et al. for the PIE homeland in two versions of the “steppe hypothesis” are still improbable. Geographically aware theorists thus tend to argue only that the original PIE homeland was situated in the western steppe zone, not over its full extent.

We cannot, of course, determine the areal extent of any prehistoric language, as the needed documentary evidence is lacking. It is tempting to associate specific languages with archeologically attested “cultures” that can be mapped, but it must be recalled that language often fails to correspond to groups defined on the basis of shared material culture; consider, for example, the “Pueblo Indians” and the Northwestern cultures of indigenous North America, both of which were highly multilingual, even at the language family level, yet substantially shared the same material cultures. Material culture, after all, is much more dependent on—and serves in part as an adaptation to—the physical environment, whereas languages seldom co-vary with physical geography; there is no way in which a certain word order pattern, or morphological type, or sound system would be more appropriate for any given landscape. All that we can do, therefore, is argue on the basis of contemporary analogues. Here we find that the areas covered by linguistic communities in those parts of the world that maintained “Neolithic” agricultural systems and forms of socio-political organization into modern times were of a restricted spatial scale. The archetypical location here is New Guinea, which is to this day characterized by pronounced linguistic fragmentation, as can be seen in the map posted here. One might object, however, on the basis that New Guinea is an extreme case and as such should not be used for comparative purposes. But in historically stateless areas elsewhere in the world, even where Neolithic technologies were superseded millennia ago, highly restricted linguistic territories remained the rule, as can be appreciated from the language map of central Nigeria posted here.* Maintaining a single language over an area as large as Hungary in such a context is highly unlikely, to say the least.

Similar objections apply to the mapping of the proto-languages of the major IE branches in Bouckaert et al. One must again consider the authors’ intentions in regard to their portrayal of these languages. It is not exactly clear, for example, what they mean by “the inferred location at the root of each subfamily is shown on the map” (see the map caption posted to the left). The “inferred location” of what? Presumably, they mean the inferred location “of the root,” and presumably “the root” refers to the proto-language that later generated each IE branch. It is still not clear, however, whether the colored areas are supposed to indicate the likely locations over which these proto-languages were spoken, or whether they merely show the probable zones in which much more spatially restricted languages were spoken. If the former scenario is indeed the case, the areas depicted are again much too large.

Of the “root languages” mapped on this figure, that of the Indo-Iranian languages is most preposterous. The previous post specified most of the problems associated with this inferred location. The map posted here also shows the extraordinary disconnection between the existing archeological evidence and the spatial hypothesis advanced by Bouckaert et al. I would further note that the area they advance for the origin of the Indo-Iranian languages makes no sense from the standpoint of physical geography. Its western apex is located in the middle of the uninhabitable Dasht-e Kavir (Great Salt Desert), its central portion is situated in the heights of the Hindu Kush, and its eastern extremity lies in the fertile plains of Punjab. It is unthinkable that any sedentary Neolithic population would have occupied such a territory at any given point in time.

*One could, however, argue that New Guinea and central Nigeria are highly linguistically diverse in part as a function of time. Both areas have been inhabited by modern humans for a very long period. Most of Eurasia has been populated by Homo sapiens sapiens for considerably time than West Africa, and to some extent even New Guinea (the presence of Neanderthals probably impeded the movement of modern humans into western Eurasia for millennia). As a result, one might expect somewhat greater linguistic differentiation in those places as compared to southern Anatolia. But it is also true that the Americas, which had been populated by modern humans for less time than western Eurasia, were also characterized by pronounced linguistic diversity. Significantly, agricultural areas in pre-Columbian North and South America that were not occupied by state-level societies were characterized by spatially restricted language groups.


The Linguistic Geography of the Wikipedia

One of the highlights of the Association of American Geographers meeting last week in Seattle was the annual Geography Bowl. Student teams competed to answer all manner of geographical questions, including a few that were devilishly difficult. The most impressive answer may have come in the final round, when the two remaining teams were asked to list the five top languages, after English, used in Wikipedia articles. The Middle Atlantic team buzzed in almost immediately, and one of its members confidently and correctly recited, “German, French, Polish, Italian, and Spanish.”

Both the lack of Chinese and the presence of Polish seemed extraordinary, prompting me to query the team after the contest. The response referenced the well-known cultural pride of the Poles, as well as the fact that Polish has roughly 40 million speakers, a considerable number.

An article in the “meta-wiki” provides detailed information on the use of the 281 languages in which Wikipedia articles have been written. The table posted lists the top fourteen of these languages, with their respective number of articles (in rounded figures). As one can see, Chinese is represented here, coming in twelfth place, between Swedish and Catalan. Such a showing is hardly impressive, however, considering the fact that more than a billion people speak Mandarin Chinese, whereas only around 10 million speak Swedish and 11.5 million Catalan. But neither is the showing of the top Wikipedia language, English. To demonstrate relative Wiki language standings, I calculated the number of articles per 1,000 total* speakers for each of the top fourteen Wikipedia languages. Here English is far surpassed by a number of other languages. Considering the fact that most Swedish, Dutch, and Norwegian Wikipedia users are fully fluent in English, the quantity of articles appearing in their native languages is impressive indeed. (Admittedly, articles in languages other than English are often translated from an English original.)

Overall, European languages dominate the Wikipedia list. A number of major non-European languages rank relatively high (Vietnamese coming in 17th place, Korean 21st, Indonesian 22nd, and Arabic 25th), but they are still surpassed by European languages with far fewer speakers. Several important Asian languages, moreover, rank very low: Bengali, for example, with more than 230 million speakers, is outranked by Luxembourgish, Welsh, and Icelandic, none of which even approaches one million speakers. Sub-Saharan African languages are least represented. Swahili ranks a respectable 75th, with more than 21,000 articles, but Hausa, a major language spoken by 43 million people, ranks 245th, with only 263 articles. By this metric, Hausa is bested by such obscure tongues as Norfolk and Nauruan, and even by long-deceased Gothic.

Another notable feature of the list is the relatively large number of articles written in non-national European languages, many of which are often regarded as mere dialects. In Spain alone, Asturian is used for more than 14,000 articles, Aragonese for more than 25,000, and Galician for more than 70,000. Local linguistic pride along with regionalism and sub-state nationalism are no doubt responsible for such elevated numbers. Such processes are largely but not entirely limited to Europe. In the Philippines, the obscure tongue of Waray-Waray (3.5 million speakers) has an amazingly large Wikipedia presence, its 102,000 articles far over-shadowing the 51,000 written in the national language Tagalog (Filipino).

A final oddity is the relatively high rankings of artificial languages. Almost as many Wikipedia articles are written in Esperanto as Arabic, and the constructed language of Volapük bests Hebrew, Hindi, Thai, and Greek. Ido, with an estimated 100-1200 speakers, boasts more than 21,000 articles, while Interlingua has more than 5,000, Novial more than 2,500, Interlingue (“Occidental”) almost 2,000, and Logban over 1,000. So-called dead languages are also reasonably well represented, with Latin being used for more than 52,000 articles, Old English (Anglo-Saxon) for 2,600, and Pali for 2,300. Artificial languages from fictional societies, however, do not make the list, even though such tongues as Navi and Klingon have plenty of aficionados. The explanation comes in a footnote: “The Klingon language edition of the Wikipedia is no longer hosted by Wikimedia and is now hosted by Wikia as Klingon Wiki.”

* As opposed to native speakers.