Intriguing Features on the Oxford Map of the English Wikipedia

Submitted by on November 30, 2014 – 11:45 am  
Wikipedia MapAs a habitual Wikipedia reader, I am particularly intrigued by the map and article entitled “Mapping English Wikipedia” found at Information Geographies (at the Oxford Internet Institute). Here, almost 700,000 dots have been placed on a world map to show the locations of geotagged articles in the English-language Wikipedia. As the authors explain:

Not all articles are geotagged, but almost all articles about events and places tend to be. The data in this map were all taken from November 2011 Wikipedia data dumps. Our project team wrote a script to search for coordinate representations in every article (taking into the varying ways in which geo-coordinates are expressed). We improved the quality of our coordinates by doing things like eliminating or fixing erroneous coordinates, grabbing coordinates (where sensible) from not just structured infoboxes, and making sure to remove irrelevant coordinates (Wikipedia actually contains a lot of coordinates for extra-terrestrial entities like lunar craters!).

The results are interesting. As the authors understatedly note, “there is clearly a lot of unevenness in the amount of content about places, and large parts of our planet are still invisible from these digital augmentations…” The unevenness of coverage is indeed conspicuous, but much of that is to be expected. It is hardly surprising, for example, that vast reaches of sparsely populated land in northern Siberia would be largely by-passed by the Wikipedia. I am more perplexed, however, by the fact that a few uninhabited and remote places, such as South Georgia Island, would be fully covered by yellow dots, whereas some densely populated and easily accessible areas, such as China’s Shandong Peninsula, would be mostly unmarked. (I suspect that the attention given to South Georgia stems in part from popular interest in the survival story of the Shackleton Expedition.) But regardless of this South Georgia oddity, the relative paucity of coverage of China is surely one of the map’s more striking features.

Wikipedia Map West AfricaIndia is much more heavily covered in the English Wikipedia than China, as might be expected, considering the widespread use of English in India along with the British colonial legacy. But colonial legacies as well as the geographies of language are in general not easily seen on the map. Consider, for example, its portrayal of West Africa, visible in the first set of detailed maps. Here the Gambia can be made out, but otherwise political borders are not discernable, even though several of them separate Anglophone from Francophone countries. I have roughly outlined Ghana to emphasize this point. Notice as well the concentrated clusters of dots in Burkina Faso to the north of Ghana. As Burkina Faso is a poor, somewhat marginal, Francophone country, its prominence in the English Wikipedia is noteworthy.

Wikipedia Map Eastern EuropeOnly in a few parts of the world are political boundaries visible on the map. The clearest example is Eastern Europe; here Poland stands out in sharp contrast to Ukraine and Belarus. The heavy English Wikipedia coverage of Poland is intriguing, as is that of Estonia and Moldova. Estonia is noted for its tech-savvy population, and hence its standing in the encyclopedia is not too surprising, but I am mystified by the blanket coverage of Moldova, Europe’s poorest country.

Wikipedia Map Southern AsiaEqually mysterious to me are the patches of concentrated Wikipedia coverage in upper Burma. As the set of maps showing southern Asia indicates, Wikipedia reporting on India and across much of Southeast Asia matches population distribution relatively well. In southwestern China, however, this connection collapses; sparsely populated Tibet receives roughly the same coverage as densely populated Sichuan. I find it remarkable that one cannot even pick out the major metropolitan areas of Chengdu and Chongqing, both of which stand out very clearly on earth-at-night satellite images.

Wikipedia Map Middle EastPolitical boundaries are evident in several other parts of the world. Armenia and Azerbaijan, for example, are easily discernable, although the “blob of yellow” that covers both countries also oddly extends into Iran, a pattern that is only partially explicable on the basis of population density. Second-order political boundaries are vaguely evident in the Midwest of the United States, where the western and southern boundaries of Minnesota can be distinguished, as can the state boundaries along the Mississippi and Ohio rivers. In the Great Plains of the United States and Wikipedia Map MidwestCanada, linear features on the map correspond to roads and railways. This feature is particularly evident in northern Ontario and Manitoba, where two rail lines appear as long lines of yellow dots.

If any readers have any ideas about the usual features found on this map, I would be very interested to hear them.

  • nachasz

    I think that Polish case is relatively easy to explain. These geotagged articles about locations have been created by several wikibots that used databases from Central Statistical Office. For example article about village Pułankowice has been created and revised entirely by bots.

    • Thanks, that is very helpful. The existence of such bots does seem to detract from the map.

      • Jeronimo Constantina

        What about non-English language Wikipedias? Can their articles also be traced in terms of geotagging? And can the presence of wikibots also be demonstrated for these?

  • Fedor Manin

    Yes, as nachasz points out, many of the political boundaries are probably due to bot-created articles that come from databases of hamlets and such. Probably the borders between Midwestern and Southern states are related to differences in local government structure: in the Midwest all counties are further subdivided into “townships”, which are distinct from cities, towns, and villages. This extra level of local government means a lot of extra Wikipedia articles with fairly little content.

    A more interesting measure would be to pick out geotagged articles longer than some specified number of characters, such that it excludes most of these very short articles about administrative divisions.

    • Excellent points. It does seem that these problems should have been realized by the people who made the map.

  • The “blob of yellow” extending into Iran below the Azerbaijan Republic is fairly similar to the Azeri-speaking areas of Iran. Assuming these articles in the Azerbaijan Republic were produced by human users who speak Azeri, their interest would predictably extend to the southern areas of cultural Azerbaijan.

  • Poland’s part of the European Union and Britain is a lot richer than Poland. Therefore there’s quite a bit of migration (to the point there are a number of running jokes). Ukraine isn’t so there isn’t as much mileage in Ukrainians learning English to emigrate. I have no explanation for Moldova.

  • Novelty Nostalgia

    Could the northern Burma articles be about the second world war? It was an important battle for the Commonwealth forces in the Second World War

  • Alexander Richards

    I do wonder if the reason why China is so low is due to the Great Firewall. In general Wikipedia relies on having local people who can access databases of existing geotagged information. I would suspect that there is, in general, something of a paucity of this due to the more recent technological and economic boom in China and much of what is present may not be easily available online. In addition Wikipedia is at least partially blocked in China which probably means that ability of those with such information to edit it is limited.
    I also would not be surprised if in the intervening 3 years (and about a million extra articles) China’s coverage has improved somewhat.

    • Tembo

      That’s a good point. And, the fact that Taiwan, Hong Kong, and Macau have quite a few of them, whereas Mainland China only has relatively few, would seem to support that point.

  • Tembo

    Looking at the map, it is interesting to me that Japan has a lot of geotagged articles, but South Korea doesn’t seem to have so many. (There actually seem to be more in the Philippines than in the Republic of Korea.)

    This appears to be especially noteworthy, given how fast the internet is in that country, as well as how many computers (and internet cafes) there are, how widespread smartphones usage is, and how many people are studying English here. In fact, it looks like the Philippines, which is fairly poor, has quite a few more of those articles than Korea (a wealthy and developed country) does.

    I wonder why that is?

    Another interesting thing I noticed, when I looked at that, was that Kenya (particularly the corridor between Mombasa and Nairobi) had a lot of geotagged articles, as did southern Uganda (around Kampala). And, both Rwanda and Burundi had a fair number of them. However, Tanzania appeared to have relatively few, even though it is in the East African Community, along with the other four states already mentioned.

    • D. Schwartz

      Though South Korea does have dense corridor running from the Seoul area to Taejon to Daegu following the highways.

      Further it does hit many of the major population centers in the country and has a dense patch in SE corner reflecting the population center there.

      Though I would have to ask is English teaching in South Korea as common a it is in Japan and the Philippines?

      • Evan Derickson

        Not yet. The market for English teaching jobs in Japan is very developed and fairly competitive. South Korea is less so, as fewer qualifications are required for English teaching jobs there. China is still trying to meet demand for native English speakers to teach, and its much easier to find a job there than Japan and somewhat easier than SK. The low proportion of English speakers in the Chinese population, combined with government censorship, explains China’s appearance on the map pretty well.

        • D. Schwartz

          Thank you for that.

  • Mohamed Mahmoud

