deepblue
Joined: 09/07/2010 00:45:45
Messages: 8
Offline
|
I have been trying to get some uniform naming for administrative divisions (language wise), but so far, with limited success. I suspect this challenge may be common to others, so I thought of sharing what I got so far.
As an example, Geonames' Administrative Division of Finland page has a nicely uniform list in not just one but even in two (local) languages. But, from exploring the exported data, I haven't found how that can be re-created.
Looking at the main table, (fclass='A' AND fcode='ADM1', country='FI'),
* some are in Finnish:
Kainuu
Kanta-Häme
Keski-Pohjanmaa
Kymenlaakso
Paijat-Hame Region (maybe this is the English name?)
Pirkanmaa
Pohjanmaa
Pohjois-Karjala
Pohjois-Pohjanmaa
Pohjois-Savo
Satakunta
Uusimaa
Varsinais-Suomi
* some are in English:
Lapland
South Karelia Region
South Ostrobothnia
Southern Savonia
* and one in Swedish:
Mellersta Finland
About the last one: Swedish is not the main language in Mellersta Finland (Finnish is) – no idea why this one is in Swedish and that specific one may need fixing; I tried to edit it, but I don't have enough access to edit the name – fair enough. The point however is that there's a mix of languages.
Then I tried to use the data in admin1CodesASCII.txt. While that improved the particular case of "Mellersta Finland" by using it's English name (Central Finland), it made it "worse" for some other countries which previously (when using just 'name' from the geonames table) had all names in the same local language (good), when using admin1CodesASCII.txt it then has (only) some entries in English mixed in (again, not uniform).
Well, it's a fact that many administrative divisions name don't have English translations, perhaps having them in the local language is what can make the listings uniform? So I turned to the alternateNames table to try to get just local language names instead.
Hence, I have tried looking up an alternate name for each administrative division, using the order of the Languages field from the countryInfo.txt, sorted by the isPreferred field. Got some improvements, but quickly realized that the local language for many countries will come in different scripts (Chinese, Cyrillic) which I can't use for my purposes. I dabbled with automatic transliteration which isn't great, but worked, up to a point: some entries seem to have multiple alternate names in the main language of the country, with some alternate names being fairly uncommon and no isPreferred field to indicate the best/correct one, effectively making things worse for some countries.
Meanwhile, I also checked the web services (e.g. children for each country), but got the same mixed language results.
So, at the moment, I'm on a dead end as I have ran out of ideas of what else to try.
Would be great to know if Geonames' Administrative Division pages are created solely from the exported data, or if it uses something else (FIPS? ISO?) to get its uniform naming in terms of language?
|