GeoNames Home | Postal Codes | Download / Webservice | About 

GeoNames Forum
  [Search] Search   [Recent Topics] Recent Topics   [Groups] Back to home page 
[Register] Register / 
[Login] Login 
2 and 3 digit ISO language codes  XML
Forum Index -> General
Author Message
crewbaby



Joined: 15/06/2006 16:10:07
Messages: 7
Offline

Hi,

I noticed that in the alternativenames dump, the col002 field for the ISO language code has entries with both 2 and 3 letter codes.

I know there are many languages for which there is no 2 letter code, so it must be expressed in 3 letters, but....why not have consistency in the cases where both codes are available?

For example, there are russian names tagged as 'ru' and other russian names tagged as 'rus'
marc



Joined: 08/12/2005 07:39:47
Messages: 4094
Offline

Hi Crewbaby

You are right, it would be better to have consistency. I just didn't have the time till now to write a small constency enforcer routine.
Do you happen to have one written in java and would you like to share the code with us? The function should take an iso language code as a single argument and return the 2 letter code if one is available, otherwise it will return the input code.

Marc

[WWW]
crewbaby



Joined: 15/06/2006 16:10:07
Messages: 7
Offline

Hi Marc,

I'm afraid I'm not much of a Java guy myself. I have a friend who might be able to help, though- I'll let you know by tomorrow if he can contribute.

I was just messing around with the codes on Excel and found:

1- 3-letter codes that map to 2-letter equivalent
2- 3-letter codes that have no equivalent
3- 3-letter codes that don't seem to mean anything (can't find them in the ISO table)

What's up with #3?? The offending codes are:

als, frp, ksh, lmo, nrm, pdc, pih, pms, rmy, vec, vls

any thoughts what these are, and how to deal with them?
marc



Joined: 08/12/2005 07:39:47
Messages: 4094
Offline

Most of them are local dialects, see here :

http://www.sil.org/iso639-3/

And I have just seen that "als" is wrong. The alternate names in this language are from wikipedia (example http://als.wikipedia.org/wiki/Basel ). Up to now I was under the impression wikipedia was using iso language codes, but at least in this case it is wrong. "als" in iso is "Albanian, Tosk" but wikipedia is using "als" for 'Allemanisch' (a German dialect).

Let me know if you find other errors of this type.

Marc

[WWW]
crewbaby



Joined: 15/06/2006 16:10:07
Messages: 7
Offline

Well spotted.

I've been looking over these language codes and there is still one that doesn't make sense: 'nah' ; I can't find any ISO representation for it, and are three entries for it in the alternatenames dump...

Also, I copy below the list of ISO 639-3 codes for which there doesn't appear to be any 639-2 or 639-1 equivalents; there aren't very many, and perhaps eyeballing them you can see if they are properly mapped.

als ---> Tosk Albanian (the one you found already)
frp ---> Franco-Provençal
ksh ---> Kölsch
lmo ---> Lombard
nrm ---> Narom
pih ---> Pitcairn-Norfolk
pms ---> Piemontese
rmy ---> Vlax Romani
rup ---> Macedo Romanian
vec ---> Venetian
vls ---> Vlaams

finally, I think it might be helpful to people to include in the dump a txt file with all these ISO codes/languages. I will email this to you now, as I already have it done, and leave it to you to decide whether to include it or not.

ps- my java friend not available, so can't help with the method

marc



Joined: 08/12/2005 07:39:47
Messages: 4094
Offline

Thank you Crewbaby, your list of iso languages is now included in the alternativenames zip file.

For the other readers :
nah is a language called "Nahuatl". It is included in the newest file Crewbaby sent me.

I have found this paragraph in the "als" wikipedia :

We know. When the Alemannic wikipedia was set up, Alemannic had no
code and since it was the Alsatian edition back then, als was
chosen. Recently we got a code (gsw), but it is only for Swiss
German and we decided to wait until there is a code for all variants
until we move the domain.
 


Marc

[WWW]
Anonymous



AFAIK all codes in wikipedia are ISO 639-3, apart from ALS, that really is a mistake. A much wider list can be built at wiktionaryZ.org where a much bigger number of them is used. you can also check Ethnologue
geotree


[Avatar]
Joined: 23/07/2007 18:28:40
Messages: 138
Location: France
Offline

To add Alsatian (Alsacien, Elsaessisch) alternate names to french departements 67 & 68, I have used 'gsw' lang code.

http://www.geonames.org/3034720/departement-du-bas-rhin.html
http://www.geonames.org/3013663/departement-du-haut-rhin.html

Not sure this is the good one, but found nothing better...

Christophe





Christophe
geotree.geonames.org
geotree.geonames.org/geotree.html
[WWW]
 
Forum Index -> General
Go to:   
Powered by JForum 2.1.5 © JForum Team