Author |
Message |
15/06/2006 16:53:19
|
crewbaby
Joined: 15/06/2006 16:10:07
Messages: 7
Offline
|
Hi,
I noticed that in the alternativenames dump, the col002 field for the ISO language code has entries with both 2 and 3 letter codes.
I know there are many languages for which there is no 2 letter code, so it must be expressed in 3 letters, but....why not have consistency in the cases where both codes are available?
For example, there are russian names tagged as 'ru' and other russian names tagged as 'rus'
|
|
|
15/06/2006 17:12:01
|
marc
Joined: 08/12/2005 07:39:47
Messages: 4406
Offline
|
Hi Crewbaby
You are right, it would be better to have consistency. I just didn't have the time till now to write a small constency enforcer routine.
Do you happen to have one written in java and would you like to share the code with us? The function should take an iso language code as a single argument and return the 2 letter code if one is available, otherwise it will return the input code.
Marc
|
|
|
|
15/06/2006 17:34:05
|
crewbaby
Joined: 15/06/2006 16:10:07
Messages: 7
Offline
|
Hi Marc,
I'm afraid I'm not much of a Java guy myself. I have a friend who might be able to help, though- I'll let you know by tomorrow if he can contribute.
I was just messing around with the codes on Excel and found:
1- 3-letter codes that map to 2-letter equivalent
2- 3-letter codes that have no equivalent
3- 3-letter codes that don't seem to mean anything (can't find them in the ISO table)
What's up with #3?? The offending codes are:
als, frp, ksh, lmo, nrm, pdc, pih, pms, rmy, vec, vls
any thoughts what these are, and how to deal with them?
|
|
|
15/06/2006 18:27:26
|
marc
Joined: 08/12/2005 07:39:47
Messages: 4406
Offline
|
Most of them are local dialects, see here :
http://www.sil.org/iso639-3/
And I have just seen that "als" is wrong. The alternate names in this language are from wikipedia (example http://als.wikipedia.org/wiki/Basel ). Up to now I was under the impression wikipedia was using iso language codes, but at least in this case it is wrong. "als" in iso is "Albanian, Tosk" but wikipedia is using "als" for 'Allemanisch' (a German dialect).
Let me know if you find other errors of this type.
Marc
|
|
|
|
16/06/2006 13:28:09
|
crewbaby
Joined: 15/06/2006 16:10:07
Messages: 7
Offline
|
Well spotted.
I've been looking over these language codes and there is still one that doesn't make sense: 'nah' ; I can't find any ISO representation for it, and are three entries for it in the alternatenames dump...
Also, I copy below the list of ISO 639-3 codes for which there doesn't appear to be any 639-2 or 639-1 equivalents; there aren't very many, and perhaps eyeballing them you can see if they are properly mapped.
als ---> Tosk Albanian (the one you found already)
frp ---> Franco-Provençal
ksh ---> Kölsch
lmo ---> Lombard
nrm ---> Narom
pih ---> Pitcairn-Norfolk
pms ---> Piemontese
rmy ---> Vlax Romani
rup ---> Macedo Romanian
vec ---> Venetian
vls ---> Vlaams
finally, I think it might be helpful to people to include in the dump a txt file with all these ISO codes/languages. I will email this to you now, as I already have it done, and leave it to you to decide whether to include it or not.
ps- my java friend not available, so can't help with the method
|
|
|
18/06/2006 12:49:23
|
marc
Joined: 08/12/2005 07:39:47
Messages: 4406
Offline
|
Thank you Crewbaby, your list of iso languages is now included in the alternativenames zip file.
For the other readers :
nah is a language called "Nahuatl". It is included in the newest file Crewbaby sent me.
I have found this paragraph in the "als" wikipedia :
We know. When the Alemannic wikipedia was set up, Alemannic had no
code and since it was the Alsatian edition back then, als was
chosen. Recently we got a code (gsw), but it is only for Swiss
German and we decided to wait until there is a code for all variants
until we move the domain.
Marc
|
|
|
|
14/07/2006 07:19:04
|
Anonymous
|
AFAIK all codes in wikipedia are ISO 639-3, apart from ALS, that really is a mistake. A much wider list can be built at wiktionaryZ.org where a much bigger number of them is used. you can also check Ethnologue
|
|
|
10/08/2007 18:09:37
|
geotree
Joined: 23/07/2007 18:28:40
Messages: 138
Location: France
Offline
|
To add Alsatian (Alsacien, Elsaessisch) alternate names to french departements 67 & 68, I have used 'gsw' lang code.
http://www.geonames.org/3034720/departement-du-bas-rhin.html
http://www.geonames.org/3013663/departement-du-haut-rhin.html
Not sure this is the good one, but found nothing better...
Christophe
|
Christophe
geotree.geonames.org
geotree.geonames.org/geotree.html |
|
|
|