GeoNames Home | Postal Codes | Download / Webservice | About 

GeoNames Forum
  [Search] Search   [Recent Topics] Recent Topics   [Groups] Back to home page 
[Register] Register / 
[Login] Login 
AlternateNames Lang clean up  XML
Forum Index -> General
Author Message
nicolas.helleringer



Joined: 11/08/2009 13:48:39
Messages: 6
Offline

Hi all

I tried to do some clean up on the alternatenames table.

Starting with a simple reference check :

select a.geonameid, a.isolanguage, a.name
from alternatename a join (
select isolanguage
from
(
select a.*
from alternatename a left outer join isolang b on (a.isolanguage=b.iso2)
where b.iso2 is null
) c
left outer join isolang d on(c.isolanguage=d.iso1)
where d.iso1 is null
group by isolanguage
having count(*)=1
) b
on (a.isolanguage=b.isolanguage)
;

where isolang table comes from http://www.loc.gov/standards/iso639-2/ascii_8bits.html

geonameid | isolanguage | name
-----------+-------------+----------------------------------------------------
2658761 | FR | (1226526,2658761,FR,Schaffhouse,,)
2950091 | alm | (2728768,2950091,alm,Bernau,,)
3130761 | EU | (1565861,3130761,EU,Aizarotz,,)
735087 | gree | (448956,735087,gree,Miriófiton,t,t)
2746057 | NL | (1244681,2746057,NL,"Kanaal Zutphen-Enschede",,)
611671 | kat | (1579031,611671,kat,"Stantsiya Tezeri",,)
6619415 | us | (2296866,6619415,us,"Indian Habitat Center",,)
1279232 | iso | (1804319,1279232,iso,adi,,)
1610911 | e | (776994,1610911,e,"Amphoe Doembang Nangbuat",t,t)
363338 | sqi | (1579663,363338,sqi,"Lumi Calases",,)
606086 | fit | (2728596,606086,fit,Jellivaara,,)
2661604 | PL | (1566876,2661604,PL,Bazylea,,)
3188225 | dt | (2014096,3188225,dt,"Velika Kladuscha",,)
6930926 | cn | (2698290,6930926,cn,上海西站,t,)
1608273 | chok | (2189069,1608273,chok,sompong,t,t)
733060 | jalt | (445839,733060,jalt,"Borissowo jaltata ka6ta",t,t)
3173977 | ld | (1893689,3173977,ld,"Al Plan",,)
3513447 | kh | (1982111,3513447,kh,55,,)
6301152 | 66 | (2080119,6301152,66,satun,t,)
3994443 | 978 | (1969608,3994443,978,"maria martinez",,)
3167022 | lih | (1902951,3167022,lih,Sann-a,,)
1861060 | lbe | (1972334,1861060,lbe,Japan,,)
1164648 | hindko | (2254485,1164648,hindko,"sohlan utli",t,)
1185099 | syl | (2198832,1185099,syl,Silôţ,,)
2633095 | nk | (1999758,2633095,nk,Askja,,)
6930834 | un | (2698239,6930834,un,"CH MAR",t,t)
3518277 | 562 | (1438092,3518277,562,"San Miguel Coatlinchoan",t,)
5128638 | va | (1632295,5128638,va,"Nova York",,)
(28 rows)

I could not make all the required changes (level 1 user )

easy steps :
FR => fr
PL => pl
NL => nl
us => en (I konw it a pity iso 693 -1 -2 and -3 do not have Bristish/American codes)
cn => zh
un => NULL

I will try to go on with occurences >1 and <20
checking for relevance
nicolas.helleringer



Joined: 11/08/2009 13:48:39
Messages: 6
Offline

Two improvements for the GeoNames project thant COULD be done on that subject :

- Use of only one of the iso 693 norm inside the isolang column (2 digits could/should only be a shortcut for data input)

- Refrence check on data input on this subject (avoid miss tip dk for da or sp for spa/es)
marc



Joined: 08/12/2005 07:39:47
Messages: 4501
Offline

Thanks for your work. I will implement a check on the interface to make sure the codes users enter are valid.

Best

Marc

[WWW]
 
Forum Index -> General
Go to:   
Powered by JForum 2.1.5 © JForum Team