GeoNames Home | Postal Codes | Download / Webservice | About 

GeoNames Forum
  [Search] Search   [Recent Topics] Recent Topics   [Groups] Back to home page 
[Register] Register / 
[Login] Login 
Some Corrections in the database  XML
Forum Index -> General
Author Message
Anonymous



Since I don't really know where to give feedback about this, here is a (unstructured ) post about some errors I have spotted :

- In the country List :
AX ALA 248 FI Aland Islands Mariehamn

FI FIN 246 FI Finland Helsinki 337030.0 5223442
EU fi-FI,smn,sv-FI

The FIPS Code of Aland Islands and Finland seems to be the same. I checked the official FIPS Code, and FI seems to be given to Finland. And, I may be wrong, but I think FIPS codes are supposed to be unique, aren't they ?

Another error I spotted :

CN CHN 156 CH China Beijing 9596960.0 1306313812
AS bo,i,ii,za,zh-CN

China seems to be linked to the "i" language, which is not a 639-1 code...

Is there a better way to help correcting stuff like that than reporting it on the forum ?

Regards,
Sami Dalouche

marc



Joined: 08/12/2005 07:39:47
Messages: 4412
Offline

Hi Sami

The flat file with the country info is now in the cvs :
http://geonames.cvs.sourceforge.net/geonames/data/


The Aland Islands belong to Finland, so it is not wrong if you look at it from the iso angle. But it might be better to have a separate column for it looking from the fips angle. The problem is that there is no 1:1 relation between ISO codes and FIPS codes.


I have also removed the wrong entries for China.

According to this source there are 235 living languages in China. I don't think it makes sense to list them all.

http://www.ethnologue.com/show_country.asp?name=China

Regards

Marc

[WWW]
samokk



Joined: 13/10/2006 21:56:39
Messages: 82
Offline

Concerning FIPS.. I'm not sure what to do exactly... What would be possible is to have an additional column indicating whether the current entry is the main one.. So, we would have 2 FI entries, but one would be primary = true, while the other one would be primary = false ... ?

Because.. When you have GNS Cities.. They refer to FIPS codes. So, if you want to import the data into a strongly-typed database, you have to decide to which Country entity you are going to link it to.. And in this case, it should be finland, not the island....

Concerning China.. It's true that 235 is a lot of languages, but actually, when you think about it, it's not that much.. China accounts for 1 billion people, which is 1/6th of the planet's population.....

Regards,
Sami Dalouche
samokk



Joined: 13/10/2006 21:56:39
Messages: 82
Offline

Another problem...

In Adm1 :

RI.RI

But the RI iso code does not exist in Countries.txt....

Actually, It does exist in ISO 3166 :
http://en.wikipedia.org/wiki/ISO_3166-1_alpha-2

RI Indonesia

and country.txt info says
ID IDN 360 ID Indonesia Jakarta 1919440.0 241973879 AS id,jv,su


...
marc



Joined: 08/12/2005 07:39:47
Messages: 4412
Offline

The RI.RI admin code has been deleted.

If we add all 235 languages for China we should order them by % of the population who speak the language and it would also be useful to give the % in parenthesis.

The country file is in the cvs, please feel free to add whathever you see fit, including a column for mainFipsCode.

Marc

[WWW]
samokk



Joined: 13/10/2006 21:56:39
Messages: 82
Offline

Hi,

OK, I am thinking about the way to handle FIPS codes correctly, and will provide a corrected countryInfo.txt

Another thing.. Concerning the languages. If we start adding more information about the languages (such as the % of population who speak the language), I am more in favor of adding a new table/file.. Something like a ManyToMany relationship between language codes and countries, with additional columns adding the informations. Adding stuff between parentheses, etc, does not really help parsing the files..

Anyways, I'll provide files in the next few days.
marc



Joined: 08/12/2005 07:39:47
Messages: 4412
Offline

In the long run it would also be nice to have the languages not only per country, but also per province/state (ISO 3166-2).


Marc

[WWW]
samokk



Joined: 13/10/2006 21:56:39
Messages: 82
Offline

Hi,

haven't forgotten my promise. just going to play with the files in the next few days, it's just that I'm late, as usual currently messing with other technical problems

1] For the FIPS code, I'm still searching for a better solution as mainFipsCode, since it is pretty much inelegant (though better than nothing)
2] I am thinking of externalizing the languages to another table/file than the country one. Then, there could be a (type/id) pair, where type = country / adm, and id = country code / adm code. We would have no data for adms right now, but at least, the schema would allow it This file could also contain any kind of statistical data we have, concerning the % of people speaking the language, etc..

Will soon come back to you
Sami Dalouche

marc wrote:
In the long run it would also be nice to have the languages not only per country, but also per province/state (ISO 3166-2).


Marc 


marc



Joined: 08/12/2005 07:39:47
Messages: 4412
Offline

The data for Switzerland is more or less this :

CH.AG=Aargau : de
CH.AR=Appenzell Ausserrhoden : de
CH.BL=Basel-Landschaft :de
CH.BS=Basel-Stadt : de
CH.BE=Bern : de
CH.FR=Fribourg : fr, de
CH.GE=Genève : fr
CH.GL=Glarus : de
CH.GR=Graubünden : de, rm
CH.AI=Appenzell Innerrhoden : de
CH.LU=Luzern : de
CH.NE=Neuchâtel : fr
CH.NW=Nidwalden : de
CH.OW=Obwalden : de
CH.SG=Sankt Gallen : de
CH.SH=Schaffhausen : de
CH.SZ=Schwyz : de
CH.SO=Solothurn : de
CH.TG=Thurgau: de
CH.TI=Ticino : it
CH.UR=Uri : de
CH.VS=Valais: fr, de
CH.VD=Vaud : fr
CH.ZG=Zug : de
CH.ZH=Zürich : de
CH.JU=Jura : fr

[WWW]
samokk



Joined: 13/10/2006 21:56:39
Messages: 82
Offline

Hi,

ok, so first part of the modifications... the rest (languages, city codes integration, INSEE codes integration, etc) still needs work

My modified countries.txt file is available at http://www.sirika.com/data/geonames/geonamesCountries.20061015.txt

What has been done :
1] Added one column (EquivalentFipsCode) : Since ISO codes and FIPS codes do not match 1-1, an equivalent FIPS Code is used when a FIPS country corresponds to several ISO countries. For instance: Finland and Aaland Islands both correspond to the FI FIPS entity. So, the finland entry has the FI FIPS code, and the Aaland islands has the FI code as an equivalent fips code.

2] Azerbaijan. Currency is Manat, code AZN (http://en.wikipedia.org/wiki/Azerbaijan)
So, the following 2 lines :

AZ AZE 031 AJ Azerbaijan Baku 86,600 7,911,974
AS .az AMD Dram +374 9999 av,az,os 587116 GE,IR,AM
,TR,RU
AZ AZE 031 AJ Azerbaijan Baku 86,600 7,911,974
AS .az AZM Manat +994 av,az,os 587116 GE,IR,AM
,TR,RU
 

have been replaced by

AZ AZE 031 AJ Azerbaijan Baku 86,600 7,911,974
AS .az AZN Manat +994 av,az,os 587116 GE,IR,AM
,TR,RU
 


Same for moldova and Cyprus, which have duplicated entries, like AZ.

Moldova: http://en.wikipedia.org/wiki/Moldova


MD MDA 498 MD Moldova Chisinau 33,843 4,455,421 EU .md Ruple +373-533 9999 mo,ro,tr,uk,yi 6290251 RO,UA
MD MDA 498 MD Moldova Chisinau 33,843 4,455,421 EU .md MDL Leu +373 mo,ro,tr,uk,yi 617790 RO,UA
 

replaced by :

MD MDA 498 MD Moldova Chisinau 33,843 4,455,421 EU .md MDL Leu +373 mo,ro,tr,uk,yi 617790 RO,UA
 


And cyprus :


CY CYP 196 CY Cyprus Nicosia 9,250 780,133 AS .cy CYP Pound +357 9999 el-CY,tr-CY 146669
CY CYP 196 CY Cyprus Nicosia 9,250 780,133 AS .nc.tr TRY Lira +90-392 el-CY,tr-CY 146669
 

replaced by :


CY CYP 196 CY Cyprus Nicosia 9,250 780,133 AS .cy CYP Pound +357 9999 el-CY,tr-CY 146669
 


Also, I am wondering about something.. Why have all numbers (area, population..) been replaced by XXX,YYY... formatted numbers ? (I am speaking about the comma). It wasn't previously the case, and while importing, it is necessary to replace the commas, by "", which is a little annoying.. Is there a good reason, or can I also replace all the commas by empty strings in the file ?


That's all for now.. If anything's wrong with the file I modified, in order to have it incorporated as the official geonames countries.txt, do not hesitate to tell me. I hope I'll come up with the rest of the modifications soon.

Regards,
Sami Dalouche
marc



Joined: 08/12/2005 07:39:47
Messages: 4412
Offline

Hi Sami

Thanks for your efforts.
Is it possible you wanted to give us this URI :
http://www.sirika.com/data/geonames/geonamesCountries.20061031.txt

The one you have given seems to be another version. Is this correct?

Cheers

Marc

[WWW]
samokk



Joined: 13/10/2006 21:56:39
Messages: 82
Offline

Hi,

Oh yeah, sorry, the URI I wanted to give was indeed http://www.sirika.com/data/geonames/geonamesCountries.20061031.txt

The previous one was actually your version ;-p

Oh, and concerning the zip code data.. You mentionned on your blog that you updated the zip codes data for some countries. Is it possible to get some up to date dump of that ?

In order to link the zip codes to cities, here is what I plan to do :
- For each zip code lat/long, ask PostGIS what the 5 nearest cities are
- look at this list of cities, and match them somehow to the zipcode place name. I initially wanted to fuzzy match the place names, but until I fix some Compass/Lucene performance problems, I am going to stick with matching the first few letters of the place name.

What do you think about this approach ? Do you have any better idea ?

Sami Dalouche

marc wrote:
Hi Sami

Thanks for your efforts.
Is it possible you wanted to give us this URI :
http://www.sirika.com/data/geonames/geonamesCountries.20061031.txt

The one you have given seems to be another version. Is this correct?

Cheers

Marc 
 
Forum Index -> General
Go to:   
Powered by JForum 2.1.5 © JForum Team