GeoNames Home | Postal Codes | Download / Webservice | About 

GeoNames Forum
  [Search] Search   [Recent Topics] Recent Topics   [Members]  Member Listing   [Groups] Back to home page 
[Register] Register / 
[Login] Login 
Messages posted by: samokk  XML
Profile for samokk -> Messages posted by samokk [78] Go to Page: Previous  1, 2, 3, 4, 5, 6 Next 
Author Message
Thanks Marc !

Sami
Re marc,

Just discovered that a few other features have the same character :

2812037
400243
1664252
2563438
2565416
3221987
3243136
3243137
1339009
1340964
395392
2578103

Thanks for your quick reply !
Sami
Hi,

there is a weird character in feature #3478883. I don't think it is UTF-8, and it makes Woodstox XMLWriter crash, so there are chances the string should be corrected...

3478883 Chácara Eurides Miqueliza Chacara Eurides Miqueliza -25.8905556 -49.1533333 S FRM BR 18 0 914 America/Sao_Paulo 1999-06-17


Regards,
Sami Dalouche

Hi,

Oh yeah, sorry, the URI I wanted to give was indeed http://www.sirika.com/data/geonames/geonamesCountries.20061031.txt

The previous one was actually your version ;-p

Oh, and concerning the zip code data.. You mentionned on your blog that you updated the zip codes data for some countries. Is it possible to get some up to date dump of that ?

In order to link the zip codes to cities, here is what I plan to do :
- For each zip code lat/long, ask PostGIS what the 5 nearest cities are
- look at this list of cities, and match them somehow to the zipcode place name. I initially wanted to fuzzy match the place names, but until I fix some Compass/Lucene performance problems, I am going to stick with matching the first few letters of the place name.

What do you think about this approach ? Do you have any better idea ?

Sami Dalouche

marc wrote:
Hi Sami

Thanks for your efforts.
Is it possible you wanted to give us this URI :
http://www.sirika.com/data/geonames/geonamesCountries.20061031.txt

The one you have given seems to be another version. Is this correct?

Cheers

Marc 
Hi,

ok, so first part of the modifications... the rest (languages, city codes integration, INSEE codes integration, etc) still needs work

My modified countries.txt file is available at http://www.sirika.com/data/geonames/geonamesCountries.20061015.txt

What has been done :
1] Added one column (EquivalentFipsCode) : Since ISO codes and FIPS codes do not match 1-1, an equivalent FIPS Code is used when a FIPS country corresponds to several ISO countries. For instance: Finland and Aaland Islands both correspond to the FI FIPS entity. So, the finland entry has the FI FIPS code, and the Aaland islands has the FI code as an equivalent fips code.

2] Azerbaijan. Currency is Manat, code AZN (http://en.wikipedia.org/wiki/Azerbaijan)
So, the following 2 lines :

AZ AZE 031 AJ Azerbaijan Baku 86,600 7,911,974
AS .az AMD Dram +374 9999 av,az,os 587116 GE,IR,AM
,TR,RU
AZ AZE 031 AJ Azerbaijan Baku 86,600 7,911,974
AS .az AZM Manat +994 av,az,os 587116 GE,IR,AM
,TR,RU
 

have been replaced by

AZ AZE 031 AJ Azerbaijan Baku 86,600 7,911,974
AS .az AZN Manat +994 av,az,os 587116 GE,IR,AM
,TR,RU
 


Same for moldova and Cyprus, which have duplicated entries, like AZ.

Moldova: http://en.wikipedia.org/wiki/Moldova


MD MDA 498 MD Moldova Chisinau 33,843 4,455,421 EU .md Ruple +373-533 9999 mo,ro,tr,uk,yi 6290251 RO,UA
MD MDA 498 MD Moldova Chisinau 33,843 4,455,421 EU .md MDL Leu +373 mo,ro,tr,uk,yi 617790 RO,UA
 

replaced by :

MD MDA 498 MD Moldova Chisinau 33,843 4,455,421 EU .md MDL Leu +373 mo,ro,tr,uk,yi 617790 RO,UA
 


And cyprus :


CY CYP 196 CY Cyprus Nicosia 9,250 780,133 AS .cy CYP Pound +357 9999 el-CY,tr-CY 146669
CY CYP 196 CY Cyprus Nicosia 9,250 780,133 AS .nc.tr TRY Lira +90-392 el-CY,tr-CY 146669
 

replaced by :


CY CYP 196 CY Cyprus Nicosia 9,250 780,133 AS .cy CYP Pound +357 9999 el-CY,tr-CY 146669
 


Also, I am wondering about something.. Why have all numbers (area, population..) been replaced by XXX,YYY... formatted numbers ? (I am speaking about the comma). It wasn't previously the case, and while importing, it is necessary to replace the commas, by "", which is a little annoying.. Is there a good reason, or can I also replace all the commas by empty strings in the file ?


That's all for now.. If anything's wrong with the file I modified, in order to have it incorporated as the official geonames countries.txt, do not hesitate to tell me. I hope I'll come up with the rest of the modifications soon.

Regards,
Sami Dalouche
Hmm..

Oh yeah, but sadly, there are no railways features, etc.. But it's true it could be very useful..

Concerning railway stations / underground, it is maybe possible to ask to the main companies for their database. They might share them, who knows... I remember of some friends who were able to get the french RATP (underground) station graph, for a school project. I have no ideas if other companies would be willing to share their data for an open source project, but it's worth asking...

Concerning restaurants, it would be great to have features for them, which could then be coupled with a yellow-page-style application ....

If anyone has some idea of some information to integrate...

Sami Dalouche

Just a question, though -

Are you looking for address geocoding, or zip code to city translation ? If you only care about zipcode, then the geonames zip code data is a starting dataset.

However, there isn't any mapping between geonames city's IDs and geonames zip codes, but I am planning to work on that (probably using a combination of GIS lat/long distance matching + fuzzy matching) whenever I have time (might be this week, next week, or later....). I'll sure share the resulting dataset with geonames users, once I get something to work

Regards,
Sami Dalouche
Hi,

the required information is available for some countries.. For the US, for instance :
http://geocoder.us/

This Perl script uses the TIGER dataset (more information on this website).. And too bad for the poor of us who live in countries that do not provide data for free.....

Sami Dalouche
( http://en.wikipedia.org/wiki/UTF-8 )
Hi,

Geonames database contains many features, whose legend is given at :
http://www.geonames.org/export/codes.html

You can download the dumps in the Download section. Is that what you need ?

Sami Dalouche
Hi,

Hmm, this is a great database ! Not sure what I can use it for, but it's definitely very nice information to have ! Thanks for the dump !

Sami Dalouche
Hi,

concerning IoC codes.. This page contains useful information, for anyone interested..
http://www.statoids.com/wab.html

Sami Dalouche
Hi,

haven't forgotten my promise. just going to play with the files in the next few days, it's just that I'm late, as usual currently messing with other technical problems

1] For the FIPS code, I'm still searching for a better solution as mainFipsCode, since it is pretty much inelegant (though better than nothing)
2] I am thinking of externalizing the languages to another table/file than the country one. Then, there could be a (type/id) pair, where type = country / adm, and id = country code / adm code. We would have no data for adms right now, but at least, the schema would allow it This file could also contain any kind of statistical data we have, concerning the % of people speaking the language, etc..

Will soon come back to you
Sami Dalouche

marc wrote:
In the long run it would also be nice to have the languages not only per country, but also per province/state (ISO 3166-2).


Marc 


Hi,

OK, I am thinking about the way to handle FIPS codes correctly, and will provide a corrected countryInfo.txt

Another thing.. Concerning the languages. If we start adding more information about the languages (such as the % of population who speak the language), I am more in favor of adding a new table/file.. Something like a ManyToMany relationship between language codes and countries, with additional columns adding the informations. Adding stuff between parentheses, etc, does not really help parsing the files..

Anyways, I'll provide files in the next few days.
Another problem...

In Adm1 :

RI.RI

But the RI iso code does not exist in Countries.txt....

Actually, It does exist in ISO 3166 :
http://en.wikipedia.org/wiki/ISO_3166-1_alpha-2

RI Indonesia

and country.txt info says
ID IDN 360 ID Indonesia Jakarta 1919440.0 241973879 AS id,jv,su


...
 
Profile for samokk -> Messages posted by samokk [78] Go to Page: Previous  1, 2, 3, 4, 5, 6 Next 
Go to:   
Powered by JForum 2.1.5 © JForum Team