| Author |
Message |
![[Post New]](/gforum/templates/default/images/icon_minipost_new.gif) 05/02/2010 01:38:01
|
Kronuz
![[Avatar]](/gforum/images/avatar/d5c186983b52c4551ee00f72316c6eaa.png)
Joined: 05/02/2010 00:48:13
Messages: 3
Offline
|
I have been working in a script in python to find the corresponding geonameid for the zip codes... in my quest, I have found some issues in the dumps. For example, the country specific files do not contain all elements the allCountries file has, in a per country basis (i.e. there are missing locations in the country specific files that do exist in allCountries corresponding to the file in question)... other issue is geonames is using FIPS codes for fist level administrative divition codes, instead of ISO (it'd be better to use ISO.)
The algorithm I'm working on, is working all the 'A' and 'P' feature classes in the geonames, and trying to merge the zip codes dump with it. For that, first I built a FIPS to ISO codes map, since seemingly all or most of the codes in the zip dumps seem to be using ISO, while geonames uses FIPS... I used some available ISO and FIPS tables to merge them first, and then used data from the geonames to complement the map. Once with the map (still not complete, I couldn't find any free complete mapping), I read the whole zip codes file and then the geonames file, for which I try to find the zip codes using the fips to iso map I created, then using names similarities (levenshtein distance and ratio algorithms on all normalized names, including alternate names from the alternateNames) and finally by geographical proximity using the latitudes and longitudes (10, 100 and 1000 meters away, depending on the accuracy). For each of the found locations I add an extra Alternate Name for the postal code ('post' type) to the geonames database. For the missing (not matching items) I add a new geonames record in the 'P' class, 'PPL' feature (as for a few that I checked seemingly did not yet exist in geonames), and add another Alternate Name for the postal code for it.
As a result, I have a final alpha-version "merged" database:
* New items added to geonames (from mising matches): 638957
* Items matching a geonameid to a zip code: 417615
The script is written in python (albeit not a very clean script) and uses Levenshtein, GeoPy, and Quadtree modules. and runs the whole thing in roughly 40 minutes here. If anyone is interested for further tweaking, I can share it. Please any suggestions would be welcome.
|
|
|
 |
![[Post New]](/gforum/templates/default/images/icon_minipost_new.gif) 05/02/2010 05:48:56
|
artigas
Joined: 15/11/2009 17:09:42
Messages: 106
Offline
|
Greetings Kronuz -
A very serious amount of work. Thank you for your efforts.
I will look forward to your data submission when you have completed the match and merge.
Kindest Regards,
|
Robert Artigas
|
|
|
 |
![[Post New]](/gforum/templates/default/images/icon_minipost_new.gif) 05/02/2010 21:05:45
|
Kronuz
![[Avatar]](/gforum/images/avatar/d5c186983b52c4551ee00f72316c6eaa.png)
Joined: 05/02/2010 00:48:13
Messages: 3
Offline
|
Is there any way we can meet in a IRC channel or some other chat at some time? so that I can exaplain what I did and share the code live? I have a few questions too.
|
|
|
 |
![[Post New]](/gforum/templates/default/images/icon_minipost_new.gif) 13/02/2010 18:30:21
|
artigas
Joined: 15/11/2009 17:09:42
Messages: 106
Offline
|
Greetings -
I can tell you from experience that you are correct. Even in the USA there are USPS postal codes that are assigned to a city that is not defined in the USGS places database.
Since any postal system purpose is to deliver the mail or packages, they would define additional rural places that have not yet made it into a national database of places.
I would be interested in the list of places you have added to the geoname table and the complete list of postal codes you have in your alternate names table.
Thanks In Advance.
Kindest Regards,
|
Robert Artigas
|
|
|
 |
![[Post New]](/gforum/templates/default/images/icon_minipost_new.gif) 10/04/2010 03:13:41
|
sgatz
Joined: 30/10/2009 23:11:32
Messages: 3
Offline
|
I too am interested in the output of this. Has Marc taken a look at this? I'd love to have the ability to get geonameid by zip code.
|
|
|
 |
![[Post New]](/gforum/templates/default/images/icon_minipost_new.gif) 05/07/2010 16:14:07
|
nrhummer
Joined: 01/07/2010 13:10:21
Messages: 1
Offline
|
Hello,
does anyone have the script from Kronuz and can share it. Sounds like a great piece of software able to spare a great amount of time/work.
Thanks in advance.
kindest regards,
Nick
|
|
|
 |
![[Post New]](/gforum/templates/default/images/icon_minipost_new.gif) 23/09/2010 15:57:27
|
hellboy
Joined: 18/05/2010 12:13:56
Messages: 9
Offline
|
I have the same question
|
|
|
 |
![[Post New]](/gforum/templates/default/images/icon_minipost_new.gif) 23/01/2011 00:50:02
|
tjagust
Joined: 22/01/2011 23:26:47
Messages: 1
Offline
|
Hi,
i'm also interested in matching postal codes to geonames_id, and in linking my data with geonames rdf model...
I was planning to write my own algorithm (text similarity + lat/lon approximity) for that, but now maybe I don't have to start from scratch, which would be great...
Is the algorithm available somewhere? or at least the matching results?
|
|
|
 |
![[Post New]](/gforum/templates/default/images/icon_minipost_new.gif) 27/08/2011 08:45:42
|
vimtura
Joined: 26/08/2011 19:26:25
Messages: 1
Offline
|
Hey,
Has there been any progress on this? I am writing a geoname datastaging area and saw the ZIP code merge as too much hassle. If myself or a joint task-force was assembled to handle this how would the data be recorded in the geonames database? Would it need to be modified for a geonameid to have a postcode (or postcodes?) or simply for the postcode to link to a geonameid.
Thoughts?
David
|
|
|
 |
|
|