GeoNames Forum

Hi Bakito

WSDL is not yet supported and I don't know whether it ever will. From my point of view it is an overkill for this kind of applications.

It shouldn't be too dificult to read the information from the pure xml output.

http://ws.geonames.org/countryInfo?

Marc

Hi Per

I dont' know of anything better then geonames. If you find a public domain POI database please let me know and I will integrate the dataset in geonames.

As of now geonames has a total of 885.331 spot features. Details on the statistics page :
http://www.geonames.org/statistics/total.html

Wikipedia is rather unstructured and the number of geo located entries is limited (some ten-thousands). The feature type encoded in the geo location tag of wikipedia is often plain wrong as many users copy paste the tag from an other entry and are not aware of the encoded meaning.

Marc

I use simple psql to import the dump into postgres. First I create a file create.sql :

Code:

 create table geoname (
         geonameid       int,
         name            varchar(200),
         asciiname        varchar(200),
         alternatenames  varchar(6000),
         latitude        float,
         longitude       float,
         fclass  char(1),
         fcode   varchar(10),
         country varchar(2),
         cc2 varchar(60),
         admin1  varchar(20),
         admin2  varchar(80),
         admin3  varchar(20),
         admin4  varchar(20),
         population      bigint,
         elevation       int,
         gtopo30         int,
         timezone varchar(40),
         moddate         date
 );
 
 create table alternatename (
         alternatenameId  int,
         geonameid          int,
         isoLanguage        varchar(7),
         alternateName     varchar(200),
         isPreferredName      boolean,
         isShortName    boolean,
         isColloquial    boolean,
         isHistoric    boolean
 );
 
 Note: the countryInfo file has changed since this posting was written.
 
 
 CREATE TABLE "countryinfo" (
      iso_alpha2 char(2),
      iso_alpha3 char(3),
      iso_numeric integer,
      fips_code character varying(3),
      name character varying(200),
      capital character varying(200),
      areainsqkm double precision,
      population integer,
      continent char(2),
      languages character varying(200),
      currency char(3),
      geonameId int
 );

Then I create the tables with 'psql dabase_name < create.sql'

in the psql command line I enter (copy/paste) :

Code:

 copy geoname (geonameid,name,asciiname,alternatenames,latitude,longitude,fclass,fcode,country,cc2, admin1,admin2,admin3,admin4,population,elevation,gtopo30,timezone,moddate) from 'allCountries.txt' null as '';
 copy alternatename  (alternatenameid,geonameid,isoLanguage,alternateName,isPreferredName,isShortName,isColloquial,isHistoric) from 'alternateNames.txt' null as '';
 copy countryInfo  (iso_alpha2,iso_alpha3,iso_numeric,fips_code,name,capital,areaInSqKm,population,continent,languages,currency,geonameId) from 'countryInfo.txt' null as '';

Edit 2006.08.07 : added cc2
Edit 2006.08.08 : added timezone
Edit 2007.03.24 : added admin2
Edit 2007.04.01 : updated table alternatename, added isPreferredName
Edit 2007.05.26 : update table alternateName, added isShortName
Edit 2007.07.25 : add admin3 and admin4
Edit 2007.07.30 : update table alternateName change isoLanguage from char(4) to char(7) to allow the pseudo code fr_1793
Edit 2007.08.23 : update countryInfo add currency and geonameId
Edit 2007.09.19 : remove quotes from countryInfo create table
Edit 2009.01.10 : http://forum.geonames.org/gforum/posts/list/1208.page
Edit 2009.04.13 : alter column population to bigint (from int) to allow for the population of continents.
Edit 2012.05.09 : add table alternatename, added isColloquial and isHistoric

The converter is based on the java Rome library with a GeoRSS Module.

The GeoRSS Module is open source and developed by geonames :

http://georss.geonames.org/

https://rome.dev.java.net/

The recent changes have GeoRSS tags in the "GeoRSS simple" format. The "ACME GeoRSS Viewer" only supports tags in the deprecated "W3C Geo" format. See http://www.georss.org/ for details.

I could add a parameter to the feed to overwrite the default output format, if you are interested in it.

Edit :
I fixed a bug with the rssToGeoRSS converter. You can now use it to convert the Geonames recent changes feed into a feed using the W3C format.

Here the recent changes piped through the converter and displayed on ACME :
http://www.acme.com/GeoRSS/?xmlsrc=http%3A%2F%2Fws.geonames.org%2FrssToGeoRSS%3FgeoRSS%3Dw3cGeo%26type%3Drss_2.0%26feedUrl%3Dhttp%3A%2F%2Fwww.geonames.org%2Frecent-changes.xml

The converter can also be used to convert into GoogleEarth format. Here the Geonames recent changes feed viewed with GoogleEarth :

http://ws.geonames.org/rssToGeoRSS?feedUrl=http://www.geonames.org/recent-changes.xml&type=kml

The international names are stored in a second database table with the geonameId refering to the main table. You find them in the file alternateNames.zip in the dump directory :

http://download.geonames.org/export/dump/

The tab seperated csv file has the structure :
geonameId, language code and name

The userinterface of geonames is currently only English, but it is the goal of geonames to display the name in the language of the user.

Here some examples (only place name and country name are translated):
Geneva for an English user :
http://www.geonames.org/search.html?q=genf&lang=en
Geneva for a German user :
http://www.geonames.org/search.html?q=genf&lang=de
Geneva for a French user :
http://www.geonames.org/search.html?q=genf&lang=fr

Or here for San Sebastian :
http://www.geonames.org/search.html?q=Donostia&lang=eu
http://www.geonames.org/search.html?q=Donostia&lang=en
http://www.geonames.org/search.html?q=Donostia&lang=es

The same is valid for the webservices, the place name search takes a lang parameter to specify the language.
http://www.geonames.org/export/geonames-search.html

If you edit a name on the userinterface it is best practice to enter the international/English name in the main edit window and enter all other languages with the corresponding language code in the alternate name form. It is could to enter the English name also in the alternate name form with the 'en' language code.

Hi luftikus143

some users prefer the forum, others the mailing list. Both have their pros and cons.
In the forum you can easily post anonymously and if you login you can edit/delete the message afterwards (to correct spelling errors...).

Choose whatever you like. Discussions relevant for many user are better posted to the mailing list.

Marc

Thanks for letting me know. I have put the text of this slashdot piece on our test case list.

It is a problem with finding a correct relevance factor for every name. In this case the country 'Mexico' has got too high a relevance compared to the state (admin1) 'New Mexico'.

Hi luftikus143

Yes it is an effect of UTF8. It is the DIN1 sorting and you want the DIN2 sorting.
The mysql doc says :

For example, the following equalities hold in both utf8_general_ci and utf8_unicode_ci:

Ä = A
Ö = O
Ü = U

from http://dev.mysql.com/doc/refman/5.0/en/charset-unicode-sets.html

I am not using mysql but it seems you have to use the latin charset to get the sorting you want.

The latin1_german1_ci and latin1_german2_ci collations are based on the DIN-1 and DIN-2 standards, where DIN stands for Deutsches Institut für Normung (the German equivalent of ANSI). DIN-1 is called the “dictionary collation” and DIN-2 is called the “phone book collation.”

latin1_german1_ci (dictionary) rules:

Ä = A
Ö = O
Ü = U
ß = s

latin1_german2_ci (phone-book) rules:

Ä = AE
Ö = OE
Ü = UE
ß = ss

Links :
http://dev.mysql.com/doc/refman/5.0/en/charset-unicode-sets.html
http://dev.mysql.com/doc/refman/5.0/en/charset-we-sets.html
http://dev.mysql.com/doc/refman/5.0/en/charset-collation-effect.html

Hope this helps

Marc

This is an interesting find. I have added it to the geonames test cases.
I post the text here as it will soon disappear in the orginal feed.

Title : "Bush nombra al polémico general Michael Hayden como nuevo director de la CIA"
Text : "El presidente de EEUU, George W. Bush, ha elegido al general de la Fuerza Aérea Michael Hayden como segundo responsable de la inteligencia estadounidense, después de John Negroponte. Hayden ucederá a Porter Goss al frente de la CIA. Leer. Escuchar",

I have had a look at how the geonames search engine is dealing with this text.
The words 'CIA', 'EEUU' and 'estadounidense' indicate that the text is about the United States. The word 'Hayden', however, is a placename in the US which makes the search engine think the text is about the place 'Hayden' in the US.

I will play a little bit with the relevance calculation for placenames in geonames, as the place 'Hayden' has only a population of 11,000 and has received too high a relevance.

http://www.geonames.org/search.html?q=hayden&country=US

Changing the relevance Algorithm will certainly improve it but I am not sure whether this will already solve the problem. An other possible improvement would be to use a list of firstnames and have the search algorithm make use of it. 'Michael' happens to be the second most popular firstname in English (after Jacob). The next word after 'Michael' is thus likely to be a family name and not a place name.

As you can see it is a pretty complex task to find an algorithm working for all different kind of texts. If you spot other problems let me know.

http://www.ssa.gov/OACT/babynames/

Hi Joerg

The full text search accepts now one or more parameters fcode.

http://ws.geonames.org/search?q=berlin&country=DE&maxRows=100&fcode=PPLC

Wildcards are also possible : fcode=PPL? will return all fcodes beginning with PPL.

Best Regards,

Marc

Hi xavi

I don't have an ansi file ready, but you can easily generate one for you.

SELECT countrycode,admincode1, ansiname FROM geonames WHERE feature_code='ADM1'

Hope this helps

Marc

Hi xavi

And with
SELECT * FROM geonames_places WHERE feature_code='ADM2' ;
you should get all provinces and with 'ADM3' all 8000+ municipalities ...
As you can see there are only 43 provinces and 1 municipalities in the geonames database. Please feel free to add the missing provinces. (I don't think it makes sense to manually add the 8000 missing municipalities

The file I was referrering to is now available in the download section : http://download.geonames.org/export/dump/admin1Codes.txt
It contains the admin1codes and the name for each code.

For Spain the fips to iso code mapping is 1-1 for most regions and provinces with exception of Ceuta and Melilla. I have update the page with the fips codes for the provinces, and fixed a lot of wrong iso codes on this page :
http://www.geonames.org/ES/administrative-division-spain.html

Marc

Now I understand. I was first thinking you wanted to say placenames were missing in these states.

The four states Massachusetts, Pennsylvania, Kentucky and Virginia are really missing in the gnis dataset.
I have added them and also updated the feature code of the four capitals :

http://www.geonames.org/recent-changes.html

Hi Thomas

How did you check the administratives regions? All states are in the geoname database, so I wonder what makes you think some states are missing.

Marc

You are right 'GB' it the official code. But 'UK' it definitely more used.

The English wikipedia for example contains 33.648 times 'UK' but only 625 times 'GB'.

I was just about to change the code to 'GB' when some one suggest to change LI to FL as FL is more common.
http://forum.geonames.org/gforum/posts/list/60.page

Marc

Hi Nenad

Funny you raise the question just now. In an other thread some one suggested to change the country code for the UK from 'UK' to 'GB' :
http://forum.geonames.org/gforum/posts/list/58.page

I was just about to change it as the main ISO code really is 'GB'. What do you think about UK/GB?
Should we always go with the main ISO code (GB,LI) or with the most common code (UK,FL) ?

Marc

You are right the file really should be part of the download. If you give me your email I can send you the file I am using, though it is not ready to be officially realeased.

Regards,

Marc

Hi xavi

Both UK and GB are valid iso codes for the United Kingdom. I just don't know what is more common. The top level domain for example is UK.
Do you think GB is more common?

Marc