GeoNames Home | Postal Codes | Download / Webservice | About 

GeoNames Forum
  [Search] Search   [Recent Topics] Recent Topics   [Groups] Back to home page 
[Register] Register / 
[Login] Login 
alternatenames *column* versus alternateNames *table*  XML
Forum Index -> General
Author Message
Big Jon



Joined: 15/06/2010 21:57:05
Messages: 3
Offline

I think I've run into a data consistency issue.

From the README, the alternateNames.txt table should contain a (proper) superset of the values in the "allCountries.txt" alternatenames column. However, it does not.

A quick and dirty (very naive) check shows just shy of 900,000 values that are in the alternatenames column that do not show up in the alterNames.txt table.

An example:

Serbia and Montenegro

Thoughts or comments?
marc



Joined: 08/12/2005 07:39:47
Messages: 4501
Offline

I don't understand what you mean. In any case they are not meant to be exactly the same.

Marc

[WWW]
Big Jon



Joined: 15/06/2010 21:57:05
Messages: 3
Offline

I know they are not mean to be the same.
The readme.txt file says:


Remark : the field 'alternatenames' in the table 'geoname' is a short version of the 'alternatenames' table. You probably don't need both.
If you don't need to know the language of a name variant, the field 'alternatenames' will be sufficient. If you need to know the language
of a name variant, then you will need to load the table 'alternatenames' and you can drop the column in the geoname table.
 


What that statement tells me is that the alternatenames field is a subset of the data contained in the alternatenames table.

However, that doesn't appear to be the case.
Using the postgresql functions string_to_array and unnest, I believe the following query returns the alternatenames that are present in an import of the 'allcountries' table, but which are not present in the 'alternatenames' table (using a case-insensitive comparison):

Code:
 SELECT FOO.* FROM 
   (SELECT DISTINCT 
     unnest(string_to_array(alternatenames, ',')) AS name 
     FROM allcountries) FOO 
   WHERE NOT EXISTS 
     (SELECT 1 
      FROM alternatenames 
      WHERE lower(alternatenames.name) = lower(FOO.name)); 
 


That query results in some 777315 values from the 'alternatenames' column that are not present in the 'alternatenames' table.

With respect to the example 'Serbia and Montenegro', that appears to have been an error on my part, however this example:

Residence du Ruanda

is present in allcountries.txt but is not present in alternatenames.txt (geonameid 4951.


marc



Joined: 08/12/2005 07:39:47
Messages: 4501
Offline

"Residence du Ruanda" is the ascii transliteration for "Résidence du Rwanda". I have updated the documentation.

Best

Marc

[WWW]
Big Jon



Joined: 15/06/2010 21:57:05
Messages: 3
Offline

The documentation says that the column (alternatenames) is an ASCII transliteration. OK, but, as an example:

781358 Tropojë Tropoje Tropoja,Tropoje-Fshat,Tropojë-Fshat,Тропоя 42.40417 20.16667 A ADM4 AL AL 47 781360 0 471 Europe/Tirane 2010-06-16

The alternatenames column clearly contain non-ASCII characters.

From the readme:

Remark : the field 'alternatenames' in the table 'geoname' is a short version of the 'alternatenames' table without links and postal codes but with ascii transliterations. You probably don't need both.

Just to clarify -- the alternatenames column is intended to be an ASCII transliterated variation of all of the alternate names for a given entity, except for those which are links or postal codes, right?

How is the ascii transliteration being performed?
marc



Joined: 08/12/2005 07:39:47
Messages: 4501
Offline

It field is not the ascii transliteration. It includes ascii transliteration together with the non-ascii text.

Marc

[WWW]
 
Forum Index -> General
Go to:   
Powered by JForum 2.1.5 © JForum Team