GeoNames Home | Postal Codes | Download / Webservice | About 

GeoNames Forum
  [Search] Search   [Recent Topics] Recent Topics   [Groups] Back to home page 
[Register] Register / 
[Login] Login 
Consistency homogeneus data  XML
Forum Index -> General
Author Message
Daniel Vázquez



Joined: 12/06/2012 16:24:56
Messages: 8
Offline

Hi guys, great job!!

I want to contribute with some thinking that I had from a dump data for run queries directly on our local DB.
Observing the geonames schema and looking for the ways to query commons actions, like: fill a combobox with provinces of a selected country, get autocomplete predictions of a poblations from a selected province, etc ... I'm finding with the following problems, to normalize a consistency set of results:

For example with a simple list of provinces of Spain, from geoname table:

geo_name_id | name
-------------+-------------------------------------
2509951 | Província de València <--[OK, This is Catalan, but why is need to say "Provincia de ..." the name of the province is "València" not "Província de València"]
2510407 | Province of Toledo <--[This is English]
2510910 | Provincia de Sevilla <--[This is Spanish, but "Provincia de ..." is not need]
3120935 | Gipuzkoa <--[This is Baske, here "Provincia de ..." is not used]
3130717 | Araba / Álava <--[This is Spanish, here a combo name with two languages expresion ]
6355234 | Murcia <--[This is Spanish without "Provincia de ...." ]
6424360 | Illes Balears <--[This is Spanish, about islands]

I think that the main idea is set the default name in their "territorial language" and I think it's correct when you're using spanish, catalan, baske, etc ... for each territorial. I think english name in "2510407 | Province of Toledo" only it's an error to correct. But we need some consistency on using "Pronvince of ..." as part of the name. I'm in disagreement whit this formula, "Province of ..." are not part of the name they are classifications words, not the name of the geoname record.

This list don't presents an homogeneus data, then I try to join with alternateNames table, for example here a provinces of Catalonia in spanish and catalan:

alternate_name_id | geo_name_id | iso_language | alternate_name | is_preferred_name | is_short_name
-------------------+-------------+--------------+------------------------+-------------------+---------------
2431383 | 3108287 | ca | Província de Tarragona | t |
2431135 | 3108287 | ca | Tarragona | |
2080205 | 3108287 | es | Provincia de Tarragona | t |
2186762 | 3108287 | es | Tarragona | | t
1325972 | 3128759 | ca | Barcelona | |
2426476 | 3128759 | ca | Província de Barcelona | t |
2080217 | 3128759 | es | Provincia de Barcelona | t |
2186893 | 3128759 | es | Barcelona | | t
2080229 | 6355230 | ca | Província de Girona | t |
2080231 | 6355230 | ca | Girona | |
2080232 | 6355230 | es | Gerona | | t
2080230 | 6355230 | es | Provincia de Gerona | t |
2186820 | 6355231 | ca | Lleida | |
2080220 | 6355231 | ca | Província de Lleida | |
6990473 | 6355231 | es | Lérida | t |


But you can see, not alwais is seted the is_preferred_name, some time (here in 6990473) the is_preferred_name is the short form. By other side, again the "Province of ..." name form, IMO alternate name like default name case, only need the name of the geoname, the province in this case, not the classification words "Provicia de ...". In this case the "long name" will be "Barcelona" and the short name can be "Bcn", "Barna", etc ...

Joining with alternateNames table, neither obtain a consistent and homogeneus set of data.

But I'm not sure what is an error or what is explicity functional/design rule of the geonames db

1. What rules are following for setting the default name of a geoname record?
2. alternateNames records, has an is_preferred_name set for each language?
3. There are some document that describe the rules that follow the tables, fields, etc?

Bests,
marc



Joined: 08/12/2005 07:39:47
Messages: 4501
Offline

Hi Daniel

There is a recent similar thread here:
http://forum.geonames.org/gforum/posts/list/3886.page

- the default name should be an international name, mostly English.
- the isPreferred flag helps distinguish several names. if no alternatenames are available then the main name is used, if alternatenames are available the name in the specific language is used, otherwise the one with no language at all.

Best

Marc

[WWW]
 
Forum Index -> General
Go to:   
Powered by JForum 2.1.5 © JForum Team