GeoNames Home | Postal Codes | Download / Webservice | About 

GeoNames Forum
  [Search] Search   [Recent Topics] Recent Topics   [Groups] Back to home page 
[Register] Register / 
[Login] Login 
question about 1st column (name of geographical point) in geoname table  XML
Forum Index -> General
Author Message
JOleg



Joined: 22/01/2017 01:02:39
Messages: 6
Offline

Hello,

I am trying to figure out what rule is used to decide what name goes into this 1st column? It's not plain ASCII, it's not native name either, it does contain accents, umlats, etc but for instance does not contain cyrillic letters. And in some cases, the English variant of georaphical point, for example: Moscow (that's how it's written in DB), but the native name is Москва, slug is: "Moskva".

So, I'm not suggesting anything. I am _asking_. What rules are taken into account when selecting the "main" version of the name that goes into the 1st column? What criteria?

From the way it looks is that it uses latin letters but allows ALL variations of latin letters, for example, with umlauts and accents. To support this theory - I do not see cyrillic letters and no chinese letters. Only "readable" latin-alike characters.

Would appreciate any help. And sorry if it has been asked, I tried to find similar question, but did not find.
JOleg



Joined: 22/01/2017 01:02:39
Messages: 6
Offline

As of today I am thinking the rule is -> Latin Script.

Latin script does allow digraphs to be used. Latin alphabet, on the other hand, does not.

As great as this database is, I don't understand why there's no dedicated column for a native name (local endonym). Instead, I see endonyms among the alternate names. Otherwise, amazing source for data...
marc



Joined: 08/12/2005 07:39:47
Messages: 4412
Offline

The name in the geoname table is an internationally understood name, often in English sometimes in latin spelling.
There is no dedicted 'native name' columns as you are the first to ask for it. The data is available in the alternatenames file, where available the native name can be derived from the alternatename table with the local language defined in the countryinfo file.
A dedicated column would be problematic for areas where several languages are spoken natively. What would be the local name for these features? In order to model it proberly we would need a 1:n relation which brings us back to the alternate name table.

Best Regards

Marc

[WWW]
JOleg



Joined: 22/01/2017 01:02:39
Messages: 6
Offline

Hi Marc, thanks for response.

You brought up a valid point, but I'm not sure if it's the case. Does the number of official languages affects the count of official tononym names?

That would mean all road signs have as much city names written on them as there are official languages. For instance, Belgium has 3 official languages: Dutch, French and German. I don't know whether they have 3 official names for every city... gotta check that out of curiousity later...

On your other note, I did take a look at alternateNames.txt file which, I thought was supposed to be dump of the AlternateNames table, but the contents of the file does not match the format written in the docs. Docs say the following format:


The table 'alternate names' :
-----------------------------
alternateNameId : the id of this alternate name, int
geonameid : geonameId referring to id in table 'geoname', int
isolanguage : iso 639 language code 2- or 3-characters; 4-characters 'post' for postal codes and 'iata','icao' and faac for airport codes, fr_1793 for French Revolution names, abbr for abbreviation, link for a website, varchar(7)
alternate name : alternate name or name variant, varchar(400)
isPreferredName : '1', if this alternate name is an official/preferred name
isShortName : '1', if this is a short name like 'California' for 'State of California'
isColloquial : '1', if this alternate name is a colloquial or slang term
isHistoric : '1', if this alternate name is historic and was used in the past
 



and there's a handy property called "isPreferredName". But I do not see these "boolean flags" in the actual dump:


3234616 4 fa Rūdkhāneh-ye Zākalī
3556003 5 fa Yekāhī
3556005 8 fa Tappeh-ye Seh Nūr
 


Can you please comment on that?
JOleg



Joined: 22/01/2017 01:02:39
Messages: 6
Offline

It turns out the flags are there afterall. The file is so big that I simply scrolled not enough pages down to see them.

However, my observation revealed that isPreferredName flag should not be relied upon. Un gives unexpected results. There are exonyms that are marked as isPreferredName, and probably non-exonyms in non-local-to-that-country languages as well, which is even worse. And for a capital of my country it said preferred name is in French, while we have no French language at all.

So still, 1 extra column in geonames table, would not hurt (official/native name). In official language to that country. If there are several languages, then just pick one that is considered more popular over the other. Still better than nothing. And it can even be nullable.
 
Forum Index -> General
Go to:   
Powered by JForum 2.1.5 © JForum Team