<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
	<channel>
		<title><![CDATA[Latest posts for the topic "alternatenames *column* versus alternateNames *table*"]]></title>
		<link>http://forum.geonames.org/gforum/posts/list/4.page</link>
		<description><![CDATA[Latest messages posted in the topic "alternatenames *column* versus alternateNames *table*"]]></description>
		<generator>JForum - http://www.jforum.net</generator>
			<item>
				<title>alternatenames *column* versus alternateNames *table*</title>
				<description><![CDATA[ I think I've run into a data consistency issue.

From the README, the alternateNames.txt table should contain a (proper) superset of the values in the "allCountries.txt" alternatenames column.  However, it does not.

A quick and dirty (very naive) check shows just shy of 900,000 values that are in the alternatenames column that do not show up in the alterNames.txt table.

An example:

Serbia and Montenegro

Thoughts or comments?
]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/1950.page#7918</guid>
				<link>http://forum.geonames.org/gforum/posts/list/1950.page#7918</link>
				<pubDate><![CDATA[Tue, 15 Jun 2010 22:30:30]]> GMT</pubDate>
				<author><![CDATA[ Big Jon]]></author>
			</item>
			<item>
				<title>Re:alternatenames *column* versus alternateNames *table*</title>
				<description><![CDATA[ I don't understand what you mean. In any case they are not meant to be exactly the same.

Marc]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/1950.page#7919</guid>
				<link>http://forum.geonames.org/gforum/posts/list/1950.page#7919</link>
				<pubDate><![CDATA[Wed, 16 Jun 2010 06:58:50]]> GMT</pubDate>
				<author><![CDATA[ marc]]></author>
			</item>
			<item>
				<title>Re:alternatenames *column* versus alternateNames *table*</title>
				<description><![CDATA[ I know they are not mean to be the same. 
The readme.txt file says:

<blockquote>
Remark : the field 'alternatenames' in the table 'geoname' is a short version of the 'alternatenames' table. You probably don't need both. 
If you don't need to know the language of a name variant, the field 'alternatenames' will be sufficient. If you need to know the language
of a name variant, then you will need to load the table 'alternatenames' and you can drop the column in the geoname table.
&nbsp;
		</blockquote>

What that statement tells me is that the alternatenames field is a subset of the data contained in the alternatenames table.

However, that doesn't appear to be the case.
Using the postgresql functions string_to_array and unnest, I believe the following query returns the alternatenames that are present in an import of the 'allcountries' table, but which are not present in the 'alternatenames' table (using a case-insensitive comparison):

<span class="genmed"><b>Code:</b></span><br>
		<div style="overflow: auto; width: 100%;">
		<pre>
SELECT FOO.* FROM 
  &#40;SELECT DISTINCT 
    unnest&#40;string_to_array&#40;alternatenames, ','&#41;&#41; AS name 
    FROM allcountries&#41; FOO 
  WHERE NOT EXISTS 
    &#40;SELECT 1 
     FROM alternatenames 
     WHERE lower&#40;alternatenames.name&#41; = lower&#40;FOO.name&#41;&#41;; 
</pre>
		</div>

That query results in some 777315 values from the 'alternatenames' column that are not present in the 'alternatenames' table.

With respect to the example 'Serbia and Montenegro', that appears to have been an error on my part, however this example:

Residence du Ruanda

is present in allcountries.txt but is not present in alternatenames.txt (geonameid 49518).


]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/1950.page#7920</guid>
				<link>http://forum.geonames.org/gforum/posts/list/1950.page#7920</link>
				<pubDate><![CDATA[Wed, 16 Jun 2010 19:17:01]]> GMT</pubDate>
				<author><![CDATA[ Big Jon]]></author>
			</item>
			<item>
				<title>Re:alternatenames *column* versus alternateNames *table*</title>
				<description><![CDATA[ "Residence du Ruanda" is the ascii transliteration for "Résidence du Rwanda". I have updated the documentation.

Best

Marc]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/1950.page#7923</guid>
				<link>http://forum.geonames.org/gforum/posts/list/1950.page#7923</link>
				<pubDate><![CDATA[Thu, 17 Jun 2010 07:53:34]]> GMT</pubDate>
				<author><![CDATA[ marc]]></author>
			</item>
			<item>
				<title>Re:alternatenames *column* versus alternateNames *table*</title>
				<description><![CDATA[ The documentation says that the column (alternatenames) is an ASCII transliteration. OK, but, as an example:

781358	Tropojë	Tropoje	Tropoja,Tropoje-Fshat,Tropojë-Fshat,Тропоя	42.40417	20.16667	A	ADM4	AL	AL	47	781360			0		471	Europe/Tirane	2010-06-16

The alternatenames column clearly contain non-ASCII characters.

From the readme:

Remark : the field 'alternatenames' in the table 'geoname' is a short version of the 'alternatenames' table without links and postal codes but with ascii transliterations. You probably don't need both. 

Just to clarify -- the alternatenames column is intended to be an ASCII transliterated variation of all of the alternate names for a given entity, except for those which are links or postal codes, right?

How is the ascii transliteration being performed? ]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/1950.page#7931</guid>
				<link>http://forum.geonames.org/gforum/posts/list/1950.page#7931</link>
				<pubDate><![CDATA[Thu, 17 Jun 2010 23:29:32]]> GMT</pubDate>
				<author><![CDATA[ Big Jon]]></author>
			</item>
			<item>
				<title>Re:alternatenames *column* versus alternateNames *table*</title>
				<description><![CDATA[ It field is not the ascii transliteration. It includes ascii transliteration together with the non-ascii text.

Marc]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/1950.page#7936</guid>
				<link>http://forum.geonames.org/gforum/posts/list/1950.page#7936</link>
				<pubDate><![CDATA[Fri, 18 Jun 2010 06:55:54]]> GMT</pubDate>
				<author><![CDATA[ marc]]></author>
			</item>
	</channel>
</rss>