<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
	<channel>
		<title><![CDATA[Latest posts for the topic "Some Corrections in the database"]]></title>
		<link>http://forum.geonames.org/gforum/posts/list/4.page</link>
		<description><![CDATA[Latest messages posted in the topic "Some Corrections in the database"]]></description>
		<generator>JForum - http://www.jforum.net</generator>
			<item>
				<title>Some Corrections in the database</title>
				<description><![CDATA[ Since I don't really know where to give feedback about this, here is a (unstructured :-() post about some errors I have spotted :

- In the country List :
AX      ALA     248     FI      Aland Islands   Mariehamn

FI      FIN     246     FI      Finland Helsinki        337030.0        5223442
EU      fi-FI,smn,sv-FI

The FIPS Code of Aland Islands and Finland seems to be the same. I checked the official FIPS Code, and FI seems to be given to Finland. And, I may be wrong, but I think FIPS codes are supposed to be unique, aren't they ?

Another error I spotted :

CN      CHN     156     CH      China   Beijing 9596960.0       1306313812
AS      bo,i,ii,za,zh-CN

China seems to be linked to the "i" language, which is not a 639-1 code...

Is there a better way to help correcting stuff like that than reporting it on the forum ?

Regards,
Sami Dalouche

]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/186.page#995</guid>
				<link>http://forum.geonames.org/gforum/posts/list/186.page#995</link>
				<pubDate><![CDATA[Sun, 15 Oct 2006 18:44:45]]> GMT</pubDate>
				<author><![CDATA[ Anonymous]]></author>
			</item>
			<item>
				<title>Re:Some Corrections in the database</title>
				<description><![CDATA[ Hi Sami 

The flat file with the country info is now in the cvs :
http://geonames.cvs.sourceforge.net/geonames/data/


The Aland Islands belong to Finland, so it is not wrong if you look at it from the iso angle. But it might be better to have a separate column for it looking from the fips angle. The problem is that there is no 1:1 relation between ISO codes and FIPS codes.


I have also removed the wrong entries for China.  

According to this source there are 235 living languages in China. I don't think it makes sense to list them all. 

http://www.ethnologue.com/show_country.asp?name=China

Regards

Marc]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/186.page#998</guid>
				<link>http://forum.geonames.org/gforum/posts/list/186.page#998</link>
				<pubDate><![CDATA[Sun, 15 Oct 2006 22:19:22]]> GMT</pubDate>
				<author><![CDATA[ marc]]></author>
			</item>
			<item>
				<title>Re:Some Corrections in the database</title>
				<description><![CDATA[ Concerning FIPS.. I'm not sure what to do exactly... What would be possible is to have an additional column indicating whether the current entry is the main one.. So, we would have 2 FI entries, but one would be primary = true, while the other one would be primary = false ... ?

Because.. When you have GNS Cities.. They refer to FIPS codes. So, if you want to import the data into a strongly-typed database, you have to decide to which Country entity you are going to link it to.. And in this case, it should be finland, not the island....

Concerning China.. It's true that 235 is a lot of languages, but actually, when you think about it, it's not that much.. China accounts for 1 billion people, which is 1/6th of the planet's population..... 

Regards,
Sami Dalouche]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/186.page#999</guid>
				<link>http://forum.geonames.org/gforum/posts/list/186.page#999</link>
				<pubDate><![CDATA[Sun, 15 Oct 2006 23:22:48]]> GMT</pubDate>
				<author><![CDATA[ samokk]]></author>
			</item>
			<item>
				<title>Re:Some Corrections in the database</title>
				<description><![CDATA[ Another problem...

In Adm1 :

RI.RI

But the RI iso code does not exist in Countries.txt....

Actually, It does exist in ISO 3166 :
http://en.wikipedia.org/wiki/ISO_3166-1_alpha-2

RI 	Indonesia 

and country.txt info says 
ID      IDN     360     ID      Indonesia       Jakarta 1919440.0       241973879       AS      id,jv,su


...]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/186.page#1000</guid>
				<link>http://forum.geonames.org/gforum/posts/list/186.page#1000</link>
				<pubDate><![CDATA[Sun, 15 Oct 2006 23:33:42]]> GMT</pubDate>
				<author><![CDATA[ samokk]]></author>
			</item>
			<item>
				<title>Re:Some Corrections in the database</title>
				<description><![CDATA[ The RI.RI admin code has been deleted.

If we add all 235 languages for China we should order them by % of the population who speak the language and it would also be useful to give the % in parenthesis.

The country file is in the cvs, please feel free to add whathever you see fit, including a column for mainFipsCode.

Marc]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/186.page#1005</guid>
				<link>http://forum.geonames.org/gforum/posts/list/186.page#1005</link>
				<pubDate><![CDATA[Tue, 17 Oct 2006 19:22:26]]> GMT</pubDate>
				<author><![CDATA[ marc]]></author>
			</item>
			<item>
				<title>Some Corrections in the database</title>
				<description><![CDATA[ Hi,

OK, I am thinking about the way to handle FIPS codes correctly, and will provide a corrected countryInfo.txt

Another thing.. Concerning the languages. If we start adding more information about the languages (such as the % of population who speak the language), I am more in favor of adding a new table/file.. Something like a ManyToMany relationship between language codes and countries, with additional columns adding the informations. Adding stuff between parentheses, etc, does not really help parsing the files.. 

Anyways, I'll provide files in the next few days.]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/186.page#1012</guid>
				<link>http://forum.geonames.org/gforum/posts/list/186.page#1012</link>
				<pubDate><![CDATA[Thu, 19 Oct 2006 01:34:51]]> GMT</pubDate>
				<author><![CDATA[ samokk]]></author>
			</item>
			<item>
				<title>Re:Some Corrections in the database</title>
				<description><![CDATA[ In the long run it would also be nice to have the languages not only per country, but also per province/state (ISO 3166-2).


Marc]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/186.page#1015</guid>
				<link>http://forum.geonames.org/gforum/posts/list/186.page#1015</link>
				<pubDate><![CDATA[Thu, 19 Oct 2006 07:45:53]]> GMT</pubDate>
				<author><![CDATA[ marc]]></author>
			</item>
			<item>
				<title>Re:Some Corrections in the database</title>
				<description><![CDATA[ Hi,

haven't forgotten my promise. just going to play with the files in the next few days, it's just that I'm late, as usual ;) currently messing with other technical problems :-)

1] For the FIPS code, I'm still searching for a better solution as mainFipsCode, since it is pretty much inelegant (though better than nothing)
2] I am thinking of externalizing the languages to another table/file than the country one. Then, there could be a (type/id) pair, where type = country / adm, and id = country code / adm code. We would have no data for adms right now, but at least, the schema would allow it ;) This file could also contain any kind of statistical data we have, concerning the % of people speaking the language, etc..

Will soon come back to you ;)
Sami Dalouche

<p></p>

		<cite>marc wrote:</cite><br>
		<blockquote>In the long run it would also be nice to have the languages not only per country, but also per province/state (ISO 3166-2).


Marc&nbsp;
		</blockquote>

]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/186.page#1044</guid>
				<link>http://forum.geonames.org/gforum/posts/list/186.page#1044</link>
				<pubDate><![CDATA[Wed, 25 Oct 2006 22:35:26]]> GMT</pubDate>
				<author><![CDATA[ samokk]]></author>
			</item>
			<item>
				<title>Re:Some Corrections in the database</title>
				<description><![CDATA[ The data for Switzerland is more or less this :

CH.AG=Aargau : de
CH.AR=Appenzell Ausserrhoden : de
CH.BL=Basel-Landschaft :de
CH.BS=Basel-Stadt : de
CH.BE=Bern : de
CH.FR=Fribourg : fr, de
CH.GE=Genève : fr
CH.GL=Glarus : de
CH.GR=Graubünden : de, rm
CH.AI=Appenzell Innerrhoden : de
CH.LU=Luzern : de
CH.NE=Neuchâtel : fr
CH.NW=Nidwalden : de
CH.OW=Obwalden : de
CH.SG=Sankt Gallen : de
CH.SH=Schaffhausen : de
CH.SZ=Schwyz : de
CH.SO=Solothurn : de
CH.TG=Thurgau: de
CH.TI=Ticino : it
CH.UR=Uri : de
CH.VS=Valais: fr, de
CH.VD=Vaud : fr
CH.ZG=Zug : de
CH.ZH=Zürich : de
CH.JU=Jura : fr]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/186.page#1054</guid>
				<link>http://forum.geonames.org/gforum/posts/list/186.page#1054</link>
				<pubDate><![CDATA[Thu, 26 Oct 2006 19:16:14]]> GMT</pubDate>
				<author><![CDATA[ marc]]></author>
			</item>
			<item>
				<title>Re:Some Corrections in the database</title>
				<description><![CDATA[ Hi,

ok, so first part of the modifications... the rest (languages, city codes integration, INSEE codes integration, etc) still needs work ;)

My modified countries.txt file is available at <a href='http://www.sirika.com/data/geonames/geonamesCountries.20061015.txt' target='_new' rel="nofollow">http://www.sirika.com/data/geonames/geonamesCountries.20061015.txt</a>

What has been done :
1] Added one column (EquivalentFipsCode) : Since ISO codes and FIPS codes do not match 1-1, an equivalent FIPS Code is used when a FIPS country corresponds to several ISO countries. For instance: Finland and Aaland Islands both correspond to the FI FIPS entity. So, the finland entry has the FI FIPS code, and the Aaland islands has the FI code as an equivalent fips code. 

2] Azerbaijan. Currency is Manat, code AZN (http://en.wikipedia.org/wiki/Azerbaijan)
So, the following 2 lines :
<blockquote>
AZ      AZE     031     AJ      Azerbaijan      Baku    86,600  7,911,974       
AS      .az     AMD     Dram    +374    9999    av,az,os        587116  GE,IR,AM
,TR,RU 
AZ      AZE     031     AJ      Azerbaijan      Baku    86,600  7,911,974       
AS      .az     AZM     Manat   +994            av,az,os        587116  GE,IR,AM
,TR,RU 
&nbsp;
		</blockquote>
have been replaced by
<blockquote>
AZ      AZE     031     AJ      Azerbaijan      Baku    86,600  7,911,974       
AS      .az     AZN     Manat   +994            av,az,os        587116  GE,IR,AM
,TR,RU 
&nbsp;
		</blockquote>

Same for moldova and Cyprus, which have duplicated entries, like AZ.

Moldova: http://en.wikipedia.org/wiki/Moldova

<blockquote>
MD      MDA     498     MD      Moldova Chisinau        33,843  4,455,421       EU      .md             Ruple   +373-533        9999    mo,ro,tr,uk,yi  6290251 RO,UA
MD      MDA     498     MD      Moldova Chisinau        33,843  4,455,421       EU      .md     MDL     Leu     +373            mo,ro,tr,uk,yi  617790  RO,UA
&nbsp;
		</blockquote>
replaced by :
<blockquote>
MD      MDA     498     MD      Moldova Chisinau        33,843  4,455,421       EU      .md     MDL     Leu     +373            mo,ro,tr,uk,yi  617790  RO,UA
&nbsp;
		</blockquote>

And cyprus :

<blockquote>
CY      CYP     196     CY      Cyprus  Nicosia 9,250   780,133 AS      .cy     CYP     Pound   +357    9999    el-CY,tr-CY     146669
CY      CYP     196     CY      Cyprus  Nicosia 9,250   780,133 AS      .nc.tr  TRY     Lira    +90-392         el-CY,tr-CY     146669
&nbsp;
		</blockquote>
replaced by :

<blockquote>
CY      CYP     196     CY      Cyprus  Nicosia 9,250   780,133 AS      .cy     CYP     Pound   +357    9999    el-CY,tr-CY     146669
&nbsp;
		</blockquote>

Also, I am wondering about something.. Why have all numbers (area, population..) been replaced by XXX,YYY... formatted numbers ? (I am speaking about the comma). It wasn't previously the case, and while importing, it is necessary to replace the commas, by "", which is a little annoying.. Is there a good reason, or can I also replace all the commas by empty strings in the file ?


That's all for now.. If anything's wrong with the file I modified, in order to have it incorporated as the official geonames countries.txt, do not hesitate to tell me. I hope I'll come up with the rest of the modifications soon.

Regards,
Sami Dalouche]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/186.page#1084</guid>
				<link>http://forum.geonames.org/gforum/posts/list/186.page#1084</link>
				<pubDate><![CDATA[Wed, 1 Nov 2006 14:51:38]]> GMT</pubDate>
				<author><![CDATA[ samokk]]></author>
			</item>
			<item>
				<title>Re:Some Corrections in the database</title>
				<description><![CDATA[ Hi Sami

Thanks for your efforts.
Is it possible you wanted to give us this URI :
http://www.sirika.com/data/geonames/geonamesCountries.20061031.txt

The one you have given seems to be another version. Is this correct?

Cheers

Marc]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/186.page#1108</guid>
				<link>http://forum.geonames.org/gforum/posts/list/186.page#1108</link>
				<pubDate><![CDATA[Sun, 5 Nov 2006 10:30:51]]> GMT</pubDate>
				<author><![CDATA[ marc]]></author>
			</item>
			<item>
				<title>Re:Some Corrections in the database</title>
				<description><![CDATA[ Hi,

Oh yeah, sorry, the URI I wanted to give was indeed http://www.sirika.com/data/geonames/geonamesCountries.20061031.txt

The previous one was actually your version ;-p

Oh, and concerning the zip code data.. You mentionned on your blog that you updated the zip codes data for some countries. Is it possible to get some up to date dump of that ?

In order to link the zip codes to cities, here is what I plan to do :
- For each zip code lat/long, ask PostGIS what the 5 nearest cities are
- look at this list of cities, and match them somehow to the zipcode place name. I initially wanted to fuzzy match the place names, but until I fix some Compass/Lucene performance problems, I am going to stick with matching the first few letters of the place name.

What do you think about this approach ? Do you have any better idea ?

Sami Dalouche

<p></p>

		<cite>marc wrote:</cite><br>
		<blockquote>Hi Sami

Thanks for your efforts.
Is it possible you wanted to give us this URI :
http://www.sirika.com/data/geonames/geonamesCountries.20061031.txt

The one you have given seems to be another version. Is this correct?

Cheers

Marc&nbsp;
		</blockquote>]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/186.page#1110</guid>
				<link>http://forum.geonames.org/gforum/posts/list/186.page#1110</link>
				<pubDate><![CDATA[Sun, 5 Nov 2006 17:12:33]]> GMT</pubDate>
				<author><![CDATA[ samokk]]></author>
			</item>
	</channel>
</rss>