<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
	<channel>
		<title><![CDATA[Latest posts for the topic "Muddled - wrong geocode - duplicate rows - not same result Donwload vs webservice"]]></title>
		<link>http://forum.geonames.org/gforum/posts/list/4.page</link>
		<description><![CDATA[Latest messages posted in the topic "Muddled - wrong geocode - duplicate rows - not same result Donwload vs webservice"]]></description>
		<generator>JForum - http://www.jforum.net</generator>
			<item>
				<title>Muddled - wrong geocode - duplicate rows - not same result Donwload vs webservice</title>
				<description><![CDATA[ Hello everybody

I have lot of question about geonames data but have to congratulate you before all, for your difficult works !

Indeed

<b>1</b> I take an example from the allCountries.txt file dl the March, 3, 2013.

Have a look for a query test :

<span class="genmed"><b>Code:</b></span><br>
		<div style="overflow: auto; width: 100%;">
		<pre>
SELECT n . * , vv.asciiname
FROM all_geonames_new n
LEFT JOIN ville_du_monde_region vv ON vv.geonameid
WHERE vv.geonameid = n.admin1id
AND n.asciiname LIKE 'Khanabad%'
</pre>
		</div>

Results : 

<blockquote>
geonameid 	country 	admin1id 	admin2id 	admin3id 	admin4id 	asciiname 	alternatenames 	latitude 	longitude 	fclass 	fcode 	cc2 	admin1 	admin2 	admin3 	admin4 	population 	elevation 	gtopo30 	timezone 	moddate 	namegeo 	asciiname
139 	IR 	127082 	NULL	NULL	NULL	Khanabad 	Khanabad,Khānābād 	32.3500000 	48.2666700 	P 	PPL 		15 				0 		97 	Asia/Tehran 	2012-01-16 	Khānābād 	Ostan-e Khuzestan
140 	IR 	127082 	NULL	NULL	NULL	Khanabad 	Khanabad,Khānābād 	32.3166700 	48.2666700 	P 	PPL 		15 				0 		94 	Asia/Tehran 	2012-01-16 	Khānābād 	Ostan-e Khuzestan
2607 	IR 	418862 	NULL	NULL	NULL	Khanabad 	Khanabad,Khānābād,khan abad,خان آباد 	32.7699000 	52.6537000 	P 	PPL 		28 				0 		2161 	Asia/Tehran 	2012-01-16 	Khānābād 	Ostan-e Esfahan
23739 	IR 	126584 	NULL	NULL	NULL	Khanabad 	Khanabad,Khānābād,khan abad,خان آباد 	34.8701000 	46.8711000 	P 	PPL 		16 				0 		1631 	Asia/Tehran 	2012-01-16 	Khānābād 	Ostan-e Kordestan
24892 	IR 	125605 	NULL	NULL	NULL	Khanabad 	Khanabad,Khānābād,khan abad,خان آباد 	34.0543000 	47.8347000 	P 	PPL 		23 				0 		2093 	Asia/Tehran 	2012-01-16 	Khānābād 	Ostan-e Lorestan
127971 	IR 	443793 	NULL	NULL	NULL	Khanabad 	Khanabad,Khaniabad,Khāniābād,Khānābād,khan abad,خا... 	35.8385000 	50.1440000 	P 	PPL 		38 				0 		1167 	Asia/Tehran 	2012-01-16 	Khānābād 	Ostan-e Qazvin
127972 	IR 	126584 	NULL	NULL	NULL	Khanabad 	Khanabad,Khargieh,Khargīeh,Kharineh,Kharīneh,Khānā... 	35.7726000 	47.8949000 	P 	PPLQ 		16 				0 		1632 	Asia/Tehran 	2012-01-16 	Khānābād 	Ostan-e Kordestan
127973 	IR 	126584 	NULL	NULL	NULL	Khanabad 	Khanabad,Khānābād,khan abad,خان آباد 	35.4437000 	47.4936000 	P 	PPL 		16 				0 		1854 	Asia/Tehran 	2012-01-16 	Khānābād 	Ostan-e Kordestan
127974 	IR 	124763 	NULL	NULL	NULL	Khanabad 	Khanabad,Khānābād,khan abad,خان آباد 	35.3428000 	50.5827000 	P 	PPLQ 		34 				0 		1235 	Asia/Tehran 	2012-01-16 	Khānābād 	Ostan-e Markazi
127981 	IR 	125605 	NULL	NULL	NULL	Khanabad 	Khanabad,Khānābād,khan abad,خان آباد 	33.3037000 	48.9214000 	P 	PPL 		23 				0 		1456 	Asia/Tehran 	2012-01-16 	Khānābād 	Ostan-e Lorestan
127982 	IR 	125605 	NULL	NULL	NULL	Khanabad 	Khanabad,Khānābād,khan abad,خان آباد 	33.2550200 	49.6589300 	P 	PPL 		23 				0 		2091 	Asia/Tehran 	2012-01-16 	Khānābād 	Ostan-e Lorestan
406283 	IR 	134766 	NULL	NULL	NULL	Khanabad 	Khanabad,Khanabad-e Bashi,Khānābād,Khānābād-e Bāsh... 	29.7386000 	53.1820000 	P 	PPL 		07 				0 		1587 	Asia/Tehran 	2012-01-19 	Khānābād 	Ostan-e Fars
1137167 	AF 	1139049 	NULL	NULL	NULL	Khanabad 	Khairabad,Khairābād,Khanabad,Khānābād,khan abad,خا... 	36.8421700 	66.1085300 	P 	PPL 		31 				0 		294 	Asia/Kabul 	2012-01-16 	Khānābād 	Wilayat-e Jowzjan
1137168 	AF 	1135690 	NULL	NULL	NULL	Khanabad 	Khanabad,Khānābād,khan abad,Ханабад,خان آباد 	36.6825000 	69.1155600 	P 	PPL 		24 				71531 		503 	Asia/Kabul 	2009-03-03 	Khanabad 	Kunduz
1174257 	PK 	1168873 	NULL	NULL	NULL	Khanabad 		34.9766900 	72.4742000 	P 	PPL 		03 				0 		1186 	Asia/Karachi 	2011-05-07 	Khānābād 	Khyber Pakhtunkhwa Province
1346072 	TJ 	1347488 	NULL	NULL	NULL	Khanabad 		38.2717200 	69.6436300 	P 	PPL 		02 				0 		931 	Asia/Dushanbe 	2012-02-28 	Khanabad 	Viloyati Khatlon
1450654 	AF 	1141268 	NULL	NULL	NULL	Khanabad 	Khanabad,Khānābād,khan abad,خان آباد 	33.1436200 	67.3089600 	P 	PPL 		08 				0 		2934 	Asia/Kabul 	2012-01-19 	Khānābād 	Wilayat-e Ghazni
1513641 	KG 	1529778 	NULL	NULL	NULL	Khanabad 	Khanabad,Khanabadskiy,Ханабад 	40.8151100 	72.9503400 	P 	PPL 		03 				0 		741 	Asia/Bishkek 	2012-01-17 	Khanabad 	Jalal-Abad Oblasty
1513647 	UZ 	1484845 	NULL	NULL	NULL	Khanabad 	Khanabad,Khanovat,Ханабад 	40.5949100 	72.0746000 	P 	PPL 		03 				0 		466 	Asia/Tashkent 	2012-01-17 	Khanabad 	Fergana
6747449 	IR 	124763 	NULL	NULL	NULL	Khanabad 	Khanabad,Khānābād,khan abad,خان آباد 	34.6120000 	49.2920000 	P 	PPL 		34 				0 		1723 	Asia/Tehran 	2012-01-20 	Khānābād 	Ostan-e Markazi
6747488 	IR 	132142 	NULL	NULL	NULL	Khanabad 	Khanabad,Khānābād,khan abad,خان آباد 	34.7770000 	49.0885000 	P 	PPL 		09 				0 		1623 	Asia/Tehran 	2012-01-20 	Khānābād 	Ostan-e Hamadan
7053316 	AF 	1135690 	7053316 	NULL	NULL	Khanabad 	Khan Abad,Khanabad,Khānābād,khan abad,خان آباد 	36.6063200 	69.1118500 	A 	ADM2 		24 	7053316 			0 		665 	Asia/Kabul 	2012-01-20 	Khānābād 	Kunduz
7321968 	IR 	6201375 	NULL	NULL	NULL	Khanabad 	Khanabad,Khānābād,khan abad,خان آباد 	34.1222000 	58.7197400 	P 	PPL 		42 				0 		1598 	Asia/Tehran 	2012-01-21 	Khānābād 	Ostan-e Khorasan-e Razavi
7427838 	PK 	1168878 	NULL	NULL	NULL	Khanabad 		36.2465000 	74.4599000 	P 	PPL 		07 				0 		1952 	Asia/Karachi 	2010-08-07 	Khānābād 	Northern Areas
44435 	IR 	134766 	NULL	NULL	NULL	Khanabad-e Mehdiqoli 	Khanabad-e Mehdiqoli,Khānābād-e Mehdīqolī 	29.7333300 	53.1500000 	P 	PPL 		07 				0 		1576 	Asia/Tehran 	2012-01-16 	Khānābād-e Mehdīqolī 	Ostan-e Fars
44434 	IR 	134766 	NULL	NULL	NULL	Khanabad-e Mohammad Taqi 	Khanabad-e Mohammad Taqi,Khānābād-e Moḩammad Taqī 	29.7500000 	53.1666700 	P 	PPL 		07 				0 		1765 	Asia/Tehran 	2012-01-16 	Khānābād-e Moḩammad Taqī 	Ostan-e Fars
138 	IR 	127082 	NULL	NULL	NULL	Khanabad-e Qotb 	Khanabad-e Qotb,Khānābād-e Qoţb 	32.3500000 	48.2500000 	P 	PPL 		15 				0 		99 	Asia/Tehran 	2012-01-16 	Khānābād-e Qoţb 	Ostan-e Khuzestan
127980 	IR 	125605 	NULL	NULL	NULL	Khanabad-e Sar Asiab 	Emamabad,Emāmābād,Khanabad,Khanabad-e Sar Asiab,Kh... 	33.4476000 	49.1961000 	P 	PPL 		23 				0 		1632 	Asia/Tehran 	2012-01-16 	Khānābād-e Sar Āsīāb 	Ostan-e Lorestan
&nbsp;
		</blockquote>

What we can see ?

Duplicate city

<blockquote>
139 	IR 	127082 	NULL	NULL	NULL	Khanabad 	Khanabad,Khānābād 	32.3500000 	48.2666700 	P 	PPL 		15 				0 		97 	Asia/Tehran 	2012-01-16 	Khānābād 	Ostan-e Khuzestan
140 	IR 	127082 	NULL	NULL	NULL	Khanabad 	Khanabad,Khānābād 	32.3166700 	48.2666700 	P 	PPL 		15 				0 		94 	Asia/Tehran 	2012-01-16 	Khānābād 	Ostan-e Khuzestan
&nbsp;
		</blockquote>

A duplicate row are two rows who have same "asciiname, country, fcode, admin1, admin2, admin3, admin4" and different moddate.

-> I have only keep data where "fclass in ('A','P')".
-> I have delete At least 277 000 duplicate data (on 3 Million data)
sql :
<span class="genmed"><b>Code:</b></span><br>
		<div style="overflow: auto; width: 100%;">
		<pre>
Delete from all_geonames  where geonameid in 
&#40; select * from 
&#40; SELECT n.geonameid
FROM all_geonames_new n, all_geonames_new r
WHERE n.asciiname = r.asciiname
AND n.country = r.country
AND n.fcode = r.fcode
AND n.admin1 = r.admin1
AND n.admin2 = r.admin2
AND n.admin3 = r.admin3
AND n.admin4 = r.admin4
AND n.geonameid != r.geonameid
AND n.moddate != r.moddate
and n.moddate &lt; r.moddate&#41;
&#41; as p 
&#41;;
</pre>
		</div>

Lot of cities have only admin1 reference, how to check if cities are not duplicates in fact so... ? (several cities of the same name in the same admin). 

<b>2</b> How to know what is the correct row between, for example : 

<blockquote>
139 	IR 	127082 	NULL	NULL	NULL	Khanabad 	Khanabad,Khānābād 	32.3500000 	48.2666700 	P 	PPL 		15 				0 		97 	Asia/Tehran 	2012-01-16 	Khānābād 	Ostan-e Khuzestan
140 	IR 	127082 	NULL	NULL	NULL	Khanabad 	Khanabad,Khānābād 	32.3166700 	48.2666700 	P 	PPL 		15 				0 		94 	Asia/Tehran 	2012-01-16 	Khānābād 	Ostan-e Khuzestan
&nbsp;
		</blockquote>

Moddate are the same ? How is possible to save a row a day and modified this row the same day and save it in an other id ? Are they two distinct city or ? I don't understand what i have to do with this two row for example.

How i can say what is the correct row ? "delete .... where a.geonameid < b.geonameid" ? Have to delete the oldest geonameid by number (geoname)id ?

<b>3</b> Lot of wrong latitude/longitude data

Same example...

 <blockquote>
139 	IR 	127082 	NULL	NULL	NULL	Khanabad 	Khanabad,Khānābād 	<b>32.3500000 	48.2666700</b> 	P 	PPL 		15 				0 		97 	Asia/Tehran 	2012-01-16 	Khānābād 	Ostan-e Khuzestan
140 	IR 	127082 	NULL	NULL	NULL	Khanabad 	Khanabad,Khānābād 	<b>32.3166700 	48.2666700</b> 	P 	PPL 		15 				0 		94 	Asia/Tehran 	2012-01-16 	Khānābād 	Ostan-e Khuzestan&nbsp;
		</blockquote>

check on gmap/openstreetmap/yahoomap lat/long 32.3500000 48.2666700
No city in this geolocalisation

check for the lat/long "modified" : 32.3166700 	48.2666700
No place find too !

Why modificate geocode by an other wrong geocode (i think). Wrong source or i have missed something ?

Same for Khanabad or Markazi region in Iran lat/long are 35.3428000 	50.5827000, or the real are 100 Miles on South.

If i want to find nearby of this city i will find the wrong nearby. Nearby of 100 miles are not nearby.

<b>4</b> Openstreet map seems to use geonames data (?) and i can see the same error.

Example : Look for "Brealito" in Argentina

On an other api geocoding i have found the right geocoding. Focus on the brealito place
On OSM was wrong. Focus on the same lat/long specified by geonames (-25.2953400 	-66.3417300) (situate 2km nearby).

<b>5</b> Data on allCountries offered by geonames on free download and webservice are not the same, why ???

Example : use autocomple jquery powered by geonames

http://jqueryui.com/autocomplete/#remote-jsonp

Result for "Khanabad"

<blockquote>

Your city: 12 results are available, use up and down arrow keys to navigate. Powered by geonames.org
Result:

    Khanabad, Takhār, Afghanistan
    Khanabad, Kunduz, Afghanistan
    Khanabad, Khatlon, Tajikistan
    Khanabad, Xocalı, Azerbaijan
    Khanabad, Yevlax, Azerbaijan
    Khanabad, Balkh, Afghanistan
    Khanabad, Jalal-Abad, Kyrgyzstan
    Khanabad, Andijon, Uzbekistan
    Khanabad, Lorestān, Iran
    Khanabad, Hamadān, Iran
    Khanabad, Fārs, Iran
    Khanabad, Kashkadarya Province, Uzbekistan
&nbsp;
		</blockquote>

The result are really different.

- No city in Pakistan webservices but in data dl
- No city in Khuzestan region in Iran webservice but in data dl
- No city in Khorasan-e Razavi region in Iran in webservice but in data dl
- Fergana region in Uzbekistan in data dl and Andijon region in webservices ??
...
...

etc. etc. etc. ...

And what are the real data ?

I'm really muddled and my apalogize to this (non-exhaustive) list of error because who have done difficult works and i thank you but enormously correction appear necessary.

- specified how to delete duplicate rows (delete the oldest rows to keep only the latest modified row)
- Have the same data on webservices and data dl (how it's possible to be different ?)
- Have the good lat/long 
(you do great job, it's pity to modified wrong lat/long by another wrong lat/long) -> i can't use your geocoding webservice if geocoding are wrong because i don't know what is the percent of correct lat/long.
- It will be usefull to know what country are not 'finished' (I have COUNT 2 436 635 rows (fclass P and A) without ADMIN2 reference and i don't know if it's "normal" ).
- Several cities are admin4 reference but not admin2, admin3 reference, is normal too ?

I hope my topic we will help you and will improve the customers ask : ) .

Don't forget to reply me please.

Have a good day
]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/4193.page#12387</guid>
				<link>http://forum.geonames.org/gforum/posts/list/4193.page#12387</link>
				<pubDate><![CDATA[Thu, 9 May 2013 06:39:33]]> GMT</pubDate>
				<author><![CDATA[ geoloc]]></author>
			</item>
	</channel>
</rss>