geoloc
Joined: 11/05/2012 11:57:09
Messages: 1
Offline
|
Hello everybody
I have lot of question about geonames data but have to congratulate you before all, for your difficult works !
Indeed
1 I take an example from the allCountries.txt file dl the March, 3, 2013.
Have a look for a query test :
Code:
SELECT n . * , vv.asciiname
FROM all_geonames_new n
LEFT JOIN ville_du_monde_region vv ON vv.geonameid
WHERE vv.geonameid = n.admin1id
AND n.asciiname LIKE 'Khanabad%'
Results :
geonameid country admin1id admin2id admin3id admin4id asciiname alternatenames latitude longitude fclass fcode cc2 admin1 admin2 admin3 admin4 population elevation gtopo30 timezone moddate namegeo asciiname
139 IR 127082 NULL NULL NULL Khanabad Khanabad,Khānābād 32.3500000 48.2666700 P PPL 15 0 97 Asia/Tehran 2012-01-16 Khānābād Ostan-e Khuzestan
140 IR 127082 NULL NULL NULL Khanabad Khanabad,Khānābād 32.3166700 48.2666700 P PPL 15 0 94 Asia/Tehran 2012-01-16 Khānābād Ostan-e Khuzestan
2607 IR 418862 NULL NULL NULL Khanabad Khanabad,Khānābād,khan abad,خان آباد 32.7699000 52.6537000 P PPL 28 0 2161 Asia/Tehran 2012-01-16 Khānābād Ostan-e Esfahan
23739 IR 126584 NULL NULL NULL Khanabad Khanabad,Khānābād,khan abad,خان آباد 34.8701000 46.8711000 P PPL 16 0 1631 Asia/Tehran 2012-01-16 Khānābād Ostan-e Kordestan
24892 IR 125605 NULL NULL NULL Khanabad Khanabad,Khānābād,khan abad,خان آباد 34.0543000 47.8347000 P PPL 23 0 2093 Asia/Tehran 2012-01-16 Khānābād Ostan-e Lorestan
127971 IR 443793 NULL NULL NULL Khanabad Khanabad,Khaniabad,Khāniābād,Khānābād,khan abad,خا... 35.8385000 50.1440000 P PPL 38 0 1167 Asia/Tehran 2012-01-16 Khānābād Ostan-e Qazvin
127972 IR 126584 NULL NULL NULL Khanabad Khanabad,Khargieh,Khargīeh,Kharineh,Kharīneh,Khānā... 35.7726000 47.8949000 P PPLQ 16 0 1632 Asia/Tehran 2012-01-16 Khānābād Ostan-e Kordestan
127973 IR 126584 NULL NULL NULL Khanabad Khanabad,Khānābād,khan abad,خان آباد 35.4437000 47.4936000 P PPL 16 0 1854 Asia/Tehran 2012-01-16 Khānābād Ostan-e Kordestan
127974 IR 124763 NULL NULL NULL Khanabad Khanabad,Khānābād,khan abad,خان آباد 35.3428000 50.5827000 P PPLQ 34 0 1235 Asia/Tehran 2012-01-16 Khānābād Ostan-e Markazi
127981 IR 125605 NULL NULL NULL Khanabad Khanabad,Khānābād,khan abad,خان آباد 33.3037000 48.9214000 P PPL 23 0 1456 Asia/Tehran 2012-01-16 Khānābād Ostan-e Lorestan
127982 IR 125605 NULL NULL NULL Khanabad Khanabad,Khānābād,khan abad,خان آباد 33.2550200 49.6589300 P PPL 23 0 2091 Asia/Tehran 2012-01-16 Khānābād Ostan-e Lorestan
406283 IR 134766 NULL NULL NULL Khanabad Khanabad,Khanabad-e Bashi,Khānābād,Khānābād-e Bāsh... 29.7386000 53.1820000 P PPL 07 0 1587 Asia/Tehran 2012-01-19 Khānābād Ostan-e Fars
1137167 AF 1139049 NULL NULL NULL Khanabad Khairabad,Khairābād,Khanabad,Khānābād,khan abad,خا... 36.8421700 66.1085300 P PPL 31 0 294 Asia/Kabul 2012-01-16 Khānābād Wilayat-e Jowzjan
1137168 AF 1135690 NULL NULL NULL Khanabad Khanabad,Khānābād,khan abad,Ханабад,خان آباد 36.6825000 69.1155600 P PPL 24 71531 503 Asia/Kabul 2009-03-03 Khanabad Kunduz
1174257 PK 1168873 NULL NULL NULL Khanabad 34.9766900 72.4742000 P PPL 03 0 1186 Asia/Karachi 2011-05-07 Khānābād Khyber Pakhtunkhwa Province
1346072 TJ 1347488 NULL NULL NULL Khanabad 38.2717200 69.6436300 P PPL 02 0 931 Asia/Dushanbe 2012-02-28 Khanabad Viloyati Khatlon
1450654 AF 1141268 NULL NULL NULL Khanabad Khanabad,Khānābād,khan abad,خان آباد 33.1436200 67.3089600 P PPL 08 0 2934 Asia/Kabul 2012-01-19 Khānābād Wilayat-e Ghazni
1513641 KG 1529778 NULL NULL NULL Khanabad Khanabad,Khanabadskiy,Ханабад 40.8151100 72.9503400 P PPL 03 0 741 Asia/Bishkek 2012-01-17 Khanabad Jalal-Abad Oblasty
1513647 UZ 1484845 NULL NULL NULL Khanabad Khanabad,Khanovat,Ханабад 40.5949100 72.0746000 P PPL 03 0 466 Asia/Tashkent 2012-01-17 Khanabad Fergana
6747449 IR 124763 NULL NULL NULL Khanabad Khanabad,Khānābād,khan abad,خان آباد 34.6120000 49.2920000 P PPL 34 0 1723 Asia/Tehran 2012-01-20 Khānābād Ostan-e Markazi
6747488 IR 132142 NULL NULL NULL Khanabad Khanabad,Khānābād,khan abad,خان آباد 34.7770000 49.0885000 P PPL 09 0 1623 Asia/Tehran 2012-01-20 Khānābād Ostan-e Hamadan
7053316 AF 1135690 7053316 NULL NULL Khanabad Khan Abad,Khanabad,Khānābād,khan abad,خان آباد 36.6063200 69.1118500 A ADM2 24 7053316 0 665 Asia/Kabul 2012-01-20 Khānābād Kunduz
7321968 IR 6201375 NULL NULL NULL Khanabad Khanabad,Khānābād,khan abad,خان آباد 34.1222000 58.7197400 P PPL 42 0 1598 Asia/Tehran 2012-01-21 Khānābād Ostan-e Khorasan-e Razavi
7427838 PK 1168878 NULL NULL NULL Khanabad 36.2465000 74.4599000 P PPL 07 0 1952 Asia/Karachi 2010-08-07 Khānābād Northern Areas
44435 IR 134766 NULL NULL NULL Khanabad-e Mehdiqoli Khanabad-e Mehdiqoli,Khānābād-e Mehdīqolī 29.7333300 53.1500000 P PPL 07 0 1576 Asia/Tehran 2012-01-16 Khānābād-e Mehdīqolī Ostan-e Fars
44434 IR 134766 NULL NULL NULL Khanabad-e Mohammad Taqi Khanabad-e Mohammad Taqi,Khānābād-e Moḩammad Taqī 29.7500000 53.1666700 P PPL 07 0 1765 Asia/Tehran 2012-01-16 Khānābād-e Moḩammad Taqī Ostan-e Fars
138 IR 127082 NULL NULL NULL Khanabad-e Qotb Khanabad-e Qotb,Khānābād-e Qoţb 32.3500000 48.2500000 P PPL 15 0 99 Asia/Tehran 2012-01-16 Khānābād-e Qoţb Ostan-e Khuzestan
127980 IR 125605 NULL NULL NULL Khanabad-e Sar Asiab Emamabad,Emāmābād,Khanabad,Khanabad-e Sar Asiab,Kh... 33.4476000 49.1961000 P PPL 23 0 1632 Asia/Tehran 2012-01-16 Khānābād-e Sar Āsīāb Ostan-e Lorestan
What we can see ?
Duplicate city
139 IR 127082 NULL NULL NULL Khanabad Khanabad,Khānābād 32.3500000 48.2666700 P PPL 15 0 97 Asia/Tehran 2012-01-16 Khānābād Ostan-e Khuzestan
140 IR 127082 NULL NULL NULL Khanabad Khanabad,Khānābād 32.3166700 48.2666700 P PPL 15 0 94 Asia/Tehran 2012-01-16 Khānābād Ostan-e Khuzestan
A duplicate row are two rows who have same "asciiname, country, fcode, admin1, admin2, admin3, admin4" and different moddate.
-> I have only keep data where "fclass in ('A','P')".
-> I have delete At least 277 000 duplicate data (on 3 Million data)
sql :
Code:
Delete from all_geonames where geonameid in
( select * from
( SELECT n.geonameid
FROM all_geonames_new n, all_geonames_new r
WHERE n.asciiname = r.asciiname
AND n.country = r.country
AND n.fcode = r.fcode
AND n.admin1 = r.admin1
AND n.admin2 = r.admin2
AND n.admin3 = r.admin3
AND n.admin4 = r.admin4
AND n.geonameid != r.geonameid
AND n.moddate != r.moddate
and n.moddate < r.moddate)
) as p
);
Lot of cities have only admin1 reference, how to check if cities are not duplicates in fact so... ? (several cities of the same name in the same admin).
2 How to know what is the correct row between, for example :
139 IR 127082 NULL NULL NULL Khanabad Khanabad,Khānābād 32.3500000 48.2666700 P PPL 15 0 97 Asia/Tehran 2012-01-16 Khānābād Ostan-e Khuzestan
140 IR 127082 NULL NULL NULL Khanabad Khanabad,Khānābād 32.3166700 48.2666700 P PPL 15 0 94 Asia/Tehran 2012-01-16 Khānābād Ostan-e Khuzestan
Moddate are the same ? How is possible to save a row a day and modified this row the same day and save it in an other id ? Are they two distinct city or ? I don't understand what i have to do with this two row for example.
How i can say what is the correct row ? "delete .... where a.geonameid < b.geonameid" ? Have to delete the oldest geonameid by number (geoname)id ?
3 Lot of wrong latitude/longitude data
Same example...
139 IR 127082 NULL NULL NULL Khanabad Khanabad,Khānābād 32.3500000 48.2666700 P PPL 15 0 97 Asia/Tehran 2012-01-16 Khānābād Ostan-e Khuzestan
140 IR 127082 NULL NULL NULL Khanabad Khanabad,Khānābād 32.3166700 48.2666700 P PPL 15 0 94 Asia/Tehran 2012-01-16 Khānābād Ostan-e Khuzestan
check on gmap/openstreetmap/yahoomap lat/long 32.3500000 48.2666700
No city in this geolocalisation
check for the lat/long "modified" : 32.3166700 48.2666700
No place find too !
Why modificate geocode by an other wrong geocode (i think). Wrong source or i have missed something ?
Same for Khanabad or Markazi region in Iran lat/long are 35.3428000 50.5827000, or the real are 100 Miles on South.
If i want to find nearby of this city i will find the wrong nearby. Nearby of 100 miles are not nearby.
4 Openstreet map seems to use geonames data (?) and i can see the same error.
Example : Look for "Brealito" in Argentina
On an other api geocoding i have found the right geocoding. Focus on the brealito place
On OSM was wrong. Focus on the same lat/long specified by geonames (-25.2953400 -66.3417300) (situate 2km nearby).
5 Data on allCountries offered by geonames on free download and webservice are not the same, why ???
Example : use autocomple jquery powered by geonames
http://jqueryui.com/autocomplete/#remote-jsonp
Result for "Khanabad"
Your city: 12 results are available, use up and down arrow keys to navigate. Powered by geonames.org
Result:
Khanabad, Takhār, Afghanistan
Khanabad, Kunduz, Afghanistan
Khanabad, Khatlon, Tajikistan
Khanabad, Xocalı, Azerbaijan
Khanabad, Yevlax, Azerbaijan
Khanabad, Balkh, Afghanistan
Khanabad, Jalal-Abad, Kyrgyzstan
Khanabad, Andijon, Uzbekistan
Khanabad, Lorestān, Iran
Khanabad, Hamadān, Iran
Khanabad, Fārs, Iran
Khanabad, Kashkadarya Province, Uzbekistan
The result are really different.
- No city in Pakistan webservices but in data dl
- No city in Khuzestan region in Iran webservice but in data dl
- No city in Khorasan-e Razavi region in Iran in webservice but in data dl
- Fergana region in Uzbekistan in data dl and Andijon region in webservices ??
...
...
etc. etc. etc. ...
And what are the real data ?
I'm really muddled and my apalogize to this (non-exhaustive) list of error because who have done difficult works and i thank you but enormously correction appear necessary.
- specified how to delete duplicate rows (delete the oldest rows to keep only the latest modified row)
- Have the same data on webservices and data dl (how it's possible to be different ?)
- Have the good lat/long
(you do great job, it's pity to modified wrong lat/long by another wrong lat/long) -> i can't use your geocoding webservice if geocoding are wrong because i don't know what is the percent of correct lat/long.
- It will be usefull to know what country are not 'finished' (I have COUNT 2 436 635 rows (fclass P and A) without ADMIN2 reference and i don't know if it's "normal" ).
- Several cities are admin4 reference but not admin2, admin3 reference, is normal too ?
I hope my topic we will help you and will improve the customers ask : ) .
Don't forget to reply me please.
Have a good day
|