Author |
Message |
21/10/2020 23:09:12
|
jay5r
Joined: 30/09/2020 21:57:56
Messages: 3
Offline
|
When you pull a children pull you see there are many, many cases of duplicates. Here's just one example…
http://api.geonames.org/children?geonameId=1275680&style=full&maxRows=5000&username=demo
Alīpura (P/PPL 10572271)
Alīpura (P/PPL 10556056)
Alīpura (P/PPL 1055638
Bahādurpur (P/PPL 10180684)
Bahādurpur (P/PPL 10180646)
Bahādurpur (P/PPL 10327144)
Bahādurpur (P/PPL 10569705)
Bahādurpur (P/PPL 10572090)
Bahāwalpur (P/PPL 10569485)
Bamnaula (P/PPL 10569613)
Bamnaula (P/PPL 10556092)
Bamnauli (P/PPL 10556093)
Bamnauli (P/PPL 10556981)
Bamnauli (P/PPL 1055703
Barhāpura (P/PPL 10556754)
Barhāpura (P/PPL 1276963)
Barhāpura (P/PPL 10556732)
Bhagwānpur (P/PPL 1055648
Bhagwānpur (P/PPL 1018067
Bhagwānpur (P/PPL 10327146)
Bhogpur (P/PPL 10571819)
Bhogpur (P/PPL 10572102)
Bhogpur (P/PPL 10556101)
Bhogpur (P/PPL 10556296)
And the examples go on and on (just for that one child pull).
It's so bad I've had to write a a routine to find dupes and pick a winner in each case. That routine has identified tens of thousands of dupes.
|
|
|
26/10/2020 22:53:09
|
marc
Joined: 08/12/2005 07:39:47
Messages: 4412
Offline
|
I have looked at the first two and they don't look like duplicates. You cannot expect placenames to be unique.
Best Regards
Marc
|
|
|
|
27/10/2020 22:25:59
|
jay5r
Joined: 30/09/2020 21:57:56
Messages: 3
Offline
|
Marc,
Just looking at the first two… Here are the links…
https://www.geonames.org/10572271/alipur.html
https://www.geonames.org/10556056/alipur.html
They have the same hierarchy, and the same name, and the same class, and the same code. That combination of factors should be impossible. Think about if you addressed a letter – there can't be two places 58km apart that both match the address on your letter.
The only thing that's different is their lat/long. (In this case those two places are ~58km apart).
https://www.google.com/maps/dir/29.157500,+78.159722/29.150833,78.611389/@29.1719633,78.1034362,10z/data=!3m1!4b1!4m7!4m6!1m3!2m2!1d78.159722!2d29.1575!1m0!3e0
So I dug deeper and Google Maps says 10556056 isn't Alipur, but rather Begumpur, Uttar Pradesh 246725, India.
What this means is that the duplicates aren't exactly duplicates as much as bad data. Someone has entered places in the wrong location. Looking like a duplicate is the indicator of bad data.
My point is that two records should never have the same hierarchy, and the same name, and the same class, and the same code. Something is wrong when that's the case. And there are at least 10s of thousands of cases like that.
|
|
|
27/10/2020 22:33:29
|
marc
Joined: 08/12/2005 07:39:47
Messages: 4412
Offline
|
you cite google maps and when you zoom in you see at both location the label Alipura. Google also seems to think that both locations exist.
two records should never have the same hierarchy, and the same name, and the same class, and the same code
There is no such rule. At the contrary place name tend to cluster.
Best Regards
Marc
|
|
|
|
|