GeoNames Home | Postal Codes | Download / Webservice | About 

GeoNames Forum
  [Search] Search   [Recent Topics] Recent Topics   [Groups] Back to home page 
[Register] Register / 
[Login] Login 
LOTS of apparent duplicates  XML
Forum Index -> General
Author Message
jay5r



Joined: 30/09/2020 21:57:56
Messages: 3
Offline

When you pull a children pull you see there are many, many cases of duplicates. Here's just one example…

http://api.geonames.org/children?geonameId=1275680&style=full&maxRows=5000&username=demo

Alīpura (P/PPL 10572271)
Alīpura (P/PPL 10556056)
Alīpura (P/PPL 1055638

Bahādurpur (P/PPL 10180684)
Bahādurpur (P/PPL 10180646)
Bahādurpur (P/PPL 10327144)
Bahādurpur (P/PPL 10569705)
Bahādurpur (P/PPL 10572090)
Bahāwalpur (P/PPL 10569485)

Bamnaula (P/PPL 10569613)
Bamnaula (P/PPL 10556092)

Bamnauli (P/PPL 10556093)
Bamnauli (P/PPL 10556981)
Bamnauli (P/PPL 1055703

Barhāpura (P/PPL 10556754)
Barhāpura (P/PPL 1276963)
Barhāpura (P/PPL 10556732)

Bhagwānpur (P/PPL 1055648
Bhagwānpur (P/PPL 1018067
Bhagwānpur (P/PPL 10327146)

Bhogpur (P/PPL 10571819)
Bhogpur (P/PPL 10572102)
Bhogpur (P/PPL 10556101)
Bhogpur (P/PPL 10556296)

And the examples go on and on (just for that one child pull).

It's so bad I've had to write a a routine to find dupes and pick a winner in each case. That routine has identified tens of thousands of dupes.
marc



Joined: 08/12/2005 07:39:47
Messages: 4412
Offline

I have looked at the first two and they don't look like duplicates. You cannot expect placenames to be unique.

Best Regards

Marc

[WWW]
jay5r



Joined: 30/09/2020 21:57:56
Messages: 3
Offline

Marc,

Just looking at the first two… Here are the links…

https://www.geonames.org/10572271/alipur.html

https://www.geonames.org/10556056/alipur.html

They have the same hierarchy, and the same name, and the same class, and the same code. That combination of factors should be impossible. Think about if you addressed a letter – there can't be two places 58km apart that both match the address on your letter.

The only thing that's different is their lat/long. (In this case those two places are ~58km apart).

https://www.google.com/maps/dir/29.157500,+78.159722/29.150833,78.611389/@29.1719633,78.1034362,10z/data=!3m1!4b1!4m7!4m6!1m3!2m2!1d78.159722!2d29.1575!1m0!3e0

So I dug deeper and Google Maps says 10556056 isn't Alipur, but rather Begumpur, Uttar Pradesh 246725, India.

What this means is that the duplicates aren't exactly duplicates as much as bad data. Someone has entered places in the wrong location. Looking like a duplicate is the indicator of bad data.

My point is that two records should never have the same hierarchy, and the same name, and the same class, and the same code. Something is wrong when that's the case. And there are at least 10s of thousands of cases like that.
marc



Joined: 08/12/2005 07:39:47
Messages: 4412
Offline

you cite google maps and when you zoom in you see at both location the label Alipura. Google also seems to think that both locations exist.


two records should never have the same hierarchy, and the same name, and the same class, and the same code
 

There is no such rule. At the contrary place name tend to cluster.


Best Regards

Marc

[WWW]
 
Forum Index -> General
Go to:   
Powered by JForum 2.1.5 © JForum Team