| Author |
Message |
![[Post New]](/gforum/templates/default/images/icon_minipost_new.gif) 05/11/2009 00:01:53
|
rlevering
Joined: 20/07/2009 20:00:19
Messages: 16
Offline
|
Try searching for San Francisco, Cuba and tell me what in the world went on there. Did some input script mess up or was the original data that dirty?
|
|
|
 |
![[Post New]](/gforum/templates/default/images/icon_minipost_new.gif) 05/11/2009 00:30:24
|
rlevering
Joined: 20/07/2009 20:00:19
Messages: 16
Offline
|
On this note, I just ran a duplicate detector on the DB...where I defined a duplicate to be anything with the same name, feature code, and all the same hierarchical breakdown (country,adm1,adm2). There are a very large number of duplicates in the database. I'm sure some of these may actually be different places, but the large majority that I saw were definitely import errors. Is there a strategy to handle these?
|
|
|
 |
![[Post New]](/gforum/templates/default/images/icon_minipost_new.gif) 05/11/2009 20:35:35
|
marc
Joined: 08/12/2005 07:39:47
Messages: 2185
Offline
|
This for a change really looks like the same toponym. The toponym is referring to an area and there are a lot of markers covering the entire area. I could imagine that one of the input sources was aggregating small subsets (like maps) and the toponym was on each of the map once and ended up n-times. Please feel free to clean it up, be careful with automated scripts. There are a couple of threads of users complaining about duplicates, but usually it is absolutely not clear whether they really are duplicates, often they are clearly not duplicates. There is just no law that makes place names unique even though it would make life easier for application developers.
Best
Marc
|
 |
|
|
 |
|
|