<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
	<channel>
		<title><![CDATA[Latest posts for the topic "ADM Duplicates "]]></title>
		<link>http://forum.geonames.org/gforum/posts/list/9.page</link>
		<description><![CDATA[Latest messages posted in the topic "ADM Duplicates "]]></description>
		<generator>JForum - http://www.jforum.net</generator>
			<item>
				<title>ADM Duplicates </title>
				<description><![CDATA[ I have been manipulating geoname administrative data for a few days/weeks.

I did found some duplicates : 

select country, admin1, count(*) from geoname where fclass='A' and fcode='ADM1' and admin1 is not null  group by country, admin1 having count(*) > 1 ;
 country | admin1 | count
---------+--------+-------
 GE      | 00     |     2
 IQ      | 00     |     4
 KH      | 00     |     3
 KI      | 00     |     2
 KZ      | 00     |     5
 LV      | 00     |     4
 MD      | 00     |    17
 ME      | 00     |    19
 MN      | 00     |     2
 NG      | 00     |     7
 NP      | 00     |    14
 PR      | 00     |    13
 PW      | 00     |    11
 SB      | 00     |     7
 ST      | 00     |     2
 TJ      | 00     |     2
 TL      | 00     |    13
 UG      | 00     |     8
(18 rows)

select country, admin1, admin2, count(*) from geoname where fclass='A' and fcode='ADM2' and admin2 is not null group by country, admin1, admin2  having count(*) > 1 ;
 country | admin1 | admin2 | count
---------+--------+--------+-------
 GB      | ENG    | 00     |     2
 GB      | ENG    | F2     |     2
(2 rows)

If I understand well '00' code is a non valid one : should it not be NULL as we do not have the information ?

Here is a true duplicate : 

select * from geoname where fclass='A' and fcode='ADM2' and country='GB' and admin1='ENG' and admin2='F2';

 geonameid |        name         |      asciiname      |                                                alternatenames                                                |  latitude  | longitude  | fclass | fcode | country | cc2 | admin1 | a
dmin2 | admin3 | admin4 | population | elevation | gtopo30 |   timezone    |  moddate   |                        geom
-----------+---------------------+---------------------+--------------------------------------------------------------------------------------------------------------+------------+------------+--------+-------+---------+-----+--------+--
------+--------+--------+------------+-----------+---------+---------------+------------+----------------------------------------------------
   2647554 | Hampshire           | Hampshire           | Hampshire,Hampshire County,Hants,Khehmpshir,Suthhamtunscir,Sūþhāmtūnscīr,Хэмпшир,ハンプシャー,ハンプシャー州 |         51 |      -1.25 | A      | ADM2  | GB      | GB  | ENG
   | F2     |        |        |    1691000 |           |      77 | Europe/London | 2008-07-13 | 0101000020E6100000000000000000F4BF0000000000804940
   2647555 | County of Hampshire | County of Hampshire | Hampshire                                                                                                    | 51.0833333 | -1.1666667 | A      | ADM2  | GB      | GB  | ENG    | F
2     |        |        |          0 |           |      77 | Europe/London | 2008-01-16 | 0101000020E6100000DE509DB3AAAAF2BF791563AAAA8A4940
(2 rows)

Some more on 3rd level :

select country, admin1, admin2, admin3, count(*) from geoname
 where fclass='A' and fcode='ADM3' and admin3 is not null group by country, admin1, admin2, admin3  having count(*) > 1 ;
 country | admin1 | admin2 | admin3 | count
---------+--------+--------+--------+-------
 DE      | 14     |        | 15171  |     2
 ES      | 31     | MU     | 30030  |     2
 ES      | 34     | O      | 33008  |     2
 ES      | 34     | O      | 33032  |     2
 ES      | 34     | O      | 33049  |     2
 ES      | 58     | OR     | 32068  |     2
 ES      | 59     | BI     | 48086  |     2
 NO      | 12     | 0301   | 13     |     2
(8 rows)

On level 4 I did not fin any.
]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/1485.page#6190</guid>
				<link>http://forum.geonames.org/gforum/posts/list/1485.page#6190</link>
				<pubDate><![CDATA[Tue, 11 Aug 2009 13:57:23]]> GMT</pubDate>
				<author><![CDATA[ nicolas.helleringer]]></author>
			</item>
			<item>
				<title>Re:ADM Duplicates </title>
				<description><![CDATA[ Hi Nicolas

Do you also have a solution for the duplicates?

Best

Marc]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/1485.page#6198</guid>
				<link>http://forum.geonames.org/gforum/posts/list/1485.page#6198</link>
				<pubDate><![CDATA[Thu, 13 Aug 2009 07:48:45]]> GMT</pubDate>
				<author><![CDATA[ marc]]></author>
			</item>
			<item>
				<title>Re:ADM Duplicates </title>
				<description><![CDATA[ Uppon import in a SGBD i did update all the records with '00' to revert them to NULL

For true duplicates i dropped the oldest record or the lowest geonameid if there were updated on the same date.

As the duplicates are from regions I do not know I do not have better criterias alas  :cry: ]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/1485.page#6216</guid>
				<link>http://forum.geonames.org/gforum/posts/list/1485.page#6216</link>
				<pubDate><![CDATA[Tue, 18 Aug 2009 12:45:22]]> GMT</pubDate>
				<author><![CDATA[ nicolas.helleringer]]></author>
			</item>
	</channel>
</rss>