mrkaikev
Joined: 05/02/2020 13:01:42
Messages: 2
Offline
|
The admin_code1 seems to be a mix of ISO and FIPS but I also found two countries where none of them is used, Bulgaria and Malaysia.
The codes specified in http://www.geonames.org/BG/administrative-division-.html or https://download.geonames.org/export/dump/admin1CodesASCII.txt are completely different from what is in http://download.geonames.org/export/zip/BG.zip
There are codes like BGS for Burgas (ISO code: 02).
Mapping them by admin_name1 does work but there is some manual work involved.
Code:
{'Vratsa': 'bg.06',
'Sofia-Grad': 'bg.22',
'Gabrovo': 'bg.07',
'Sofia': 'bg.23',
'Dobrich': 'bg.08',
'Stara Zagora': 'bg.24',
'Kardzhali': 'bg.09',
'Targovishte': 'bg.25',
'Kyustendil': 'bg.10',
'Haskovo': 'bg.26',
'Lovech': 'bg.11',
'Shumen': 'bg.27',
'Montana': 'bg.12',
'Yambol': 'bg.28',
'Pazardzhik': 'bg.13',
'Pernik': 'bg.14',
'Pleven': 'bg.15',
'Plovdiv': 'bg.16',
'Blagoevgrad': 'bg.01',
'Razgrad': 'bg.17',
'Burgas': 'bg.02',
'Ruse': 'bg.18',
'Varna': 'bg.03',
'Silistra': 'bg.19',
'Veliko Tarnovo': 'bg.04',
'Sliven': 'bg.20',
'Vidin': 'bg.05',
'Smolyan': 'bg.21',
'Veliko Turnovo': 'bg.04',
'Vraca': 'bg.06',
'Kjustendil': 'bg.10',
'Smoljan': 'bg.21',
'Sofija (stolica)': 'bg.22',
'Sofija': 'bg.23',
'Turgovishhe': 'bg.25',
'Khaskovo': 'bg.26',
'Jambol': 'bg.28'}
My question is if you are aware of this issue and plan to normalize admin_code1 or if there is an alternative mapping other than ISO or what is in admin1CodesASCII.txt. Here are all the undocumented admin codes I found (they are all lowercased and have the country_code at the start:
Code:
{'bg.bgs',
'bg.blg',
'bg.dob',
'bg.gab',
'bg.hkv',
'bg.jam',
'bg.knl',
'bg.krz',
'bg.lov',
'bg.mon',
'bg.paz',
'bg.pdv',
'bg.per',
'bg.pvn',
'bg.raz',
'bg.rse',
'bg.sfo',
'bg.shu',
'bg.sls',
'bg.slv',
'bg.sml',
'bg.sof',
'bg.szr',
'bg.tgv',
'bg.var',
'bg.vid',
'bg.vrc',
'bg.vtr',
'gb.l93000001',
'gb.m83000003',
'gu.66',
'li.7001',
'li.7002',
'li.7003',
'li.7004',
'li.7005',
'li.7006',
'li.7007',
'li.7008',
'li.7009',
'li.7010',
'li.7011',
'mc.01',
'mh.68',
'my.jhr',
'my.kdh',
'my.ktn',
'my.kul',
'my.lbn',
'my.mlk',
'my.nsn',
'my.phg',
'my.pjy',
'my.pls',
'my.png',
'my.prk',
'my.sbh',
'my.sgr',
'my.srw',
'my.trg',
'us.mh',
'vi.78'}
The same doesn't happen in the cities500.zip dataset for example. There are sometimes some deleted codes but it's not mixed or contains unknown codes.
|