canadianveggie
Joined: 05/06/2015 01:59:53
Messages: 2
Offline
|
There's a number of entries in GB.zip and GB_full.zip (all with short postal codes) where the country is set incorrectly (adminCode1 and adminName1).
Examples:
A number of entries for Greater London, Devon, Shropshire, Gloucestershire, and a few others are listed in Wales but should be in England.
Some entries in Northumberland, Bristol, and Cumbria are listed in Scotland but should be England.
Western Isles and Highland are in Scotland not North Ireland.
I've made some corrections to the file. Take a look at the diff.
I identified the bad entries by running this R script
Code:
require(data.table)
gb <- data.table(read.csv("~/Desktop/GB.txt", colClasses = c("character","character","character","character","character","character","character","character","character","numeric","numeric","numeric"), sep="\t", header = FALSE))
setnames(gb, c("countryCode", "postalCode", "placeName", "adminName1", "adminCode1", "adminName2", "adminCode2", "adminName3", "adminCode3", "latitude", "longitude", "accuracy"))
unique(gb[adminName2 %in% unique(gb[adminName2 != "", list(adminName1, adminName2)])[, list(size=.N), by="adminName2"][size > 1]$adminName2, list(adminName1, adminName2)])
Description |
Updated adminCode1/adminName1 |
Download
|
Filesize |
1759 Kbytes
|
Downloaded: |
3016 time(s) |
|