GeoNames Home | Postal Codes | Download / Webservice | About 

GeoNames Forum
  [Search] Search   [Recent Topics] Recent Topics   [Groups] Back to home page 
[Register] Register / 
[Login] Login 
Updating GB Postal Regex in countryinfo.txt  XML
Forum Index -> General
Author Message
inchcrest



Joined: 04/01/2018 22:12:19
Messages: 2
Offline

Hi,

I just discovered this project, and it is awesome. I downloaded the countryinfo.txt file in order to get regexes for postal codes across the world. I noticed the GB postal code regex was not working for some of my test postal codes.

I was wondering if we could update the regex to be the same as the one recommended here: https://en.wikipedia.org/wiki/Postcodes_in_the_United_Kingdom#Validation

Code:
 ^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([A-Za-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z])))) [0-9][A-Za-z]{2})$
 



Or if someone could point me in the right direction on how to do it myself? I couldn't find any clear documentation on how to update the countryinfo.txt file.

Thanks
inchcrest



Joined: 04/01/2018 22:12:19
Messages: 2
Offline

I thought I'd update this with a javascript test of the current regex and the proposed regex. I hopte this'll make my argument a little more concrete.

https://jsfiddle.net/inchcrest/om49bcwd/

Thanks
raydog



Joined: 16/05/2018 04:21:40
Messages: 2
Offline

Yeah, I was having an issue with that regex too. When you download the full list of postal codes in Great Britain from geonames, the countryinfo.txt regex only accepts 967,863 of the 1,696,228 total postal codes. (Or 57%)

According to http://webarchive.nationalarchives.gov.uk/20101126012154/http://www.cabinetoffice.gov.uk/govtalk/schemasstandards/e-gif/datastandards/address/postcode.aspx, British postal codes are formatted as one of:

  • AN NAA
  • ANN NAA
  • AAN NAA
  • AANN NAA
  • ANA NAA
  • AANA NAA

    And then there are also a bunch of rules around what letters are allowed when they appear at different indexes.

    inchcrest's regex follows those rules, and it matches 99.82% of all postal codes. (And the other ones seem to be only partial postal codes, so they should be rejected...)

    Alternatively, I wrote this regex from those rules, simply because I found that rule page before finding inchcrest's corrected regex. It should accept the same set of postal codes:

    Code:
    ^(?:GIR\s0AA|[A-PR-UWYZ](?:[A-HK-Y]?\d{1,2}\s|\d[A-HJKSTUW]\s|[A-HK-Y]\d[ABEHMNPRV-Y]\s)\d[ABD-HJLNP-UW-Z]{2})$
  • marc



    Joined: 08/12/2005 07:39:47
    Messages: 4412
    Offline

    thanks for the regex.
    I thought I had it already updated, but the modified file was not yet deployed. Now the newest file is available with some more changes:
    - some more countries with postal code regex
    - Kyiv as capital of the Urkaine (instead of Kiev)
    - Timor Leste instead of East Timor

    Cheers

    Marc

    [WWW]
     
    Forum Index -> General
    Go to:   
    Powered by JForum 2.1.5 © JForum Team