GeoNames Home | Postal Codes | Download / Webservice | About 

GeoNames Forum
  [Search] Search   [Recent Topics] Recent Topics   [Groups] Back to home page 
[Register] Register / 
[Login] Login 
Importing *all* geonames tables to postgresql  XML
Forum Index -> FAQ - frequently asked questions Go to Page: 1, 2 Next 
Author Message

Joined: 07/06/2008 13:58:13
Messages: 6

I wrote a shell script which would download all available files and import it into postgresql.
I am a newbie to postgresql and think some datatypes could be match better. Also I have not add any constraints.
Feel free to tell me how i can optimize it.

Also there are the lines for handle the postalcodes. Please do not ask me, where they are. If you interested to use the postalcodes please read here:
No postal codes in database dump

Thanks to marc and dportabella for there postings:
Import GeoNames dump into Postgres
Importing *all* geonames tables to mysql

Please read it and use it carefully. This script overwrite files and drop tables. It works straight and fits best for my needs.

 # download all needed files and if needed unzip them
 TXTFILES="admin1Codes.txt admin1CodesASCII.txt admin2Codes.txt countryInfo.txt featureCodes.txt iso-languagecodes.txt timeZones.txt"
 	wget <a href="$i" target="_blank" rel="nofollow">$i</a>
 	echo "Done download $i"
 for i in $ZIPFILES
 	unzip -o -qq $i
 # rename files because of name conflict
 mv allCountries.txt allGeoCountries.txt
 # download the postalcodes. You must know yourself the url
 #wget <a href="http://xxx" target="_blank" rel="nofollow">http://xxx</a>
 #echo "Done download postal codes (xxx)"
 #unzip -o -qq xxx
 # rename files because of name conflict
 mv xxx xxx
 mv xxx xxx
 # alter files for import
 tail -n +2 iso-languagecodes.txt > iso-languagecodes.txt.tmp
 grep -v '^#' countryInfo.txt | tail -n +2 > countryInfo.txt.tmp
 tail -n +2 timeZones.txt > timeZones.txt.tmp
 psql test <<EOT
 DROP TABLE geoname;
 CREATE TABLE geoname (
 	geonameid int,
 	name varchar(200),
 	asciiname varchar(200),
 	alternatenames varchar(5000),
 	latitude float,
 	longitude float,
 	fclass char(1),
 	fcode varchar(10),
 	country varchar(2),
 	cc2 varchar(60),
 	admin1 varchar(20),
 	admin2 varchar(80),
 	admin3 varchar(20),
 	admin4 varchar(20),
 	population bigint,
 	elevation int,
 	gtopo30 int,
 	timezone varchar(40),
 	moddate date
 copy geoname (geonameid,name,asciiname,alternatenames,latitude,longitude,fclass,fcode,country,cc2, admin1,admin2,admin3,admin4,population,elevation,gtopo30,timezone,moddate) from '${WORKPATH}/allGeoCountries.txt' null as '';
 DROP TABLE alternatename;
 CREATE TABLE alternatename (
 	alternatenameId int,
 	geonameid int,
 	isoLanguage varchar(7),
 	alternateName varchar(200),
 	isPreferredName boolean,
 	isShortName boolean,
 	isColloquial boolean,
 	isHistoric boolean
 copy alternatename  (alternatenameid,geonameid,isoLanguage,alternateName,isPreferredName,isShortName,isColloquial,isHistoric) from '${WORKPATH}/alternateNames.txt' null as '';
 DROP TABLE countryinfo;
 CREATE TABLE countryinfo (
 	iso_alpha2 char(2),
 	iso_alpha3 char(3),
 	iso_numeric integer,
 	fips_code character varying(3),
 	name character varying(200),
 	capital character varying(200),
 	areainsqkm double precision,
 	population integer,
 	continent char(2),
 	languages character varying(200),
 	currency char(3),
 	geonameId int
 copy countryInfo (iso_alpha2,iso_alpha3,iso_numeric,fips_code,name,capital,areaInSqKm,population,continent,languages,currency,geonameId) from '${WORKPATH}/countryInfo.txt.tmp' null as '';
 DROP TABLE iso_languagecodes;
 CREATE TABLE iso_languagecodes(
 	iso_639_3 CHAR(4),
 	iso_639_2 VARCHAR(50),
 	iso_639_1 VARCHAR(50),
 	language_name VARCHAR(200)
 copy iso_languagecodes (iso_639_3, iso_639_2, iso_639_1, language_name) from '${WORKPATH}/iso-languagecodes.txt.tmp' null as '';
 DROP TABLE admin1Codes;
 CREATE TABLE admin1Codes (
 	code CHAR(20),
 	name TEXT
 copy admin1Codes (code, name) from '${WORKPATH}/admin1Codes.txt' null as '';
 DROP TABLE admin1CodesAscii;
 CREATE TABLE admin1CodesAscii (
 	code CHAR(20),
 	name TEXT,
 	nameAscii TEXT,
 	geonameid int
 copy admin1CodesAscii (code,name,nameAscii,geonameid) from '${WORKPATH}/admin1CodesASCII.txt' null as '';
 DROP TABLE featureCodes;
 CREATE TABLE featureCodes (
 	code CHAR(7),
 	name VARCHAR(200),
 	description TEXT
 copy featureCodes (code,name,description) from '${WORKPATH}/featureCodes_en.txt' null as '';
 DROP TABLE timeZones;
 CREATE TABLE timeZones (
 	timeZoneId VARCHAR(200),
 	GMT_offset numeric(3,1),
 	DST_offset numeric(3,1)
 copy timeZones (timeZoneId,GMT_offset,DST_offset) from '${WORKPATH}/timeZones.txt.tmp' null as '';
 DROP TABLE continentCodes;
 CREATE TABLE continentCodes (
 	code CHAR(2),
 	name VARCHAR(20),
 	geonameid INT
 INSERT INTO continentCodes VALUES ('AF', 'Africa', 6255146);
 INSERT INTO continentCodes VALUES ('AS', 'Asia', 6255147);
 INSERT INTO continentCodes VALUES ('EU', 'Europe', 6255148);
 INSERT INTO continentCodes VALUES ('NA', 'North America', 6255149);
 INSERT INTO continentCodes VALUES ('OC', 'Oceania', 6255150);
 INSERT INTO continentCodes VALUES ('SA', 'South America', 6255151);
 INSERT INTO continentCodes VALUES ('AN', 'Antarctica', 6255152);
 DROP TABLE postalcodes;
 CREATE TABLE postalcodes (
 	countrycode char(2),
 	postalcode varchar(10),
 	placename varchar(180),
 	admin1name varchar(100),
 	admin1code varchar(20),
 	admin2name varchar(100),
 	admin2code varchar(20),
 	admin3name varchar(100),
 	latitude float,
 	longitude float,
 	accuracy smallint
 #copy postalcodes (countrycode,postalcode,placename,admin1name,admin1code,admin2name,admin2code,admin3name,latitude,longitude,accuracy) from '${WORKPATH}/xxx' null as '';


2012-05-09 add isCollloquial and isHistoric flag (Marc)
2008-06-07 replace COPY continentCodes through INSERT statements.

Joined: 19/09/2008 12:00:26
Messages: 1

Great job.

I have made some changes in my personal script:

And taken from
 ALTER TABLE ONLY alternatename
      ADD CONSTRAINT pk_alternatenameid PRIMARY KEY (alternatenameid);
      ADD CONSTRAINT pk_geonameid PRIMARY KEY (geonameid);
  ALTER TABLE ONLY countryinfo
      ADD CONSTRAINT pk_iso_alpha2 PRIMARY KEY (iso_alpha2);
 ALTER TABLE ONLY countryinfo
      ADD CONSTRAINT fk_geonameid FOREIGN KEY (geonameid) REFERENCES geoname(geonameid);
 ALTER TABLE ONLY alternatename
      ADD CONSTRAINT fk_geonameid FOREIGN KEY (geonameid) REFERENCES geoname(geonameid);
 CREATE INDEX index_countryinfo_geonameid ON countryinfo USING hash (geonameid);
 CREATE INDEX index_alternatename_geonameid ON alternatename USING hash (geonameid);


Joined: 25/11/2008 12:49:52
Messages: 7


I loved the script so I started working on it. This is the working end result.
  • It will create a working path if it is not there
  • It will create the db
  • It wil check if you are already using the latest files from the net and therefore not downloading it again
  • Added the command CASCADE to DROP TABLE geoname due to the constrains
  • I do not thing you need to change DROP TABLE to DROP TABLE IF EXIST, it is sort of pointless
  • perhaps only update/insert/delete changes in the future by removing duplicate lines from the txt files?
     #          FILE:
     #         USAGE:  ./ 
     #   DESCRIPTION:  run the script so that the geodata will be downloaded and inserted into your
     #		  database
     #       OPTIONS:  ---
     #  REQUIREMENTS:  ---
     #          BUGS:  ---
     #         NOTES:  ---
     #        AUTHOR:  Andreas (aka Harpagophyt )
     #       COMPANY:  <a href="" target="_blank" rel="nofollow"></a>
     #       VERSION:  1.3
     #       CREATED:  07/06/2008
     #      REVISION:  1.1 2008-06-07 replace COPY continentCodes through INSERT statements.
     #                 1.2 2008-11-25 Adjusted by Bastiaan Wakkie in order to not unnessisarily 
     #                 1.3 2009-02-18 Fixed countryInfo bug in Aland Islands as there was one tab too many at te end and sone field lengths needed to be larger
     #                                download.
     FILES=" admin1Codes.txt admin1CodesASCII.txt admin2Codes.txt countryInfo.txt featureCodes.txt iso-languagecodes.txt timeZones.txt"
     createdb -O $DBUSER geonames
     psql -U $DBUSER geonames <<EOT
     DROP TABLE geoname CASCADE;
     CREATE TABLE geoname (
     	geonameid int,
     	name varchar(200),
     	asciiname varchar(200),
     	alternatenames varchar(4000),
     	latitude float,
     	longitude float,
     	fclass char(1),
     	fcode varchar(10),
     	country varchar(2),
     	cc2 varchar(60),
     	admin1 varchar(20),
     	admin2 varchar(80),
     	admin3 varchar(20),
     	admin4 varchar(20),
     	population bigint,
     	elevation int,
     	gtopo30 int,
     	timezone varchar(40),
     	moddate date
     DROP TABLE alternatename;
     CREATE TABLE alternatename (
     	alternatenameId int,
     	geonameid int,
     	isoLanguage varchar(7),
     	alternateName varchar(200),
     	isPreferredName boolean,
     	isShortName boolean
     	isColloquial boolean,
      	isHistoric boolean
     DROP TABLE countryinfo;
      CREATE TABLE countryinfo (
      	iso_alpha2 char(2),
      	iso_alpha3 char(3),
      	iso_numeric integer,
      	fips_code character varying(3),
      	name character varying(200),
      	capital character varying(200),
      	areainsqkm double precision,
      	population integer,
      	continent char(2),
      	tld character varying(200),
      	currency char(3),
      	curranceName char(15),
      	phone char(20),
      	postal char(60),
      	postalRegex char(250),
      	languages char(130),
      	geonameid integer,
      	neighbours char(50)
     DROP TABLE iso_languagecodes;
     CREATE TABLE iso_languagecodes(
     	iso_639_3 CHAR(4),
     	iso_639_2 VARCHAR(50),
     	iso_639_1 VARCHAR(50),
     	language_name VARCHAR(200)
     DROP TABLE admin1Codes;
     CREATE TABLE admin1Codes (
     	code CHAR(6),
     	name TEXT
     DROP TABLE admin1CodesAscii;
     CREATE TABLE admin1CodesAscii (
     	code CHAR(6),
     	name TEXT,
     	nameAscii TEXT,
     	geonameid int
     DROP TABLE featureCodes;
     CREATE TABLE featureCodes (
     	code CHAR(7),
     	name VARCHAR(200),
     	description TEXT
     DROP TABLE timeZones;
     CREATE TABLE timeZones (
     	timeZoneId VARCHAR(200),
     	GMT_offset numeric(3,1),
     	DST_offset numeric(3,1)
     DROP TABLE continentCodes;
     CREATE TABLE continentCodes (
     	code CHAR(2),
     	name VARCHAR(20),
     	geonameid INT
     DROP TABLE postalcodes;
     CREATE TABLE postalcodes (
     	countrycode char(2),
     	postalcode varchar(10),
     	placename varchar(180),
     	admin1name varchar(100),
     	admin1code varchar(20),
     	admin2name varchar(100),
     	admin2code varchar(20),
     	admin3name varchar(100),
     	latitude float,
     	longitude float,
     	accuracy smallint
     ALTER TABLE ONLY alternatename
           ADD CONSTRAINT pk_alternatenameid PRIMARY KEY (alternatenameid);
     ALTER TABLE ONLY geoname
           ADD CONSTRAINT pk_geonameid PRIMARY KEY (geonameid);
     ALTER TABLE ONLY countryinfo
           ADD CONSTRAINT pk_iso_alpha2 PRIMARY KEY (iso_alpha2);
     ALTER TABLE ONLY countryinfo
           ADD CONSTRAINT fk_geonameid FOREIGN KEY (geonameid) REFERENCES geoname(geonameid);
     ALTER TABLE ONLY alternatename
           ADD CONSTRAINT fk_geonameid FOREIGN KEY (geonameid) REFERENCES geoname(geonameid);
     # check if needed directories do already exsist
     if [ -d "$WORKPATH" ]; then
     	sleep 0
     	mkdir $WORKPATH
     	mkdir $WORKPATH/$TMPPATH
     	mkdir $WORKPATH/$PCPATH
     	echo "created $WORKPATH"
     echo ",---- STARTING (downloading, unpacking and preparing)"
     for i in $FILES
     	wget -N -q "$i" # get newer files
     	if  [ $i -nt $PREFIX$i ] || [ ! -e $PREFIX$i ] ; then
     		cp -p $i $PREFIX$i
     		unzip -u -q $i
     		case "$i" in
     			tail -n +2 iso-languagecodes.txt > iso-languagecodes.txt.tmp;
     			grep -v '^#' countryInfo.txt | tail -n +2 | sed 's/                FI/     FI/g'  > countryInfo.txt.tmp;
     			tail -n +2 timeZones.txt > timeZones.txt.tmp;
     		echo "| $1 has been downloaded";
     		echo "| $i is already the latest version" 
     # download the postalcodes. You must know yourself the url
     wget -q -N "http://.../"
     if [ $WORKPATH/$PCPATH/ -nt $WORKPATH/$PCPATH/allCountries$ ] || [ ! -e $WORKPATH/$PCPATH/ ]; then
     	unzip -u -q $WORKPATH/$PCPATH/
     	cp -p $WORKPATH/$PCPATH/ $WORKPATH/$PCPATH/allCountries$
     	echo "| has been downloaded"
     	echo "| is already the latest version" 
     echo "+---- FILL DATABASE ( this takes 2 days on my machine :)"
     psql -e -U $DBUSER geonames <<EOT
     copy geoname (geonameid,name,asciiname,alternatenames,latitude,longitude,fclass,fcode,country,cc2,admin1,admin2,admin3,admin4,population,elevation,gtopo30,timezone,moddate) from '${WORKPATH}/${TMPPATH}/allCountries.txt' null as '';
     copy postalcodes (countrycode,postalcode,placename,admin1name,admin1code,admin2name,admin2code,admin3name,latitude,longitude,accuracy) from '${WORKPATH}/${PCPATH}/allCountries.txt' null as '';
     copy timeZones (timeZoneId,GMT_offset,DST_offset) from '${WORKPATH}/${TMPPATH}/timeZones.txt.tmp' null as '';
     copy featureCodes (code,name,description) from '${WORKPATH}/${TMPPATH}/featureCodes.txt' null as '';
     copy admin1CodesAscii (code,name,nameAscii,geonameid) from '${WORKPATH}/${TMPPATH}/admin1CodesASCII.txt' null as '';
     copy admin1Codes (code,name) from '${WORKPATH}/${TMPPATH}/admin1Codes.txt' null as '';
     copy iso_languagecodes (iso_639_3,iso_639_2,iso_639_1,language_name) from '${WORKPATH}/${TMPPATH}/iso-languagecodes.txt.tmp' null as '';
     copy countryInfo (iso_alpha2,iso_alpha3,iso_numeric,fips_code,country,capital,areainsqkm,population,continent,tld,currency_code,currency_name,phone,postal,postalRegex,languages,geonameId,neighbours) from '${WORKPATH}/${TMPPATH}/countryInfo.txt.tmp' null as '';
     copy alternatename  (alternatenameid,geonameid,isoLanguage,alternateName,isPreferredName,isShortName) from '${WORKPATH}/${TMPPATH}/alternateNames.txt' null as '';
     INSERT INTO continentCodes VALUES ('AF', 'Africa', 6255146);
     INSERT INTO continentCodes VALUES ('AS', 'Asia', 6255147);
     INSERT INTO continentCodes VALUES ('EU', 'Europe', 6255148);
     INSERT INTO continentCodes VALUES ('NA', 'North America', 6255149);
     INSERT INTO continentCodes VALUES ('OC', 'Oceania', 6255150);
     INSERT INTO continentCodes VALUES ('SA', 'South America', 6255151);
     INSERT INTO continentCodes VALUES ('AN', 'Antarctica', 6255152);
     CREATE INDEX index_countryinfo_geonameid ON countryinfo USING hash (geonameid);
     CREATE INDEX index_alternatename_geonameid ON alternatename USING hash (geonameid);
     echo "'----- DONE ( have fun... )"
  • bwakkie

    Joined: 25/11/2008 12:49:52
    Messages: 7

    The countryInfo.txt has wrong table field names:
    #ISO	ISO3	ISO-Numeric	fips	Country	Capital	Area(in sq km)	Population	Continent	tld	CurrencyCode	CurrencyName	Phone	Postal Code Format	Postal Code Regex	Languages	geonameid	neighbours	EquivalentFipsCode
     AD	AND	20	AN	Andorra	Andorra la Vella	468	72000	EU	.ad	EUR	Euro	376	AD###	^(?:AD)*(\d{3})$	ca,fr-AD,pt	3041565	ES,FR

    neighbours EquivalentFipsCode are switched!
    also the geonameid does not exist anymore
    so new postgresql sql will be something like this:
     I adjusted the code above as the length values of phone and currance name where changed too. There was a weird typo with one tab too much .. this is handled now too.

    Joined: 08/12/2005 07:39:47
    Messages: 4416

    geonameId is still there, isn't it?



    Joined: 25/11/2008 12:49:52
    Messages: 7

    bwakkie wrote:

    I loved the script so I started working on it. This is the working end result.
  • It will create a working path if it is not there
  • It will create the db
  • It wil check if you are already using the latest files from the net and therefore not downloading it again
  • Added the command CASCADE to DROP TABLE geoname due to the constrains
  • I do not thing you need to change DROP TABLE to DROP TABLE IF EXIST, it is sort of pointless
  • perhaps only update/insert/delete changes in the future by removing duplicate lines from the txt files?
     #          FILE:
     #         USAGE:  ./ 
     #   DESCRIPTION:  run the script so that the geodata will be downloaded and inserted into your
     #		  database
     #       OPTIONS:  ---
     #  REQUIREMENTS:  ---
     #          BUGS:  ---
     #         NOTES:  ---
     #        AUTHOR:  Andreas (aka Harpagophyt )
     #       COMPANY:  <a href="" target="_blank" rel="nofollow"></a>
     #       VERSION:  1.3
     #       CREATED:  07/06/2008
     #      REVISION:  1.1 2008-06-07 replace COPY continentCodes through INSERT statements.
     #                 1.2 2008-11-25 Adjusted by Bastiaan Wakkie in order to not unnessisarily 
     #                                download.
     FILES=" admin1Codes.txt admin1CodesASCII.txt admin2Codes.txt countryInfo.txt featureCodes.txt iso-languagecodes.txt timeZones.txt"
     createdb -O $DBUSER geonames
     psql -U $DBUSER geonames <<EOT
     DROP TABLE geoname CASCADE;
     CREATE TABLE geoname (
     	geonameid int,
     	name varchar(200),
     	asciiname varchar(200),
     	alternatenames varchar(4000),
     	latitude float,
     	longitude float,
     	fclass char(1),
     	fcode varchar(10),
     	country varchar(2),
     	cc2 varchar(60),
     	admin1 varchar(20),
     	admin2 varchar(80),
     	admin3 varchar(20),
     	admin4 varchar(20),
     	population bigint,
     	elevation int,
     	gtopo30 int,
     	timezone varchar(40),
     	moddate date
     DROP TABLE alternatename;
     CREATE TABLE alternatename (
     	alternatenameId int,
     	geonameid int,
     	isoLanguage varchar(7),
     	alternateName varchar(200),
     	isPreferredName boolean,
     	isShortName boolean,
     	isColloquial boolean,
     	isHistoric boolean
     DROP TABLE countryinfo;
     CREATE TABLE "countryinfo" (
            iso_alpha2 char(2),
            iso_alpha3 char(3),
            iso_numeric integer,
            fips_code character varying(3),
            country character varying(200),
            capital character varying(200),
            areainsqkm double precision,
            population integer,
            continent char(2),
            tld CHAR(10),
            currency_code char(3),
            currency_name CHAR(15),
            phone character varying(20),
            languages character varying(200),
            geonameId int,
            neighbours character varying(50),
            equivalent_fips_code character varying(3)
     DROP TABLE iso_languagecodes;
     CREATE TABLE iso_languagecodes(
     	iso_639_3 CHAR(4),
     	iso_639_2 VARCHAR(50),
     	iso_639_1 VARCHAR(50),
     	language_name VARCHAR(200)
     DROP TABLE admin1Codes;
     CREATE TABLE admin1Codes (
     	code CHAR(6),
     	name TEXT
     DROP TABLE admin1CodesAscii;
     CREATE TABLE admin1CodesAscii (
     	code CHAR(6),
     	name TEXT,
     	nameAscii TEXT,
     	geonameid int
     DROP TABLE featureCodes;
     CREATE TABLE featureCodes (
     	code CHAR(7),
     	name VARCHAR(200),
     	description TEXT
     DROP TABLE timeZones;
     CREATE TABLE timeZones (
     	timeZoneId VARCHAR(200),
     	GMT_offset numeric(3,1),
     	DST_offset numeric(3,1)
     DROP TABLE continentCodes;
     CREATE TABLE continentCodes (
     	code CHAR(2),
     	name VARCHAR(20),
     	geonameid INT
     DROP TABLE postalcodes;
     CREATE TABLE postalcodes (
     	countrycode char(2),
     	postalcode varchar(10),
     	placename varchar(180),
     	admin1name varchar(100),
     	admin1code varchar(20),
     	admin2name varchar(100),
     	admin2code varchar(20),
     	admin3name varchar(100),
     	latitude float,
     	longitude float,
     	accuracy smallint
     ALTER TABLE ONLY alternatename
           ADD CONSTRAINT pk_alternatenameid PRIMARY KEY (alternatenameid);
     ALTER TABLE ONLY geoname
           ADD CONSTRAINT pk_geonameid PRIMARY KEY (geonameid);
     ALTER TABLE ONLY countryinfo
           ADD CONSTRAINT pk_iso_alpha2 PRIMARY KEY (iso_alpha2);
     ALTER TABLE ONLY countryinfo
           ADD CONSTRAINT fk_geonameid FOREIGN KEY (geonameid) REFERENCES geoname(geonameid);
     ALTER TABLE ONLY alternatename
           ADD CONSTRAINT fk_geonameid FOREIGN KEY (geonameid) REFERENCES geoname(geonameid);
     # check if needed directories do already exsist
     if [ -d "$WORKPATH" ]; then
     	sleep 0
     	mkdir $WORKPATH
     	mkdir $WORKPATH/$TMPPATH
     	mkdir $WORKPATH/$PCPATH
     	echo "created $WORKPATH"
     echo ",---- STARTING (downloading, unpacking and preparing)"
     for i in $FILES
     	wget -N -q "$i" # get newer files
     	if  [ $i -nt $PREFIX$i ] || [ ! -e $PREFIX$i ] ; then
     		cp -p $i $PREFIX$i
     		unzip -u -q $i
     		case "$i" in
     			tail -n +2 iso-languagecodes.txt > iso-languagecodes.txt.tmp;
     			grep -v '^#' countryInfo.txt | tail -n +2  > countryInfo.txt.tmp;
     			tail -n +2 timeZones.txt > timeZones.txt.tmp;
     		echo "| $1 has been downloaded";
     		echo "| $i is already the latest version" 
     # download the postalcodes. You must know yourself the url
     wget -q -N "http://.../"
     if [ $WORKPATH/$PCPATH/ -nt $WORKPATH/$PCPATH/allCountries$ ] || [ ! -e $WORKPATH/$PCPATH/ ]; then
     	unzip -u -q $WORKPATH/$PCPATH/
     	cp -p $WORKPATH/$PCPATH/ $WORKPATH/$PCPATH/allCountries$
     	echo "| has been downloaded"
     	echo "| is already the latest version" 
     echo "+---- FILL DATABASE ( this takes 2 days on my machine :)"
     psql -e -U $DBUSER geonames <<EOT
     copy geoname (geonameid,name,asciiname,alternatenames,latitude,longitude,fclass,fcode,country,cc2,admin1,admin2,admin3,admin4,population,elevation,gtopo30,timezone,moddate) from '${WORKPATH}/${TMPPATH}/allCountries.txt' null as '';
     copy postalcodes (countrycode,postalcode,placename,admin1name,admin1code,admin2name,admin2code,admin3name,latitude,longitude,accuracy) from '${WORKPATH}/${PCPATH}/allCountries.txt' null as '';
     copy timeZones (timeZoneId,GMT_offset,DST_offset) from '${WORKPATH}/${TMPPATH}/timeZones.txt.tmp' null as '';
     copy featureCodes (code,name,description) from '${WORKPATH}/${TMPPATH}/featureCodes.txt' null as '';
     copy admin1CodesAscii (code,name,nameAscii,geonameid) from '${WORKPATH}/${TMPPATH}/admin1CodesASCII.txt' null as '';
     copy admin1Codes (code,name) from '${WORKPATH}/${TMPPATH}/admin1Codes.txt' null as '';
     copy iso_languagecodes (iso_639_3,iso_639_2,iso_639_1,language_name) from '${WORKPATH}/${TMPPATH}/iso-languagecodes.txt.tmp' null as '';
     copy countryInfo (iso_alpha2,iso_alpha3,iso_numeric,fips_code,country,capital,areainsqkm,population,continent,tld,currency_code,currency_name,phone,postal,postalRegex,languages,geonameid,neighbours,equivalent_fips_code) from '${WORKPATH}/${TMPPATH}/countryInfo.txt.tmp' null as '';
     copy alternatename  (alternatenameid,geonameid,isoLanguage,alternateName,isPreferredName,isShortName,isColloquial,isHistoric) from '${WORKPATH}/${TMPPATH}/alternateNames.txt' null as '';
     INSERT INTO continentCodes VALUES ('AF', 'Africa', 6255146);
     INSERT INTO continentCodes VALUES ('AS', 'Asia', 6255147);
     INSERT INTO continentCodes VALUES ('EU', 'Europe', 6255148);
     INSERT INTO continentCodes VALUES ('NA', 'North America', 6255149);
     INSERT INTO continentCodes VALUES ('OC', 'Oceania', 6255150);
     INSERT INTO continentCodes VALUES ('SA', 'South America', 6255151);
     INSERT INTO continentCodes VALUES ('AN', 'Antarctica', 6255152);
     CREATE INDEX index_countryinfo_geonameid ON countryinfo USING hash (geonameid);
     CREATE INDEX index_alternatename_geonameid ON alternatename USING hash (geonameid);
     echo "'----- DONE ( have fun... )"
  • hellboy

    Joined: 18/05/2010 12:13:56
    Messages: 9

     copy postalcodes (countrycode,postalcode,placename,admin1name,admin1cod
     e,admin2name,admin2code,admin3name,latitude,longitude,accuracy) from '/home/..../allCountries.txt' WITH DELIMITER e'\t' null as '';

    ERROR: extra data after last expected column
    CONTEXT: COPY postalcodes, line 1: "AD AD100 Canillo 4
    2.5833 1.6667 6"

    What's wrong?

    Joined: 30/12/2010 07:18:11
    Messages: 2

    Note that int is not a good choice anymore for population as Asia (6255147) has a population of 3,812,366,000 (as of the date of this post). The max size of an integer in PostgreSQL is 2,147,483,647. This causes an error during an allCountries.txt import.

    I'm sure this has been mentioned elsewhere, but thought it was worth noting here, where many PGSql folks will land when searching for this solution.

    BTW, thanks, Marc, for all your hard work.

    Joined: 08/12/2005 07:39:47
    Messages: 4416

    Thanks, I have updated the postings in this thread to use bigint for the geonames population column.


    Joined: 19/01/2009 21:58:06
    Messages: 9

    Hi everybody!

    I've just tried the script posted at "26/11/2008 10:31:01". I'm afraid, it still has a lot of "issues":

    - The subdirectories "tmp" and "pc" won't be created if the working base directories already exists.

    - First copy line (geonames) yields an error
     ERROR:  value too long for type character varying(4000)
     CONTEXT:  COPY geoname, line 3185, column alternatenames:...

    I got rid of this by changing the corresponding field size from 4000 to 6000.

    - There's no file "featureCodes.txt" on the, so the corresponding download, unzipping and copying into the db yields errors. (This might have to do with this, though.)

    - The copy line for admin1CodesASCII yields an error
     ERROR:  value too long for type character(6)
     CONTEXT:  COPY admin1codesascii, line 502, column code: "CG.7280295"

    I solved this by increasing the size of the field to 10.

    - It's the same with "admin1Code".

    - The copy line for countryinfo yields an error
     ERROR:  column "country" of relation "countryinfo" does not exist

    I solved this by removing the "country" item from the copy command.

    - The copy line for countryinfo also yields errors for wrong column names. Solution: rename "currency" to "currency_code" and "currencyname" to "currency_name".

    - The copy line for countryinfo also yields an error for wrong column counts:
     ERROR:  extra data after last expected column
     CONTEXT:  COPY countryinfo, line 1:...

    I've no idea on how to fix this so far.

    - The copy line for alternateNames yields an error:
     ERROR:  value too long for type character varying(200)
     CONTEXT:  COPY alternatename, line 1986888, column alternatename:

    Solution: set the length of this field to 300

    - It might be a good idea to also include the generation of spatial indexes (like described in in the script.

    Kind regards,


    Joined: 17/02/2010 20:06:53
    Messages: 7

    In the postalcodes area, you forgot "admin1code varchar(20)" ...

    Once you add that, it works like a charm.

    Joined: 21/02/2011 17:13:13
    Messages: 3

    There are 2 errors in the "countryInfo.txt": In two last records field "geonameid" equal "0":
    CS SCG 891 YI Serbia and Montenegro Belgrade 102350 10829175 EU .cs RSD Dinar 381 ##### ^(\d{5})$ cu,hu,sq,sr 0 AL,HU,MK,RO,HR,BA,BG

    AN ANT 530 NT Netherlands Antilles Willemstad 960 136197 NA .an ANG Guilder 599 nl-AN,en,es 0 GP


    Joined: 08/12/2005 07:39:47
    Messages: 4416

    These two countries are deliberately set to '0' because they no longer exist. It is mentioned in the file comment header. The lines are kept in the file for reference.



    Joined: 21/02/2011 17:13:13
    Messages: 3

    marc wrote:
    These two countries are deliberately set to '0' because they no longer exist. It is mentioned in the file comment header. The lines are kept in the file for reference.


    It's correct, but there is no "geoname" DB record with "geonameid" = 0. There is CONSTRAINT " ALTER TABLE ONLY countryinfo
    ADD CONSTRAINT fk_geonameid FOREIGN KEY (geonameid) REFERENCES geoname(geonameid);" In the time of "countryinfo" DB table filling we receive error.
    There is 2 solutions:
    1. Add artifitial recors with "geonameid" = 0 into "geoname" DB table.
    2. Remove 2 obsolete records from "countryinfo" dump image.


    Joined: 21/02/2011 17:13:13
    Messages: 3

    There corrected script. Please modify script variables before script execution.
    1. WORKPATH="${HOME}/tmp/GIS/gisnames/geodata" - work directory
    2. TMPPATH="tmp" - subdirectory for data files download
    3. PCPATH="pc" - subdirectory for Post Codes data file download
    4. PREFIX="_" - not change
    5. DBHOST="" - DB server host
    6. DBPORT="5433" - DB server port
    7. DBUSER="postgres" - DB user name (in our case postgres)

    Execute script. Script will asks for postgres user 3 times:

    1. DB creating
    2. geonames DB tables creating
    3. geonames DB table filling

    After DB table creating the script will get data files from Geoname server. There is one important step: we need check content of countryInfo.txt.tmp file in $WORKPATH/$TMPPATH directory. Now the file consists of 2 last records with obsolete countries: Serbia and Montenegro and Netherlands Antilles with field geonameid=0. We need remove them.

    Script will works a hour or a little more time.

    # FILE:
    # USAGE: ./
    # DESCRIPTION: run the script so that the geodata will be downloaded and inserted into your
    # database
    # OPTIONS: ---
    # BUGS: ---
    # NOTES: ---
    # AUTHOR: Andreas (aka Harpagophyt )
    # COMPANY: <a href="" target="_blank" rel="nofollow"></a>
    # VERSION: 1.3
    # CREATED: 07/06/2008
    # REVISION: 1.1 2008-06-07 replace COPY continentCodes through INSERT statements.
    # 1.2 2008-11-25 Adjusted by Bastiaan Wakkie in order to not unnessisarily
    # download.

    FILES=" admin1Codes.txt admin1CodesASCII.txt admin2Codes.txt countryInfo.txt featureCodes_en.txt iso-languagecodes.txt timeZones.txt"
    psql -U $DBUSER -h $DBHOST -p $DBPORT -c "CREATE DATABASE geonames WITH TEMPLATE = template0 ENCODING = 'UTF8';"
    psql -U $DBUSER -h $DBHOST -p $DBPORT geonames <<EOT
    CREATE TABLE geoname (
    geonameid int,
    name varchar(200),
    asciiname varchar(200),
    alternatenames varchar(6000),
    latitude float,
    longitude float,
    fclass char(1),
    fcode varchar(10),
    country varchar(2),
    cc2 varchar(60),
    admin1 varchar(20),
    admin2 varchar(80),
    admin3 varchar(20),
    admin4 varchar(20),
    population bigint,
    elevation int,
    gtopo30 int,
    timezone varchar(40),
    moddate date

    DROP TABLE alternatename;
    CREATE TABLE alternatename (
    alternatenameId int,
    geonameid int,
    isoLanguage varchar(7),
    alternateName varchar(300),
    isPreferredName boolean,
    isShortName boolean,
    isColloquial boolean,
    isHistoric boolean

    DROP TABLE countryinfo;
    CREATE TABLE "countryinfo" (
    iso_alpha2 char(2),
    iso_alpha3 char(3),
    iso_numeric integer,
    fips_code character varying(3),
    country character varying(200),
    capital character varying(200),
    areainsqkm double precision,
    population integer,
    continent char(2),
    tld CHAR(10),
    currency_code char(3),
    currency_name CHAR(15),
    phone character varying(20),
    postal character varying(60),
    postalRegex character varying(200),
    languages character varying(200),
    geonameId int,
    neighbours character varying(50),
    equivalent_fips_code character varying(3)

    DROP TABLE iso_languagecodes;
    CREATE TABLE iso_languagecodes(
    iso_639_3 CHAR(4),
    iso_639_2 VARCHAR(50),
    iso_639_1 VARCHAR(50),
    language_name VARCHAR(200)

    DROP TABLE admin1Codes;
    CREATE TABLE admin1Codes (
    code CHAR(20),
    name TEXT

    DROP TABLE admin1CodesAscii;
    CREATE TABLE admin1CodesAscii (
    code CHAR(20),
    name TEXT,
    nameAscii TEXT,
    geonameid int

    DROP TABLE admin2CodesAscii;
    CREATE TABLE admin2CodesAscii (
    code CHAR(80),
    name TEXT,
    nameAscii TEXT,
    geonameid int

    DROP TABLE featureCodes;
    CREATE TABLE featureCodes (
    code CHAR(7),
    name VARCHAR(200),
    description TEXT

    DROP TABLE timeZones;
    CREATE TABLE timeZones (
    timeZoneId VARCHAR(200),
    GMT_offset numeric(3,1),
    DST_offset numeric(3,1)

    DROP TABLE continentCodes;
    CREATE TABLE continentCodes (
    code CHAR(2),
    name VARCHAR(20),
    geonameid INT

    DROP TABLE postalcodes;
    CREATE TABLE postalcodes (
    countrycode char(2),
    postalcode varchar(10),
    placename varchar(180),
    admin1name varchar(100),
    admin1code varchar(20),
    admin2name varchar(100),
    admin2code varchar(20),
    admin3name varchar(100),
    admin3code varchar(20),
    latitude float,
    longitude float,
    accuracy smallint

    ALTER TABLE ONLY alternatename
    ADD CONSTRAINT pk_alternatenameid PRIMARY KEY (alternatenameid);
    ALTER TABLE ONLY geoname
    ADD CONSTRAINT pk_geonameid PRIMARY KEY (geonameid);
    ALTER TABLE ONLY countryinfo
    ADD CONSTRAINT pk_iso_alpha2 PRIMARY KEY (iso_alpha2);
    ALTER TABLE ONLY countryinfo
    ADD CONSTRAINT fk_geonameid FOREIGN KEY (geonameid) REFERENCES geoname(geonameid);
    ALTER TABLE ONLY alternatename
    ADD CONSTRAINT fk_geonameid FOREIGN KEY (geonameid) REFERENCES geoname(geonameid);

    # check if needed directories do already exsist
    if [ -d "$WORKPATH" ]; then
    echo "$WORKPATH exists..."
    sleep 0
    echo "$WORKPATH and subdirectories will be created..."
    mkdir -p $WORKPATH
    mkdir -p $WORKPATH/$TMPPATH
    mkdir -p $WORKPATH/$PCPATH
    echo "created $WORKPATH"
    echo ",---- STARTING (downloading, unpacking and preparing)"
    for i in $FILES
    wget -N -q "$i" # get newer files
    if [ $i -nt $PREFIX$i ] || [ ! -e $PREFIX$i ] ; then
    cp -p $i $PREFIX$i
    unzip -u -q $i
    case "$i" in
    tail -n +2 iso-languagecodes.txt > iso-languagecodes.txt.tmp;
    grep -v '^#' countryInfo.txt | tail -n +2 > countryInfo.txt.tmp;
    tail -n +2 timeZones.txt > timeZones.txt.tmp;
    echo "| $1 has been downloaded";
    echo "| $i is already the latest version"

    # download the postalcodes. You must know yourself the url
    wget -q -N ""
    if [ $WORKPATH/$PCPATH/ -nt $WORKPATH/$PCPATH/allCountries$ ] || [ ! -e $WORKPATH/$PCPATH/ ]; then
    echo "Attempt to unzip $WORKPATH/$PCPATH/ file..."
    unzip -u -q $WORKPATH/$PCPATH/
    cp -p $WORKPATH/$PCPATH/ $WORKPATH/$PCPATH/allCountries$
    echo "| has been downloaded"
    echo "| is already the latest version"

    echo "+---- FILL DATABASE ( this takes 2 days on my machine "

    psql -e -U $DBUSER -h $DBHOST -p $DBPORT geonames <<EOT
    copy geoname (geonameid,name,asciiname,alternatenames,latitude,longitude,fclass,fcode,country,cc2,admin1,admin2,admin3,admin4,population,elevation,gtopo30,timezone,moddate) from '${WORKPATH}/${TMPPATH}/allCountries.txt' null as '';
    copy postalcodes (countrycode,postalcode,placename,admin1name,admin1code,admin2name,admin2code,admin3name,admin3code,latitude,longitude,accuracy) from '${WORKPATH}/${PCPATH}/allCountries.txt' null as '';
    copy timeZones (timeZoneId,GMT_offset,DST_offset) from '${WORKPATH}/${TMPPATH}/timeZones.txt.tmp' null as '';
    copy featureCodes (code,name,description) from '${WORKPATH}/${TMPPATH}/featureCodes_en.txt' null as '';
    copy admin1CodesAscii (code,name,nameAscii,geonameid) from '${WORKPATH}/${TMPPATH}/admin1CodesASCII.txt' null as '';
    copy admin1Codes (code,name) from '${WORKPATH}/${TMPPATH}/admin1Codes.txt' null as '';
    copy admin2CodesAscii (code,name,nameAscii,geonameid) from '${WORKPATH}/${TMPPATH}/admin2Codes.txt' null as '';
    copy iso_languagecodes (iso_639_3,iso_639_2,iso_639_1,language_name) from '${WORKPATH}/${TMPPATH}/iso-languagecodes.txt.tmp' null as '';
    copy countryInfo (iso_alpha2,iso_alpha3,iso_numeric,fips_code,country,capital,areainsqkm,population,continent,tld,currency_code,currency_name,phone,postal,postalRegex,languages,geonameid,neighbours,equivalent_fips_code) from '${WORKPATH}/${TMPPATH}/countryInfo.txt.tmp' null as '';
    copy alternatename (alternatenameid,geonameid,isoLanguage,alternateName,isPreferredName,isShortName,isColloquial,isHistoric) from '${WORKPATH}/${TMPPATH}/alternateNames.txt' null as '';
    INSERT INTO continentCodes VALUES ('AF', 'Africa', 6255146);
    INSERT INTO continentCodes VALUES ('AS', 'Asia', 6255147);
    INSERT INTO continentCodes VALUES ('EU', 'Europe', 625514;
    INSERT INTO continentCodes VALUES ('NA', 'North America', 6255149);
    INSERT INTO continentCodes VALUES ('OC', 'Oceania', 6255150);
    INSERT INTO continentCodes VALUES ('SA', 'South America', 6255151);
    INSERT INTO continentCodes VALUES ('AN', 'Antarctica', 6255152);
    CREATE INDEX index_countryinfo_geonameid ON countryinfo USING hash (geonameid);
    CREATE INDEX index_alternatename_geonameid ON alternatename USING hash (geonameid);
    echo "'----- DONE ( have fun... )"

    Joined: 26/03/2011 15:24:41
    Messages: 6

    thanks for this best work
    how and where to run this script ?
    others information are welcome [b]

    Joined: 26/03/2011 15:24:41
    Messages: 6

    Hi every boby
    can i run this script in windows and how?

    help Plz

    Joined: 31/03/2011 16:24:03
    Messages: 3


    It seems that there is an error in your script:
    > grep -v '^#' countryInfo.txt | tail -n +2 > countryInfo.txt.tmp;

    should be changed to:

    >grep -v '^#' countryInfo.txt | head -n -2 > countryInfo.txt.tmp;

    tail -n +2 removes the 1st line from the file - this is ok for timeZones.txt and iso-languagecodes.txt cause their 1st line is table header, so it should be removed before importing to the database.

    But for countryInfo.txt header is already removed by "grep -v '^#' countryInfo.txt" which skips all lines starting with "#" and table header also starts with "#", so "tail -n +2" would remove the 1st data line which should not be removed. And at the same time last 2 lines should be removed from countryInfo.txt (they have geonameId=0 - explained above) - this can be done with "head -n -2".

    I would also recommend to join db creating and filling with data to single psql session (called after the data is loaded) - this would reduce the number of password prompts.

    Joined: 31/03/2011 16:24:03
    Messages: 3

    and also info for reference:
    - downloaded files take 1.7 Gb on disk
    - created db size after importing downloaded data is about 2Gb

    Joined: 31/03/2011 16:24:03
    Messages: 3

    and also had to increase alternatenames column size:
    alternatenames varchar(12000),

    "alternatenames varchar(6000)" seemed to work on one of my postgres instances, but by some reason showed "ERROR: value too long for type character varying(6000)" on another.
    Forum Index -> FAQ - frequently asked questions Go to Page: 1, 2 Next 
    Go to:   
    Powered by JForum 2.1.5 © JForum Team