GeoNames Home | Postal Codes | Download / Webservice | About 

GeoNames Forum
  [Search] Search   [Recent Topics] Recent Topics   [Groups] Back to home page 
[Register] Register / 
[Login] Login 
Importing *all* geonames tables to postgresql  XML
Forum Index -> FAQ - frequently asked questions Go to Page: 1, 2 Next 
Author Message
Harpagophyt



Joined: 07/06/2008 13:58:13
Messages: 6
Offline

I wrote a shell script which would download all available files and import it into postgresql.
I am a newbie to postgresql and think some datatypes could be match better. Also I have not add any constraints.
Feel free to tell me how i can optimize it.

Also there are the lines for handle the postalcodes. Please do not ask me, where they are. If you interested to use the postalcodes please read here:
No postal codes in database dump

Thanks to marc and dportabella for there postings:
Import GeoNames dump into Postgres
Importing *all* geonames tables to mysql

Please read it and use it carefully. This script overwrite files and drop tables. It works straight and fits best for my needs.

Code:
 #!/bin/bash
 
 WORKPATH="/data/geodata"
 
 cd $WORKPATH
 
 # download all needed files and if needed unzip them
 ZIPFILES="allCountries.zip alternateNames.zip userTags.zip"
 TXTFILES="admin1Codes.txt admin1CodesASCII.txt admin2Codes.txt countryInfo.txt featureCodes.txt iso-languagecodes.txt timeZones.txt"
 for i in $ZIPFILES $TXTFILES
 do
 	wget <a href="http://download.geonames.org/export/dump/$i" target="_blank" rel="nofollow">http://download.geonames.org/export/dump/$i</a>
 	echo "Done download $i"
 done
 for i in $ZIPFILES
 do
 	unzip -o -qq $i
 done
 
 # rename files because of name conflict
 mv allCountries.zip allGeoCountries.zip
 mv allCountries.txt allGeoCountries.txt
 
 # download the postalcodes. You must know yourself the url
 #wget <a href="http://xxx" target="_blank" rel="nofollow">http://xxx</a>
 #echo "Done download postal codes (xxx)"
 #unzip -o -qq xxx
 
 # rename files because of name conflict
 mv xxx xxx
 mv xxx xxx
 
 # alter files for import
 tail -n +2 iso-languagecodes.txt > iso-languagecodes.txt.tmp
 grep -v '^#' countryInfo.txt | tail -n +2 > countryInfo.txt.tmp
 tail -n +2 timeZones.txt > timeZones.txt.tmp
 
 psql test <<EOT
 DROP TABLE geoname;
 CREATE TABLE geoname (
 	geonameid int,
 	name varchar(200),
 	asciiname varchar(200),
 	alternatenames varchar(5000),
 	latitude float,
 	longitude float,
 	fclass char(1),
 	fcode varchar(10),
 	country varchar(2),
 	cc2 varchar(60),
 	admin1 varchar(20),
 	admin2 varchar(80),
 	admin3 varchar(20),
 	admin4 varchar(20),
 	population bigint,
 	elevation int,
 	gtopo30 int,
 	timezone varchar(40),
 	moddate date
 );
 copy geoname (geonameid,name,asciiname,alternatenames,latitude,longitude,fclass,fcode,country,cc2, admin1,admin2,admin3,admin4,population,elevation,gtopo30,timezone,moddate) from '${WORKPATH}/allGeoCountries.txt' null as '';
 
 DROP TABLE alternatename;
 CREATE TABLE alternatename (
 	alternatenameId int,
 	geonameid int,
 	isoLanguage varchar(7),
 	alternateName varchar(200),
 	isPreferredName boolean,
 	isShortName boolean,
 	isColloquial boolean,
 	isHistoric boolean
 );
 copy alternatename  (alternatenameid,geonameid,isoLanguage,alternateName,isPreferredName,isShortName,isColloquial,isHistoric) from '${WORKPATH}/alternateNames.txt' null as '';
 
 DROP TABLE countryinfo;
 CREATE TABLE countryinfo (
 	iso_alpha2 char(2),
 	iso_alpha3 char(3),
 	iso_numeric integer,
 	fips_code character varying(3),
 	name character varying(200),
 	capital character varying(200),
 	areainsqkm double precision,
 	population integer,
 	continent char(2),
 	languages character varying(200),
 	currency char(3),
 	geonameId int
 );
 copy countryInfo (iso_alpha2,iso_alpha3,iso_numeric,fips_code,name,capital,areaInSqKm,population,continent,languages,currency,geonameId) from '${WORKPATH}/countryInfo.txt.tmp' null as '';
 
 DROP TABLE iso_languagecodes;
 CREATE TABLE iso_languagecodes(
 	iso_639_3 CHAR(4),
 	iso_639_2 VARCHAR(50),
 	iso_639_1 VARCHAR(50),
 	language_name VARCHAR(200)
 );
 copy iso_languagecodes (iso_639_3, iso_639_2, iso_639_1, language_name) from '${WORKPATH}/iso-languagecodes.txt.tmp' null as '';
 
 DROP TABLE admin1Codes;
 CREATE TABLE admin1Codes (
 	code CHAR(20),
 	name TEXT
 );
 copy admin1Codes (code, name) from '${WORKPATH}/admin1Codes.txt' null as '';
 
 DROP TABLE admin1CodesAscii;
 CREATE TABLE admin1CodesAscii (
 	code CHAR(20),
 	name TEXT,
 	nameAscii TEXT,
 	geonameid int
 );
 copy admin1CodesAscii (code,name,nameAscii,geonameid) from '${WORKPATH}/admin1CodesASCII.txt' null as '';
 
 DROP TABLE featureCodes;
 CREATE TABLE featureCodes (
 	code CHAR(7),
 	name VARCHAR(200),
 	description TEXT
 );
 copy featureCodes (code,name,description) from '${WORKPATH}/featureCodes_en.txt' null as '';
 
 DROP TABLE timeZones;
 CREATE TABLE timeZones (
 	timeZoneId VARCHAR(200),
 	GMT_offset numeric(3,1),
 	DST_offset numeric(3,1)
 );
 copy timeZones (timeZoneId,GMT_offset,DST_offset) from '${WORKPATH}/timeZones.txt.tmp' null as '';
 
 DROP TABLE continentCodes;
 CREATE TABLE continentCodes (
 	code CHAR(2),
 	name VARCHAR(20),
 	geonameid INT
 );
 INSERT INTO continentCodes VALUES ('AF', 'Africa', 6255146);
 INSERT INTO continentCodes VALUES ('AS', 'Asia', 6255147);
 INSERT INTO continentCodes VALUES ('EU', 'Europe', 6255148);
 INSERT INTO continentCodes VALUES ('NA', 'North America', 6255149);
 INSERT INTO continentCodes VALUES ('OC', 'Oceania', 6255150);
 INSERT INTO continentCodes VALUES ('SA', 'South America', 6255151);
 INSERT INTO continentCodes VALUES ('AN', 'Antarctica', 6255152);
 
 
 DROP TABLE postalcodes;
 CREATE TABLE postalcodes (
 	countrycode char(2),
 	postalcode varchar(10),
 	placename varchar(180),
 	admin1name varchar(100),
 	admin1code varchar(20),
 	admin2name varchar(100),
 	admin2code varchar(20),
 	admin3name varchar(100),
 	latitude float,
 	longitude float,
 	accuracy smallint
 );
 #copy postalcodes (countrycode,postalcode,placename,admin1name,admin1code,admin2name,admin2code,admin3name,latitude,longitude,accuracy) from '${WORKPATH}/xxx' null as '';
 EOT
 


Greetings
Andreas

Changes:
2012-05-09 add isCollloquial and isHistoric flag (Marc)
2008-06-07 replace COPY continentCodes through INSERT statements.
lasizoillo



Joined: 19/09/2008 12:00:26
Messages: 1
Offline

Great job.

I have made some changes in my personal script:
Code:
 :%s/DROP TABLE/DROP TABLE IF EXIST/
 


And taken from http://forum.geonames.org/gforum/posts/list/67.page
Code:
 ALTER TABLE ONLY alternatename
      ADD CONSTRAINT pk_alternatenameid PRIMARY KEY (alternatenameid);
  
  ALTER TABLE ONLY geoname
      ADD CONSTRAINT pk_geonameid PRIMARY KEY (geonameid);
  
  ALTER TABLE ONLY countryinfo
      ADD CONSTRAINT pk_iso_alpha2 PRIMARY KEY (iso_alpha2);
 
 ALTER TABLE ONLY countryinfo
      ADD CONSTRAINT fk_geonameid FOREIGN KEY (geonameid) REFERENCES geoname(geonameid);
 
 ALTER TABLE ONLY alternatename
      ADD CONSTRAINT fk_geonameid FOREIGN KEY (geonameid) REFERENCES geoname(geonameid);
 
 CREATE INDEX index_countryinfo_geonameid ON countryinfo USING hash (geonameid);
  
 CREATE INDEX index_alternatename_geonameid ON alternatename USING hash (geonameid);
 


bwakkie



Joined: 25/11/2008 12:49:52
Messages: 7
Offline

Hi,

I loved the script so I started working on it. This is the working end result.
done:
  • It will create a working path if it is not there
  • It will create the db
  • It wil check if you are already using the latest files from the net and therefore not downloading it again
  • Added the command CASCADE to DROP TABLE geoname due to the constrains
  • I do not thing you need to change DROP TABLE to DROP TABLE IF EXIST, it is sort of pointless
    todo:
  • perhaps only update/insert/delete changes in the future by removing duplicate lines from the txt files?
    Code:
     #!/bin/bash
     #===============================================================================
     #
     #          FILE:  getgeo.sh
     # 
     #         USAGE:  ./getgeo.sh 
     # 
     #   DESCRIPTION:  run the script so that the geodata will be downloaded and inserted into your
     #		  database
     # 
     #       OPTIONS:  ---
     #  REQUIREMENTS:  ---
     #          BUGS:  ---
     #         NOTES:  ---
     #        AUTHOR:  Andreas (aka Harpagophyt )
     #       COMPANY:  <a href="http://forum.geonames.org/gforum/posts/list/926.page" target="_blank" rel="nofollow">http://forum.geonames.org/gforum/posts/list/926.page</a>
     #       VERSION:  1.3
     #       CREATED:  07/06/2008
     #      REVISION:  1.1 2008-06-07 replace COPY continentCodes through INSERT statements.
     #                 1.2 2008-11-25 Adjusted by Bastiaan Wakkie in order to not unnessisarily 
     #                 1.3 2009-02-18 Fixed countryInfo bug in Aland Islands as there was one tab too many at te end and sone field lengths needed to be larger
     #                                download.
     #===============================================================================
     #!/bin/bash
     
     WORKPATH="~/geodata"
     TMPPATH="tmp"
     PCPATH="pc"
     PREFIX="_"
     DBUSER="$USER"
     FILES="allCountries.zip alternateNames.zip userTags.zip admin1Codes.txt admin1CodesASCII.txt admin2Codes.txt countryInfo.txt featureCodes.txt iso-languagecodes.txt timeZones.txt"
     createdb -O $DBUSER geonames
     psql -U $DBUSER geonames <<EOT
     DROP TABLE geoname CASCADE;
     CREATE TABLE geoname (
     	geonameid int,
     	name varchar(200),
     	asciiname varchar(200),
     	alternatenames varchar(4000),
     	latitude float,
     	longitude float,
     	fclass char(1),
     	fcode varchar(10),
     	country varchar(2),
     	cc2 varchar(60),
     	admin1 varchar(20),
     	admin2 varchar(80),
     	admin3 varchar(20),
     	admin4 varchar(20),
     	population bigint,
     	elevation int,
     	gtopo30 int,
     	timezone varchar(40),
     	moddate date
     );
     
     DROP TABLE alternatename;
     CREATE TABLE alternatename (
     	alternatenameId int,
     	geonameid int,
     	isoLanguage varchar(7),
     	alternateName varchar(200),
     	isPreferredName boolean,
     	isShortName boolean
     	isColloquial boolean,
      	isHistoric boolean
     );
     
     DROP TABLE countryinfo;
      CREATE TABLE countryinfo (
      	iso_alpha2 char(2),
      	iso_alpha3 char(3),
      	iso_numeric integer,
      	fips_code character varying(3),
      	name character varying(200),
      	capital character varying(200),
      	areainsqkm double precision,
      	population integer,
      	continent char(2),
      	tld character varying(200),
      	currency char(3),
      	curranceName char(15),
      	phone char(20),
      	postal char(60),
      	postalRegex char(250),
      	languages char(130),
      	geonameid integer,
      	neighbours char(50)
      );
     
     
     DROP TABLE iso_languagecodes;
     CREATE TABLE iso_languagecodes(
     	iso_639_3 CHAR(4),
     	iso_639_2 VARCHAR(50),
     	iso_639_1 VARCHAR(50),
     	language_name VARCHAR(200)
     );
     
     DROP TABLE admin1Codes;
     CREATE TABLE admin1Codes (
     	code CHAR(6),
     	name TEXT
     );
     
     DROP TABLE admin1CodesAscii;
     CREATE TABLE admin1CodesAscii (
     	code CHAR(6),
     	name TEXT,
     	nameAscii TEXT,
     	geonameid int
     );
     
     DROP TABLE featureCodes;
     CREATE TABLE featureCodes (
     	code CHAR(7),
     	name VARCHAR(200),
     	description TEXT
     );
     
     DROP TABLE timeZones;
     CREATE TABLE timeZones (
     	timeZoneId VARCHAR(200),
     	GMT_offset numeric(3,1),
     	DST_offset numeric(3,1)
     );
     
     DROP TABLE continentCodes;
     CREATE TABLE continentCodes (
     	code CHAR(2),
     	name VARCHAR(20),
     	geonameid INT
     );
     
     DROP TABLE postalcodes;
     CREATE TABLE postalcodes (
     	countrycode char(2),
     	postalcode varchar(10),
     	placename varchar(180),
     	admin1name varchar(100),
     	admin1code varchar(20),
     	admin2name varchar(100),
     	admin2code varchar(20),
     	admin3name varchar(100),
     	latitude float,
     	longitude float,
     	accuracy smallint
     );
     
     ALTER TABLE ONLY alternatename
           ADD CONSTRAINT pk_alternatenameid PRIMARY KEY (alternatenameid);
     ALTER TABLE ONLY geoname
           ADD CONSTRAINT pk_geonameid PRIMARY KEY (geonameid);
     ALTER TABLE ONLY countryinfo
           ADD CONSTRAINT pk_iso_alpha2 PRIMARY KEY (iso_alpha2);
     
     ALTER TABLE ONLY countryinfo
           ADD CONSTRAINT fk_geonameid FOREIGN KEY (geonameid) REFERENCES geoname(geonameid);
     ALTER TABLE ONLY alternatename
           ADD CONSTRAINT fk_geonameid FOREIGN KEY (geonameid) REFERENCES geoname(geonameid);
     EOT
     
     # check if needed directories do already exsist
     if [ -d "$WORKPATH" ]; then
     	sleep 0
     else
     	mkdir $WORKPATH
     	mkdir $WORKPATH/$TMPPATH
     	mkdir $WORKPATH/$PCPATH
     	echo "created $WORKPATH"
     fi
     echo
     echo ",---- STARTING (downloading, unpacking and preparing)"
     cd $WORKPATH/$TMPPATH
     for i in $FILES
     do
     	wget -N -q "http://download.geonames.org/export/dump/$i" # get newer files
     	if  [ $i -nt $PREFIX$i ] || [ ! -e $PREFIX$i ] ; then
     		cp -p $i $PREFIX$i
     		unzip -u -q $i
     		case "$i" in
     		iso-languagecodes.txt)
     			tail -n +2 iso-languagecodes.txt > iso-languagecodes.txt.tmp;
     		;;
     		countryInfo.txt)
     			grep -v '^#' countryInfo.txt | tail -n +2 | sed 's/                FI/     FI/g'  > countryInfo.txt.tmp;
     		;;
     		timeZones.txt)
     			tail -n +2 timeZones.txt > timeZones.txt.tmp;
     		;;
     		esac
     		echo "| $1 has been downloaded";
     	else
     		echo "| $i is already the latest version" 
     
     	fi
     done
     # download the postalcodes. You must know yourself the url
     cd $WORKPATH/$PCPATH
     wget -q -N "http://.../....zip"
     if [ $WORKPATH/$PCPATH/....zip -nt $WORKPATH/$PCPATH/allCountries$PREFIX.zip ] || [ ! -e $WORKPATH/$PCPATH/....zip ]; then
     	unzip -u -q $WORKPATH/$PCPATH/....zip
     	cp -p $WORKPATH/$PCPATH/....zip $WORKPATH/$PCPATH/allCountries$PREFIX.zip
     	echo "| ....zip has been downloaded"
     else
     	echo "| ....zip is already the latest version" 
     fi
     
     echo "+---- FILL DATABASE ( this takes 2 days on my machine :)"
     
     psql -e -U $DBUSER geonames <<EOT
     copy geoname (geonameid,name,asciiname,alternatenames,latitude,longitude,fclass,fcode,country,cc2,admin1,admin2,admin3,admin4,population,elevation,gtopo30,timezone,moddate) from '${WORKPATH}/${TMPPATH}/allCountries.txt' null as '';
     copy postalcodes (countrycode,postalcode,placename,admin1name,admin1code,admin2name,admin2code,admin3name,latitude,longitude,accuracy) from '${WORKPATH}/${PCPATH}/allCountries.txt' null as '';
     copy timeZones (timeZoneId,GMT_offset,DST_offset) from '${WORKPATH}/${TMPPATH}/timeZones.txt.tmp' null as '';
     copy featureCodes (code,name,description) from '${WORKPATH}/${TMPPATH}/featureCodes.txt' null as '';
     copy admin1CodesAscii (code,name,nameAscii,geonameid) from '${WORKPATH}/${TMPPATH}/admin1CodesASCII.txt' null as '';
     copy admin1Codes (code,name) from '${WORKPATH}/${TMPPATH}/admin1Codes.txt' null as '';
     copy iso_languagecodes (iso_639_3,iso_639_2,iso_639_1,language_name) from '${WORKPATH}/${TMPPATH}/iso-languagecodes.txt.tmp' null as '';
     copy countryInfo (iso_alpha2,iso_alpha3,iso_numeric,fips_code,country,capital,areainsqkm,population,continent,tld,currency_code,currency_name,phone,postal,postalRegex,languages,geonameId,neighbours) from '${WORKPATH}/${TMPPATH}/countryInfo.txt.tmp' null as '';
     copy alternatename  (alternatenameid,geonameid,isoLanguage,alternateName,isPreferredName,isShortName) from '${WORKPATH}/${TMPPATH}/alternateNames.txt' null as '';
     INSERT INTO continentCodes VALUES ('AF', 'Africa', 6255146);
     INSERT INTO continentCodes VALUES ('AS', 'Asia', 6255147);
     INSERT INTO continentCodes VALUES ('EU', 'Europe', 6255148);
     INSERT INTO continentCodes VALUES ('NA', 'North America', 6255149);
     INSERT INTO continentCodes VALUES ('OC', 'Oceania', 6255150);
     INSERT INTO continentCodes VALUES ('SA', 'South America', 6255151);
     INSERT INTO continentCodes VALUES ('AN', 'Antarctica', 6255152);
     CREATE INDEX index_countryinfo_geonameid ON countryinfo USING hash (geonameid);
     CREATE INDEX index_alternatename_geonameid ON alternatename USING hash (geonameid);
     EOT
     echo "'----- DONE ( have fun... )"
     
  • bwakkie



    Joined: 25/11/2008 12:49:52
    Messages: 7
    Offline

    The countryInfo.txt has wrong table field names:
    example:
    Code:
    #ISO	ISO3	ISO-Numeric	fips	Country	Capital	Area(in sq km)	Population	Continent	tld	CurrencyCode	CurrencyName	Phone	Postal Code Format	Postal Code Regex	Languages	geonameid	neighbours	EquivalentFipsCode
     AD	AND	20	AN	Andorra	Andorra la Vella	468	72000	EU	.ad	EUR	Euro	376	AD###	^(?:AD)*(\d{3})$	ca,fr-AD,pt	3041565	ES,FR

    neighbours EquivalentFipsCode are switched!
    also the geonameid does not exist anymore
    so new postgresql sql will be something like this:
    Code:
     I adjusted the code above as the length values of phone and currance name where changed too. There was a weird typo with one tab too much .. this is handled now too.
     
    marc



    Joined: 08/12/2005 07:39:47
    Messages: 4412
    Offline

    geonameId is still there, isn't it?

    Marc

    [WWW]
    bwakkie



    Joined: 25/11/2008 12:49:52
    Messages: 7
    Offline

    bwakkie wrote:
    Hi,

    I loved the script so I started working on it. This is the working end result.
    done:
  • It will create a working path if it is not there
  • It will create the db
  • It wil check if you are already using the latest files from the net and therefore not downloading it again
  • Added the command CASCADE to DROP TABLE geoname due to the constrains
  • I do not thing you need to change DROP TABLE to DROP TABLE IF EXIST, it is sort of pointless
    todo:
  • perhaps only update/insert/delete changes in the future by removing duplicate lines from the txt files?
    Code:
     #!/bin/bash
     #===============================================================================
     #
     #          FILE:  getgeo.sh
     # 
     #         USAGE:  ./getgeo.sh 
     # 
     #   DESCRIPTION:  run the script so that the geodata will be downloaded and inserted into your
     #		  database
     # 
     #       OPTIONS:  ---
     #  REQUIREMENTS:  ---
     #          BUGS:  ---
     #         NOTES:  ---
     #        AUTHOR:  Andreas (aka Harpagophyt )
     #       COMPANY:  <a href="http://forum.geonames.org/gforum/posts/list/926.page" target="_blank" rel="nofollow">http://forum.geonames.org/gforum/posts/list/926.page</a>
     #       VERSION:  1.3
     #       CREATED:  07/06/2008
     #      REVISION:  1.1 2008-06-07 replace COPY continentCodes through INSERT statements.
     #                 1.2 2008-11-25 Adjusted by Bastiaan Wakkie in order to not unnessisarily 
     #                                download.
     #===============================================================================
     #!/bin/bash
     
     WORKPATH="~/geodata"
     TMPPATH="tmp"
     PCPATH="pc"
     PREFIX="_"
     DBUSER="$USER"
     FILES="allCountries.zip alternateNames.zip userTags.zip admin1Codes.txt admin1CodesASCII.txt admin2Codes.txt countryInfo.txt featureCodes.txt iso-languagecodes.txt timeZones.txt"
     createdb -O $DBUSER geonames
     psql -U $DBUSER geonames <<EOT
     DROP TABLE geoname CASCADE;
     CREATE TABLE geoname (
     	geonameid int,
     	name varchar(200),
     	asciiname varchar(200),
     	alternatenames varchar(4000),
     	latitude float,
     	longitude float,
     	fclass char(1),
     	fcode varchar(10),
     	country varchar(2),
     	cc2 varchar(60),
     	admin1 varchar(20),
     	admin2 varchar(80),
     	admin3 varchar(20),
     	admin4 varchar(20),
     	population bigint,
     	elevation int,
     	gtopo30 int,
     	timezone varchar(40),
     	moddate date
     );
     
     DROP TABLE alternatename;
     CREATE TABLE alternatename (
     	alternatenameId int,
     	geonameid int,
     	isoLanguage varchar(7),
     	alternateName varchar(200),
     	isPreferredName boolean,
     	isShortName boolean,
     	isColloquial boolean,
     	isHistoric boolean
     );
     
     DROP TABLE countryinfo;
     CREATE TABLE "countryinfo" (
            iso_alpha2 char(2),
            iso_alpha3 char(3),
            iso_numeric integer,
            fips_code character varying(3),
            country character varying(200),
            capital character varying(200),
            areainsqkm double precision,
            population integer,
            continent char(2),
            tld CHAR(10),
            currency_code char(3),
            currency_name CHAR(15),
            phone character varying(20),
            languages character varying(200),
            geonameId int,
            neighbours character varying(50),
            equivalent_fips_code character varying(3)
     );
     
     
     
     DROP TABLE iso_languagecodes;
     CREATE TABLE iso_languagecodes(
     	iso_639_3 CHAR(4),
     	iso_639_2 VARCHAR(50),
     	iso_639_1 VARCHAR(50),
     	language_name VARCHAR(200)
     );
     
     DROP TABLE admin1Codes;
     CREATE TABLE admin1Codes (
     	code CHAR(6),
     	name TEXT
     );
     
     DROP TABLE admin1CodesAscii;
     CREATE TABLE admin1CodesAscii (
     	code CHAR(6),
     	name TEXT,
     	nameAscii TEXT,
     	geonameid int
     );
     
     DROP TABLE featureCodes;
     CREATE TABLE featureCodes (
     	code CHAR(7),
     	name VARCHAR(200),
     	description TEXT
     );
     
     DROP TABLE timeZones;
     CREATE TABLE timeZones (
     	timeZoneId VARCHAR(200),
     	GMT_offset numeric(3,1),
     	DST_offset numeric(3,1)
     );
     
     DROP TABLE continentCodes;
     CREATE TABLE continentCodes (
     	code CHAR(2),
     	name VARCHAR(20),
     	geonameid INT
     );
     
     DROP TABLE postalcodes;
     CREATE TABLE postalcodes (
     	countrycode char(2),
     	postalcode varchar(10),
     	placename varchar(180),
     	admin1name varchar(100),
     	admin1code varchar(20),
     	admin2name varchar(100),
     	admin2code varchar(20),
     	admin3name varchar(100),
     	latitude float,
     	longitude float,
     	accuracy smallint
     );
     
     ALTER TABLE ONLY alternatename
           ADD CONSTRAINT pk_alternatenameid PRIMARY KEY (alternatenameid);
     ALTER TABLE ONLY geoname
           ADD CONSTRAINT pk_geonameid PRIMARY KEY (geonameid);
     ALTER TABLE ONLY countryinfo
           ADD CONSTRAINT pk_iso_alpha2 PRIMARY KEY (iso_alpha2);
     ALTER TABLE ONLY countryinfo
           ADD CONSTRAINT fk_geonameid FOREIGN KEY (geonameid) REFERENCES geoname(geonameid);
     ALTER TABLE ONLY alternatename
           ADD CONSTRAINT fk_geonameid FOREIGN KEY (geonameid) REFERENCES geoname(geonameid);
     EOT
     
     # check if needed directories do already exsist
     if [ -d "$WORKPATH" ]; then
     	sleep 0
     else
     	mkdir $WORKPATH
     	mkdir $WORKPATH/$TMPPATH
     	mkdir $WORKPATH/$PCPATH
     	echo "created $WORKPATH"
     fi
     echo
     echo ",---- STARTING (downloading, unpacking and preparing)"
     cd $WORKPATH/$TMPPATH
     for i in $FILES
     do
     	wget -N -q "http://download.geonames.org/export/dump/$i" # get newer files
     	if  [ $i -nt $PREFIX$i ] || [ ! -e $PREFIX$i ] ; then
     		cp -p $i $PREFIX$i
     		unzip -u -q $i
     		case "$i" in
     		iso-languagecodes.txt)
     			tail -n +2 iso-languagecodes.txt > iso-languagecodes.txt.tmp;
     		;;
     		countryInfo.txt)
     			grep -v '^#' countryInfo.txt | tail -n +2  > countryInfo.txt.tmp;
     		;;
     		timeZones.txt)
     			tail -n +2 timeZones.txt > timeZones.txt.tmp;
     		;;
     		esac
     		echo "| $1 has been downloaded";
     	else
     		echo "| $i is already the latest version" 
     
     	fi
     done
     # download the postalcodes. You must know yourself the url
     cd $WORKPATH/$PCPATH
     wget -q -N "http://.../....zip"
     if [ $WORKPATH/$PCPATH/....zip -nt $WORKPATH/$PCPATH/allCountries$PREFIX.zip ] || [ ! -e $WORKPATH/$PCPATH/....zip ]; then
     	unzip -u -q $WORKPATH/$PCPATH/....zip
     	cp -p $WORKPATH/$PCPATH/....zip $WORKPATH/$PCPATH/allCountries$PREFIX.zip
     	echo "| ....zip has been downloaded"
     else
     	echo "| ....zip is already the latest version" 
     fi
     
     echo "+---- FILL DATABASE ( this takes 2 days on my machine :)"
     
     psql -e -U $DBUSER geonames <<EOT
     copy geoname (geonameid,name,asciiname,alternatenames,latitude,longitude,fclass,fcode,country,cc2,admin1,admin2,admin3,admin4,population,elevation,gtopo30,timezone,moddate) from '${WORKPATH}/${TMPPATH}/allCountries.txt' null as '';
     copy postalcodes (countrycode,postalcode,placename,admin1name,admin1code,admin2name,admin2code,admin3name,latitude,longitude,accuracy) from '${WORKPATH}/${PCPATH}/allCountries.txt' null as '';
     copy timeZones (timeZoneId,GMT_offset,DST_offset) from '${WORKPATH}/${TMPPATH}/timeZones.txt.tmp' null as '';
     copy featureCodes (code,name,description) from '${WORKPATH}/${TMPPATH}/featureCodes.txt' null as '';
     copy admin1CodesAscii (code,name,nameAscii,geonameid) from '${WORKPATH}/${TMPPATH}/admin1CodesASCII.txt' null as '';
     copy admin1Codes (code,name) from '${WORKPATH}/${TMPPATH}/admin1Codes.txt' null as '';
     copy iso_languagecodes (iso_639_3,iso_639_2,iso_639_1,language_name) from '${WORKPATH}/${TMPPATH}/iso-languagecodes.txt.tmp' null as '';
     copy countryInfo (iso_alpha2,iso_alpha3,iso_numeric,fips_code,country,capital,areainsqkm,population,continent,tld,currency_code,currency_name,phone,postal,postalRegex,languages,geonameid,neighbours,equivalent_fips_code) from '${WORKPATH}/${TMPPATH}/countryInfo.txt.tmp' null as '';
     copy alternatename  (alternatenameid,geonameid,isoLanguage,alternateName,isPreferredName,isShortName,isColloquial,isHistoric) from '${WORKPATH}/${TMPPATH}/alternateNames.txt' null as '';
     INSERT INTO continentCodes VALUES ('AF', 'Africa', 6255146);
     INSERT INTO continentCodes VALUES ('AS', 'Asia', 6255147);
     INSERT INTO continentCodes VALUES ('EU', 'Europe', 6255148);
     INSERT INTO continentCodes VALUES ('NA', 'North America', 6255149);
     INSERT INTO continentCodes VALUES ('OC', 'Oceania', 6255150);
     INSERT INTO continentCodes VALUES ('SA', 'South America', 6255151);
     INSERT INTO continentCodes VALUES ('AN', 'Antarctica', 6255152);
     CREATE INDEX index_countryinfo_geonameid ON countryinfo USING hash (geonameid);
     CREATE INDEX index_alternatename_geonameid ON alternatename USING hash (geonameid);
     EOT
     echo "'----- DONE ( have fun... )"
     
     
  • hellboy



    Joined: 18/05/2010 12:13:56
    Messages: 9
    Offline

    Code:
     copy postalcodes (countrycode,postalcode,placename,admin1name,admin1cod
     e,admin2name,admin2code,admin3name,latitude,longitude,accuracy) from '/home/..../allCountries.txt' WITH DELIMITER e'\t' null as '';
     



    ERROR: extra data after last expected column
    CONTEXT: COPY postalcodes, line 1: "AD AD100 Canillo 4
    2.5833 1.6667 6"
     


    What's wrong?
    standerson



    Joined: 30/12/2010 07:18:11
    Messages: 2
    Offline

    Note that int is not a good choice anymore for population as Asia (6255147) has a population of 3,812,366,000 (as of the date of this post). The max size of an integer in PostgreSQL is 2,147,483,647. This causes an error during an allCountries.txt import.

    I'm sure this has been mentioned elsewhere, but thought it was worth noting here, where many PGSql folks will land when searching for this solution.

    BTW, thanks, Marc, for all your hard work.
    marc



    Joined: 08/12/2005 07:39:47
    Messages: 4412
    Offline

    Thanks, I have updated the postings in this thread to use bigint for the geonames population column.

    [WWW]
    sjs



    Joined: 19/01/2009 21:58:06
    Messages: 9
    Offline

    Hi everybody!

    I've just tried the script posted at "26/11/2008 10:31:01". I'm afraid, it still has a lot of "issues":

    - The subdirectories "tmp" and "pc" won't be created if the working base directories already exists.

    - First copy line (geonames) yields an error
    Code:
     ERROR:  value too long for type character varying(4000)
     CONTEXT:  COPY geoname, line 3185, column alternatenames:...
     

    I got rid of this by changing the corresponding field size from 4000 to 6000.

    - There's no file "featureCodes.txt" on the Geonames.org-Server, so the corresponding download, unzipping and copying into the db yields errors. (This might have to do with this, though.)

    - The copy line for admin1CodesASCII yields an error
    Code:
     ERROR:  value too long for type character(6)
     CONTEXT:  COPY admin1codesascii, line 502, column code: "CG.7280295"
     

    I solved this by increasing the size of the field to 10.

    - It's the same with "admin1Code".

    - The copy line for countryinfo yields an error
    Code:
     ERROR:  column "country" of relation "countryinfo" does not exist
     

    I solved this by removing the "country" item from the copy command.

    - The copy line for countryinfo also yields errors for wrong column names. Solution: rename "currency" to "currency_code" and "currencyname" to "currency_name".

    - The copy line for countryinfo also yields an error for wrong column counts:
    Code:
     ERROR:  extra data after last expected column
     CONTEXT:  COPY countryinfo, line 1:...
     

    I've no idea on how to fix this so far.

    - The copy line for alternateNames yields an error:
    Code:
     ERROR:  value too long for type character varying(200)
     CONTEXT:  COPY alternatename, line 1986888, column alternatename:
     

    Solution: set the length of this field to 300

    - It might be a good idea to also include the generation of spatial indexes (like described in http://forum.geonames.org/gforum/posts/list/727.page) in the script.

    Kind regards,

    Sebastian
    rshelby



    Joined: 17/02/2010 20:06:53
    Messages: 7
    Offline

    In the postalcodes area, you forgot "admin1code varchar(20)" ...

    Once you add that, it works like a charm.
    Evgenich



    Joined: 21/02/2011 17:13:13
    Messages: 3
    Offline

    There are 2 errors in the "countryInfo.txt": In two last records field "geonameid" equal "0":
    CS SCG 891 YI Serbia and Montenegro Belgrade 102350 10829175 EU .cs RSD Dinar 381 ##### ^(\d{5})$ cu,hu,sq,sr 0 AL,HU,MK,RO,HR,BA,BG

    AN ANT 530 NT Netherlands Antilles Willemstad 960 136197 NA .an ANG Guilder 599 nl-AN,en,es 0 GP
     

    marc



    Joined: 08/12/2005 07:39:47
    Messages: 4412
    Offline

    These two countries are deliberately set to '0' because they no longer exist. It is mentioned in the file comment header. The lines are kept in the file for reference.

    Marc

    [WWW]
    Evgenich



    Joined: 21/02/2011 17:13:13
    Messages: 3
    Offline

    marc wrote:
    These two countries are deliberately set to '0' because they no longer exist. It is mentioned in the file comment header. The lines are kept in the file for reference.

    Marc 


    It's correct, but there is no "geoname" DB record with "geonameid" = 0. There is CONSTRAINT " ALTER TABLE ONLY countryinfo
    ADD CONSTRAINT fk_geonameid FOREIGN KEY (geonameid) REFERENCES geoname(geonameid);" In the time of "countryinfo" DB table filling we receive error.
    There is 2 solutions:
    1. Add artifitial recors with "geonameid" = 0 into "geoname" DB table.
    2. Remove 2 obsolete records from "countryinfo" dump image.


    Evgenich



    Joined: 21/02/2011 17:13:13
    Messages: 3
    Offline

    There corrected script. Please modify script variables before script execution.
    1. WORKPATH="${HOME}/tmp/GIS/gisnames/geodata" - work directory
    2. TMPPATH="tmp" - subdirectory for data files download
    3. PCPATH="pc" - subdirectory for Post Codes data file download
    4. PREFIX="_" - not change
    5. DBHOST="127.0.0.1" - DB server host
    6. DBPORT="5433" - DB server port
    7. DBUSER="postgres" - DB user name (in our case postgres)

    Execute script. Script will asks for postgres user 3 times:

    1. DB creating
    2. geonames DB tables creating
    3. geonames DB table filling

    After DB table creating the script will get data files from Geoname server. There is one important step: we need check content of countryInfo.txt.tmp file in $WORKPATH/$TMPPATH directory. Now the file consists of 2 last records with obsolete countries: Serbia and Montenegro and Netherlands Antilles with field geonameid=0. We need remove them.

    Script will works a hour or a little more time.


    #!/bin/bash
    #===============================================================================
    #
    # FILE: getgeo.sh
    #
    # USAGE: ./getgeo.sh
    #
    # DESCRIPTION: run the script so that the geodata will be downloaded and inserted into your
    # database
    #
    # OPTIONS: ---
    # REQUIREMENTS: ---
    # BUGS: ---
    # NOTES: ---
    # AUTHOR: Andreas (aka Harpagophyt )
    # COMPANY: <a href="http://forum.geonames.org/gforum/posts/list/926.page" target="_blank" rel="nofollow">http://forum.geonames.org/gforum/posts/list/926.page</a>
    # VERSION: 1.3
    # CREATED: 07/06/2008
    # REVISION: 1.1 2008-06-07 replace COPY continentCodes through INSERT statements.
    # 1.2 2008-11-25 Adjusted by Bastiaan Wakkie in order to not unnessisarily
    # download.
    #===============================================================================
    #!/bin/bash

    WORKPATH="${HOME}/tmp/GIS/gisnames/geodata"
    TMPPATH="tmp"
    PCPATH="pc"
    PREFIX="_"
    DBHOST="127.0.0.1"
    DBPORT="5432"
    DBUSER="postgres"
    FILES="allCountries.zip alternateNames.zip userTags.zip admin1Codes.txt admin1CodesASCII.txt admin2Codes.txt countryInfo.txt featureCodes_en.txt iso-languagecodes.txt timeZones.txt"
    psql -U $DBUSER -h $DBHOST -p $DBPORT -c "CREATE DATABASE geonames WITH TEMPLATE = template0 ENCODING = 'UTF8';"
    psql -U $DBUSER -h $DBHOST -p $DBPORT geonames <<EOT
    DROP TABLE geoname CASCADE;
    CREATE TABLE geoname (
    geonameid int,
    name varchar(200),
    asciiname varchar(200),
    alternatenames varchar(6000),
    latitude float,
    longitude float,
    fclass char(1),
    fcode varchar(10),
    country varchar(2),
    cc2 varchar(60),
    admin1 varchar(20),
    admin2 varchar(80),
    admin3 varchar(20),
    admin4 varchar(20),
    population bigint,
    elevation int,
    gtopo30 int,
    timezone varchar(40),
    moddate date
    );

    DROP TABLE alternatename;
    CREATE TABLE alternatename (
    alternatenameId int,
    geonameid int,
    isoLanguage varchar(7),
    alternateName varchar(300),
    isPreferredName boolean,
    isShortName boolean,
    isColloquial boolean,
    isHistoric boolean
    );

    DROP TABLE countryinfo;
    CREATE TABLE "countryinfo" (
    iso_alpha2 char(2),
    iso_alpha3 char(3),
    iso_numeric integer,
    fips_code character varying(3),
    country character varying(200),
    capital character varying(200),
    areainsqkm double precision,
    population integer,
    continent char(2),
    tld CHAR(10),
    currency_code char(3),
    currency_name CHAR(15),
    phone character varying(20),
    postal character varying(60),
    postalRegex character varying(200),
    languages character varying(200),
    geonameId int,
    neighbours character varying(50),
    equivalent_fips_code character varying(3)
    );



    DROP TABLE iso_languagecodes;
    CREATE TABLE iso_languagecodes(
    iso_639_3 CHAR(4),
    iso_639_2 VARCHAR(50),
    iso_639_1 VARCHAR(50),
    language_name VARCHAR(200)
    );

    DROP TABLE admin1Codes;
    CREATE TABLE admin1Codes (
    code CHAR(20),
    name TEXT
    );

    DROP TABLE admin1CodesAscii;
    CREATE TABLE admin1CodesAscii (
    code CHAR(20),
    name TEXT,
    nameAscii TEXT,
    geonameid int
    );

    DROP TABLE admin2CodesAscii;
    CREATE TABLE admin2CodesAscii (
    code CHAR(80),
    name TEXT,
    nameAscii TEXT,
    geonameid int
    );

    DROP TABLE featureCodes;
    CREATE TABLE featureCodes (
    code CHAR(7),
    name VARCHAR(200),
    description TEXT
    );

    DROP TABLE timeZones;
    CREATE TABLE timeZones (
    timeZoneId VARCHAR(200),
    GMT_offset numeric(3,1),
    DST_offset numeric(3,1)
    );

    DROP TABLE continentCodes;
    CREATE TABLE continentCodes (
    code CHAR(2),
    name VARCHAR(20),
    geonameid INT
    );

    DROP TABLE postalcodes;
    CREATE TABLE postalcodes (
    countrycode char(2),
    postalcode varchar(10),
    placename varchar(180),
    admin1name varchar(100),
    admin1code varchar(20),
    admin2name varchar(100),
    admin2code varchar(20),
    admin3name varchar(100),
    admin3code varchar(20),
    latitude float,
    longitude float,
    accuracy smallint
    );

    ALTER TABLE ONLY alternatename
    ADD CONSTRAINT pk_alternatenameid PRIMARY KEY (alternatenameid);
    ALTER TABLE ONLY geoname
    ADD CONSTRAINT pk_geonameid PRIMARY KEY (geonameid);
    ALTER TABLE ONLY countryinfo
    ADD CONSTRAINT pk_iso_alpha2 PRIMARY KEY (iso_alpha2);
    ALTER TABLE ONLY countryinfo
    ADD CONSTRAINT fk_geonameid FOREIGN KEY (geonameid) REFERENCES geoname(geonameid);
    ALTER TABLE ONLY alternatename
    ADD CONSTRAINT fk_geonameid FOREIGN KEY (geonameid) REFERENCES geoname(geonameid);
    EOT

    # check if needed directories do already exsist
    if [ -d "$WORKPATH" ]; then
    echo "$WORKPATH exists..."
    sleep 0
    else
    echo "$WORKPATH and subdirectories will be created..."
    mkdir -p $WORKPATH
    mkdir -p $WORKPATH/$TMPPATH
    mkdir -p $WORKPATH/$PCPATH
    echo "created $WORKPATH"
    fi
    echo
    echo ",---- STARTING (downloading, unpacking and preparing)"
    cd $WORKPATH/$TMPPATH
    for i in $FILES
    do
    wget -N -q "http://download.geonames.org/export/dump/$i" # get newer files
    if [ $i -nt $PREFIX$i ] || [ ! -e $PREFIX$i ] ; then
    cp -p $i $PREFIX$i
    unzip -u -q $i
    case "$i" in
    iso-languagecodes.txt)
    tail -n +2 iso-languagecodes.txt > iso-languagecodes.txt.tmp;
    ;;
    countryInfo.txt)
    grep -v '^#' countryInfo.txt | tail -n +2 > countryInfo.txt.tmp;
    ;;
    timeZones.txt)
    tail -n +2 timeZones.txt > timeZones.txt.tmp;
    ;;
    esac
    echo "| $1 has been downloaded";
    else
    echo "| $i is already the latest version"

    fi
    done
    # download the postalcodes. You must know yourself the url
    cd $WORKPATH/$PCPATH
    wget -q -N "http://download.geonames.org/export/zip/allCountries.zip"
    if [ $WORKPATH/$PCPATH/allCountries.zip -nt $WORKPATH/$PCPATH/allCountries$PREFIX.zip ] || [ ! -e $WORKPATH/$PCPATH/allCountries.zip ]; then
    echo "Attempt to unzip $WORKPATH/$PCPATH/allCountries.zip file..."
    unzip -u -q $WORKPATH/$PCPATH/allCountries.zip
    cp -p $WORKPATH/$PCPATH/allCountries.zip $WORKPATH/$PCPATH/allCountries$PREFIX.zip
    echo "| ....zip has been downloaded"
    else
    echo "| ....zip is already the latest version"
    fi

    echo "+---- FILL DATABASE ( this takes 2 days on my machine "

    psql -e -U $DBUSER -h $DBHOST -p $DBPORT geonames <<EOT
    copy geoname (geonameid,name,asciiname,alternatenames,latitude,longitude,fclass,fcode,country,cc2,admin1,admin2,admin3,admin4,population,elevation,gtopo30,timezone,moddate) from '${WORKPATH}/${TMPPATH}/allCountries.txt' null as '';
    copy postalcodes (countrycode,postalcode,placename,admin1name,admin1code,admin2name,admin2code,admin3name,admin3code,latitude,longitude,accuracy) from '${WORKPATH}/${PCPATH}/allCountries.txt' null as '';
    copy timeZones (timeZoneId,GMT_offset,DST_offset) from '${WORKPATH}/${TMPPATH}/timeZones.txt.tmp' null as '';
    copy featureCodes (code,name,description) from '${WORKPATH}/${TMPPATH}/featureCodes_en.txt' null as '';
    copy admin1CodesAscii (code,name,nameAscii,geonameid) from '${WORKPATH}/${TMPPATH}/admin1CodesASCII.txt' null as '';
    copy admin1Codes (code,name) from '${WORKPATH}/${TMPPATH}/admin1Codes.txt' null as '';
    copy admin2CodesAscii (code,name,nameAscii,geonameid) from '${WORKPATH}/${TMPPATH}/admin2Codes.txt' null as '';
    copy iso_languagecodes (iso_639_3,iso_639_2,iso_639_1,language_name) from '${WORKPATH}/${TMPPATH}/iso-languagecodes.txt.tmp' null as '';
    copy countryInfo (iso_alpha2,iso_alpha3,iso_numeric,fips_code,country,capital,areainsqkm,population,continent,tld,currency_code,currency_name,phone,postal,postalRegex,languages,geonameid,neighbours,equivalent_fips_code) from '${WORKPATH}/${TMPPATH}/countryInfo.txt.tmp' null as '';
    copy alternatename (alternatenameid,geonameid,isoLanguage,alternateName,isPreferredName,isShortName,isColloquial,isHistoric) from '${WORKPATH}/${TMPPATH}/alternateNames.txt' null as '';
    INSERT INTO continentCodes VALUES ('AF', 'Africa', 6255146);
    INSERT INTO continentCodes VALUES ('AS', 'Asia', 6255147);
    INSERT INTO continentCodes VALUES ('EU', 'Europe', 625514;
    INSERT INTO continentCodes VALUES ('NA', 'North America', 6255149);
    INSERT INTO continentCodes VALUES ('OC', 'Oceania', 6255150);
    INSERT INTO continentCodes VALUES ('SA', 'South America', 6255151);
    INSERT INTO continentCodes VALUES ('AN', 'Antarctica', 6255152);
    CREATE INDEX index_countryinfo_geonameid ON countryinfo USING hash (geonameid);
    CREATE INDEX index_alternatename_geonameid ON alternatename USING hash (geonameid);
    EOT
    echo "'----- DONE ( have fun... )"
     
    snrt



    Joined: 26/03/2011 15:24:41
    Messages: 6
    Offline

    thanks for this best work
    how and where to run this script ?
    others information are welcome [b]
    snrt



    Joined: 26/03/2011 15:24:41
    Messages: 6
    Offline

    Hi every boby
    can i run this script in windows and how?

    help Plz
    benderamp



    Joined: 31/03/2011 16:24:03
    Messages: 3
    Offline

    2Evgenich

    It seems that there is an error in your script:
    > grep -v '^#' countryInfo.txt | tail -n +2 > countryInfo.txt.tmp;

    should be changed to:

    >grep -v '^#' countryInfo.txt | head -n -2 > countryInfo.txt.tmp;


    tail -n +2 removes the 1st line from the file - this is ok for timeZones.txt and iso-languagecodes.txt cause their 1st line is table header, so it should be removed before importing to the database.

    But for countryInfo.txt header is already removed by "grep -v '^#' countryInfo.txt" which skips all lines starting with "#" and table header also starts with "#", so "tail -n +2" would remove the 1st data line which should not be removed. And at the same time last 2 lines should be removed from countryInfo.txt (they have geonameId=0 - explained above) - this can be done with "head -n -2".


    I would also recommend to join db creating and filling with data to single psql session (called after the data is loaded) - this would reduce the number of password prompts.
    benderamp



    Joined: 31/03/2011 16:24:03
    Messages: 3
    Offline

    and also info for reference:
    - downloaded files take 1.7 Gb on disk
    - created db size after importing downloaded data is about 2Gb
    benderamp



    Joined: 31/03/2011 16:24:03
    Messages: 3
    Offline

    and also had to increase alternatenames column size:
    alternatenames varchar(12000),

    "alternatenames varchar(6000)" seemed to work on one of my postgres instances, but by some reason showed "ERROR: value too long for type character varying(6000)" on another.
     
    Forum Index -> FAQ - frequently asked questions Go to Page: 1, 2 Next 
    Go to:   
    Powered by JForum 2.1.5 © JForum Team