Author |
Message |
07/06/2008 14:41:39
|
Harpagophyt
Joined: 07/06/2008 13:58:13
Messages: 6
Offline
|
I wrote a shell script which would download all available files and import it into postgresql.
I am a newbie to postgresql and think some datatypes could be match better. Also I have not add any constraints.
Feel free to tell me how i can optimize it.
Also there are the lines for handle the postalcodes. Please do not ask me, where they are. If you interested to use the postalcodes please read here:
No postal codes in database dump
Thanks to marc and dportabella for there postings:
Import GeoNames dump into Postgres
Importing *all* geonames tables to mysql
Please read it and use it carefully. This script overwrite files and drop tables. It works straight and fits best for my needs.
Code:
#!/bin/bash
WORKPATH="/data/geodata"
cd $WORKPATH
# download all needed files and if needed unzip them
ZIPFILES="allCountries.zip alternateNames.zip userTags.zip"
TXTFILES="admin1Codes.txt admin1CodesASCII.txt admin2Codes.txt countryInfo.txt featureCodes.txt iso-languagecodes.txt timeZones.txt"
for i in $ZIPFILES $TXTFILES
do
wget <a href="http://download.geonames.org/export/dump/$i" target="_blank" rel="nofollow">http://download.geonames.org/export/dump/$i</a>
echo "Done download $i"
done
for i in $ZIPFILES
do
unzip -o -qq $i
done
# rename files because of name conflict
mv allCountries.zip allGeoCountries.zip
mv allCountries.txt allGeoCountries.txt
# download the postalcodes. You must know yourself the url
#wget <a href="http://xxx" target="_blank" rel="nofollow">http://xxx</a>
#echo "Done download postal codes (xxx)"
#unzip -o -qq xxx
# rename files because of name conflict
mv xxx xxx
mv xxx xxx
# alter files for import
tail -n +2 iso-languagecodes.txt > iso-languagecodes.txt.tmp
grep -v '^#' countryInfo.txt | tail -n +2 > countryInfo.txt.tmp
tail -n +2 timeZones.txt > timeZones.txt.tmp
psql test <<EOT
DROP TABLE geoname;
CREATE TABLE geoname (
geonameid int,
name varchar(200),
asciiname varchar(200),
alternatenames varchar(5000),
latitude float,
longitude float,
fclass char(1),
fcode varchar(10),
country varchar(2),
cc2 varchar(60),
admin1 varchar(20),
admin2 varchar(80),
admin3 varchar(20),
admin4 varchar(20),
population bigint,
elevation int,
gtopo30 int,
timezone varchar(40),
moddate date
);
copy geoname (geonameid,name,asciiname,alternatenames,latitude,longitude,fclass,fcode,country,cc2, admin1,admin2,admin3,admin4,population,elevation,gtopo30,timezone,moddate) from '${WORKPATH}/allGeoCountries.txt' null as '';
DROP TABLE alternatename;
CREATE TABLE alternatename (
alternatenameId int,
geonameid int,
isoLanguage varchar(7),
alternateName varchar(200),
isPreferredName boolean,
isShortName boolean,
isColloquial boolean,
isHistoric boolean
);
copy alternatename (alternatenameid,geonameid,isoLanguage,alternateName,isPreferredName,isShortName,isColloquial,isHistoric) from '${WORKPATH}/alternateNames.txt' null as '';
DROP TABLE countryinfo;
CREATE TABLE countryinfo (
iso_alpha2 char(2),
iso_alpha3 char(3),
iso_numeric integer,
fips_code character varying(3),
name character varying(200),
capital character varying(200),
areainsqkm double precision,
population integer,
continent char(2),
languages character varying(200),
currency char(3),
geonameId int
);
copy countryInfo (iso_alpha2,iso_alpha3,iso_numeric,fips_code,name,capital,areaInSqKm,population,continent,languages,currency,geonameId) from '${WORKPATH}/countryInfo.txt.tmp' null as '';
DROP TABLE iso_languagecodes;
CREATE TABLE iso_languagecodes(
iso_639_3 CHAR(4),
iso_639_2 VARCHAR(50),
iso_639_1 VARCHAR(50),
language_name VARCHAR(200)
);
copy iso_languagecodes (iso_639_3, iso_639_2, iso_639_1, language_name) from '${WORKPATH}/iso-languagecodes.txt.tmp' null as '';
DROP TABLE admin1Codes;
CREATE TABLE admin1Codes (
code CHAR(20),
name TEXT
);
copy admin1Codes (code, name) from '${WORKPATH}/admin1Codes.txt' null as '';
DROP TABLE admin1CodesAscii;
CREATE TABLE admin1CodesAscii (
code CHAR(20),
name TEXT,
nameAscii TEXT,
geonameid int
);
copy admin1CodesAscii (code,name,nameAscii,geonameid) from '${WORKPATH}/admin1CodesASCII.txt' null as '';
DROP TABLE featureCodes;
CREATE TABLE featureCodes (
code CHAR(7),
name VARCHAR(200),
description TEXT
);
copy featureCodes (code,name,description) from '${WORKPATH}/featureCodes_en.txt' null as '';
DROP TABLE timeZones;
CREATE TABLE timeZones (
timeZoneId VARCHAR(200),
GMT_offset numeric(3,1),
DST_offset numeric(3,1)
);
copy timeZones (timeZoneId,GMT_offset,DST_offset) from '${WORKPATH}/timeZones.txt.tmp' null as '';
DROP TABLE continentCodes;
CREATE TABLE continentCodes (
code CHAR(2),
name VARCHAR(20),
geonameid INT
);
INSERT INTO continentCodes VALUES ('AF', 'Africa', 6255146);
INSERT INTO continentCodes VALUES ('AS', 'Asia', 6255147);
INSERT INTO continentCodes VALUES ('EU', 'Europe', 6255148);
INSERT INTO continentCodes VALUES ('NA', 'North America', 6255149);
INSERT INTO continentCodes VALUES ('OC', 'Oceania', 6255150);
INSERT INTO continentCodes VALUES ('SA', 'South America', 6255151);
INSERT INTO continentCodes VALUES ('AN', 'Antarctica', 6255152);
DROP TABLE postalcodes;
CREATE TABLE postalcodes (
countrycode char(2),
postalcode varchar(10),
placename varchar(180),
admin1name varchar(100),
admin1code varchar(20),
admin2name varchar(100),
admin2code varchar(20),
admin3name varchar(100),
latitude float,
longitude float,
accuracy smallint
);
#copy postalcodes (countrycode,postalcode,placename,admin1name,admin1code,admin2name,admin2code,admin3name,latitude,longitude,accuracy) from '${WORKPATH}/xxx' null as '';
EOT
Greetings
Andreas
Changes:
2012-05-09 add isCollloquial and isHistoric flag (Marc)
2008-06-07 replace COPY continentCodes through INSERT statements.
|
|
|
19/09/2008 12:16:41
|
lasizoillo
Joined: 19/09/2008 12:00:26
Messages: 1
Offline
|
Great job.
I have made some changes in my personal script:
Code:
:%s/DROP TABLE/DROP TABLE IF EXIST/
And taken from http://forum.geonames.org/gforum/posts/list/67.page
Code:
ALTER TABLE ONLY alternatename
ADD CONSTRAINT pk_alternatenameid PRIMARY KEY (alternatenameid);
ALTER TABLE ONLY geoname
ADD CONSTRAINT pk_geonameid PRIMARY KEY (geonameid);
ALTER TABLE ONLY countryinfo
ADD CONSTRAINT pk_iso_alpha2 PRIMARY KEY (iso_alpha2);
ALTER TABLE ONLY countryinfo
ADD CONSTRAINT fk_geonameid FOREIGN KEY (geonameid) REFERENCES geoname(geonameid);
ALTER TABLE ONLY alternatename
ADD CONSTRAINT fk_geonameid FOREIGN KEY (geonameid) REFERENCES geoname(geonameid);
CREATE INDEX index_countryinfo_geonameid ON countryinfo USING hash (geonameid);
CREATE INDEX index_alternatename_geonameid ON alternatename USING hash (geonameid);
|
|
|
26/11/2008 10:31:01
|
bwakkie
Joined: 25/11/2008 12:49:52
Messages: 7
Offline
|
Hi,
I loved the script so I started working on it. This is the working end result.
done:
It will create a working path if it is not there
It will create the db
It wil check if you are already using the latest files from the net and therefore not downloading it again
Added the command CASCADE to DROP TABLE geoname due to the constrains
I do not thing you need to change DROP TABLE to DROP TABLE IF EXIST, it is sort of pointless
todo:
perhaps only update/insert/delete changes in the future by removing duplicate lines from the txt files?
Code:
#!/bin/bash
#===============================================================================
#
# FILE: getgeo.sh
#
# USAGE: ./getgeo.sh
#
# DESCRIPTION: run the script so that the geodata will be downloaded and inserted into your
# database
#
# OPTIONS: ---
# REQUIREMENTS: ---
# BUGS: ---
# NOTES: ---
# AUTHOR: Andreas (aka Harpagophyt )
# COMPANY: <a href="http://forum.geonames.org/gforum/posts/list/926.page" target="_blank" rel="nofollow">http://forum.geonames.org/gforum/posts/list/926.page</a>
# VERSION: 1.3
# CREATED: 07/06/2008
# REVISION: 1.1 2008-06-07 replace COPY continentCodes through INSERT statements.
# 1.2 2008-11-25 Adjusted by Bastiaan Wakkie in order to not unnessisarily
# 1.3 2009-02-18 Fixed countryInfo bug in Aland Islands as there was one tab too many at te end and sone field lengths needed to be larger
# download.
#===============================================================================
#!/bin/bash
WORKPATH="~/geodata"
TMPPATH="tmp"
PCPATH="pc"
PREFIX="_"
DBUSER="$USER"
FILES="allCountries.zip alternateNames.zip userTags.zip admin1Codes.txt admin1CodesASCII.txt admin2Codes.txt countryInfo.txt featureCodes.txt iso-languagecodes.txt timeZones.txt"
createdb -O $DBUSER geonames
psql -U $DBUSER geonames <<EOT
DROP TABLE geoname CASCADE;
CREATE TABLE geoname (
geonameid int,
name varchar(200),
asciiname varchar(200),
alternatenames varchar(4000),
latitude float,
longitude float,
fclass char(1),
fcode varchar(10),
country varchar(2),
cc2 varchar(60),
admin1 varchar(20),
admin2 varchar(80),
admin3 varchar(20),
admin4 varchar(20),
population bigint,
elevation int,
gtopo30 int,
timezone varchar(40),
moddate date
);
DROP TABLE alternatename;
CREATE TABLE alternatename (
alternatenameId int,
geonameid int,
isoLanguage varchar(7),
alternateName varchar(200),
isPreferredName boolean,
isShortName boolean
isColloquial boolean,
isHistoric boolean
);
DROP TABLE countryinfo;
CREATE TABLE countryinfo (
iso_alpha2 char(2),
iso_alpha3 char(3),
iso_numeric integer,
fips_code character varying(3),
name character varying(200),
capital character varying(200),
areainsqkm double precision,
population integer,
continent char(2),
tld character varying(200),
currency char(3),
curranceName char(15),
phone char(20),
postal char(60),
postalRegex char(250),
languages char(130),
geonameid integer,
neighbours char(50)
);
DROP TABLE iso_languagecodes;
CREATE TABLE iso_languagecodes(
iso_639_3 CHAR(4),
iso_639_2 VARCHAR(50),
iso_639_1 VARCHAR(50),
language_name VARCHAR(200)
);
DROP TABLE admin1Codes;
CREATE TABLE admin1Codes (
code CHAR(6),
name TEXT
);
DROP TABLE admin1CodesAscii;
CREATE TABLE admin1CodesAscii (
code CHAR(6),
name TEXT,
nameAscii TEXT,
geonameid int
);
DROP TABLE featureCodes;
CREATE TABLE featureCodes (
code CHAR(7),
name VARCHAR(200),
description TEXT
);
DROP TABLE timeZones;
CREATE TABLE timeZones (
timeZoneId VARCHAR(200),
GMT_offset numeric(3,1),
DST_offset numeric(3,1)
);
DROP TABLE continentCodes;
CREATE TABLE continentCodes (
code CHAR(2),
name VARCHAR(20),
geonameid INT
);
DROP TABLE postalcodes;
CREATE TABLE postalcodes (
countrycode char(2),
postalcode varchar(10),
placename varchar(180),
admin1name varchar(100),
admin1code varchar(20),
admin2name varchar(100),
admin2code varchar(20),
admin3name varchar(100),
latitude float,
longitude float,
accuracy smallint
);
ALTER TABLE ONLY alternatename
ADD CONSTRAINT pk_alternatenameid PRIMARY KEY (alternatenameid);
ALTER TABLE ONLY geoname
ADD CONSTRAINT pk_geonameid PRIMARY KEY (geonameid);
ALTER TABLE ONLY countryinfo
ADD CONSTRAINT pk_iso_alpha2 PRIMARY KEY (iso_alpha2);
ALTER TABLE ONLY countryinfo
ADD CONSTRAINT fk_geonameid FOREIGN KEY (geonameid) REFERENCES geoname(geonameid);
ALTER TABLE ONLY alternatename
ADD CONSTRAINT fk_geonameid FOREIGN KEY (geonameid) REFERENCES geoname(geonameid);
EOT
# check if needed directories do already exsist
if [ -d "$WORKPATH" ]; then
sleep 0
else
mkdir $WORKPATH
mkdir $WORKPATH/$TMPPATH
mkdir $WORKPATH/$PCPATH
echo "created $WORKPATH"
fi
echo
echo ",---- STARTING (downloading, unpacking and preparing)"
cd $WORKPATH/$TMPPATH
for i in $FILES
do
wget -N -q "http://download.geonames.org/export/dump/$i" # get newer files
if [ $i -nt $PREFIX$i ] || [ ! -e $PREFIX$i ] ; then
cp -p $i $PREFIX$i
unzip -u -q $i
case "$i" in
iso-languagecodes.txt)
tail -n +2 iso-languagecodes.txt > iso-languagecodes.txt.tmp;
;;
countryInfo.txt)
grep -v '^#' countryInfo.txt | tail -n +2 | sed 's/ FI/ FI/g' > countryInfo.txt.tmp;
;;
timeZones.txt)
tail -n +2 timeZones.txt > timeZones.txt.tmp;
;;
esac
echo "| $1 has been downloaded";
else
echo "| $i is already the latest version"
fi
done
# download the postalcodes. You must know yourself the url
cd $WORKPATH/$PCPATH
wget -q -N "http://.../....zip"
if [ $WORKPATH/$PCPATH/....zip -nt $WORKPATH/$PCPATH/allCountries$PREFIX.zip ] || [ ! -e $WORKPATH/$PCPATH/....zip ]; then
unzip -u -q $WORKPATH/$PCPATH/....zip
cp -p $WORKPATH/$PCPATH/....zip $WORKPATH/$PCPATH/allCountries$PREFIX.zip
echo "| ....zip has been downloaded"
else
echo "| ....zip is already the latest version"
fi
echo "+---- FILL DATABASE ( this takes 2 days on my machine :)"
psql -e -U $DBUSER geonames <<EOT
copy geoname (geonameid,name,asciiname,alternatenames,latitude,longitude,fclass,fcode,country,cc2,admin1,admin2,admin3,admin4,population,elevation,gtopo30,timezone,moddate) from '${WORKPATH}/${TMPPATH}/allCountries.txt' null as '';
copy postalcodes (countrycode,postalcode,placename,admin1name,admin1code,admin2name,admin2code,admin3name,latitude,longitude,accuracy) from '${WORKPATH}/${PCPATH}/allCountries.txt' null as '';
copy timeZones (timeZoneId,GMT_offset,DST_offset) from '${WORKPATH}/${TMPPATH}/timeZones.txt.tmp' null as '';
copy featureCodes (code,name,description) from '${WORKPATH}/${TMPPATH}/featureCodes.txt' null as '';
copy admin1CodesAscii (code,name,nameAscii,geonameid) from '${WORKPATH}/${TMPPATH}/admin1CodesASCII.txt' null as '';
copy admin1Codes (code,name) from '${WORKPATH}/${TMPPATH}/admin1Codes.txt' null as '';
copy iso_languagecodes (iso_639_3,iso_639_2,iso_639_1,language_name) from '${WORKPATH}/${TMPPATH}/iso-languagecodes.txt.tmp' null as '';
copy countryInfo (iso_alpha2,iso_alpha3,iso_numeric,fips_code,country,capital,areainsqkm,population,continent,tld,currency_code,currency_name,phone,postal,postalRegex,languages,geonameId,neighbours) from '${WORKPATH}/${TMPPATH}/countryInfo.txt.tmp' null as '';
copy alternatename (alternatenameid,geonameid,isoLanguage,alternateName,isPreferredName,isShortName) from '${WORKPATH}/${TMPPATH}/alternateNames.txt' null as '';
INSERT INTO continentCodes VALUES ('AF', 'Africa', 6255146);
INSERT INTO continentCodes VALUES ('AS', 'Asia', 6255147);
INSERT INTO continentCodes VALUES ('EU', 'Europe', 6255148);
INSERT INTO continentCodes VALUES ('NA', 'North America', 6255149);
INSERT INTO continentCodes VALUES ('OC', 'Oceania', 6255150);
INSERT INTO continentCodes VALUES ('SA', 'South America', 6255151);
INSERT INTO continentCodes VALUES ('AN', 'Antarctica', 6255152);
CREATE INDEX index_countryinfo_geonameid ON countryinfo USING hash (geonameid);
CREATE INDEX index_alternatename_geonameid ON alternatename USING hash (geonameid);
EOT
echo "'----- DONE ( have fun... )"
|
|
|
26/11/2008 12:50:58
|
bwakkie
Joined: 25/11/2008 12:49:52
Messages: 7
Offline
|
The countryInfo.txt has wrong table field names:
example:
Code:
#ISO ISO3 ISO-Numeric fips Country Capital Area(in sq km) Population Continent tld CurrencyCode CurrencyName Phone Postal Code Format Postal Code Regex Languages geonameid neighbours EquivalentFipsCode
AD AND 20 AN Andorra Andorra la Vella 468 72000 EU .ad EUR Euro 376 AD### ^(?:AD)*(\d{3})$ ca,fr-AD,pt 3041565 ES,FR
neighbours EquivalentFipsCode are switched!
also the geonameid does not exist anymore
so new postgresql sql will be something like this:
Code:
I adjusted the code above as the length values of phone and currance name where changed too. There was a weird typo with one tab too much .. this is handled now too.
|
|
|
26/11/2008 21:09:32
|
marc
Joined: 08/12/2005 07:39:47
Messages: 4416
Offline
|
geonameId is still there, isn't it?
Marc
|
|
|
|
18/02/2009 22:20:42
|
bwakkie
Joined: 25/11/2008 12:49:52
Messages: 7
Offline
|
bwakkie wrote:
Hi,
I loved the script so I started working on it. This is the working end result.
done:
It will create a working path if it is not there
It will create the db
It wil check if you are already using the latest files from the net and therefore not downloading it again
Added the command CASCADE to DROP TABLE geoname due to the constrains
I do not thing you need to change DROP TABLE to DROP TABLE IF EXIST, it is sort of pointless
todo:
perhaps only update/insert/delete changes in the future by removing duplicate lines from the txt files?
Code:
#!/bin/bash
#===============================================================================
#
# FILE: getgeo.sh
#
# USAGE: ./getgeo.sh
#
# DESCRIPTION: run the script so that the geodata will be downloaded and inserted into your
# database
#
# OPTIONS: ---
# REQUIREMENTS: ---
# BUGS: ---
# NOTES: ---
# AUTHOR: Andreas (aka Harpagophyt )
# COMPANY: <a href="http://forum.geonames.org/gforum/posts/list/926.page" target="_blank" rel="nofollow">http://forum.geonames.org/gforum/posts/list/926.page</a>
# VERSION: 1.3
# CREATED: 07/06/2008
# REVISION: 1.1 2008-06-07 replace COPY continentCodes through INSERT statements.
# 1.2 2008-11-25 Adjusted by Bastiaan Wakkie in order to not unnessisarily
# download.
#===============================================================================
#!/bin/bash
WORKPATH="~/geodata"
TMPPATH="tmp"
PCPATH="pc"
PREFIX="_"
DBUSER="$USER"
FILES="allCountries.zip alternateNames.zip userTags.zip admin1Codes.txt admin1CodesASCII.txt admin2Codes.txt countryInfo.txt featureCodes.txt iso-languagecodes.txt timeZones.txt"
createdb -O $DBUSER geonames
psql -U $DBUSER geonames <<EOT
DROP TABLE geoname CASCADE;
CREATE TABLE geoname (
geonameid int,
name varchar(200),
asciiname varchar(200),
alternatenames varchar(4000),
latitude float,
longitude float,
fclass char(1),
fcode varchar(10),
country varchar(2),
cc2 varchar(60),
admin1 varchar(20),
admin2 varchar(80),
admin3 varchar(20),
admin4 varchar(20),
population bigint,
elevation int,
gtopo30 int,
timezone varchar(40),
moddate date
);
DROP TABLE alternatename;
CREATE TABLE alternatename (
alternatenameId int,
geonameid int,
isoLanguage varchar(7),
alternateName varchar(200),
isPreferredName boolean,
isShortName boolean,
isColloquial boolean,
isHistoric boolean
);
DROP TABLE countryinfo;
CREATE TABLE "countryinfo" (
iso_alpha2 char(2),
iso_alpha3 char(3),
iso_numeric integer,
fips_code character varying(3),
country character varying(200),
capital character varying(200),
areainsqkm double precision,
population integer,
continent char(2),
tld CHAR(10),
currency_code char(3),
currency_name CHAR(15),
phone character varying(20),
languages character varying(200),
geonameId int,
neighbours character varying(50),
equivalent_fips_code character varying(3)
);
DROP TABLE iso_languagecodes;
CREATE TABLE iso_languagecodes(
iso_639_3 CHAR(4),
iso_639_2 VARCHAR(50),
iso_639_1 VARCHAR(50),
language_name VARCHAR(200)
);
DROP TABLE admin1Codes;
CREATE TABLE admin1Codes (
code CHAR(6),
name TEXT
);
DROP TABLE admin1CodesAscii;
CREATE TABLE admin1CodesAscii (
code CHAR(6),
name TEXT,
nameAscii TEXT,
geonameid int
);
DROP TABLE featureCodes;
CREATE TABLE featureCodes (
code CHAR(7),
name VARCHAR(200),
description TEXT
);
DROP TABLE timeZones;
CREATE TABLE timeZones (
timeZoneId VARCHAR(200),
GMT_offset numeric(3,1),
DST_offset numeric(3,1)
);
DROP TABLE continentCodes;
CREATE TABLE continentCodes (
code CHAR(2),
name VARCHAR(20),
geonameid INT
);
DROP TABLE postalcodes;
CREATE TABLE postalcodes (
countrycode char(2),
postalcode varchar(10),
placename varchar(180),
admin1name varchar(100),
admin1code varchar(20),
admin2name varchar(100),
admin2code varchar(20),
admin3name varchar(100),
latitude float,
longitude float,
accuracy smallint
);
ALTER TABLE ONLY alternatename
ADD CONSTRAINT pk_alternatenameid PRIMARY KEY (alternatenameid);
ALTER TABLE ONLY geoname
ADD CONSTRAINT pk_geonameid PRIMARY KEY (geonameid);
ALTER TABLE ONLY countryinfo
ADD CONSTRAINT pk_iso_alpha2 PRIMARY KEY (iso_alpha2);
ALTER TABLE ONLY countryinfo
ADD CONSTRAINT fk_geonameid FOREIGN KEY (geonameid) REFERENCES geoname(geonameid);
ALTER TABLE ONLY alternatename
ADD CONSTRAINT fk_geonameid FOREIGN KEY (geonameid) REFERENCES geoname(geonameid);
EOT
# check if needed directories do already exsist
if [ -d "$WORKPATH" ]; then
sleep 0
else
mkdir $WORKPATH
mkdir $WORKPATH/$TMPPATH
mkdir $WORKPATH/$PCPATH
echo "created $WORKPATH"
fi
echo
echo ",---- STARTING (downloading, unpacking and preparing)"
cd $WORKPATH/$TMPPATH
for i in $FILES
do
wget -N -q "http://download.geonames.org/export/dump/$i" # get newer files
if [ $i -nt $PREFIX$i ] || [ ! -e $PREFIX$i ] ; then
cp -p $i $PREFIX$i
unzip -u -q $i
case "$i" in
iso-languagecodes.txt)
tail -n +2 iso-languagecodes.txt > iso-languagecodes.txt.tmp;
;;
countryInfo.txt)
grep -v '^#' countryInfo.txt | tail -n +2 > countryInfo.txt.tmp;
;;
timeZones.txt)
tail -n +2 timeZones.txt > timeZones.txt.tmp;
;;
esac
echo "| $1 has been downloaded";
else
echo "| $i is already the latest version"
fi
done
# download the postalcodes. You must know yourself the url
cd $WORKPATH/$PCPATH
wget -q -N "http://.../....zip"
if [ $WORKPATH/$PCPATH/....zip -nt $WORKPATH/$PCPATH/allCountries$PREFIX.zip ] || [ ! -e $WORKPATH/$PCPATH/....zip ]; then
unzip -u -q $WORKPATH/$PCPATH/....zip
cp -p $WORKPATH/$PCPATH/....zip $WORKPATH/$PCPATH/allCountries$PREFIX.zip
echo "| ....zip has been downloaded"
else
echo "| ....zip is already the latest version"
fi
echo "+---- FILL DATABASE ( this takes 2 days on my machine :)"
psql -e -U $DBUSER geonames <<EOT
copy geoname (geonameid,name,asciiname,alternatenames,latitude,longitude,fclass,fcode,country,cc2,admin1,admin2,admin3,admin4,population,elevation,gtopo30,timezone,moddate) from '${WORKPATH}/${TMPPATH}/allCountries.txt' null as '';
copy postalcodes (countrycode,postalcode,placename,admin1name,admin1code,admin2name,admin2code,admin3name,latitude,longitude,accuracy) from '${WORKPATH}/${PCPATH}/allCountries.txt' null as '';
copy timeZones (timeZoneId,GMT_offset,DST_offset) from '${WORKPATH}/${TMPPATH}/timeZones.txt.tmp' null as '';
copy featureCodes (code,name,description) from '${WORKPATH}/${TMPPATH}/featureCodes.txt' null as '';
copy admin1CodesAscii (code,name,nameAscii,geonameid) from '${WORKPATH}/${TMPPATH}/admin1CodesASCII.txt' null as '';
copy admin1Codes (code,name) from '${WORKPATH}/${TMPPATH}/admin1Codes.txt' null as '';
copy iso_languagecodes (iso_639_3,iso_639_2,iso_639_1,language_name) from '${WORKPATH}/${TMPPATH}/iso-languagecodes.txt.tmp' null as '';
copy countryInfo (iso_alpha2,iso_alpha3,iso_numeric,fips_code,country,capital,areainsqkm,population,continent,tld,currency_code,currency_name,phone,postal,postalRegex,languages,geonameid,neighbours,equivalent_fips_code) from '${WORKPATH}/${TMPPATH}/countryInfo.txt.tmp' null as '';
copy alternatename (alternatenameid,geonameid,isoLanguage,alternateName,isPreferredName,isShortName,isColloquial,isHistoric) from '${WORKPATH}/${TMPPATH}/alternateNames.txt' null as '';
INSERT INTO continentCodes VALUES ('AF', 'Africa', 6255146);
INSERT INTO continentCodes VALUES ('AS', 'Asia', 6255147);
INSERT INTO continentCodes VALUES ('EU', 'Europe', 6255148);
INSERT INTO continentCodes VALUES ('NA', 'North America', 6255149);
INSERT INTO continentCodes VALUES ('OC', 'Oceania', 6255150);
INSERT INTO continentCodes VALUES ('SA', 'South America', 6255151);
INSERT INTO continentCodes VALUES ('AN', 'Antarctica', 6255152);
CREATE INDEX index_countryinfo_geonameid ON countryinfo USING hash (geonameid);
CREATE INDEX index_alternatename_geonameid ON alternatename USING hash (geonameid);
EOT
echo "'----- DONE ( have fun... )"
|
|
|
30/09/2010 11:57:57
|
hellboy
Joined: 18/05/2010 12:13:56
Messages: 9
Offline
|
Code:
copy postalcodes (countrycode,postalcode,placename,admin1name,admin1cod
e,admin2name,admin2code,admin3name,latitude,longitude,accuracy) from '/home/..../allCountries.txt' WITH DELIMITER e'\t' null as '';
ERROR: extra data after last expected column
CONTEXT: COPY postalcodes, line 1: "AD AD100 Canillo 4
2.5833 1.6667 6"
What's wrong?
|
|
|
30/12/2010 07:24:40
|
standerson
Joined: 30/12/2010 07:18:11
Messages: 2
Offline
|
Note that int is not a good choice anymore for population as Asia (6255147) has a population of 3,812,366,000 (as of the date of this post). The max size of an integer in PostgreSQL is 2,147,483,647. This causes an error during an allCountries.txt import.
I'm sure this has been mentioned elsewhere, but thought it was worth noting here, where many PGSql folks will land when searching for this solution.
BTW, thanks, Marc, for all your hard work.
|
|
|
01/01/2011 11:10:19
|
marc
Joined: 08/12/2005 07:39:47
Messages: 4416
Offline
|
Thanks, I have updated the postings in this thread to use bigint for the geonames population column.
|
|
|
|
14/01/2011 07:43:40
|
sjs
Joined: 19/01/2009 21:58:06
Messages: 9
Offline
|
Hi everybody!
I've just tried the script posted at "26/11/2008 10:31:01". I'm afraid, it still has a lot of "issues":
- The subdirectories "tmp" and "pc" won't be created if the working base directories already exists.
- First copy line (geonames) yields an error
Code:
ERROR: value too long for type character varying(4000)
CONTEXT: COPY geoname, line 3185, column alternatenames:...
I got rid of this by changing the corresponding field size from 4000 to 6000.
- There's no file "featureCodes.txt" on the Geonames.org-Server, so the corresponding download, unzipping and copying into the db yields errors. (This might have to do with this, though.)
- The copy line for admin1CodesASCII yields an error
Code:
ERROR: value too long for type character(6)
CONTEXT: COPY admin1codesascii, line 502, column code: "CG.7280295"
I solved this by increasing the size of the field to 10.
- It's the same with "admin1Code".
- The copy line for countryinfo yields an error
Code:
ERROR: column "country" of relation "countryinfo" does not exist
I solved this by removing the "country" item from the copy command.
- The copy line for countryinfo also yields errors for wrong column names. Solution: rename "currency" to "currency_code" and "currencyname" to "currency_name".
- The copy line for countryinfo also yields an error for wrong column counts:
Code:
ERROR: extra data after last expected column
CONTEXT: COPY countryinfo, line 1:...
I've no idea on how to fix this so far.
- The copy line for alternateNames yields an error:
Code:
ERROR: value too long for type character varying(200)
CONTEXT: COPY alternatename, line 1986888, column alternatename:
Solution: set the length of this field to 300
- It might be a good idea to also include the generation of spatial indexes (like described in http://forum.geonames.org/gforum/posts/list/727.page) in the script.
Kind regards,
Sebastian
|
|
|
15/02/2011 17:42:54
|
rshelby
Joined: 17/02/2010 20:06:53
Messages: 7
Offline
|
In the postalcodes area, you forgot "admin1code varchar(20)" ...
Once you add that, it works like a charm.
|
|
|
21/02/2011 17:25:50
|
Evgenich
Joined: 21/02/2011 17:13:13
Messages: 3
Offline
|
There are 2 errors in the "countryInfo.txt": In two last records field "geonameid" equal "0":
CS SCG 891 YI Serbia and Montenegro Belgrade 102350 10829175 EU .cs RSD Dinar 381 ##### ^(\d{5})$ cu,hu,sq,sr 0 AL,HU,MK,RO,HR,BA,BG
AN ANT 530 NT Netherlands Antilles Willemstad 960 136197 NA .an ANG Guilder 599 nl-AN,en,es 0 GP
|
|
|
22/02/2011 06:40:54
|
marc
Joined: 08/12/2005 07:39:47
Messages: 4416
Offline
|
These two countries are deliberately set to '0' because they no longer exist. It is mentioned in the file comment header. The lines are kept in the file for reference.
Marc
|
|
|
|
22/02/2011 11:53:25
|
Evgenich
Joined: 21/02/2011 17:13:13
Messages: 3
Offline
|
marc wrote:
These two countries are deliberately set to '0' because they no longer exist. It is mentioned in the file comment header. The lines are kept in the file for reference.
Marc
It's correct, but there is no "geoname" DB record with "geonameid" = 0. There is CONSTRAINT " ALTER TABLE ONLY countryinfo
ADD CONSTRAINT fk_geonameid FOREIGN KEY (geonameid) REFERENCES geoname(geonameid);" In the time of "countryinfo" DB table filling we receive error.
There is 2 solutions:
1. Add artifitial recors with "geonameid" = 0 into "geoname" DB table.
2. Remove 2 obsolete records from "countryinfo" dump image.
|
|
|
22/02/2011 19:03:55
|
Evgenich
Joined: 21/02/2011 17:13:13
Messages: 3
Offline
|
There corrected script. Please modify script variables before script execution.
1. WORKPATH="${HOME}/tmp/GIS/gisnames/geodata" - work directory
2. TMPPATH="tmp" - subdirectory for data files download
3. PCPATH="pc" - subdirectory for Post Codes data file download
4. PREFIX="_" - not change
5. DBHOST="127.0.0.1" - DB server host
6. DBPORT="5433" - DB server port
7. DBUSER="postgres" - DB user name (in our case postgres)
Execute script. Script will asks for postgres user 3 times:
1. DB creating
2. geonames DB tables creating
3. geonames DB table filling
After DB table creating the script will get data files from Geoname server. There is one important step: we need check content of countryInfo.txt.tmp file in $WORKPATH/$TMPPATH directory. Now the file consists of 2 last records with obsolete countries: Serbia and Montenegro and Netherlands Antilles with field geonameid=0. We need remove them.
Script will works a hour or a little more time.
#!/bin/bash
#===============================================================================
#
# FILE: getgeo.sh
#
# USAGE: ./getgeo.sh
#
# DESCRIPTION: run the script so that the geodata will be downloaded and inserted into your
# database
#
# OPTIONS: ---
# REQUIREMENTS: ---
# BUGS: ---
# NOTES: ---
# AUTHOR: Andreas (aka Harpagophyt )
# COMPANY: <a href="http://forum.geonames.org/gforum/posts/list/926.page" target="_blank" rel="nofollow">http://forum.geonames.org/gforum/posts/list/926.page</a>
# VERSION: 1.3
# CREATED: 07/06/2008
# REVISION: 1.1 2008-06-07 replace COPY continentCodes through INSERT statements.
# 1.2 2008-11-25 Adjusted by Bastiaan Wakkie in order to not unnessisarily
# download.
#===============================================================================
#!/bin/bash
WORKPATH="${HOME}/tmp/GIS/gisnames/geodata"
TMPPATH="tmp"
PCPATH="pc"
PREFIX="_"
DBHOST="127.0.0.1"
DBPORT="5432"
DBUSER="postgres"
FILES="allCountries.zip alternateNames.zip userTags.zip admin1Codes.txt admin1CodesASCII.txt admin2Codes.txt countryInfo.txt featureCodes_en.txt iso-languagecodes.txt timeZones.txt"
psql -U $DBUSER -h $DBHOST -p $DBPORT -c "CREATE DATABASE geonames WITH TEMPLATE = template0 ENCODING = 'UTF8';"
psql -U $DBUSER -h $DBHOST -p $DBPORT geonames <<EOT
DROP TABLE geoname CASCADE;
CREATE TABLE geoname (
geonameid int,
name varchar(200),
asciiname varchar(200),
alternatenames varchar(6000),
latitude float,
longitude float,
fclass char(1),
fcode varchar(10),
country varchar(2),
cc2 varchar(60),
admin1 varchar(20),
admin2 varchar(80),
admin3 varchar(20),
admin4 varchar(20),
population bigint,
elevation int,
gtopo30 int,
timezone varchar(40),
moddate date
);
DROP TABLE alternatename;
CREATE TABLE alternatename (
alternatenameId int,
geonameid int,
isoLanguage varchar(7),
alternateName varchar(300),
isPreferredName boolean,
isShortName boolean,
isColloquial boolean,
isHistoric boolean
);
DROP TABLE countryinfo;
CREATE TABLE "countryinfo" (
iso_alpha2 char(2),
iso_alpha3 char(3),
iso_numeric integer,
fips_code character varying(3),
country character varying(200),
capital character varying(200),
areainsqkm double precision,
population integer,
continent char(2),
tld CHAR(10),
currency_code char(3),
currency_name CHAR(15),
phone character varying(20),
postal character varying(60),
postalRegex character varying(200),
languages character varying(200),
geonameId int,
neighbours character varying(50),
equivalent_fips_code character varying(3)
);
DROP TABLE iso_languagecodes;
CREATE TABLE iso_languagecodes(
iso_639_3 CHAR(4),
iso_639_2 VARCHAR(50),
iso_639_1 VARCHAR(50),
language_name VARCHAR(200)
);
DROP TABLE admin1Codes;
CREATE TABLE admin1Codes (
code CHAR(20),
name TEXT
);
DROP TABLE admin1CodesAscii;
CREATE TABLE admin1CodesAscii (
code CHAR(20),
name TEXT,
nameAscii TEXT,
geonameid int
);
DROP TABLE admin2CodesAscii;
CREATE TABLE admin2CodesAscii (
code CHAR(80),
name TEXT,
nameAscii TEXT,
geonameid int
);
DROP TABLE featureCodes;
CREATE TABLE featureCodes (
code CHAR(7),
name VARCHAR(200),
description TEXT
);
DROP TABLE timeZones;
CREATE TABLE timeZones (
timeZoneId VARCHAR(200),
GMT_offset numeric(3,1),
DST_offset numeric(3,1)
);
DROP TABLE continentCodes;
CREATE TABLE continentCodes (
code CHAR(2),
name VARCHAR(20),
geonameid INT
);
DROP TABLE postalcodes;
CREATE TABLE postalcodes (
countrycode char(2),
postalcode varchar(10),
placename varchar(180),
admin1name varchar(100),
admin1code varchar(20),
admin2name varchar(100),
admin2code varchar(20),
admin3name varchar(100),
admin3code varchar(20),
latitude float,
longitude float,
accuracy smallint
);
ALTER TABLE ONLY alternatename
ADD CONSTRAINT pk_alternatenameid PRIMARY KEY (alternatenameid);
ALTER TABLE ONLY geoname
ADD CONSTRAINT pk_geonameid PRIMARY KEY (geonameid);
ALTER TABLE ONLY countryinfo
ADD CONSTRAINT pk_iso_alpha2 PRIMARY KEY (iso_alpha2);
ALTER TABLE ONLY countryinfo
ADD CONSTRAINT fk_geonameid FOREIGN KEY (geonameid) REFERENCES geoname(geonameid);
ALTER TABLE ONLY alternatename
ADD CONSTRAINT fk_geonameid FOREIGN KEY (geonameid) REFERENCES geoname(geonameid);
EOT
# check if needed directories do already exsist
if [ -d "$WORKPATH" ]; then
echo "$WORKPATH exists..."
sleep 0
else
echo "$WORKPATH and subdirectories will be created..."
mkdir -p $WORKPATH
mkdir -p $WORKPATH/$TMPPATH
mkdir -p $WORKPATH/$PCPATH
echo "created $WORKPATH"
fi
echo
echo ",---- STARTING (downloading, unpacking and preparing)"
cd $WORKPATH/$TMPPATH
for i in $FILES
do
wget -N -q "http://download.geonames.org/export/dump/$i" # get newer files
if [ $i -nt $PREFIX$i ] || [ ! -e $PREFIX$i ] ; then
cp -p $i $PREFIX$i
unzip -u -q $i
case "$i" in
iso-languagecodes.txt)
tail -n +2 iso-languagecodes.txt > iso-languagecodes.txt.tmp;
;;
countryInfo.txt)
grep -v '^#' countryInfo.txt | tail -n +2 > countryInfo.txt.tmp;
;;
timeZones.txt)
tail -n +2 timeZones.txt > timeZones.txt.tmp;
;;
esac
echo "| $1 has been downloaded";
else
echo "| $i is already the latest version"
fi
done
# download the postalcodes. You must know yourself the url
cd $WORKPATH/$PCPATH
wget -q -N "http://download.geonames.org/export/zip/allCountries.zip"
if [ $WORKPATH/$PCPATH/allCountries.zip -nt $WORKPATH/$PCPATH/allCountries$PREFIX.zip ] || [ ! -e $WORKPATH/$PCPATH/allCountries.zip ]; then
echo "Attempt to unzip $WORKPATH/$PCPATH/allCountries.zip file..."
unzip -u -q $WORKPATH/$PCPATH/allCountries.zip
cp -p $WORKPATH/$PCPATH/allCountries.zip $WORKPATH/$PCPATH/allCountries$PREFIX.zip
echo "| ....zip has been downloaded"
else
echo "| ....zip is already the latest version"
fi
echo "+---- FILL DATABASE ( this takes 2 days on my machine "
psql -e -U $DBUSER -h $DBHOST -p $DBPORT geonames <<EOT
copy geoname (geonameid,name,asciiname,alternatenames,latitude,longitude,fclass,fcode,country,cc2,admin1,admin2,admin3,admin4,population,elevation,gtopo30,timezone,moddate) from '${WORKPATH}/${TMPPATH}/allCountries.txt' null as '';
copy postalcodes (countrycode,postalcode,placename,admin1name,admin1code,admin2name,admin2code,admin3name,admin3code,latitude,longitude,accuracy) from '${WORKPATH}/${PCPATH}/allCountries.txt' null as '';
copy timeZones (timeZoneId,GMT_offset,DST_offset) from '${WORKPATH}/${TMPPATH}/timeZones.txt.tmp' null as '';
copy featureCodes (code,name,description) from '${WORKPATH}/${TMPPATH}/featureCodes_en.txt' null as '';
copy admin1CodesAscii (code,name,nameAscii,geonameid) from '${WORKPATH}/${TMPPATH}/admin1CodesASCII.txt' null as '';
copy admin1Codes (code,name) from '${WORKPATH}/${TMPPATH}/admin1Codes.txt' null as '';
copy admin2CodesAscii (code,name,nameAscii,geonameid) from '${WORKPATH}/${TMPPATH}/admin2Codes.txt' null as '';
copy iso_languagecodes (iso_639_3,iso_639_2,iso_639_1,language_name) from '${WORKPATH}/${TMPPATH}/iso-languagecodes.txt.tmp' null as '';
copy countryInfo (iso_alpha2,iso_alpha3,iso_numeric,fips_code,country,capital,areainsqkm,population,continent,tld,currency_code,currency_name,phone,postal,postalRegex,languages,geonameid,neighbours,equivalent_fips_code) from '${WORKPATH}/${TMPPATH}/countryInfo.txt.tmp' null as '';
copy alternatename (alternatenameid,geonameid,isoLanguage,alternateName,isPreferredName,isShortName,isColloquial,isHistoric) from '${WORKPATH}/${TMPPATH}/alternateNames.txt' null as '';
INSERT INTO continentCodes VALUES ('AF', 'Africa', 6255146);
INSERT INTO continentCodes VALUES ('AS', 'Asia', 6255147);
INSERT INTO continentCodes VALUES ('EU', 'Europe', 625514;
INSERT INTO continentCodes VALUES ('NA', 'North America', 6255149);
INSERT INTO continentCodes VALUES ('OC', 'Oceania', 6255150);
INSERT INTO continentCodes VALUES ('SA', 'South America', 6255151);
INSERT INTO continentCodes VALUES ('AN', 'Antarctica', 6255152);
CREATE INDEX index_countryinfo_geonameid ON countryinfo USING hash (geonameid);
CREATE INDEX index_alternatename_geonameid ON alternatename USING hash (geonameid);
EOT
echo "'----- DONE ( have fun... )"
|
|
|
27/03/2011 01:00:53
|
snrt
Joined: 26/03/2011 15:24:41
Messages: 6
Offline
|
thanks for this best work
how and where to run this script ?
others information are welcome [b]
|
|
|
27/03/2011 03:47:51
|
snrt
Joined: 26/03/2011 15:24:41
Messages: 6
Offline
|
Hi every boby
can i run this script in windows and how?
help Plz
|
|
|
31/03/2011 16:33:37
|
benderamp
Joined: 31/03/2011 16:24:03
Messages: 3
Offline
|
2Evgenich
It seems that there is an error in your script:
> grep -v '^#' countryInfo.txt | tail -n +2 > countryInfo.txt.tmp;
should be changed to:
>grep -v '^#' countryInfo.txt | head -n -2 > countryInfo.txt.tmp;
tail -n +2 removes the 1st line from the file - this is ok for timeZones.txt and iso-languagecodes.txt cause their 1st line is table header, so it should be removed before importing to the database.
But for countryInfo.txt header is already removed by "grep -v '^#' countryInfo.txt" which skips all lines starting with "#" and table header also starts with "#", so "tail -n +2" would remove the 1st data line which should not be removed. And at the same time last 2 lines should be removed from countryInfo.txt (they have geonameId=0 - explained above) - this can be done with "head -n -2".
I would also recommend to join db creating and filling with data to single psql session (called after the data is loaded) - this would reduce the number of password prompts.
|
|
|
31/03/2011 22:57:19
|
benderamp
Joined: 31/03/2011 16:24:03
Messages: 3
Offline
|
and also info for reference:
- downloaded files take 1.7 Gb on disk
- created db size after importing downloaded data is about 2Gb
|
|
|
04/04/2011 14:21:17
|
benderamp
Joined: 31/03/2011 16:24:03
Messages: 3
Offline
|
and also had to increase alternatenames column size:
alternatenames varchar(12000),
"alternatenames varchar(6000)" seemed to work on one of my postgres instances, but by some reason showed "ERROR: value too long for type character varying(6000)" on another.
|
|
|
|
|
|