GeoNames Home | Postal Codes | Download / Webservice | About 

GeoNames Forum
  [Search] Search   [Recent Topics] Recent Topics   [Groups] Back to home page 
[Register] Register / 
[Login] Login 
Importing *all* geonames tables to mysql  XML
Forum Index -> FAQ - frequently asked questions Go to Page: 1, 2, 3, 4 Next 
Author Message
dportabella



Joined: 11/01/2008 00:45:40
Messages: 3
Offline

Hello,

First, congratulations for providing this great gazetteer database!!

How to import the geonames.org data into mysql:

1. Download and prepare data
++++++++++++++++++++++++++++

mkdir /data
cd /data

wget http://download.geonames.org/export/dump/allCountries.zip
wget http://download.geonames.org/export/dump/alternateNames.zip
wget http://download.geonames.org/export/dump/admin1Codes.txt
wget http://download.geonames.org/export/dump/admin1CodesASCII.txt
wget http://download.geonames.org/export/dump/featureCodes_en.txt
wget http://download.geonames.org/export/dump/timeZones.txt
wget http://download.geonames.org/export/dump/countryInfo.txt

unzip allCountries.zip
unzip alternateNames.zip
cat countryInfo.txt | grep -v "^#" >countryInfo-n.txt



2. Create tables and import data
++++++++++++++++
mysql -u root <import.sql


DONE.
Remember to add the indexes you need...

TODO:
>geoname.cc2 : alternate country codes, comma separated, ISO-3166 2-letter country code, 60 characters
Move this information to a separate table.





Files:
+++++ continentCodes.txt START +++++++++++++++++++++++++++++++++++++
AF,Africa,6255146
AS,Asia,6255147
EU,Europe,6255148
NA,North America,6255149
OC,Oceania,6255151
SA,South America,6255150
AN,Antarctica,6255152
+++++ continentCodes.txt END +++++++++++++++++++++++++++++++++++++++


+++++ import.sql START +++++++++++++++++++++++++++++++++++++++
CREATE DATABASE geonames;
USE geonames;

CREATE TABLE geoname (
geonameid int PRIMARY KEY,
name varchar(200),
asciiname varchar(200),
alternatenames varchar(4000),
latitude decimal(10,7),
longitude decimal(10,7),
fclass char(1),
fcode varchar(10),
country varchar(2),
cc2 varchar(60),
admin1 varchar(20),
admin2 varchar(80),
admin3 varchar(20),
admin4 varchar(20),
population int,
elevation int,
gtopo30 int,
timezone varchar(40),
moddate date
) CHARACTER SET utf8;


CREATE TABLE alternatename (
alternatenameId int PRIMARY KEY,
geonameid int,
isoLanguage varchar(7),
alternateName varchar(200),
isPreferredName boolean,
isShortName boolean,
isColloquial boolean,
isHistoric boolean
) CHARACTER SET utf8;


CREATE TABLE countryinfo (
iso_alpha2 char(2),
iso_alpha3 char(3),
iso_numeric integer,
fips_code character varying(3),
name character varying(200),
capital character varying(200),
areainsqkm double precision,
population integer,
continent char(2),
tld char(3),
currency char(3),
currencyName char(20),
Phone char(10),
postalCodeFormat char(20),
postalCodeRegex char(20),
geonameId int
languages character varying(200),
neighbours char(20),
equivalentFipsCode char(10)
) CHARACTER SET utf8;


CREATE TABLE iso_languagecodes(
iso_639_3 CHAR(4),
iso_639_2 VARCHAR(50),
iso_639_1 VARCHAR(50),
language_name VARCHAR(200)
) CHARACTER SET utf8;


CREATE TABLE admin1Codes (
code CHAR(6),
name TEXT
) CHARACTER SET utf8;


CREATE TABLE admin1CodesAscii (
code CHAR(6),
name TEXT,
nameAscii TEXT,
geonameid int
) CHARACTER SET utf8;


CREATE TABLE featureCodes (
code CHAR(7),
name VARCHAR(200),
description TEXT
) CHARACTER SET utf8;


CREATE TABLE timeZones (
timeZoneId VARCHAR(200),
GMT_offset DECIMAL(3,1),
DST_offset DECIMAL(3,1)
) CHARACTER SET utf8;


CREATE TABLE continentCodes (
code CHAR(2),
name VARCHAR(20),
geonameid INT
) CHARACTER SET utf8;


LOAD DATA INFILE '/data/allCountries.txt' INTO TABLE geoname (geonameid,name,asciiname,alternatenames,latitude,longitude,fclass,fcode,country,cc2, admin1,admin2,admin3,admin4,population,elevation,gtopo30,timezone,moddate);

LOAD DATA INFILE '/data/alternateNames.txt' INTO TABLE alternatename (alternatenameid,geonameid,isoLanguage,alternateName,isPreferredName,isShortName,isColloquial,isHistoric);

LOAD DATA INFILE '/data/iso-languagecodes.txt' INTO TABLE iso_languagecodes (iso_639_3, iso_639_2, iso_639_1, language_name);

LOAD DATA INFILE '/data/admin1Codes.txt' INTO TABLE admin1Codes (code, name);

LOAD DATA INFILE '/data/admin1CodesAscii.txt' INTO TABLE admin1CodesAscii (code, name, nameAscii, geonameid);

LOAD DATA INFILE '/data/featureCodes_en.txt' INTO TABLE featureCodes (code, name, description);

LOAD DATA INFILE '/data/timeZones.txt' INTO TABLE timeZones IGNORE 1 LINES (timeZoneId, GMT_offset, DST_offset);

LOAD DATA INFILE '/data/countryInfo-n.txt' INTO TABLE countryInfo IGNORE 1 LINES (iso_alpha2,iso_alpha3,iso_numeric,fips_code,name,capital,areaInSqKm,population,continent,languages,currency,geonameId);

LOAD DATA INFILE '/data/continentCodes.txt' INTO TABLE continentCodes FIELDS TERMINATED BY ',' (code, name, geonameId);

+++++ import.sql END +++++++++++++++++++++++++++++++++++++++++


[Email]
marc



Joined: 08/12/2005 07:39:47
Messages: 4406
Offline

thanks for sharing this.

Marc

[WWW]
dportabella



Joined: 11/01/2008 00:45:40
Messages: 3
Offline

Thanks

You can also download the geonames database in MsAccess format:
http://david.portabella.name/tools/download.php?file=geonames.zip

Attention: 1.6 Gb.


DAvid
[Email]
noiv



Joined: 10/06/2006 16:36:57
Messages: 19
Location: ExploreOurPla.net
Offline

dportabella wrote:
Attention: 1.6 Gb 
Great Work! Any zip available?

noiv - ExploreOurPla.net
[WWW]
dportabella



Joined: 11/01/2008 00:45:40
Messages: 3
Offline

You are right

Here there is a zip version, 270 Mb instead of 1.6 Gb
http://david.portabella.name/files2/geonames.mdb.zip
[Email]
Shahzad



Joined: 04/03/2008 18:32:29
Messages: 9
Offline

when im importing data, its giving me the following error
ERROR 1366 : Incorrect integer value: '' for column 'elevation' at row 1

im using the following command
LOAD DATA INFILE 'worldlocations/db/allCountries.txt' REPLACE INTO TABLE geoname(geonameid,name,asciiname,alternatenames,latitude,longitude,fclass,fcode,country,cc2, admin1,admin2,admin3,admin4,population,elevation,gtopo30,timezone,moddate);

My geoname table is set to utf-8 character set
Shahzad



Joined: 04/03/2008 18:32:29
Messages: 9
Offline

found teh answer on the 3rd thread in this section

requires changing table type to MyISam
giorgio79



Joined: 21/04/2008 17:05:33
Messages: 28
Offline

It seems the ADM2 codes are not included here in the mysql imports:

http://download.geonames.org/export/dump/admin2Codes.txt

Also, there was an interesting mention here
http://forum.geonames.org/gforum/posts/list/80.page
that geonameId can be stored as medint in mysql instead of int to save space. Anyone has experience with this?
giorgio79



Joined: 21/04/2008 17:05:33
Messages: 28
Offline

Here is the structure for admin2codes

Code:
 CREATE TABLE `geonames_admin2codes` (
   `code` varchar(20) default NULL,
   `name_local` text,
   `name` text NOT NULL,
   `geonameid` int(11) NOT NULL
 ) ENGINE=MyISAM DEFAULT CHARSET=utf8;
 
yascha



Joined: 04/06/2008 18:36:58
Messages: 3
Offline

Hi everyone.

First I'd like to thank dportabella for creating and sharing this script with everyone.

I've run an adapted version of this script for my purposes and I've come across an interesting error that I feel obliged to point out.

Setting latitude and longitude to type decimal(9,7) means that the database value will have 9 digit precision; 7 of those digits being after the decimal point (a 7 digit mantissa).

Upon loading my data into an SQL Server, all of my longitudes which were > 99.99 and < -99.99 were rounded off to 99.9999 and -99.9999 respectively.

An easy fix is to set the latitude and longitude to type decimal(10,7).

With this fix in place, the altered create statement for the geoname table will look as follows:

CREATE TABLE geoname (
geonameid int PRIMARY KEY,
name varchar(200),
asciiname varchar(200),
alternatenames varchar(4000),
latitude decimal(10,7),
longitude decimal(10,7),

fclass char(1),
fcode varchar(10),
country varchar(2),
cc2 varchar(60),
admin1 varchar(20),
admin2 varchar(80),
admin3 varchar(20),
admin4 varchar(20),
population int,
elevation int,
gtopo30 int,
timezone varchar(40),
moddate date
) CHARACTER SET utf8;

Cheers,
Yascha
giorgio79



Joined: 21/04/2008 17:05:33
Messages: 28
Offline

I noticed the iso_languagecodes language_name column has some inconsistencies as some entries are enclosed have double quotes around them.

I removed these and here is a corrected mysql dump of the table
 Description [Disk] Download
 Filesize 226 Kbytes
 Downloaded:  38651 time(s)

mhuggins



Joined: 20/06/2008 00:12:34
Messages: 18
Location: Downingtown, PA
Offline

Am I the only one having problems with encoding in following the steps in this thread?

Just a simple example, but check out the adminName1 for this geoname entry (adminCode 'CL.06'). It should say "Bío-Bío".

Now I do a search in my own database for adminCode1 = 'CL.06', and I get "Bío-Bío". This happens both in the SQL SELECT statement via SSH, as well as when being transferred via HTTP. (The latter was tested to ensure it wasn't a display limitation of Putty SSH client.)

I'm not sure if it's a problem with wget, the table creation (is utf8 the correct character set?), the insertion of the data itself, or the selection/retrieval of the data. When I first download data with wget, it doesn't appear correctly in my SSH client (similar to the above issue, I see í characters). However, if I FTP the file to my windows box and open the file, I see the correct characters -- albeit I receive a popup warning that characters are being converted to a displayable format when that happens.

Anyway, I obviously want to have the data stored in my database correctly, and I don't seem to have that right now. Does anyone have a clue where my problem could lie? Is there a parameter I need to pass with "wget" or "unzip"? Is there a different character set I should use in my CREATE TABLE formats? Something else I might not know about?

Thanks!

Edited to add: I also tried including SQL statement "SET NAMES 'utf8'" as per MySQL docs prior to calling the above SELECT statement, but that still resulted in the following output: "B�­o-B�­o".

And lastly, I tried forcing the content-type when requesting the file via wget, which also didn't work:
wget --header="Content-Type:text/plain; charset=utf-8" http://download.geonames.org/export/dump/admin1Codes.txt 
[WWW]
mhuggins



Joined: 20/06/2008 00:12:34
Messages: 18
Location: Downingtown, PA
Offline

Well, I think I'm on my way to figuring out some of my problem posted above. Here are the steps I've taken so far to try to resolve it. (Note that this is on Ubuntu, so it may differ for other distros.)

1. Check if you have the en_US.UTF-8 locale.
> locale -a 

2. If you don't see it, then you need to install it.
> sudo apt-get install language-pack-en 

3. Verify that the install worked, change the system locale, and ensure that the change was accepted. (Note: I had to log out of my box and log back in to see the locale change, you may have to do the same.)
> locale -a
> sudo update-locale LANG=en_US.utf-8
> locale 

4. Update your Apache settings to specify UTF-8 response type. Add the line "AddDefaultCharset utf-8" in your httpd.conf or specific site configuration. (Note: There may be a better way of doing this so that only specific files sent by Apache are sent in UTF-8 encoding; however, this worked for what I needed.)
> sudo nano /etc/apache2/sites-available/example.com 

So as of now, when I use wget to save the raw data dumps onto my system, Apache is correctly sending them as UTF-8 encoded if I request the files on my web server directly.

However, I'm still not able to retrieve (or maybe insert?) the data from MySQL correctly. The data is either being stored without being properly encoded, or it's being retrieved without being properly encoded...I'm not sure which. If anyone has any insight here, I'd greatly appreciate it!
[WWW]
giorgio79



Joined: 21/04/2008 17:05:33
Messages: 28
Offline

Is the default locale utf8 in your mysql cnf? Meaning all your tables and cols are utf8 right?

Is your html page utf8 encoded?

Dont load the dumps like this

mysql -u whatever table < mytable.sql

I have done that and it messes utf8 up as the > in linux uses ascii

use load data or source mytable.sql

good luck!

In the end, everything everywhere has to be utf8 for it to work
John Small



Joined: 24/04/2008 10:48:17
Messages: 12
Location: United Kingdom
Offline

I had to change "load data infile" to "load data local infile" as described in the MySQL docs. Using the first form the server loads the data so it sees everything using the account that the MySQL server process starts in. Using the second form the data is loaded via the client process so it uses the account and directory you're in.

Since this is a feature of MySQL rather than any version of nix you're in I'm puzzled why everything worked OK for the other respondents.


John Small
[WWW]
Phliplip



Joined: 31/07/2008 14:35:29
Messages: 2
Offline

Do someone have a method of updating one's local geonames database?

Or are you just supposed to download new copy, truncate existing db and load new values?
Yves



Joined: 13/08/2008 11:45:01
Messages: 4
Offline

There is a problem with the tables:

CREATE TABLE admin1Codes (
code CHAR(5),
name TEXT
) CHARACTER SET utf8;


CREATE TABLE admin1CodesAscii (
code CHAR(5),
name TEXT,
nameAscii TEXT,
geonameid int
) CHARACTER SET utf8;


char(5) for code is too short as for example BE.WAL, BE.BRU, BE.VLG will be truncated. Be aware of this.

Currently building a vacation rentals website http://www.chaletino.com/
SEO



Joined: 14/10/2008 18:58:20
Messages: 18
Offline

The country import code is outdated, this is the latest code:

LOAD DATA INFILE '/data/countryInfo-n.txt' INTO TABLE countryinfo IGNORE 1 LINES (iso_alpha2,iso_alpha3,iso_numeric,fips_code,name,capital,areaInSqKm,population,continent,tld,currencycode,currencyname,phone,postal_code_format,postal_code_regex,languages,geonameid,neighbours,equivalentfipscode);
gumush



Joined: 15/10/2008 16:07:26
Messages: 1
Offline

Is there any link for download mysqldatabase ready sql file compressed ?
giorgio79



Joined: 21/04/2008 17:05:33
Messages: 28
Offline

The main geonames and alternate tables are huge once I load them.

geonames is almost 700 MB s and alternate names is sg like 80 MBs

Given I am only doing SELECTs on the tables, I compressed them with mysql tools to half their original size. Now geonames is 300 MB and alternatenames is 30

Here is the artcile how
http://articles.techrepublic.com.com/5100-10878_11-5852557.html
 
Forum Index -> FAQ - frequently asked questions Go to Page: 1, 2, 3, 4 Next 
Go to:   
Powered by JForum 2.1.5 © JForum Team