Author |
Message |
11/01/2008 00:50:34
|
dportabella
Joined: 11/01/2008 00:45:40
Messages: 3
Offline
|
Hello,
First, congratulations for providing this great gazetteer database!!
How to import the geonames.org data into mysql:
1. Download and prepare data
++++++++++++++++++++++++++++
mkdir /data
cd /data
wget http://download.geonames.org/export/dump/allCountries.zip
wget http://download.geonames.org/export/dump/alternateNames.zip
wget http://download.geonames.org/export/dump/admin1Codes.txt
wget http://download.geonames.org/export/dump/admin1CodesASCII.txt
wget http://download.geonames.org/export/dump/featureCodes_en.txt
wget http://download.geonames.org/export/dump/timeZones.txt
wget http://download.geonames.org/export/dump/countryInfo.txt
unzip allCountries.zip
unzip alternateNames.zip
cat countryInfo.txt | grep -v "^#" >countryInfo-n.txt
2. Create tables and import data
++++++++++++++++
mysql -u root <import.sql
DONE.
Remember to add the indexes you need...
TODO:
>geoname.cc2 : alternate country codes, comma separated, ISO-3166 2-letter country code, 60 characters
Move this information to a separate table.
Files:
+++++ continentCodes.txt START +++++++++++++++++++++++++++++++++++++
AF,Africa,6255146
AS,Asia,6255147
EU,Europe,6255148
NA,North America,6255149
OC,Oceania,6255151
SA,South America,6255150
AN,Antarctica,6255152
+++++ continentCodes.txt END +++++++++++++++++++++++++++++++++++++++
+++++ import.sql START +++++++++++++++++++++++++++++++++++++++
CREATE DATABASE geonames;
USE geonames;
CREATE TABLE geoname (
geonameid int PRIMARY KEY,
name varchar(200),
asciiname varchar(200),
alternatenames varchar(4000),
latitude decimal(10,7),
longitude decimal(10,7),
fclass char(1),
fcode varchar(10),
country varchar(2),
cc2 varchar(60),
admin1 varchar(20),
admin2 varchar(80),
admin3 varchar(20),
admin4 varchar(20),
population int,
elevation int,
gtopo30 int,
timezone varchar(40),
moddate date
) CHARACTER SET utf8;
CREATE TABLE alternatename (
alternatenameId int PRIMARY KEY,
geonameid int,
isoLanguage varchar(7),
alternateName varchar(200),
isPreferredName boolean,
isShortName boolean,
isColloquial boolean,
isHistoric boolean
) CHARACTER SET utf8;
CREATE TABLE countryinfo (
iso_alpha2 char(2),
iso_alpha3 char(3),
iso_numeric integer,
fips_code character varying(3),
name character varying(200),
capital character varying(200),
areainsqkm double precision,
population integer,
continent char(2),
tld char(3),
currency char(3),
currencyName char(20),
Phone char(10),
postalCodeFormat char(20),
postalCodeRegex char(20),
geonameId int
languages character varying(200),
neighbours char(20),
equivalentFipsCode char(10)
) CHARACTER SET utf8;
CREATE TABLE iso_languagecodes(
iso_639_3 CHAR(4),
iso_639_2 VARCHAR(50),
iso_639_1 VARCHAR(50),
language_name VARCHAR(200)
) CHARACTER SET utf8;
CREATE TABLE admin1Codes (
code CHAR(6),
name TEXT
) CHARACTER SET utf8;
CREATE TABLE admin1CodesAscii (
code CHAR(6),
name TEXT,
nameAscii TEXT,
geonameid int
) CHARACTER SET utf8;
CREATE TABLE featureCodes (
code CHAR(7),
name VARCHAR(200),
description TEXT
) CHARACTER SET utf8;
CREATE TABLE timeZones (
timeZoneId VARCHAR(200),
GMT_offset DECIMAL(3,1),
DST_offset DECIMAL(3,1)
) CHARACTER SET utf8;
CREATE TABLE continentCodes (
code CHAR(2),
name VARCHAR(20),
geonameid INT
) CHARACTER SET utf8;
LOAD DATA INFILE '/data/allCountries.txt' INTO TABLE geoname (geonameid,name,asciiname,alternatenames,latitude,longitude,fclass,fcode,country,cc2, admin1,admin2,admin3,admin4,population,elevation,gtopo30,timezone,moddate);
LOAD DATA INFILE '/data/alternateNames.txt' INTO TABLE alternatename (alternatenameid,geonameid,isoLanguage,alternateName,isPreferredName,isShortName,isColloquial,isHistoric);
LOAD DATA INFILE '/data/iso-languagecodes.txt' INTO TABLE iso_languagecodes (iso_639_3, iso_639_2, iso_639_1, language_name);
LOAD DATA INFILE '/data/admin1Codes.txt' INTO TABLE admin1Codes (code, name);
LOAD DATA INFILE '/data/admin1CodesAscii.txt' INTO TABLE admin1CodesAscii (code, name, nameAscii, geonameid);
LOAD DATA INFILE '/data/featureCodes_en.txt' INTO TABLE featureCodes (code, name, description);
LOAD DATA INFILE '/data/timeZones.txt' INTO TABLE timeZones IGNORE 1 LINES (timeZoneId, GMT_offset, DST_offset);
LOAD DATA INFILE '/data/countryInfo-n.txt' INTO TABLE countryInfo IGNORE 1 LINES (iso_alpha2,iso_alpha3,iso_numeric,fips_code,name,capital,areaInSqKm,population,continent,languages,currency,geonameId);
LOAD DATA INFILE '/data/continentCodes.txt' INTO TABLE continentCodes FIELDS TERMINATED BY ',' (code, name, geonameId);
+++++ import.sql END +++++++++++++++++++++++++++++++++++++++++
|
|
|
11/01/2008 19:27:04
|
marc
Joined: 08/12/2005 07:39:47
Messages: 4441
Offline
|
thanks for sharing this.
Marc
|
|
|
|
11/01/2008 23:02:20
|
dportabella
Joined: 11/01/2008 00:45:40
Messages: 3
Offline
|
Thanks
You can also download the geonames database in MsAccess format:
http://david.portabella.name/tools/download.php?file=geonames.zip
Attention: 1.6 Gb.
DAvid
|
|
|
12/01/2008 17:25:57
|
noiv
Joined: 10/06/2006 16:36:57
Messages: 19
Location: ExploreOurPla.net
Offline
|
dportabella wrote:
Attention: 1.6 Gb
Great Work! Any zip available?
|
noiv - ExploreOurPla.net |
|
|
12/01/2008 23:05:22
|
dportabella
Joined: 11/01/2008 00:45:40
Messages: 3
Offline
|
You are right
Here there is a zip version, 270 Mb instead of 1.6 Gb
http://david.portabella.name/files2/geonames.mdb.zip
|
|
|
14/03/2008 03:49:14
|
Shahzad
Joined: 04/03/2008 18:32:29
Messages: 9
Offline
|
when im importing data, its giving me the following error
ERROR 1366 : Incorrect integer value: '' for column 'elevation' at row 1
im using the following command
LOAD DATA INFILE 'worldlocations/db/allCountries.txt' REPLACE INTO TABLE geoname(geonameid,name,asciiname,alternatenames,latitude,longitude,fclass,fcode,country,cc2, admin1,admin2,admin3,admin4,population,elevation,gtopo30,timezone,moddate);
My geoname table is set to utf-8 character set
|
|
|
17/03/2008 03:28:28
|
Shahzad
Joined: 04/03/2008 18:32:29
Messages: 9
Offline
|
found teh answer on the 3rd thread in this section
requires changing table type to MyISam
|
|
|
09/05/2008 16:12:47
|
giorgio79
Joined: 21/04/2008 17:05:33
Messages: 28
Offline
|
It seems the ADM2 codes are not included here in the mysql imports:
http://download.geonames.org/export/dump/admin2Codes.txt
Also, there was an interesting mention here
http://forum.geonames.org/gforum/posts/list/80.page
that geonameId can be stored as medint in mysql instead of int to save space. Anyone has experience with this?
|
|
|
21/05/2008 08:01:11
|
giorgio79
Joined: 21/04/2008 17:05:33
Messages: 28
Offline
|
Here is the structure for admin2codes
Code:
CREATE TABLE `geonames_admin2codes` (
`code` varchar(20) default NULL,
`name_local` text,
`name` text NOT NULL,
`geonameid` int(11) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
|
|
|
05/06/2008 00:19:48
|
yascha
Joined: 04/06/2008 18:36:58
Messages: 3
Offline
|
Hi everyone.
First I'd like to thank dportabella for creating and sharing this script with everyone.
I've run an adapted version of this script for my purposes and I've come across an interesting error that I feel obliged to point out.
Setting latitude and longitude to type decimal(9,7) means that the database value will have 9 digit precision; 7 of those digits being after the decimal point (a 7 digit mantissa).
Upon loading my data into an SQL Server, all of my longitudes which were > 99.99 and < -99.99 were rounded off to 99.9999 and -99.9999 respectively.
An easy fix is to set the latitude and longitude to type decimal(10,7).
With this fix in place, the altered create statement for the geoname table will look as follows:
CREATE TABLE geoname (
geonameid int PRIMARY KEY,
name varchar(200),
asciiname varchar(200),
alternatenames varchar(4000),
latitude decimal(10,7),
longitude decimal(10,7),
fclass char(1),
fcode varchar(10),
country varchar(2),
cc2 varchar(60),
admin1 varchar(20),
admin2 varchar(80),
admin3 varchar(20),
admin4 varchar(20),
population int,
elevation int,
gtopo30 int,
timezone varchar(40),
moddate date
) CHARACTER SET utf8;
Cheers,
Yascha
|
|
|
05/07/2008 04:58:23
|
giorgio79
Joined: 21/04/2008 17:05:33
Messages: 28
Offline
|
I noticed the iso_languagecodes language_name column has some inconsistencies as some entries are enclosed have double quotes around them.
I removed these and here is a corrected mysql dump of the table
Description |
|
Download
|
Filesize |
226 Kbytes
|
Downloaded: |
39255 time(s) |
|
|
|
05/07/2008 20:57:58
|
mhuggins
Joined: 20/06/2008 00:12:34
Messages: 18
Location: Downingtown, PA
Offline
|
Am I the only one having problems with encoding in following the steps in this thread?
Just a simple example, but check out the adminName1 for this geoname entry (adminCode 'CL.06'). It should say "Bío-Bío".
Now I do a search in my own database for adminCode1 = 'CL.06', and I get "BÃo-BÃo". This happens both in the SQL SELECT statement via SSH, as well as when being transferred via HTTP. (The latter was tested to ensure it wasn't a display limitation of Putty SSH client.)
I'm not sure if it's a problem with wget, the table creation (is utf8 the correct character set?), the insertion of the data itself, or the selection/retrieval of the data. When I first download data with wget, it doesn't appear correctly in my SSH client (similar to the above issue, I see à characters). However, if I FTP the file to my windows box and open the file, I see the correct characters -- albeit I receive a popup warning that characters are being converted to a displayable format when that happens.
Anyway, I obviously want to have the data stored in my database correctly, and I don't seem to have that right now. Does anyone have a clue where my problem could lie? Is there a parameter I need to pass with "wget" or "unzip"? Is there a different character set I should use in my CREATE TABLE formats? Something else I might not know about?
Thanks!
Edited to add: I also tried including SQL statement "SET NAMES 'utf8'" as per MySQL docs prior to calling the above SELECT statement, but that still resulted in the following output: "BÃ�Âo-BÃ�Âo".
And lastly, I tried forcing the content-type when requesting the file via wget, which also didn't work:
wget --header="Content-Type:text/plain; charset=utf-8" http://download.geonames.org/export/dump/admin1Codes.txt
|
|
|
05/07/2008 23:29:44
|
mhuggins
Joined: 20/06/2008 00:12:34
Messages: 18
Location: Downingtown, PA
Offline
|
Well, I think I'm on my way to figuring out some of my problem posted above. Here are the steps I've taken so far to try to resolve it. (Note that this is on Ubuntu, so it may differ for other distros.)
1. Check if you have the en_US.UTF-8 locale.
> locale -a
2. If you don't see it, then you need to install it.
> sudo apt-get install language-pack-en
3. Verify that the install worked, change the system locale, and ensure that the change was accepted. (Note: I had to log out of my box and log back in to see the locale change, you may have to do the same.)
> locale -a
> sudo update-locale LANG=en_US.utf-8
> locale
4. Update your Apache settings to specify UTF-8 response type. Add the line "AddDefaultCharset utf-8" in your httpd.conf or specific site configuration. (Note: There may be a better way of doing this so that only specific files sent by Apache are sent in UTF-8 encoding; however, this worked for what I needed.)
> sudo nano /etc/apache2/sites-available/example.com
So as of now, when I use wget to save the raw data dumps onto my system, Apache is correctly sending them as UTF-8 encoded if I request the files on my web server directly.
However, I'm still not able to retrieve (or maybe insert?) the data from MySQL correctly. The data is either being stored without being properly encoded, or it's being retrieved without being properly encoded...I'm not sure which. If anyone has any insight here, I'd greatly appreciate it!
|
|
|
07/07/2008 14:48:12
|
giorgio79
Joined: 21/04/2008 17:05:33
Messages: 28
Offline
|
Is the default locale utf8 in your mysql cnf? Meaning all your tables and cols are utf8 right?
Is your html page utf8 encoded?
Dont load the dumps like this
mysql -u whatever table < mytable.sql
I have done that and it messes utf8 up as the > in linux uses ascii
use load data or source mytable.sql
good luck!
In the end, everything everywhere has to be utf8 for it to work
|
|
|
23/07/2008 11:56:16
|
John Small
Joined: 24/04/2008 10:48:17
Messages: 12
Location: United Kingdom
Offline
|
I had to change "load data infile" to "load data local infile" as described in the MySQL docs. Using the first form the server loads the data so it sees everything using the account that the MySQL server process starts in. Using the second form the data is loaded via the client process so it uses the account and directory you're in.
Since this is a feature of MySQL rather than any version of nix you're in I'm puzzled why everything worked OK for the other respondents.
|
John Small |
|
|
04/08/2008 09:24:39
|
Phliplip
Joined: 31/07/2008 14:35:29
Messages: 2
Offline
|
Do someone have a method of updating one's local geonames database?
Or are you just supposed to download new copy, truncate existing db and load new values?
|
|
|
13/08/2008 11:57:37
|
Yves
Joined: 13/08/2008 11:45:01
Messages: 4
Offline
|
There is a problem with the tables:
CREATE TABLE admin1Codes (
code CHAR(5),
name TEXT
) CHARACTER SET utf8;
CREATE TABLE admin1CodesAscii (
code CHAR(5),
name TEXT,
nameAscii TEXT,
geonameid int
) CHARACTER SET utf8;
char(5) for code is too short as for example BE.WAL, BE.BRU, BE.VLG will be truncated. Be aware of this.
|
Currently building a vacation rentals website http://www.chaletino.com/ |
|
|
14/10/2008 21:23:44
|
SEO
Joined: 14/10/2008 18:58:20
Messages: 18
Offline
|
The country import code is outdated, this is the latest code:
LOAD DATA INFILE '/data/countryInfo-n.txt' INTO TABLE countryinfo IGNORE 1 LINES (iso_alpha2,iso_alpha3,iso_numeric,fips_code,name,capital,areaInSqKm,population,continent,tld,currencycode,currencyname,phone,postal_code_format,postal_code_regex,languages,geonameid,neighbours,equivalentfipscode);
|
|
|
15/10/2008 16:08:16
|
gumush
Joined: 15/10/2008 16:07:26
Messages: 1
Offline
|
Is there any link for download mysqldatabase ready sql file compressed ?
|
|
|
26/10/2008 06:48:37
|
giorgio79
Joined: 21/04/2008 17:05:33
Messages: 28
Offline
|
The main geonames and alternate tables are huge once I load them.
geonames is almost 700 MB s and alternate names is sg like 80 MBs
Given I am only doing SELECTs on the tables, I compressed them with mysql tools to half their original size. Now geonames is 300 MB and alternatenames is 30
Here is the artcile how
http://articles.techrepublic.com.com/5100-10878_11-5852557.html
|
|
|
|
|
|