GeoNames Home | Postal Codes | Download / Webservice | About 

GeoNames Forum
  [Search] Search   [Recent Topics] Recent Topics   [Groups] Back to home page 
[Register] Register / 
[Login] Login 
Can't import anything after 518 000 records in mysql 8  XML
Forum Index -> General
Author Message
flusteredCoder



Joined: 21/08/2018 21:01:04
Messages: 3
Offline

For the longest time, I've been trying to import the allcountries.txt file into my database. The thing is, after 518 000 records, the database hangs.

I have managed to insert 518 001 - 518 020 records, but it would be extremely tedious to do that considering there are over 11 million lines (records). NOTE[b] that I am using import wizards to do this (Mysql Workbench and Navicat Premium).

I have created the tables according to the F.A.Q., tried manipulating the varchars with more 0's (had an error from that too), split the file to a hundred+ files of 50 000 lines and tried importing the files where the expected errors are - and received the errors. I also tried avoiding those lines with the errors but there were many errors after 518 000 lines. I know I'm saying errors a lot, but maybe 'error' is not what is happening, but it hangs regardless. I downloaded and tried importing only the CA.txt (Canada) and got errors too after 2800 lines or so.

I did manage to get the postal codes imported without errors the same way I tried allcountries.txt.

So here I am asking for much-needed help. I have recently installed Mysql 8. I downloaded the new dump from (2018-08-21), and now stall at 518 600 records This is for a personal learning project, so any help is much appreciated, thank you.

gisconfusion



Joined: 21/08/2018 22:48:28
Messages: 3
Offline

As I just went through this process using MYSQL 5.7, I'll tell you what I did and maybe that can help you too.

First issue, uploads through PHP will timeout and so you can't do this. I tried through phpmyadmin at first and it just won't work.

Upload the entire allcountries file to your server and then login to your server using ssh with a program like Putty (what I use), and then login to your mysql instance - there are lots of online resources that will tell you how to do this specific to your server, but on Unix-like systems its like this:

mysql -h Hostname -u Username -pPassword Database name

If you have your own server, Hostname will be Localhost.

Once you do this, you need to use LOAD DATA LOCAL INFILE to get the file into your database table.
Here is a sample command:

LOAD DATA LOCAL INFILE '/usr/home/username/directory/allCountries.txt'
INTO TABLE allCountries
CHARACTER SET UTF8
FIELDS TERMINATED BY '\t'
ENCLOSED BY ''
LINES TERMINATED BY '\n'

This is the exact command that I used to import it. The \t means that tabs separated the fields of data and \n means that each row is located on a new line.

Also, make sure that your database table has the correct columns for the data. I just mirrored the schema of the allCountries file to reduce possible issues.

Oh, and if you have a problem with the elevation field importing and defaulting to zero when there is no value for it, you will need to add this to the end of the above LAOAD DATA command:


(geonameid, name, asciiname, alternatenames, latitude, longitude, feature_class, feature_code, country_code, cc2, admin1_code, admin2_code, admin3_code, admin4_code, population, @elevation, dem, timezone, modification_date)
Set
elevation = if(@elevation = '', NULL, @elevation)


Of course, the elevation field in your table must be able to take a NULL value. All fields in the table are listed in this command so make sure the spelling is the same for you.

I hope this can help you! This is everything I did to import the data and it ended up about 2.3 gigs in the database.

flusteredCoder



Joined: 21/08/2018 21:01:04
Messages: 3
Offline

Thanks for the reply, I have done that at the Mysql prompt. Also attempted through import wizards for Mysql Workbench and Navicat Premium.

I'm also using utfmb4 instead of utf8 and collation utf8mb4_unicode_520_ci -as advised from many sources on stack overflow (allowing additional characters that utf8 doesn't have).

Also, I got an error stating that alternate names record (I think it was 2700) was too long for varchar10 000. So I omitted alternatenames, and then tried only importing the geonameid, geoname, asciiname, longitude, and latitude. That worked until record 518 600 where it hung. I tried again from record 518 601 to 600 000 - thinking maybe it could be a ram issue, but it added only 600 records before it hung up again.

here is my EXPLAINed table: (excuse the formatting)

+-------------------+----------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+----------------+------+-----+---------+-------+
| geonameid | int(255) | YES | | NULL | |
| geoname | varchar(255) | YES | | NULL | |
| asciiname | varchar(255) | YES | | NULL | |
| alternatenames | varchar(10000) | YES | | NULL | |
| latitude | decimal(10,7) | YES | | NULL | |
| longitude | decimal(10,7) | YES | | NULL | |
| feature_class | char(1) | YES | | NULL | |
| feature_code | varchar(10) | YES | | NULL | |
| country_code | varchar(2) | YES | | NULL | |
| cc2 | varchar(200) | YES | | NULL | |
| admin1 | varchar(20) | YES | | NULL | |
| admin2 | varchar(80) | YES | | NULL | |
| admin3 | varchar(20) | YES | | NULL | |
| admin4 | varchar(20) | YES | | NULL | |
| population | bigint( | YES | | NULL | |
| elevation | int(11) | YES | | NULL | |
| gtopo30 | bigint(20) | YES | | NULL | |
| timezone | varchar(40) | YES | | NULL | |
| modification_date | date | YES | | NULL | |
+-------------------+----------------+------+-----+---------+-------+


In the meantime, I'll see if I can convert this huge file into a json formatted file.
gisconfusion



Joined: 21/08/2018 22:48:28
Messages: 3
Offline

I'm no expert, but you shouldn't limit your fields so much. For testing purposes, change all fields to text or medium text and see if the import is successful. If that works, then, trim the fields down or change back to INT, etc.

OH, almost missed this, geonameid MUST be BIGINT or it can fail! That might actually be the issue right there.
flusteredCoder



Joined: 21/08/2018 21:01:04
Messages: 3
Offline

So I have some success!

I went back to setting my table and collation to regular utf8 after making sure that allcountries.txt was utf8 with the terminal command:
Code:
file -I allcountries.txt


Then, still in terminal, I started creating files with less columns. In the end I settled with [id, asciiname, latitude, longitude)] using:

Code:
cat allcountries.txt |  awk 'BEGIN{FS="[\t]"};{ if ($7 == "A") {print $1 "\t" $3 "\t" $5 "\t" $6} };' > newwithnames.txt


*I actually first tried to make a new file with only the ids - and it worked! All 11 million + records. Then I added more columns where they conked out on me again until eventually ending with my solution in the previous paragraph.

Now I have a file of 16MB, and 406 234 records All countries, regions, cities. I'll work with that for now, but I'll see later if I can improve on it.
Yalmar



Joined: 25/03/2021 21:06:39
Messages: 1
Offline

This is quite an old post, but I thought I'd chip in with what I did.

I analysed one column at a time from the original allCountries.txt to find out max length.

I first imported into a table where all columns are varchar with the exception of the 1st column geonameid which is an int.

Here's a quick overview of the varchar length:
01 geonameid : INT( : id
02 name : VARCHAR(151) : name
03 asciiname : VARCHAR(151) : ascii_name
04 alternatenames : VARCHAR(9772) : alt_name
05 latitude : VARCHAR(9) : latitude
06 longitude : VARCHAR(10) : longitude
07 feature class : VARCHAR(1) : feature_class
08 feature code : VARCHAR(5) : feature_cd
09 country code : VARCHAR(2) : country_cd
10 cc2 : VARCHAR(170) : alt_country_cd
11 admin1 code : VARCHAR(20) : admin1_cd
12 admin2 code : VARCHAR(63) : admin2_cd
13 admin3 code : VARCHAR(10) : admin3_cd
14 admin4 code : VARCHAR(20) : admin4_cd
15 population : VARCHAR(10) : population
16 elevation : VARCHAR(6) : elevation
17 dem : VARCHAR(5) : dem
18 timezone : VARCHAR(30) : timezone
19 modification date : VARCHAR(10) : mod_dt_var

I then created a second table where some of the columns are int and decimal and insert into the second table via a select from the first table with the necessary datatype conversions.

 
Forum Index -> General
Go to:   
Powered by JForum 2.1.5 © JForum Team