GeoNames Home | Postal Codes | Download / Webservice | About 

GeoNames Forum
  [Search] Search   [Recent Topics] Recent Topics   [Groups] Back to home page 
[Register] Register / 
[Login] Login 
Geoname Sql loader for Oracle  XML
Forum Index -> FAQ - frequently asked questions
Author Message
Sam



Joined: 05/03/2008 06:58:28
Messages: 28
Offline

I want to load geoname data dump (allCountries.txt) record into oracle table.Can somebody help me doing the same?
isa



Joined: 31/01/2007 12:37:49
Messages: 77
Offline

Hi, Sam:
Have you created already the tabla? Have you done some test?

File allcountries.txt is a huge file. It has more than 6.000.000 rows.

While you do the tests, it's better use a smaller file: the files separated by countries, have the same structure than allCountries.txt and much less rows.
You must use one of them. When the process works fine, you can import the full file.

Tell me if you has done some test, and I will try to help you.

Regards,
Isa
Sam



Joined: 05/03/2008 06:58:28
Messages: 28
Offline


SQLLDR log=test.log bad = bad.bad data = GB.txt
Control=contorl1.ctl
user:
Password:

my control file is
LOAD DATA
CHARACTERSET UTF8
INTO TABLE geoname_tmp
FIELDS TERMINATED BY X'09'
TRAILING NULLCOLS
(geonameid,name,asciiname,alternatenames,latitude,longitude,fclass,fcode,country,cc2, admin1,admin2,admin3,admin4,population,elevation,gtopo30,timezone,moddate)


my oracle table structure is
Name Null? Type
----------------------------------------- -------- -------------

GEONAMEID NOT NULL NUMBER(3
NAME VARCHAR2(200)
ASCIINAME VARCHAR2(200)
ALTERNATENAMES VARCHAR2(4000
LATITUDE NUMBER(9,7)
LONGITUDE NUMBER(9,7)
FCLASS CHAR(5)
FCODE VARCHAR2(10)
COUNTRY VARCHAR2(2)
CC2 VARCHAR2(60)
ADMIN1 VARCHAR2(10)
ADMIN2 VARCHAR2(10)
ADMIN3 VARCHAR2(10)
ADMIN4 VARCHAR2(10)
POPULATION VARCHAR2(20)
ELEVATION VARCHAR2(20)
GTOPO30 VARCHAR2(20)
TIMEZONE VARCHAR2(40)
MODDATE VARCHAR2(20)

When i try to insert all the values to table.
It inserts few and gives error for Alternate name field
"Field in data file exceeds maximum length"

this error is because of few special character in alternate name fiels
e.g Sariqöl in this example "O" is a special character and is not in english

what is the solution for this?
i m unable to get all the data inserted into my geoname.
isa



Joined: 31/01/2007 12:37:49
Messages: 77
Offline

Hi:

Can you try this?:
In the ctl file, add "char(4000)" to the alternatenames column:

LOAD DATA
CHARACTERSET UTF8
INTO TABLE geoname_tmp
FIELDS TERMINATED BY X'09'
TRAILING NULLCOLS
(geonameid,name,asciiname,alternatenames char(4000),
latitude,longitude,fclass,fcode,country,cc2, admin1,admin2,admin3,admin4,
population,elevation,gtopo30,timezone,moddate)

What is your Character Set?

Regards
Sam



Joined: 05/03/2008 06:58:28
Messages: 28
Offline

i have tried this.
the above one also does not work.
actually the problem here is there are few characters which editors does not understand
e.g
Bab-e Hormoz,Bab-e-Hormuz,Bab-i-Hormuz,Bughaz Hormuz,Bāb-e Hormoz,Bāb-e-Hormūz,Bāb-i-Hormuz,Būghāz Hormuz,Madiq Hurmuz,Maḑīq Hurmūz,Strait of Hormoz,Strait of Ormuz,Tangeh Hormoz,Tangeh-e-Hormoz,Tangeh-ye Hormoz,Tangeh-ye Ḩormoz,Tangi Hormuz,Tangī Hormuz,بابِ هُرمُز,تَنگِه هُرمُز,تَنگِۀ حُرمُز,تَنگِۀ هُرمُ 1 ۀ 溝 3
In the above example it does not insert the complete data because all the text editors can not understand what is (* 1)character is also (*3) character is so it skips the complete record saying that it exceeds maximum length.
isa



Joined: 31/01/2007 12:37:49
Messages: 77
Offline

Hi.
Can you do another quick test? For discard another things.

Put the column alternatenames with type LONG, not with varchar2.

Regards,
Isa
Sam



Joined: 05/03/2008 06:58:28
Messages: 28
Offline

I have tried with everything from char ,varchar ,long,nchar till blob but it does not work actaully the oracle is unable to uncerstand some language which internet browsres understand and henece it gives error saying that specific field exceeds the maximum length .

is there any way by which i can make the oracle to understnad the specific language ?
hakan



Joined: 13/03/2008 19:21:16
Messages: 11
Offline

Sam wrote:

actually the problem here is there are few characters which editors does not understand
 


Are you actually loading the dump file into a editor? You should not need to do that, at least with postgresql, you use a command like

"load into <tablename> from allcountries.txt"

That way, you never load allcountries.txt into an editor...


Regards,
Hakan
isa



Joined: 31/01/2007 12:37:49
Messages: 77
Offline

Hi, Sam:
more tests.
What character sets has your database?
You can see it like that:

select * from nls_database_parameters
where parameter like '%CHARACTER%'

PARAMETER VALUE
------------------------------ ----------------------------------------
NLS_NUMERIC_CHARACTERS .,
NLS_CHARACTERSET WE8ISO8859P1
NLS_NCHAR_CHARACTERSET AL16UTF16

If your NLS_CHARACTERSET is not UTF-8, you must put this column as NVARCHAR2(2000).

More things: your database is in windows or linux/unix?

If you want, send me the log file of the "load data".

Regards,
Isa
bs_pk



Joined: 27/03/2009 22:03:14
Messages: 2
Offline

I'm load allCountries.txt in Oracle database without error.
Structure geoname
create table GEONAME
(
GEONAME_ID NUMBER,
NAME VARCHAR2(200),
ASCIINAME VARCHAR2(200),
ALTERNATENAMES NCLOB,
LATITUDE NUMBER(19,16),
LONGITUDE NUMBER(19,16),
FEATURE_CLASS VARCHAR2(1),
FEATURE_CODE VARCHAR2(10),
COUNTRY_CODE VARCHAR2(2),
CC2 VARCHAR2(60),
ADMIN1_CODE VARCHAR2(20),
ADMIN2_CODE VARCHAR2(80),
ADMIN3_CODE VARCHAR2(20),
ADMIN4_CODE VARCHAR2(20),
POPULATION NUMBER,
ELEVATION NUMBER,
GTOPO30 NUMBER,
TIMEZONE_ID VARCHAR2(50),
MODIFICATION_DATE DATE
)
Control file
OPTIONS (ROWS=8192, BINDSIZE=5000000, READSIZE=5000000, ERRORS=999999999)
LOAD DATA CHARACTERSET UTF8 BYTEORDERMARK CHECK
INTO TABLE geoname FIELDS TERMINATED BY X'09' TRAILING NULLCOLS
(
GEONAME_ID,
NAME,
ASCIINAME,
ALTERNATENAMES char(8000),
LATITUDE "replace(:latitude,'.',',')",
LONGITUDE "replace(:longitude,'.',',')",
FEATURE_CLASS,
FEATURE_CODE,
COUNTRY_CODE,
CC2,
ADMIN1_CODE,
ADMIN2_CODE,
ADMIN3_CODE,
ADMIN4_CODE,
POPULATION,
ELEVATION,
GTOPO30,
TIMEZONE_ID,
MODIFICATION_DATE DATE "YYYY-MM-DD"
)
Before load. need run command
copy/b utf8_mark.+allCountries.txt allCountries8.txt
File ut8_mark contact 3 byte: 0xEFBBBF.

Result load
SQL*Loader: Release 10.2.0.1.0 - Production on Пт. Бер 27 22:40:19 2009

Copyright (c) 1982, 2005, Oracle. All rights reserved.

Control File: geoname.ctl
Character Set UTF8 specified for all input.
BYTEORDERMARK CHECK specified and a unicode character set specified.
Checking for a byte order mark.

Data File: allCountries8.txt
Bad File: geoname.bad
Discard File: geoname.dis
(Allow all discards)

Number to load: ALL
Number to skip: 0
Errors allowed: 999999999
Bind array: 8192 rows, maximum of 5000000 bytes
Continuation: none specified
Path used: Conventional

Table GEONAME, loaded from every logical record.
Insert option in effect for this table: INSERT
TRAILING NULLCOLS option in effect

Column Name Position Len Term Encl Datatype
------------------------------ ---------- ----- ---- ---- ---------------------
GEONAME_ID FIRST * WHT CHARACTER
NAME NEXT * WHT CHARACTER
ASCIINAME NEXT * WHT CHARACTER
ALTERNATENAMES NEXT 8000 WHT CHARACTER
LATITUDE NEXT * WHT CHARACTER
SQL string for column : "replace(:latitude,'.',',')"
LONGITUDE NEXT * WHT CHARACTER
SQL string for column : "replace(:longitude,'.',',')"
FEATURE_CLASS NEXT * WHT CHARACTER
FEATURE_CODE NEXT * WHT CHARACTER
COUNTRY_CODE NEXT * WHT CHARACTER
CC2 NEXT * WHT CHARACTER
ADMIN1_CODE NEXT * WHT CHARACTER
ADMIN2_CODE NEXT * WHT CHARACTER
ADMIN3_CODE NEXT * WHT CHARACTER
ADMIN4_CODE NEXT * WHT CHARACTER
POPULATION NEXT * WHT CHARACTER
ELEVATION NEXT * WHT CHARACTER
GTOPO30 NEXT * WHT CHARACTER
TIMEZONE_ID NEXT * WHT CHARACTER
MODIFICATION_DATE NEXT * WHT DATE YYYY-MM-DD

value used for ROWS parameter changed from 8192 to 395

Table GEONAME:
6903975 Rows successfully loaded.
0 Rows not loaded due to data errors.
0 Rows not loaded because all WHEN clauses were failed.
0 Rows not loaded because all fields were null.


Space allocated for bind array: 4995170 bytes(395 rows)
Read buffer bytes: 5000000

Total logical records skipped: 0
Total logical records read: 6903975
Total logical records rejected: 0
Total logical records discarded: 0

P.S.
select * from nls_database_parameters
where parameter like '%CHARACTER%'

PARAMETER VALUE
------------------------------------------
NLS_NUMERIC_CHARACTERS .,
NLS_CHARACTERSET AL32UTF8
NLS_NCHAR_CHARACTERSET AL16UTF16

 Description [Disk] Download
 Filesize 3 bytes
 Downloaded:  0 time(s)

bs_pk



Joined: 27/03/2009 22:03:14
Messages: 2
Offline

.
 
Forum Index -> FAQ - frequently asked questions
Go to:   
Powered by JForum 2.1.5 © JForum Team