Author |
Message |
02/07/2019 22:11:24
|
pablolbap
Joined: 02/07/2019 16:48:53
Messages: 1
Offline
|
Hi,
My file reader detects a wrong number of tokens when scanning the entry for the Istanbul International Airport (from alternateNamesV2).
I believe it could be due to it having one separator to many.
said entry:
"13635172;11838481;iata;ISL;;;;1;2018-10-31;2019-04-06 ;"
Best regards
|
|
|
26/07/2019 15:16:17
|
SvenAtWork
Joined: 26/07/2019 12:03:22
Messages: 2
Location: Germany
Offline
|
Same here.
There is an additional Tab at the end of the line.
May be other lines aswell, but was not able to analyse it yet.
Would be much appreciated, if this can be fixed soon.
Otherwise we have to build a custom programmed solution.
However, because this is my first post in this forum...
GREAT WORK in general!
I use geonames.org data (Gazatteer + PostalCodes) for years.
|
|
|
26/07/2019 15:21:51
|
marc
Joined: 08/12/2005 07:39:47
Messages: 4412
Offline
|
you are right there are some control chars in the from and to fields.
Will be fixed with the next extract.
And the frontend will be improved to eliminated these chars when saving.
Marc
|
|
|
|
27/07/2019 17:06:15
|
SvenAtWork
Joined: 26/07/2019 12:03:22
Messages: 2
Location: Germany
Offline
|
Thanks a lot!
just for info:
I parsed the whole file today, seems that the line with alternateNameId "13635172" is the only problematic row in the file.
|
|
|
05/08/2019 17:33:31
|
willi99
Joined: 05/08/2019 12:00:14
Messages: 4
Offline
|
I have downloaded allCoutries.zip and parsed it via python csv.Dictreader, and in names2 (the big field with alle the different utf encodes) I encountered control chars too, as it always stopped at one of the Afghanistan entries.
|
Description |
|
Download
|
Filesize |
17 Kbytes
|
Downloaded: |
2350 time(s) |
|
|
|
12/08/2019 09:47:32
|
marc
Joined: 08/12/2005 07:39:47
Messages: 4412
Offline
|
the alternateNameId "13635172" should be fixed. Is there still an issue?
What is the problem with Afghanistan? Which feature and which control char?
Marc
|
|
|
|
08/09/2019 22:43:10
|
willi99
Joined: 05/08/2019 12:00:14
Messages: 4
Offline
|
Hy, the Error Message (python) on ID 1149361
is:
_mysql_exceptions.OperationalError: (1366, "Incorrect string value: '\\xF0\\x90\\x8C\\xB0\\xF0\\x90...' for column 'alternatenames' at row 1")
is it possible there is an escape code to change write direction to right-to-left is in there, as nano behaves strange with this field too, when passing this character nano display gets garbaged.
|
|
|
11/09/2019 10:59:46
|
willi99
Joined: 05/08/2019 12:00:14
Messages: 4
Offline
|
i suspect its 4byte utf stings that trigger the error. but strange that nano had display problems too
https://stackoverflow.com/questions/10957238/incorrect-string-value-when-trying-to-insert-utf-8-into-mysql-via-jdbc#
|
|
|
11/09/2019 10:59:46
|
willi99
Joined: 05/08/2019 12:00:14
Messages: 4
Offline
|
It was my error, i had to use utf8mb5 encoding, also for the python mysqldb-connector. it worked now. its because 4byte utf characters are used and mysql does not handle them in utf8 but in utf8mb4.
|
|
|
30/09/2019 23:19:28
|
mariakatosvich
Joined: 24/09/2016 13:38:52
Messages: 1
Offline
|
If you have MySQL 5.5 or later you can change the column encoding from utf8 to utf8mb4. This encoding allows storage of characters that occupy 4 bytes in UTF-8.
You may also have to set the server property character_set_server to utf8mb4 in the MySQL configuration file
|
|
|
|