GeoNames Home | Postal Codes | Download / Webservice | About 

GeoNames Forum
  [Search] Search   [Recent Topics] Recent Topics   [Groups] Back to home page 
[Register] Register / 
[Login] Login 
Encoding issues in admin1Codes  XML
Forum Index -> General
Author Message
bernard



Joined: 18/08/2006 11:54:14
Messages: 30
Offline

Follow-up of a remark I made on my region of Provence Alpes Côte d'Azur, which shows in the map interface as "Provence-Alpes-Côte dʼAzur" with a weird ʼ instead of ' in Internet Explorer. In Firefox I have a correct display.
Same through the web service
http://ws.geonames.org/countrySubdivision?lat=44.5&lng=6.5

So I downloaded the admin1Codes.txt file, and found out that it was not only a browser issue, since I found indeed :
FR.B8 Provence-Alpes-Côte dʼAzur in this file.
among many other occurrences of bad encoded characters in various languages. I checked in various text editors, and have the same issue, although the file seems to be recognized as UTF-8 encoded.

Do others have the same issue? Any clue on how to fix that? Is it something wrong in the files, or in my machine (could be, it's a new one and maybe some settings are to be fixed).
marc



Joined: 08/12/2005 07:39:47
Messages: 4501
Offline

Thanks Bernard

It is an arabic character in the original dataset. I will correct it.

The character is also used for the Valle d'Aosta.

The reverse geocoding webservice, the map interface and the admin1Codes.txt file are all the same thing.

Cheers

Marc

[WWW]
bernard



Joined: 18/08/2006 11:54:14
Messages: 30
Offline

Côte d'Azur is OK now but there is still a lot of other names to clean up, e.g at http://www.geonames.org/maps/showOnMap?q=Abū. There again everything OK in Firefox and a lot of things like Abū Z̧aby in IE (well, if you look at this thread in Firefox, you don't see the problem at all ... )
marc



Joined: 08/12/2005 07:39:47
Messages: 4501
Offline

Do you speak these languages in order to know it is not a IE bug or shortcoming? I am reluctant to change it just because IE is having problems and I would not know with which character to replace it. Any ideas?

Marc

[WWW]
bernard



Joined: 18/08/2006 11:54:14
Messages: 30
Offline

Unfortunately I don't speak those languages

And were it only for IE, I would gladly forget it

But, as written above, I find the issue also when dowloading admin1Codes.txt, and opening it with any text or XML editor at hand (UltraEdit, XML Spy ...) even when taking UTF-8 options. So... I don't know. We would need native speakers around for a variety of languages I'm afraid. What is the source of all those names, BTW?
marc



Joined: 08/12/2005 07:39:47
Messages: 4501
Offline

The source for most of these codes is the National Geospatial-Intelligence Agency.

It is a Department of Defense (DoD) combat support agency that has been assigned an important, additional statutory mission of supporting national-level policymakers and Government Agencies. NGA is a member of the Intelligence Community and the single entity upon which the U.S. Government now relies to coherently manage the previously separate disciplines of imagery and mapping. By providing customers with ready access to the world's best imagery and geospatial intelligence, NGA provides critical support for the national decision making process and contributes to the high state of operational readiness of America's military forces. 


Marc
[WWW]
giorgio79



Joined: 21/04/2008 17:05:33
Messages: 28
Offline

I think my observation might be related:

http://forum.geonames.org/gforum/posts/list/928.page
zukanta



Joined: 18/11/2009 21:14:04
Messages: 4
Offline

We are still seeing lots of 'square' characters in the source files (AllCountries.txt) mainly in the alternate names column. Even thru UltraEdit or EditPad Pro and the web interface e.g. for

Hup’o Bank
Nishi Kaitoku Seamount
Usan Trough
...

So we can conclude that this is because of the National Geospatial-Intelligence Agency having bugs and can't really expect this to get fixed?

Thanks
marc



Joined: 08/12/2005 07:39:47
Messages: 4501
Offline

I don't see any 'squares' for the places you list below. I would rather say you haven't installed the right fonts on your machine to have them rendered properly.

Best

Marc

[WWW]
zukanta



Joined: 18/11/2009 21:14:04
Messages: 4
Offline

Hi Marc,

Thanks for your quick reply.

Well, I'm using a unicode font and I can see without problems lots of double byte text (arabic for instance) and most of the file is ok except for some of the last entries in the alternateNames column on several records. I suspect these entries to be in Chinese or Japanese.

Which editor did you use to look at AllCountries.txt that shows data without 'squares' for "Hup’o Bank"? I'm curious,

Thanks!
marc



Joined: 08/12/2005 07:39:47
Messages: 4501
Offline

I was looking at the webpage with firefox.

Marc

[WWW]
zukanta



Joined: 18/11/2009 21:14:04
Messages: 4
Offline

Marc,

I tried all available encodings and unicode fonts in EditPad Pro and UltraEdit. To no avail. SQL Server 2005 also has issues with those specific entries/characters. Sometimes it's a single character amongst several arabic characters.

I tried with FireFox and it displays fine, even in the FireFox editor when I look at the source code. Can't see anything special in the FF setup that makes it work (Font or encoding: simple Times New Roman/helvetica/Arial/Verdana and UTF.

Do you know of any file editor in a Windows environment that can display for instance the Alternate Names for

Hup’o Bank
Nishi Kaitoku Seamount
Usan Trough
...

and if so which one and using which font and encoding?

Thanks!

marc



Joined: 08/12/2005 07:39:47
Messages: 4501
Offline

If I open it with notepad then it looks ok.

Marc

[WWW]
zukanta



Joined: 18/11/2009 21:14:04
Messages: 4
Offline

Thanks Marc,

I was using fonts that had 'unicode' in their name but happen not to be FULLY unicode compatible. Arial Unicode MS seems to be the most Unicode compatible so far. Eyeballing the file, problems seem to reduced to a few arabic alternate name entries in AllCountries.txt:

e.g.
Rūd-e Takhtarī
Kūh-e Takhtarī
Dasht-e Takhtarī
Sulni Khwaṟ

But since they show fine in FireFox, I guess these issues are due to bugs with the Arial Unicode MS font. So don't worry about this,

Cheers and thanks for your help!
 
Forum Index -> General
Go to:   
Powered by JForum 2.1.5 © JForum Team