GeoNames Home | Postal Codes | Download / Webservice | About 

GeoNames Forum
  [Search] Search   [Recent Topics] Recent Topics   [Groups] Back to home page 
[Register] Register / 
[Login] Login 
News from Washington  XML
Forum Index -> General
Author Message
Anonymous



When checking the new feature of automatic GeoRSS, I see that in El País , news from Washington DC are assignated to the state of Washington.
marc



Joined: 08/12/2005 07:39:47
Messages: 4271
Offline

This is an interesting find. I have added it to the geonames test cases.
I post the text here as it will soon disappear in the orginal feed.

Title : "Bush nombra al polémico general Michael Hayden como nuevo director de la CIA"
Text : "El presidente de EEUU, George W. Bush, ha elegido al general de la Fuerza Aérea Michael Hayden como segundo responsable de la inteligencia estadounidense, después de John Negroponte. Hayden ucederá a Porter Goss al frente de la CIA.  Leer. Escuchar",

I have had a look at how the geonames search engine is dealing with this text.
The words 'CIA', 'EEUU' and 'estadounidense' indicate that the text is about the United States. The word 'Hayden', however, is a placename in the US which makes the search engine think the text is about the place 'Hayden' in the US.

I will play a little bit with the relevance calculation for placenames in geonames, as the place 'Hayden' has only a population of 11,000 and has received too high a relevance.

http://www.geonames.org/search.html?q=hayden&country=US

Changing the relevance Algorithm will certainly improve it but I am not sure whether this will already solve the problem. An other possible improvement would be to use a list of firstnames and have the search algorithm make use of it. 'Michael' happens to be the second most popular firstname in English (after Jacob). The next word after 'Michael' is thus likely to be a family name and not a place name.

As you can see it is a pretty complex task to find an algorithm working for all different kind of texts. If you spot other problems let me know.


http://www.ssa.gov/OACT/babynames/

[WWW]
Anonymous



There was also a piece in Slashdot that mentioned New Mexico but was coded as Mexico.
marc



Joined: 08/12/2005 07:39:47
Messages: 4271
Offline

Thanks for letting me know. I have put the text of this slashdot piece on our test case list.

It is a problem with finding a correct relevance factor for every name. In this case the country 'Mexico' has got too high a relevance compared to the state (admin1) 'New Mexico'.

[WWW]
Anonymous



Another one i found: in french news paper "le monde", it shows an article about "new orleans" in orleans in france...
marc



Joined: 08/12/2005 07:39:47
Messages: 4271
Offline

It was a problem with the minus in 'La Nouvelle-Orléans'.

Thanks for spotting the bug and letting me know.

Marc

[WWW]
 
Forum Index -> General
Go to:   
Powered by JForum 2.1.5 © JForum Team