GeoNames Home | Postal Codes | Download / Webservice | About 

GeoNames Forum
  [Search] Search   [Recent Topics] Recent Topics   [Groups] Back to home page 
[Register] Register / 
[Login] Login 
Not all local Wikipedia entries showing up  XML
Forum Index -> General
Author Message
egilchri


[Avatar]

Joined: 01/11/2010 12:43:05
Messages: 1
Offline

I check my local Wikipedia entries as follows:

http://ws.geonames.org/findNearbyWikipediaJSON?lat=43.06757&lng=-70.76137&radius=20&maxRows=30

but there are lots of entries that show up. In particular, we have lots of historic houses here in Portsmouth NH, that have Wikipedia entries, and they appear to also have lat and lng. They show up fine in the Wikipedia layer of Google Maps.

Do you know what is going on, an if there's anyway for me to get all my local geocoded Wikipedia entries?
[WWW]
marc



Joined: 08/12/2005 07:39:47
Messages: 4501
Offline

The GeoNames wikipedia layer has not been updated for quite some time. I assume those are entries that are new or did not yet have coordinates in a form understood by the parser when it was run the last time. (many months ago)

A new extract is planned, but I cannot make any promises on when it will be available.

Marc

[WWW]
aw



Joined: 02/11/2010 17:06:15
Messages: 3
Offline

Can it be a related problem that many of the descriptions (within the 'summary' tag) start in the middle of a sentence and don't match the start of the wikipedia article? Some of them can be quite confusing.

If what you say is true and the extracts are only done once every several months (really??) then maybe these descriptions will be automatically fixed on the next export? Or are there still known problems with the page parsing? Are you looking for help with the parsing?

I would have thought that such an extract would be trivial to do just by pushing a button once a week or even once a month. Is it so difficult / time-consuming to do that it can only be done twice or three times per year?
marc



Joined: 08/12/2005 07:39:47
Messages: 4501
Offline

Parsing wikipedia is definitely not trivial. In fact it is nearly impossible. Wikipedia is not a structured data source, there are an infinite number of different templates how coordinates are used and the templates are changing constantly. This means when you have invested a lot of work in writing a parser, some weeks later when you run it again you miss tons of entries because people have changed the templates (using robots) for existing articles and you have to start again messing around with the parser.


Marc

[WWW]
aw



Joined: 02/11/2010 17:06:15
Messages: 3
Offline

So what does that mean, that it's a hopeless task?

Are you looking for help with the parsing? Is there anywhere where the parsing technique or code is available or documented? Do you work on the wiki source or the generated HTML? And is it really such an exhausting task to run as you make it sound?
marc



Joined: 08/12/2005 07:39:47
Messages: 4501
Offline

If you want to write a java based parser that would be great.


Marc

[WWW]
aw



Joined: 02/11/2010 17:06:15
Messages: 3
Offline

I'd be delighted to help!
But - let me guess - you're not going to give me any clues how it currently works as a starting point?
marc



Joined: 08/12/2005 07:39:47
Messages: 4501
Offline

The current implementation is a mess and you better don't know about it.

Marc

[WWW]
 
Forum Index -> General
Go to:   
Powered by JForum 2.1.5 © JForum Team