GeoNames Home | Postal Codes | Download / Webservice | About 

GeoNames Forum
  [Search] Search   [Recent Topics] Recent Topics   [Groups] Back to home page 
[Register] Register / 
[Login] Login 
Coordinate Patterns in Wikipedia  XML
Forum Index -> General
Author Message
johahoff



Joined: 12/08/2010 15:07:40
Messages: 4
Offline

Hi,

I was wondering how you actually extract the coordinates from Wikipedia, as there are so many different and ever-changing templates for specifying them. Do you use regular expressions? If so, are they available somewhere?

This would be great, because we are also extracting coordinates from Wikipedia, but you seem to have a higher recall (according to your data, you have around 90 000 entities from the English Wikipedia alone, we only have 60 000).

This is actually part of a project of merging Wikipedia and Geonames to a structured knowledge base which will also be freely available under Creative Commons, so we won't steal anything

Thanks in advance!
marc



Joined: 08/12/2005 07:39:47
Messages: 4501
Offline

parsing wikipedia is a pain, sometimes it is better sometimes it is worse it depends on what kind of robots are just messing around when you want to parse it.

well, creative commons has many licenses. Not all of them can be considered free. As you are working with wikipedia you are likely forced to use the share alike type, which cannot be considered free.

Marc

[WWW]
johahoff



Joined: 12/08/2010 15:07:40
Messages: 4
Offline

Hi Marc,

thanks for the quick reply. Regarding licenses - you are working with Wikipedia data but still use the Attribution license, right? Same as us

Cheers
marc



Joined: 08/12/2005 07:39:47
Messages: 4501
Offline

No, the GeoNames dataset does not include wikipedia data. We are just linking to it not merging wikipedia with our content.

Marc

[WWW]
johahoff



Joined: 12/08/2010 15:07:40
Messages: 4
Offline

Ah, i see, so you have your own coordinates and just parse Wikipedia to match them by geographical vicinity?

Sorry for the misunderstanding - so is there any possibility to get to know how exactly you parse the coordinates?
marc



Joined: 08/12/2005 07:39:47
Messages: 4501
Offline

there is no secret trick I could possibly post here and everything is fine. It is just a big mess for everybody trying to parse it. I could spend a day looking at the source code and post a summary, but I would prefer using the time to refactor the code instead.


Marc

[WWW]
 
Forum Index -> General
Go to:   
Powered by JForum 2.1.5 © JForum Team