| Author |
Message |
![[Post New]](/gforum/templates/default/images/icon_minipost_new.gif) 12/08/2010 15:12:40
|
johahoff
Joined: 12/08/2010 15:07:40
Messages: 4
Offline
|
Hi,
I was wondering how you actually extract the coordinates from Wikipedia, as there are so many different and ever-changing templates for specifying them. Do you use regular expressions? If so, are they available somewhere?
This would be great, because we are also extracting coordinates from Wikipedia, but you seem to have a higher recall (according to your data, you have around 90 000 entities from the English Wikipedia alone, we only have 60 000).
This is actually part of a project of merging Wikipedia and Geonames to a structured knowledge base which will also be freely available under Creative Commons, so we won't steal anything
Thanks in advance!
|
|
|
 |
![[Post New]](/gforum/templates/default/images/icon_minipost_new.gif) 12/08/2010 16:06:53
|
marc
Joined: 08/12/2005 07:39:47
Messages: 4501
Offline
|
parsing wikipedia is a pain, sometimes it is better sometimes it is worse it depends on what kind of robots are just messing around when you want to parse it.
well, creative commons has many licenses. Not all of them can be considered free. As you are working with wikipedia you are likely forced to use the share alike type, which cannot be considered free.
Marc
|
 |
|
|
 |
![[Post New]](/gforum/templates/default/images/icon_minipost_new.gif) 12/08/2010 16:18:13
|
johahoff
Joined: 12/08/2010 15:07:40
Messages: 4
Offline
|
Hi Marc,
thanks for the quick reply. Regarding licenses - you are working with Wikipedia data but still use the Attribution license, right? Same as us
Cheers
|
|
|
 |
![[Post New]](/gforum/templates/default/images/icon_minipost_new.gif) 12/08/2010 18:07:02
|
marc
Joined: 08/12/2005 07:39:47
Messages: 4501
Offline
|
No, the GeoNames dataset does not include wikipedia data. We are just linking to it not merging wikipedia with our content.
Marc
|
 |
|
|
 |
![[Post New]](/gforum/templates/default/images/icon_minipost_new.gif) 12/08/2010 19:38:14
|
johahoff
Joined: 12/08/2010 15:07:40
Messages: 4
Offline
|
Ah, i see, so you have your own coordinates and just parse Wikipedia to match them by geographical vicinity?
Sorry for the misunderstanding - so is there any possibility to get to know how exactly you parse the coordinates?
|
|
|
 |
![[Post New]](/gforum/templates/default/images/icon_minipost_new.gif) 13/08/2010 22:05:58
|
marc
Joined: 08/12/2005 07:39:47
Messages: 4501
Offline
|
there is no secret trick I could possibly post here and everything is fine. It is just a big mess for everybody trying to parse it. I could spend a day looking at the source code and post a summary, but I would prefer using the time to refactor the code instead.
Marc
|
 |
|
|
 |
|
|