I think it would be great to make the service more generic by adding the possibility to geolocate any document (HTML for example), and not only RSS feeds...
If this is not in your plans... do you have plans to open the RSS to GeoRSS inference backend ?
I'd be glad to collaborate, but I don't know what could I do.
Could you please elaborate on the tasks I can help with ?
Thank you very much,
guido
PS. Is the natural language code using a third party Java library ? Could you please tell something else about the algorithm you are using ?
Geocoding webpages has a lot of new aspects compared to geocoding of natural language text. Location information is often encoded in tabular address formats. You could first look at how addresses can be identified on a webpage. (Microformats) A next problem is the visual proximity of address elements. For natural language geocoding you have one long string of text, whereas on a webpage you have ofen many elements between address element. The geocoder will have to detect this information.
We are not using any third party library for natural language geocoding. In a first step we identify the part of speech of all elements, then we search matching placenames and in the third step we find the best places from the potential places depending on proximity of other places in the same text and some gramatical rules.