Joined: 12/09/2016 07:34:15
We are happy to announce our project on improving and enriching a gazetteer at the University of Alberta. Here is a summary of our project:
1. We have developed strategies for accurately detecting the geographical scope of places. The geographical scope of each place is derived as a bounding box, based on information in a gazetteer (e.g. Geonames) and public data such as the areas of places in Wikipedia. The accuracy of our bounding boxes is close to those from Geonames and exceeds the accuracy of those available from Open Street Map, as shown in our evaluation. We have currently constructed bounding boxes for a bit over 93,000 locations in Geonames. We think this data can help many geotagging services that rely on Geonames, and we are happy to share the data with the public (please see the Bitbucket page). Please drop us a line if you use this data in your project.
Geonames does not have a bounding box for many of these locations.
2. We have noticed that Geonames often maintains a shallow directory hierarchy with states or provinces placed under countries and everything else placed under states. This often has to do with the variations in administrative divisions that are available to Geonames. For example, University of Alberta is placed under Alberta instead of Edmonton. Based on the bounding boxes obtained in (1), we have found that over 2 million places in Geonames can be moved deeper in the hierarchy and over 90% of the time, those moves are correct.
For more information about the project (including the strategies and evaluation results), please consult the following paper. This is also the paper to cite for our data, algorithms and software.
Sanket Kumar Singh, Davood Rafiei, "Strategies for Geographical Scoping and Improving a Gazetteer", Proc. of the Web (former WWW) Conference, 2018 (to appear).
Sanket Kumar Singh (firstname.lastname@example.org)
Davood Rafiei (email@example.com)