Messages posted by "barryhunter"

Re:country = 'UK' not returning anything.

barryhunter — GMT

The 'country code' for the UK, is actually GB select * from geoname where country = 'GB' order by random() LIMIT 1 Possibly confusion when the standard was devised, GB is short for Great Britain which is often confused with United Kingdom. When defining country top level domains, undid this mistake and switched to the more understandable .uk (.gb domains are not used)

Re:Trying to lookup cities by country, but country is among results

barryhunter — GMT

Can post filter by fcode, http://www.geonames.org/export/codes.html to remove ones you not interested in. Or could perhaps use the cities paramater, to limit to top cities. http://www.geonames.org/export/geonames-search.html Or even the featureCode paramater, and then list the types of feature codes you do want. Probably the PPL* ones.

Re:NGA GEOnet Names ID's in GeoNames?

barryhunter — GMT

Another bump for this. None of the above links work any more. :( I have an old gazetteer imported from NGA GEOnet. I would like to convert it to geonames.org. It still has the UFI&UNI ids stored, so would great to be able to use that for the conversion. (rather than having do some sort of imperfect fuzzy matching)

Re:Extract "geonames" from text

barryhunter — GMT

http://developer.yahoo.com/geo/placemaker/ http://developers.metacarta.com/product/metacarta-geotagger/ http://www.geonames.org/rss-to-georss-converter.html Otherwise grow your own. Can get a fairly basic, but working, implementation, by loading geonames placenames into a full-text search index (eg sphinxsearch.com). And querying against it.

Re:Sorting geocoded place name results in own server?

barryhunter — GMT

At the most basic level, I think it just sorts by population descending. Ie at the lack of any other distinguishing features returns the biggest one first. But can also combine in sort by the featurecode.

Re:Search with LIKE --> very very slow

barryhunter — GMT

You should look at Full-text search indexes. I have no idea if oracle offer such a thing - but would be very surprised if they dont. If not look at something like sphinxsearch.com - works great with geonames data. Shouldnt be too difficult to hook up to a oracal database, either using odbc, http://sphinxsearch.com/docs/current.html#conf-odbc-dsn or xmlpipe

Re:Is geonames webservice is currently broken

barryhunter — GMT

Btw see these threads http://groups.google.com/group/geonames/browse_thread/thread/b8e0917b13240264?hl=en http://groups.google.com/group/geonames/browse_thread/thread/775e8b6c518b8bcb?hl=en

Re:Is geonames webservice is currently broken

barryhunter — GMT

Because of a problem with a iphone application apparently causing issues, the webservices have been moved temporally to ws5.geonames.org http://ws5.geonames.org/neighbourhood?lat=40.78343&lng=-73.96625 BUT its very suspicious that ws.geonames.org redirects to the the iPhone apps webpage! The only reason I can think is its a retaliatory effort, to send all the traffic that is currently hitting geonames servers and instead overwelm imob's webservers.

How to build a "Suggest" search

barryhunter — GMT

Re the note on using Sphinx, it was probably my misinformation, but you dont actully need to create that column, Sphinx can do it transarently at the indexing stage. what to include... I wondered this, in the end not sure can leave anything out, you just want them very low priority so they come last :) order by ... I've done basically that in a prototype I am building, but didnt think about the population (although now think it I think geonames does that!) ... what I did was assigna sort order (manually) to each feature code, and sort on that. (can either add it as another column, or just join with a table of feature codes)

Re:Best/fastest way to search geonames table (mysql query)

barryhunter — GMT

Despite what I said before :oops: , Sphinx does have inbuilt functionality to to do indexing of part words. Its not documented, but in the config file there is min_prefix_len "if prefix length is positive, indexer will not only index all words, but all the possible prefixes (ie. word beginnings) as well" and minimum infix length "if infix length is positive, indexer will not only index all words, but all the possible infixes (ie. characters subsequences starting anywhere inside the word) as well" ok so they do exactly the same but you dont have to maintain a seperate column :)

Re:Best/fastest way to search geonames table (mysql query)

barryhunter — GMT

> Is there no way in Sphinx to search for partial words? Nope, it doesnt actully store the words but rather CRC number hashes for each word - why it can be so quick. > found it it's hard to install on mac osx for a command line novice like me dito - unfortuntly it doesnt seem to have many packages available, so has to be compiled from source. btw I should make clear that sphinx just one I use (and love) but there are other full text engines like Lucene which will probably install easier (but not sure if easier to use) , or you might find the inbuilt MySQL one fast enough for you.

Re:Best/fastest way to search geonames table (mysql query)

barryhunter — GMT

Yes that is almost certainly possible (and is almost what I do) I was going to post this here: http://www.nearby.org.uk/geonames/text-service.php?query=Brussel+BE to as I now have the sphinx index on a online server, so you can see how quick it is. As it happens I was putting the script I use for autocomplete, via sphinx. The trick I use, is to build a 'autocomplete' secondary index column, so Code:

        name=Brusselse Voorstad
        autocomplete=brusselse brussels brussel brusse bruss brus bru br b voorstad voorsta voorst voors voor voo vo v

what this does is gives the sphinx index words to find (as it can only match whole words) (note however that "Bruxe" wont work on the above service yet as only built the autocompelte index for PPLs) - its far from useable service yet Also note by also indexing the country_code column can also just add that as a search term to restrict to that country. (this can be done transparently in your code) "If my user enters "Bruxe..." he shouldn't see "Brussels" in the list below, and he also shouldn't see the full comma-separated list of alternames but only the altername he's probably going to be entering: "Bruxelles"." This is case I hadnt considered, but it can be easily rectified with a bit of post processing (in php etc), where you detect the query doesnt match the name field, and just select the apprirate one from ther alternames and show that instead (as though it was the main name)

Re:Best/fastest way to search geonames table (mysql query)

barryhunter — GMT

I managed to create two seperate indexes PRIMARY KEY (`geonameid`), KEY `country_code` (`country_code`,`feature_code`), KEY `name_2` (`name`), KEY `alternames` (`alternames`), FULLTEXT KEY `name` (`name`,`alternames`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 | (the one on alternames took over 1 hour and 30 minutes - even the Full text index only took 15 minutes) But Code:

mysql> explain SELECT * FROM `geonames` use key (`alternames`) WHERE `alternames` LIKE '%tenterden%' ;
+----+-------------+----------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table    | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
+----+-------------+----------+------+---------------+------+---------+------+---------+-------------+
|  1 | SIMPLE      | geonames | ALL  | NULL          | NULL | NULL    | NULL | 6605140 | Using where |
+----+-------------+----------+------+---------------+------+---------+------+---------+-------------+
1 row in set (0.00 sec)

mysql> explain SELECT * FROM `geonames` use key (`alternames`) WHERE `alternames` LIKE 'tenterden%' ;
+----+-------------+----------+-------+---------------+------------+---------+------+------+-------------+
| id | select_type | table    | type  | possible_keys | key        | key_len | ref  | rows | Extra       |
+----+-------------+----------+-------+---------------+------------+---------+------+------+-------------+
|  1 | SIMPLE      | geonames | range | alternames    | alternames | 602     | NULL |    1 | Using where |
+----+-------------+----------+-------+---------------+------------+---------+------+------+-------------+
1 row in set (0.05 sec)

so it seems the altername index is useless. So a union still takes (19.07 sec) as the alternames half still takes forever.

Re:Best/fastest way to search geonames table (mysql query)

barryhunter — GMT

Some benchmarks, (for some reason couldnt get a standard index to work so missed that test out) These tests are all on the same machine, its only one time shown, but did each test a number of times and the numbers in same ballpark. This is on a table with the global geonames database loaded. ================================================ No Index -------- SELECT * FROM `geonames` WHERE `name` = 'tenterden' OR `alternames` LIKE '%tenterden%'; Showing rows 0 - 2 (3 total, Query took 17.9760 sec) ================================================ ADD INDEX `name` ( `name` , `alternames` ); MySQL said: #1071 - Specified key was too long; max key length is 1000 bytes No idea why that wont work... so no test with index... ================================================ Full Text Index -------- SQL query: SELECT * FROM `geonames` WHERE MATCH ( name, alternames ) AGAINST ( 'tenterden' ); Showing rows 0 - 3 (4 total, Query took 0.1028 sec) ================================================ Sphinx -------- time search --index geonames tenterden Sphinx 0.9.7 index 'geonames': query 'tenterden ': returned 4 matches of 4 total in 0.000 sec words: 1. 'tenterden': 4 documents, 7 hits real 0m0.076s user 0m0.000s sys 0m0.056s (the difference in time between that reported by sphinx search and the overall, is because the overall includes having to lookup the details in the mysql - as sphinx only returns ID. (doing it as 4 seperate mysql queries takes 0.0102 sec, but in a real application would do in (id1,id2) etc which took 0.0003 sec) So overall in a realworld application coould expect 0.001 seconds :) ================================================

Re:Best/fastest way to search geonames table (mysql query)

barryhunter — GMT

> But isn't a full text search much slower than using LIKE? Not usually, mostly Full Text is quicker, espically when checking multiple columns. (mysql doesnt optimise OR queries very well in my experience) A single column, with a normal index, should be quicker with LIKE, esp if LIKE 'bla%' as it can optimise the use of the index. But using a dedicated engine like sphinx will be MUCH MUCH MUCH quicker then either Like, or even plain =, even on single columns. Edited to add: the disanvantage of full text is updates are slower, (esp with a seperate engine :)), but with something like a gazetteer that rarely changes not such an issue.

Re:Best/fastest way to search geonames table (mysql query)

barryhunter — GMT

Personally I would recommend using a Full Text Search. Mysql has full text functionality built in, which should work. http://dev.mysql.com/doc/en/Fulltext_Search.html (you can create a index on multiple columns, so the name and alternate name can be in one index) However I would also recommend, http://www.sphinxsearch.com/ which is a very nice full text engine, in other projects found it _much_ quicker as well as returning more relevent results thatn the built in mysql one. (I've never tried it on the geonames gazetteer, but have on others) Its not trivial to setup, and comes with some overhead, but I would say the effort is worth it. (I beleive that geonames use the Lucene fulltext engine, which evidently works too!)

Re:rss to georss recognizes coordinates?

barryhunter — GMT

Do you have the option to add tags? If so could use the following tags: geo:lat=51.4989 geo:lon=-0.1786 that are fairly standard, and no reason geonames cant easily parse that.

Re:rss to georss recognizes coordinates?

barryhunter — GMT

One thing that strikes me here is this is what Microformats should be for http://microformats.org/wiki/geo ... so encourage the blog creator to use microformats to tag the locations, (if they not already using GeoRSS - which this tool does maintain :) - and then geonames only has to support parseing out a microformat. more... http://highearthorbit.com/a-proposal-georss-kml/ (unifing the various standards) ... alas I dont have any feeds to test this on :( ... although the geonames rss parser does have a 'place in the market' for taking arbitrtay text and marking it up to standarsd (eg surrounding identified coordinates with the microformat for other services to consume) (Marc I'll forward you some regexp's I have for identifing arbiray coordinates in free text, that I have used in a prototype geographic search engine - alas before it was recently patented :)

Re:Database Dump Postals

barryhunter — GMT

Have you seen the postcodes forum: http://forum.geonames.org/gforum/forums/show/7.page

Map and Mybrid layers in Google Map?

barryhunter — GMT

Great information! It's true for geonames, which is primarilly a gazetteer, that its not really a direct copy of the map data (which is a special case of a database hence under Database Right), as you say the data will not exactly match and is subject to artistic interpretation. The two angles that I am taking from this I guess would be taking a more direct 'copy' of the map, the openstreetmap project would be taking a copy of the map data itself. And in the case of geograph, we are picking features at a high resolution, that for example would be exactly matching the underlying data- to eventually be able to recreate a sizeable portion of the map data (ie capture, road junctions, roads, canals, churchs, basically all map features.)