<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
	<channel>
		<title><![CDATA[Latest posts for the topic "Best/fastest way to search geonames table (mysql query)"]]></title>
		<link>http://forum.geonames.org/gforum/posts/list/4.page</link>
		<description><![CDATA[Latest messages posted in the topic "Best/fastest way to search geonames table (mysql query)"]]></description>
		<generator>JForum - http://www.jforum.net</generator>
			<item>
				<title>Best/fastest way to search geonames table (mysql query)</title>
				<description><![CDATA[ I used the GeoNames information to fill my mysql database with the LAT/LONG information of all European countries. Now I need a php function that finds the LAT/LONG information of a city based on whatever a user inputs in a search field on my website.

What's the best query to do this? I could of course use:

SELECT latitude, longitude
FROM geonames
WHERE name = '$SEARCH_STRING'

But I want to search in the "alternames" fields as well. Should I use

SELECT latitude, longitude
FROM geonames
WHERE (name = '$SEARCH_STRING') OR (alternames LIKE '%$SEARCH_STRING%')

Or is there a faster way? ]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/722.page#3253</guid>
				<link>http://forum.geonames.org/gforum/posts/list/722.page#3253</link>
				<pubDate><![CDATA[Mon, 7 Jan 2008 18:26:35]]> GMT</pubDate>
				<author><![CDATA[ bytte]]></author>
			</item>
			<item>
				<title>Re:Best/fastest way to search geonames table (mysql query)</title>
				<description><![CDATA[ Personally I would recommend using a Full Text Search. 

Mysql has full text functionality built in, which should work. 
http://dev.mysql.com/doc/en/Fulltext_Search.html
(you can create a index on multiple columns, so the name and alternate name can be in one index) 

However I would also recommend, 
http://www.sphinxsearch.com/
which is a very nice full text engine, in other projects found it _much_ quicker as well as returning more relevent results thatn the built in mysql one. (I've never tried it on the geonames gazetteer, but have on others) 

Its not trivial to setup, and comes with some overhead, but I would say the effort is worth it. 

(I beleive that geonames use the Lucene fulltext engine, which evidently works too!) 
]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/722.page#3255</guid>
				<link>http://forum.geonames.org/gforum/posts/list/722.page#3255</link>
				<pubDate><![CDATA[Mon, 7 Jan 2008 19:38:26]]> GMT</pubDate>
				<author><![CDATA[ barryhunter]]></author>
			</item>
			<item>
				<title>Re:Best/fastest way to search geonames table (mysql query)</title>
				<description><![CDATA[ Thanks for the info. I'll check that out.
But isn't a full text search much slower than using LIKE?]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/722.page#3256</guid>
				<link>http://forum.geonames.org/gforum/posts/list/722.page#3256</link>
				<pubDate><![CDATA[Mon, 7 Jan 2008 20:51:21]]> GMT</pubDate>
				<author><![CDATA[ bytte]]></author>
			</item>
			<item>
				<title>Re:Best/fastest way to search geonames table (mysql query)</title>
				<description><![CDATA[ > But isn't a full text search much slower than using LIKE?

Not usually, mostly Full Text is quicker, espically when checking multiple columns. (mysql doesnt optimise OR queries very well in my experience) 

 A single column, with a normal index, should be quicker with LIKE, esp if LIKE 'bla%' as it can optimise the use of the index. 

But using a dedicated engine like sphinx will be MUCH MUCH MUCH quicker then either Like, or even plain =, even on single columns. 

Edited to add: the disanvantage of full text is updates are slower, (esp with a seperate engine :)), but with something like a gazetteer that rarely changes not such an issue.
]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/722.page#3257</guid>
				<link>http://forum.geonames.org/gforum/posts/list/722.page#3257</link>
				<pubDate><![CDATA[Mon, 7 Jan 2008 21:04:57]]> GMT</pubDate>
				<author><![CDATA[ barryhunter]]></author>
			</item>
			<item>
				<title>Re:Best/fastest way to search geonames table (mysql query)</title>
				<description><![CDATA[ Hey that's really interesting information. Thanks!]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/722.page#3259</guid>
				<link>http://forum.geonames.org/gforum/posts/list/722.page#3259</link>
				<pubDate><![CDATA[Mon, 7 Jan 2008 22:15:57]]> GMT</pubDate>
				<author><![CDATA[ bytte]]></author>
			</item>
			<item>
				<title>Re:Best/fastest way to search geonames table (mysql query)</title>
				<description><![CDATA[ Some benchmarks, (for some reason couldnt get a standard index to work so missed that test out) 

These tests are all on the same machine, its only one time shown, but did each test a number of times and the numbers in same ballpark. This is on a table with the global geonames database loaded. 

================================================

No Index
--------

SELECT *
FROM `geonames`
WHERE `name` = 'tenterden'
OR `alternames` LIKE '%tenterden%';

Showing rows 0 - 2 (3 total, Query took 17.9760 sec)

================================================

ADD INDEX `name` ( `name` , `alternames` );

MySQL said: #1071 - Specified key was too long; max key length is 1000 bytes

No idea why that wont work... so no test with index...

================================================

Full Text Index
--------

SQL query: SELECT *
FROM `geonames`
WHERE MATCH (
name, alternames
)
AGAINST (
'tenterden'
);

Showing rows 0 - 3 (4 total, Query took 0.1028 sec)

================================================

Sphinx
--------

 time search --index geonames tenterden
Sphinx 0.9.7
index 'geonames': query 'tenterden ': returned 4 matches of 4 total in 0.000 sec

<SNIP RESULTS>

words:
1. 'tenterden': 4 documents, 7 hits


real    0m0.076s
user    0m0.000s
sys     0m0.056s

(the difference in time between that reported by sphinx search and the overall, is because the overall includes having to lookup the details in the mysql - as sphinx only returns ID. 

(doing it as 4 seperate mysql queries takes 0.0102 sec, but in a real application would do in (id1,id2) etc which took 0.0003 sec)

So overall in a realworld application coould expect 0.001 seconds :)

================================================
]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/722.page#3265</guid>
				<link>http://forum.geonames.org/gforum/posts/list/722.page#3265</link>
				<pubDate><![CDATA[Tue, 8 Jan 2008 00:30:21]]> GMT</pubDate>
				<author><![CDATA[ barryhunter]]></author>
			</item>
			<item>
				<title>Re:Best/fastest way to search geonames table (mysql query)</title>
				<description><![CDATA[ Have you tested a "UNION ALL" statement to see if it is any quicker than the "OR" statement? It may or may not be, but it is worth checking...



]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/722.page#3268</guid>
				<link>http://forum.geonames.org/gforum/posts/list/722.page#3268</link>
				<pubDate><![CDATA[Tue, 8 Jan 2008 09:32:23]]> GMT</pubDate>
				<author><![CDATA[ rtaylor]]></author>
			</item>
			<item>
				<title>Re:Best/fastest way to search geonames table (mysql query)</title>
				<description><![CDATA[ crazy benchmarks! sphinx seems the way to go!]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/722.page#3269</guid>
				<link>http://forum.geonames.org/gforum/posts/list/722.page#3269</link>
				<pubDate><![CDATA[Tue, 8 Jan 2008 10:37:55]]> GMT</pubDate>
				<author><![CDATA[ bytte]]></author>
			</item>
			<item>
				<title>Re:Best/fastest way to search geonames table (mysql query)</title>
				<description><![CDATA[ I managed to create two seperate indexes

  PRIMARY KEY  (`geonameid`),
  KEY `country_code` (`country_code`,`feature_code`),
  KEY `name_2` (`name`),
  KEY `alternames` (`alternames`),
  FULLTEXT KEY `name` (`name`,`alternames`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 |


(the one on alternames took over 1 hour and 30 minutes - even the Full text index only took 15 minutes) 

But 
<span class="genmed"><b>Code:</b></span><br>
		<div style="overflow: auto; width: 100%;">
		<pre>
mysql&gt; explain SELECT * FROM `geonames` use key &#40;`alternames`&#41; WHERE `alternames` LIKE '%tenterden%' ;
+----+-------------+----------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table    | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
+----+-------------+----------+------+---------------+------+---------+------+---------+-------------+
|  1 | SIMPLE      | geonames | ALL  | NULL          | NULL | NULL    | NULL | 6605140 | Using where |
+----+-------------+----------+------+---------------+------+---------+------+---------+-------------+
1 row in set &#40;0.00 sec&#41;

mysql&gt; explain SELECT * FROM `geonames` use key &#40;`alternames`&#41; WHERE `alternames` LIKE 'tenterden%' ;
+----+-------------+----------+-------+---------------+------------+---------+------+------+-------------+
| id | select_type | table    | type  | possible_keys | key        | key_len | ref  | rows | Extra       |
+----+-------------+----------+-------+---------------+------------+---------+------+------+-------------+
|  1 | SIMPLE      | geonames | range | alternames    | alternames | 602     | NULL |    1 | Using where |
+----+-------------+----------+-------+---------------+------------+---------+------+------+-------------+
1 row in set &#40;0.05 sec&#41;
</pre>
		</div>
so it seems the altername index is useless. So a union still takes (19.07 sec) as the alternames half still takes forever.  ]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/722.page#3270</guid>
				<link>http://forum.geonames.org/gforum/posts/list/722.page#3270</link>
				<pubDate><![CDATA[Tue, 8 Jan 2008 12:43:50]]> GMT</pubDate>
				<author><![CDATA[ barryhunter]]></author>
			</item>
			<item>
				<title>Re:Best/fastest way to search geonames table (mysql query)</title>
				<description><![CDATA[ On my website I want the user to select a country e.g. Belgium. Then he needs to enter a city name in a search field. I plan to use an ajax script to search the geonames database while the user enters text so it can "suggest" a list of possible city names the user wants to enter.

I'm sure you've seen this before, kind of like what happens on Google Suggest at http://www.google.com/webhp?complete=1&hl=en

I guess that's perfectly possible using the full-text search. However, I'm not sure about the alternames field. As it has comma separated values I want to display only the perfect match of that list. Consider the database entry for Brussels. It has many alternames such as Bruxelles, Brussel, Bruselas and so on. If my user enters "Bruxe..." he shouldn't see "Brussels" in the list below, and he also shouldn't see the full comma-separated list of alternames but only the altername he's probably going to be entering: "Bruxelles".

Is there a way to do this using sphinx?]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/722.page#3271</guid>
				<link>http://forum.geonames.org/gforum/posts/list/722.page#3271</link>
				<pubDate><![CDATA[Tue, 8 Jan 2008 13:01:09]]> GMT</pubDate>
				<author><![CDATA[ bytte]]></author>
			</item>
			<item>
				<title>Re:Best/fastest way to search geonames table (mysql query)</title>
				<description><![CDATA[ Yes that is almost certainly possible (and is almost what I do) 

I was going to post this here:
http://www.nearby.org.uk/geonames/text-service.php?query=Brussel+BE
to as I now have the sphinx index on a online server, so you can see how quick it is. 

As it happens I was putting the script I use for autocomplete, via sphinx. 

The trick I use, is to build a 'autocomplete' secondary index column, 

so <span class="genmed"><b>Code:</b></span><br>
		<div style="overflow: auto; width: 100%;">
		<pre>        name=Brusselse Voorstad
        autocomplete=brusselse brussels brussel brusse bruss brus bru br b voorstad voorsta voorst voors voor voo vo v</pre>
		</div>

what this does is gives the sphinx index words to find (as it can only match whole words) 

(note however that "Bruxe" wont work on the above service yet as only built the autocompelte index for PPLs)
- its far from useable service yet

Also note by also indexing the country_code column can also just add that as a search term to restrict to that country. (this can be done transparently in your code) 

"<i>If my user enters "Bruxe..." he shouldn't see "Brussels" in the list below, and he also shouldn't see the full comma-separated list of alternames but only the altername he's probably going to be entering: "Bruxelles".</i>"

This is case I hadnt considered, but it can be easily rectified with a bit of post processing (in php etc), where you detect the query doesnt match the name field, and just select the apprirate one from ther alternames and show that instead (as though it was the main name) ]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/722.page#3272</guid>
				<link>http://forum.geonames.org/gforum/posts/list/722.page#3272</link>
				<pubDate><![CDATA[Tue, 8 Jan 2008 14:07:03]]> GMT</pubDate>
				<author><![CDATA[ barryhunter]]></author>
			</item>
			<item>
				<title>Re:Best/fastest way to search geonames table (mysql query)</title>
				<description><![CDATA[ Hmmm... I don't really like the idea of this extra column. It feels like double information and looks harder to maintain. Is there no way in Sphinx to search for partial words? Similar to what LIKE does with the "%" syntax?

(I was about to try sphinx but found it it's hard to install on mac osx for a command line novice like me)]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/722.page#3273</guid>
				<link>http://forum.geonames.org/gforum/posts/list/722.page#3273</link>
				<pubDate><![CDATA[Tue, 8 Jan 2008 15:23:42]]> GMT</pubDate>
				<author><![CDATA[ bytte]]></author>
			</item>
			<item>
				<title>Re:Best/fastest way to search geonames table (mysql query)</title>
				<description><![CDATA[ > Is there no way in Sphinx to search for partial words?
Nope, it doesnt actully store the words but rather CRC number hashes for each word - why it can be so quick. 

> found it it's hard to install on mac osx for a command line novice like me
dito - unfortuntly it doesnt seem to have many packages available, so has to be compiled from source. 

btw I should make clear that sphinx just one I use (and love) but there are other full text engines like Lucene which will probably install easier (but not sure if easier to use) , or you might find the inbuilt MySQL one fast enough for you. ]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/722.page#3274</guid>
				<link>http://forum.geonames.org/gforum/posts/list/722.page#3274</link>
				<pubDate><![CDATA[Tue, 8 Jan 2008 15:32:58]]> GMT</pubDate>
				<author><![CDATA[ barryhunter]]></author>
			</item>
			<item>
				<title>Re:Best/fastest way to search geonames table (mysql query)</title>
				<description><![CDATA[ Ok thanks. If it wasn't for the partial words, it would of course be perfect. Lucene doesn't seem that fast compared to Sphinx as far as I read on the web. Might try to re-think my application so it fits Sphinx.]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/722.page#3276</guid>
				<link>http://forum.geonames.org/gforum/posts/list/722.page#3276</link>
				<pubDate><![CDATA[Tue, 8 Jan 2008 15:53:46]]> GMT</pubDate>
				<author><![CDATA[ bytte]]></author>
			</item>
			<item>
				<title>Re:Best/fastest way to search geonames table (mysql query)</title>
				<description><![CDATA[ I was thinking of creating a new table for every country. I could then possible use the sql LIKE "searchstring%" method to find the possible matching cities. The advantage is that mysql only needs to look through the country database.

Do you think that'll be quick enough as there's no Sphinx involved?]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/722.page#3282</guid>
				<link>http://forum.geonames.org/gforum/posts/list/722.page#3282</link>
				<pubDate><![CDATA[Tue, 8 Jan 2008 21:03:52]]> GMT</pubDate>
				<author><![CDATA[ bytte]]></author>
			</item>
			<item>
				<title>Re:Best/fastest way to search geonames table (mysql query)</title>
				<description><![CDATA[ I am using <a href="http://wiki.apache.org/jakarta-lucene/SpellChecker" target="_new" rel="nofollow">Lucene spellchecker</a>    for giving suggestions.

I created 2 indices, 
a) The asciiname index
b) spell checker index

Given any place, I look up the first index ( example "london*" ) and if no result is found, I fall back checking the second index.

The only drawback of falling to the second index is that, lucene spellchecker, would just return an array of strings representing the possible corrections and hence if you need any additional data, you would need to query the first index again to get additional field data

-Rakesh]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/722.page#3326</guid>
				<link>http://forum.geonames.org/gforum/posts/list/722.page#3326</link>
				<pubDate><![CDATA[Fri, 11 Jan 2008 17:34:57]]> GMT</pubDate>
				<author><![CDATA[ rakeshxp]]></author>
			</item>
			<item>
				<title>Re:Best/fastest way to search geonames table (mysql query)</title>
				<description><![CDATA[ Despite what I said before  :oops: , Sphinx does have inbuilt functionality to to do indexing of part words. 

Its not documented, but in the config file there is min_prefix_len
"if prefix length is positive, indexer will not only index all words,
but all the possible prefixes (ie. word beginnings) as well"

and minimum infix length
"if infix length is positive, indexer will not only index all words,
but all the possible infixes (ie. characters subsequences starting
anywhere inside the word) as well"

ok so they do exactly the same but you dont have to maintain a seperate column :)


]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/722.page#3388</guid>
				<link>http://forum.geonames.org/gforum/posts/list/722.page#3388</link>
				<pubDate><![CDATA[Mon, 21 Jan 2008 21:10:37]]> GMT</pubDate>
				<author><![CDATA[ barryhunter]]></author>
			</item>
			<item>
				<title>Re:Best/fastest way to search geonames table (mysql query)</title>
				<description><![CDATA[ This is awesome!]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/722.page#3389</guid>
				<link>http://forum.geonames.org/gforum/posts/list/722.page#3389</link>
				<pubDate><![CDATA[Mon, 21 Jan 2008 22:05:49]]> GMT</pubDate>
				<author><![CDATA[ bytte]]></author>
			</item>
			<item>
				<title>Re:Best/fastest way to search geonames table (mysql query)</title>
				<description><![CDATA[ Can someone explain in a sentence or two why something like Sphinx or Lucene is needed?  How come fulltext searches can't be performed quickly with MySQL's MATCH...AGAINST or something similar?  Are these two options quicker generally, and if so, what kind of SQL do they use that I can't just implement it directly into my own query?]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/722.page#4278</guid>
				<link>http://forum.geonames.org/gforum/posts/list/722.page#4278</link>
				<pubDate><![CDATA[Sun, 6 Jul 2008 02:06:31]]> GMT</pubDate>
				<author><![CDATA[ mhuggins]]></author>
			</item>
			<item>
				<title>Re:Best/fastest way to search geonames table (mysql query)</title>
				<description><![CDATA[ Hello All

I want to take the city name from my site user and want to give the suggestions to them that there are these many cities of this name. Site user will select on the base of the suggestions.

To achieve this, what can be the proper way?

Should I call the geonames webservices with full text search? I want to search the city for the entire world?

Or should I use the DB? If the DB is proper then from where I can get the dump for same?

Kindly help me out.

Thanks]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/722.page#4811</guid>
				<link>http://forum.geonames.org/gforum/posts/list/722.page#4811</link>
				<pubDate><![CDATA[Thu, 16 Oct 2008 08:29:16]]> GMT</pubDate>
				<author><![CDATA[ Hiren]]></author>
			</item>
	</channel>
</rss>