GeoNames Home | Postal Codes | Download / Webservice | About 

GeoNames Forum
  [Search] Search   [Recent Topics] Recent Topics   [Groups] Back to home page 
[Register] Register / 
[Login] Login 
Best/fastest way to search geonames table (mysql query)  XML
Forum Index -> General
Author Message
bytte



Joined: 07/01/2008 17:56:11
Messages: 15
Offline

I used the GeoNames information to fill my mysql database with the LAT/LONG information of all European countries. Now I need a php function that finds the LAT/LONG information of a city based on whatever a user inputs in a search field on my website.

What's the best query to do this? I could of course use:

SELECT latitude, longitude
FROM geonames
WHERE name = '$SEARCH_STRING'

But I want to search in the "alternames" fields as well. Should I use

SELECT latitude, longitude
FROM geonames
WHERE (name = '$SEARCH_STRING') OR (alternames LIKE '%$SEARCH_STRING%')

Or is there a faster way?
barryhunter


[Avatar]

Joined: 13/09/2006 21:25:40
Messages: 27
Offline

Personally I would recommend using a Full Text Search.

Mysql has full text functionality built in, which should work.
http://dev.mysql.com/doc/en/Fulltext_Search.html
(you can create a index on multiple columns, so the name and alternate name can be in one index)

However I would also recommend,
http://www.sphinxsearch.com/
which is a very nice full text engine, in other projects found it _much_ quicker as well as returning more relevent results thatn the built in mysql one. (I've never tried it on the geonames gazetteer, but have on others)

Its not trivial to setup, and comes with some overhead, but I would say the effort is worth it.

(I beleive that geonames use the Lucene fulltext engine, which evidently works too!)

--
- www.nearby.org.uk - www.geograph.org.uk -
[WWW]
bytte



Joined: 07/01/2008 17:56:11
Messages: 15
Offline

Thanks for the info. I'll check that out.
But isn't a full text search much slower than using LIKE?
barryhunter


[Avatar]

Joined: 13/09/2006 21:25:40
Messages: 27
Offline

> But isn't a full text search much slower than using LIKE?

Not usually, mostly Full Text is quicker, espically when checking multiple columns. (mysql doesnt optimise OR queries very well in my experience)

A single column, with a normal index, should be quicker with LIKE, esp if LIKE 'bla%' as it can optimise the use of the index.

But using a dedicated engine like sphinx will be MUCH MUCH MUCH quicker then either Like, or even plain =, even on single columns.

Edited to add: the disanvantage of full text is updates are slower, (esp with a seperate engine ), but with something like a gazetteer that rarely changes not such an issue.

--
- www.nearby.org.uk - www.geograph.org.uk -
[WWW]
bytte



Joined: 07/01/2008 17:56:11
Messages: 15
Offline

Hey that's really interesting information. Thanks!
barryhunter


[Avatar]

Joined: 13/09/2006 21:25:40
Messages: 27
Offline

Some benchmarks, (for some reason couldnt get a standard index to work so missed that test out)

These tests are all on the same machine, its only one time shown, but did each test a number of times and the numbers in same ballpark. This is on a table with the global geonames database loaded.

================================================

No Index
--------

SELECT *
FROM `geonames`
WHERE `name` = 'tenterden'
OR `alternames` LIKE '%tenterden%';

Showing rows 0 - 2 (3 total, Query took 17.9760 sec)

================================================

ADD INDEX `name` ( `name` , `alternames` );

MySQL said: #1071 - Specified key was too long; max key length is 1000 bytes

No idea why that wont work... so no test with index...

================================================

Full Text Index
--------

SQL query: SELECT *
FROM `geonames`
WHERE MATCH (
name, alternames
)
AGAINST (
'tenterden'
);

Showing rows 0 - 3 (4 total, Query took 0.1028 sec)

================================================

Sphinx
--------

time search --index geonames tenterden
Sphinx 0.9.7
index 'geonames': query 'tenterden ': returned 4 matches of 4 total in 0.000 sec

<SNIP RESULTS>

words:
1. 'tenterden': 4 documents, 7 hits


real 0m0.076s
user 0m0.000s
sys 0m0.056s

(the difference in time between that reported by sphinx search and the overall, is because the overall includes having to lookup the details in the mysql - as sphinx only returns ID.

(doing it as 4 seperate mysql queries takes 0.0102 sec, but in a real application would do in (id1,id2) etc which took 0.0003 sec)

So overall in a realworld application coould expect 0.001 seconds

================================================

--
- www.nearby.org.uk - www.geograph.org.uk -
[WWW]
rtaylor



Joined: 07/01/2008 16:15:19
Messages: 3
Offline

Have you tested a "UNION ALL" statement to see if it is any quicker than the "OR" statement? It may or may not be, but it is worth checking...



[WWW]
bytte



Joined: 07/01/2008 17:56:11
Messages: 15
Offline

crazy benchmarks! sphinx seems the way to go!
barryhunter


[Avatar]

Joined: 13/09/2006 21:25:40
Messages: 27
Offline

I managed to create two seperate indexes

PRIMARY KEY (`geonameid`),
KEY `country_code` (`country_code`,`feature_code`),
KEY `name_2` (`name`),
KEY `alternames` (`alternames`),
FULLTEXT KEY `name` (`name`,`alternames`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 |


(the one on alternames took over 1 hour and 30 minutes - even the Full text index only took 15 minutes)

But
Code:
 mysql> explain SELECT * FROM `geonames` use key (`alternames`) WHERE `alternames` LIKE '%tenterden%' ;
 +----+-------------+----------+------+---------------+------+---------+------+---------+-------------+
 | id | select_type | table    | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
 +----+-------------+----------+------+---------------+------+---------+------+---------+-------------+
 |  1 | SIMPLE      | geonames | ALL  | NULL          | NULL | NULL    | NULL | 6605140 | Using where |
 +----+-------------+----------+------+---------------+------+---------+------+---------+-------------+
 1 row in set (0.00 sec)
 
 mysql> explain SELECT * FROM `geonames` use key (`alternames`) WHERE `alternames` LIKE 'tenterden%' ;
 +----+-------------+----------+-------+---------------+------------+---------+------+------+-------------+
 | id | select_type | table    | type  | possible_keys | key        | key_len | ref  | rows | Extra       |
 +----+-------------+----------+-------+---------------+------------+---------+------+------+-------------+
 |  1 | SIMPLE      | geonames | range | alternames    | alternames | 602     | NULL |    1 | Using where |
 +----+-------------+----------+-------+---------------+------------+---------+------+------+-------------+
 1 row in set (0.05 sec)
 

so it seems the altername index is useless. So a union still takes (19.07 sec) as the alternames half still takes forever.

--
- www.nearby.org.uk - www.geograph.org.uk -
[WWW]
bytte



Joined: 07/01/2008 17:56:11
Messages: 15
Offline

On my website I want the user to select a country e.g. Belgium. Then he needs to enter a city name in a search field. I plan to use an ajax script to search the geonames database while the user enters text so it can "suggest" a list of possible city names the user wants to enter.

I'm sure you've seen this before, kind of like what happens on Google Suggest at http://www.google.com/webhp?complete=1&hl=en

I guess that's perfectly possible using the full-text search. However, I'm not sure about the alternames field. As it has comma separated values I want to display only the perfect match of that list. Consider the database entry for Brussels. It has many alternames such as Bruxelles, Brussel, Bruselas and so on. If my user enters "Bruxe..." he shouldn't see "Brussels" in the list below, and he also shouldn't see the full comma-separated list of alternames but only the altername he's probably going to be entering: "Bruxelles".

Is there a way to do this using sphinx?
barryhunter


[Avatar]

Joined: 13/09/2006 21:25:40
Messages: 27
Offline

Yes that is almost certainly possible (and is almost what I do)

I was going to post this here:
http://www.nearby.org.uk/geonames/text-service.php?query=Brussel+BE
to as I now have the sphinx index on a online server, so you can see how quick it is.

As it happens I was putting the script I use for autocomplete, via sphinx.

The trick I use, is to build a 'autocomplete' secondary index column,

so Code:
        name=Brusselse Voorstad
         autocomplete=brusselse brussels brussel brusse bruss brus bru br b voorstad voorsta voorst voors voor voo vo v


what this does is gives the sphinx index words to find (as it can only match whole words)

(note however that "Bruxe" wont work on the above service yet as only built the autocompelte index for PPLs)
- its far from useable service yet

Also note by also indexing the country_code column can also just add that as a search term to restrict to that country. (this can be done transparently in your code)

"If my user enters "Bruxe..." he shouldn't see "Brussels" in the list below, and he also shouldn't see the full comma-separated list of alternames but only the altername he's probably going to be entering: "Bruxelles"."

This is case I hadnt considered, but it can be easily rectified with a bit of post processing (in php etc), where you detect the query doesnt match the name field, and just select the apprirate one from ther alternames and show that instead (as though it was the main name)

--
- www.nearby.org.uk - www.geograph.org.uk -
[WWW]
bytte



Joined: 07/01/2008 17:56:11
Messages: 15
Offline

Hmmm... I don't really like the idea of this extra column. It feels like double information and looks harder to maintain. Is there no way in Sphinx to search for partial words? Similar to what LIKE does with the "%" syntax?

(I was about to try sphinx but found it it's hard to install on mac osx for a command line novice like me)
barryhunter


[Avatar]

Joined: 13/09/2006 21:25:40
Messages: 27
Offline

> Is there no way in Sphinx to search for partial words?
Nope, it doesnt actully store the words but rather CRC number hashes for each word - why it can be so quick.

> found it it's hard to install on mac osx for a command line novice like me
dito - unfortuntly it doesnt seem to have many packages available, so has to be compiled from source.

btw I should make clear that sphinx just one I use (and love) but there are other full text engines like Lucene which will probably install easier (but not sure if easier to use) , or you might find the inbuilt MySQL one fast enough for you.

--
- www.nearby.org.uk - www.geograph.org.uk -
[WWW]
bytte



Joined: 07/01/2008 17:56:11
Messages: 15
Offline

Ok thanks. If it wasn't for the partial words, it would of course be perfect. Lucene doesn't seem that fast compared to Sphinx as far as I read on the web. Might try to re-think my application so it fits Sphinx.
bytte



Joined: 07/01/2008 17:56:11
Messages: 15
Offline

I was thinking of creating a new table for every country. I could then possible use the sql LIKE "searchstring%" method to find the possible matching cities. The advantage is that mysql only needs to look through the country database.

Do you think that'll be quick enough as there's no Sphinx involved?
rakeshxp



Joined: 10/01/2008 10:57:34
Messages: 7
Offline

I am using Lucene spellchecker for giving suggestions.

I created 2 indices,
a) The asciiname index
b) spell checker index

Given any place, I look up the first index ( example "london*" ) and if no result is found, I fall back checking the second index.

The only drawback of falling to the second index is that, lucene spellchecker, would just return an array of strings representing the possible corrections and hence if you need any additional data, you would need to query the first index again to get additional field data

-Rakesh
barryhunter


[Avatar]

Joined: 13/09/2006 21:25:40
Messages: 27
Offline

Despite what I said before , Sphinx does have inbuilt functionality to to do indexing of part words.

Its not documented, but in the config file there is min_prefix_len
"if prefix length is positive, indexer will not only index all words,
but all the possible prefixes (ie. word beginnings) as well"

and minimum infix length
"if infix length is positive, indexer will not only index all words,
but all the possible infixes (ie. characters subsequences starting
anywhere inside the word) as well"

ok so they do exactly the same but you dont have to maintain a seperate column



--
- www.nearby.org.uk - www.geograph.org.uk -
[WWW]
bytte



Joined: 07/01/2008 17:56:11
Messages: 15
Offline

This is awesome!
mhuggins



Joined: 20/06/2008 00:12:34
Messages: 18
Location: Downingtown, PA
Offline

Can someone explain in a sentence or two why something like Sphinx or Lucene is needed? How come fulltext searches can't be performed quickly with MySQL's MATCH...AGAINST or something similar? Are these two options quicker generally, and if so, what kind of SQL do they use that I can't just implement it directly into my own query?
[WWW]
Hiren



Joined: 06/10/2008 08:22:36
Messages: 6
Offline

Hello All

I want to take the city name from my site user and want to give the suggestions to them that there are these many cities of this name. Site user will select on the base of the suggestions.

To achieve this, what can be the proper way?

Should I call the geonames webservices with full text search? I want to search the city for the entire world?

Or should I use the DB? If the DB is proper then from where I can get the dump for same?

Kindly help me out.

Thanks
 
Forum Index -> General
Go to:   
Powered by JForum 2.1.5 © JForum Team