GeoNames Home | Postal Codes | Download / Webservice | About 

GeoNames Forum
  [Search] Search   [Recent Topics] Recent Topics   [Members]  Member Listing   [Groups] Back to home page 
[Register] Register / 
[Login] Login 
Speeding up lon lat Proximity / Nearest Places Search  XML
Forum Index -> General
Author Message
yougodo



Joined: 20/07/2007 00:25:28
Messages: 11
Offline

Hi, firstly - cheers for such an excellent source of geo data!

I'm wondering if any of you geo data gurus can help me out?

I've downloaded the data and am doing proximity searches on it, finding all places within 100 kms for example of a lon lat. The only trouble is that this data takes about 2 mins to return results against a single lon lat. I'm wondering if anyone has any techniques to speed up the search results? Maybe some SQL WHERE clause I can apply to the lon lats in order to arrive at a subset of data to perform calculations on?

My simplified SQL (MS SQL Server 2005) is below:

Code:
 DECLARE @GeoNameID int
 SET @GeoNameID = 2656031
 
 DECLARE @EarthRadius decimal(10,3)
 SET @EarthRadius = 6378.137 -- km
 
 DECLARE @MaxDistance int
 SET @MaxDistance= 100
 
 DECLARE @Latitude decimal(23,20), @Longitude Decimal(23,20)
 SELECT @Latitude = Latitude, @Longitude = Longitude FROM GeoName WHERE GeoNameID = @GeoNameID
 
 SELECT geo.GeoNameID, 
 	ROUND(@EarthRadius * ACOS((SIN(RADIANS(@Latitude)) * SIN(RADIANS(geo.Latitude))) +
 		(COS(RADIANS(@Latitude)) * COS(RADIANS(geo.Latitude)) *
 		 COS(RADIANS(geo.Longitude) - RADIANS(@Longitude)))),0) AS Distance
 FROM GeoName AS geo
 WHERE ROUND(@EarthRadius * ACOS((SIN(RADIANS(@Latitude)) * SIN(RADIANS(geo.Latitude))) +
 	(COS(RADIANS(@Latitude)) * COS(RADIANS(geo.Latitude)) *
 	 COS(RADIANS(geo.Longitude) - RADIANS(@Longitude)))),0) <= @MaxDistance
 


I'm thinking the best way to increase query speed would be to reduce the number of calculations I need to perform by first defining a search box from which I can do a WHERE clause using lon & lat. If I'm looking for all matches within 100 kms I'd first calculate the latitude 100 kms north and south, and the longitude 100 kms east and west. Then the more complex calculation shown above would then be applied to the subset of geo records found within these boundaries. With an index applied to the longitude and latitude columns I'm guessing significant speed increases would be experienced as this should prevent me needing to perform calculations on every single location listed.

However I'm not sure how to calculate the differences in longitude and latitude from a given point for a certain distance. Can anyone help me out here and perhaps confirm my theory is correct?

Any pointers / suggestions would be greatly appreciated.

Cheers,
Gavin.

Gavin Harriss
Portfolio: www.gavinharriss.com
marc



Joined: 08/12/2005 07:39:47
Messages: 2183
Offline

Hi Gavin

I fear SQL Server does not have inbuilt spatial support.

There is a spatial extension for mssql 2005 :
http://www.codeplex.com/MsSqlSpatial

And a microsoft paper :
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql90/html/TblValFuncSQL.asp

The next version of mssql server will have geospatial support :
http://industry.slashgeo.org/article.pl?sid=07/05/11/1329253

In any case you probably can do without geospatial functions and use
simple indices on the lat and lng columns. Use a rectangular bounding
box for your search large enough to include all your records and filter
out the rest based on exact distance. As a rule of thumb you can assume
one degree is around 111km (=40.000/360).

Hope this helps

Marc

[WWW]
yougodo



Joined: 20/07/2007 00:25:28
Messages: 11
Offline

Hi Marc, thanks for your advice - very helpful and very much appreciated.

Gavin Harriss
Portfolio: www.gavinharriss.com
yougodo



Joined: 20/07/2007 00:25:28
Messages: 11
Offline

Hi Marc, my geo data queries are now super fast! However I do notice that significantly less records are found when I apply my rough lan long box criteria to the search. Just wondering if you have any idea why this might be? Am I making some obvious mistake?

My SQL with the lan long box applied:

Code:
 DECLARE @GeoNameID int
 SET @GeoNameID = 2656031
 DECLARE @MaxDistance int
 SET @MaxDistance = 50 -- km
 
 DECLARE @EarthRadius decimal(10,3)
 SET @EarthRadius = 6378.137 -- km
 
 DECLARE @EarthCircumference int
 SET @EarthCircumference = 40076 -- km
 
 -- Degrees difference for distance is only an approximisation to speed up search via WHERE clause!
 DECLARE @DegDiff float
 SET @DegDiff = CAST(@MaxDistance AS float) / CAST((@EarthCircumference / 360) AS float)
 
 -- Get latitude and longitude
 DECLARE @Latitude decimal(23,20), @Longitude decimal(23,20)
 SELECT @Latitude = Latitude, @Longitude = Longitude FROM yougodo_GeoName WHERE GeoNameID = @GeoNameID
 
 DECLARE @MinLatitude decimal(23,20), @MaxLatitude decimal(23,20), @MinLongitude decimal(23,20), @MaxLongitude decimal(23,20)
 SET @MinLatitude = @Latitude - @DegDiff
 SET @MaxLatitude = @Latitude + @DegDiff
 SET @MinLongitude = @Longitude - @DegDiff
 SET @MaxLongitude = @Longitude + @DegDiff
 
 -- Search
 SELECT geo.GeoNameID,
 	ROUND(@EarthRadius * ACOS((SIN(RADIANS(@Latitude)) * SIN(RADIANS(geo.Latitude))) +
 		(COS(RADIANS(@Latitude)) * COS(RADIANS(geo.Latitude)) *
 		 COS(RADIANS(geo.Longitude) - RADIANS(@Longitude)))),0) AS Distance
 FROM yougodo_GeoName AS geo
 	INNER JOIN yougodo_GeoFeature AS ftr ON ISNULL(geo.FeatureClass, '') = ftr.FeatureClass
 		AND ISNULL(geo.FeatureCode, '') = ftr.FeatureCode
 WHERE geo.GeoNameID <> @GeoNameID
 	AND (geo.Latitude >= @MinLatitude AND geo.Latitude <= @MaxLatitude AND geo.Longitude >= @MinLongitude AND geo.Longitude <= @MaxLongitude)
 	AND ROUND(@EarthRadius * ACOS((SIN(RADIANS(@Latitude)) * SIN(RADIANS(geo.Latitude))) +
 		(COS(RADIANS(@Latitude)) * COS(RADIANS(geo.Latitude)) *
 		 COS(RADIANS(geo.Longitude) - RADIANS(@Longitude)))),0) <= @MaxDistance
 


my SQL without the lan lon box:

Code:
 DECLARE @GeoNameID int
 SET @GeoNameID = 2656031
 DECLARE @MaxDistance int
 SET @MaxDistance = 50 -- km
 
 DECLARE @EarthRadius decimal(10,3)
 SET @EarthRadius = 6378.137 -- km
 
 DECLARE @EarthCircumference int
 SET @EarthCircumference = 40076 -- km
 
 -- Degrees difference for distance is only an approximisation to speed up search via WHERE clause!
 DECLARE @DegDiff float
 SET @DegDiff = CAST(@MaxDistance AS float) / CAST((@EarthCircumference / 360) AS float)
 
 -- Get latitude and longitude
 DECLARE @Latitude decimal(23,20), @Longitude decimal(23,20)
 SELECT @Latitude = Latitude, @Longitude = Longitude FROM yougodo_GeoName WHERE GeoNameID = @GeoNameID
 
 DECLARE @MinLatitude decimal(23,20), @MaxLatitude decimal(23,20), @MinLongitude decimal(23,20), @MaxLongitude decimal(23,20)
 SET @MinLatitude = @Latitude - @DegDiff
 SET @MaxLatitude = @Latitude + @DegDiff
 SET @MinLongitude = @Longitude - @DegDiff
 SET @MaxLongitude = @Longitude + @DegDiff
 
 -- Search
 SELECT geo.GeoNameID,
 	ROUND(@EarthRadius * ACOS((SIN(RADIANS(@Latitude)) * SIN(RADIANS(geo.Latitude))) +
 		(COS(RADIANS(@Latitude)) * COS(RADIANS(geo.Latitude)) *
 		 COS(RADIANS(geo.Longitude) - RADIANS(@Longitude)))),0) AS Distance
 FROM yougodo_GeoName AS geo
 	INNER JOIN yougodo_GeoFeature AS ftr ON ISNULL(geo.FeatureClass, '') = ftr.FeatureClass
 		AND ISNULL(geo.FeatureCode, '') = ftr.FeatureCode
 WHERE geo.GeoNameID <> @GeoNameID
 	AND ROUND(@EarthRadius * ACOS((SIN(RADIANS(@Latitude)) * SIN(RADIANS(geo.Latitude))) +
 		(COS(RADIANS(@Latitude)) * COS(RADIANS(geo.Latitude)) *
 		 COS(RADIANS(geo.Longitude) - RADIANS(@Longitude)))),0) <= @MaxDistance
 


If anything, due to the way lan lons work (from my limited understanding) I should always be defining a box that's probably a little too big which is fine as the second more thorough search criteria comes in to play afterwards. Maybe I've applied the lon lat box calculations slightly wrong?

Cheers, Gavin.

Gavin Harriss
Portfolio: www.gavinharriss.com
yougodo



Joined: 20/07/2007 00:25:28
Messages: 11
Offline

Just adding a few observations and tips for anyone who stumbles upon this thread with similar requirements.

1) I got around the problem of my rough search box supplying less results by simply slightly increasing the dimensions. Not the most elegant solution, but seems to do the job.

2) For MS SQL Server users out there. SQL intermittently encounters a "A domain error occurred." error when performing the distance calculations. After some investigation I found that it seems to have trouble handling calculations that result in more than 15 digits after the decimal place. So you'll see my calculation rounds part of the calculation before applying the rest of the calculation to it in order to get around this problem.

My code for anyone who might find it useful for adaptation is below. For re-usability I've implemented it as a function that returns a table of resulting IDs and their distances.

Code:
 CREATE FUNCTION [Geo_GetNearbyLocations] 
 (	
 	@GeoNameID int,
 	@Latitude decimal(23,20),
 	@Longitude decimal(23,20),
 	@MaxDistance int
 )
 RETURNS @GeoNameIDs TABLE
 (
 	GeoNameID int NOT NULL,
 	Distance int NOT NULL
 ) 
 AS
 BEGIN
 /*
 Inspiration taken from:
 <a href="http://forum.geonames.org/gforum/posts/list/0/522.page" target="_blank" rel="nofollow">http://forum.geonames.org/gforum/posts/list/0/522.page</a>
 */
 	IF @GeoNameID IS NOT NULL OR (@Latitude IS NOT NULL AND @Longitude IS NOT NULL)
 	BEGIN
 		/*
 		Earth's radius:
 		6378137 meters
 		6378.137 km
 		3963.191 miles
 		3441.596 nautical miles
 		*/
 		DECLARE @EarthRadius decimal(10,3)
 		SET @EarthRadius = 6378.137 -- km
 
 		DECLARE @EarthCircumference int
 		SET @EarthCircumference = 40076 -- km
 
 		-- Degrees difference for distance is only an approximisation to speed up search via WHERE clause!
 		DECLARE @DegDiff float
 		SET @DegDiff = CAST(@MaxDistance AS float) / CAST((@EarthCircumference / 360) AS float)
 		-- NOTE: Add some error tolerance as too few results are returned else for some reason?!?
 		SET @DegDiff = @DegDiff + CAST(@DegDiff * 0.75 AS float) -- Error tolerance
 
 		-- Get latitude and longitude
 		IF NOT @GeoNameID IS NULL
 			SELECT @Latitude = Latitude, @Longitude = Longitude FROM GeoName WHERE GeoNameID = @GeoNameID
 
 		DECLARE @MinLatitude decimal(23,20), @MaxLatitude decimal(23,20), @MinLongitude decimal(23,20), @MaxLongitude decimal(23,20)
 		SET @MinLatitude = @Latitude - @DegDiff
 		SET @MaxLatitude = @Latitude + @DegDiff
 		SET @MinLongitude = @Longitude - @DegDiff
 		SET @MaxLongitude = @Longitude + @DegDiff
 
 		-- NOTE: There is a Geo_CalculateDistance function but it's far faster to perform calculations inline!
 
 		-- Search
 		INSERT INTO @GeoNameIDs (GeoNameID, Distance)
 		SELECT geo.GeoNameID,
 			ROUND(@EarthRadius * ACOS(ROUND(
 				(SIN(RADIANS(@Latitude)) * SIN(RADIANS(geo.Latitude))) +
 				(COS(RADIANS(@Latitude)) * COS(RADIANS(geo.Latitude)) *
 				 COS(RADIANS(geo.Longitude) - RADIANS(@Longitude))),15)),0) AS Distance
 		FROM GeoName AS geo
 		WHERE
 			-- Below is rough proximity search to super speed the query up
 			(geo.Latitude >= @MinLatitude AND geo.Latitude <= @MaxLatitude AND geo.Longitude >= @MinLongitude AND geo.Longitude <= @MaxLongitude)
 			-- Below should result in accurate proximity search
 			AND ROUND(@EarthRadius * ACOS(ROUND(
 				(SIN(RADIANS(@Latitude)) * SIN(RADIANS(geo.Latitude))) +
 				(COS(RADIANS(@Latitude)) * COS(RADIANS(geo.Latitude)) *
 				 COS(RADIANS(geo.Longitude) - RADIANS(@Longitude))),15)),0) <= @MaxDistance
 		ORDER BY @EarthRadius * ACOS(ROUND(
 				(SIN(RADIANS(@Latitude)) * SIN(RADIANS(geo.Latitude))) +
 				(COS(RADIANS(@Latitude)) * COS(RADIANS(geo.Latitude)) *
 				 COS(RADIANS(geo.Longitude) - RADIANS(@Longitude))),15)) ASC
 	END
 	RETURN
 END
 

Gavin Harriss
Portfolio: www.gavinharriss.com
snives



Joined: 03/11/2007 22:46:15
Messages: 1
Offline

Your idea to use a bounding box to filter records is valid. It is used in all commercial spacial search functions. In fact they all rely heavily on indexes to perform covermap filtering based on various techniques, but the bounding box is a good start.

You are missing zip codes because your bounding box calculation is incorrect. The number of miles in a degree of longitude varies by the cosine of the latitude. Think of the equator vs the 89th parallel, there is alot more earth between the lines at the equator, right?

Use this instead...
Code:
DECLARE @LatRange float
 DECLARE @LongRange float
 SET @LatRange = @Miles / 69.0499 
 SET @LongRange = @Miles / (69.0499 * cos(Radians(@Latitude)) )
 

You are also implementing the ACOS trig calculation which is susceptable to error over small distances. Use the Haversine method for more accurate small distance calculations. If most of your distance calculations are small distances, you can optimize your distance calculation even further by using simple euclidean geometry (insanely fast). I've found it to be accurate to within 10 ft per Mile, this is plenty accurate if your search is just to filter records.

I bet your code is paralizingly slow too. Here is a tip, create a table valued function to return a subset of GeoNameId's that match your boundary condition. Then create a function to calculate the distance between lat/longs. Then index on your GeoNameId (which I assume you already do).

Here is your final product FTW. It searches 40,000 records and returns 56 zips (on my 3yr old laptop) in 10ms! I suspect further speed improvements may be possible using spacial search in Sql Server 2008.

Code:
 select City, ZipCode, dbo.[GetApproxDistanceBetweenLatLong](41.882582, -87.637601, Latitude, Longitude) as Actual
 from Zip
 join dbo.fZipsNearCoverMap(41.882582, -87.637601, 10) covermap on Zip.ZipId = covermap.ZipId
 and dbo.[GetApproxDistanceBetweenLatLong](41.882582, -87.637601, Latitude, Longitude) <= 10
 order by Actual desc
 


More info at: http://technet.microsoft.com/en-us/library/aa964138.aspx

yougodo



Joined: 20/07/2007 00:25:28
Messages: 11
Offline

Hi snives,

Thanks for sharing your optimisation tips with me - very much appreciated! I'll be trying them out in the next day or two to see how much an improvement I can get out of them.

The location search code is very much the bottleneck being experienced in my code at the moment. It's being used by http://www.yougodo.com - still a work in progress at present, but nearly there.

Cheers again,
Gavin.

Gavin Harriss
Portfolio: www.gavinharriss.com
yougodo



Joined: 20/07/2007 00:25:28
Messages: 11
Offline

Hi snives,

I've only just started to get around to revisiting my geo proximity search code and was wondering if you would mind assisting me a little further?

In your post you reference 2 functions - GetApproxDistanceBetweenLatLong and fZipsNearCoverMap. Would you mind posting the complete SQL for these so I can t a clearer grasp what you are doing and how to apply the Haversine method myself?

I believe I understand basically what you're suggesting by separating out the "box" search part of the SQL in to a separate function and am just about to experiment with this to determine differences in query execution performance gains.

Thanks again for you assistance,
Gavin.

Gavin Harriss
Portfolio: www.gavinharriss.com
yougodo



Joined: 20/07/2007 00:25:28
Messages: 11
Offline

A friendly chappie named Colin Jackman picked up on snives suggestions and had a bit of a email dialogue going with me as he applied snives suggestions to my original solution. So for anyone out there interested in a better / faster proximity search solution here you go...

Code:
CREATE FUNCTION [dbo].[GetNearbyLocations] 
 (	
 	@GeoNameID int,
 	@Latitude decimal(23,20),
 	@Longitude decimal(23,20),
 	@MaxDistance int
 )
 RETURNS @GeoNameIDs TABLE
 (
 	GeoNameID int NOT NULL,
 	Distance int NOT NULL
 ) 
 AS
 BEGIN
 /*
 Inspiration taken from:
 <a href="http://forum.geonames.org/gforum/posts/list/0/522.page" target="_blank" rel="nofollow">http://forum.geonames.org/gforum/posts/list/0/522.page</a>
 */
 	IF @GeoNameID IS NOT NULL OR (@Latitude IS NOT NULL AND @Longitude IS NOT NULL)
 	BEGIN
 		-- Convert to decimal to avoid rounding errors
 		DECLARE @MaxDistanceDec decimal
 		SET @MaxDistanceDec = @MaxDistance
 
 		/*
 		Earth's radius:
 		6378137 meters
 		6378.137 km
 		3963.191 miles
 		3441.596 nautical miles
 		*/
 		DECLARE @EarthRadius decimal(10,3)
 		SET @EarthRadius = 6378.137 -- km
 
 		-- Get latitude and longitude
 		IF NOT @GeoNameID IS NULL
 			SELECT @Latitude = Latitude, @Longitude = Longitude FROM yougodo_GeoName WHERE GeoNameID = @GeoNameID
 
 		DECLARE @LatRange decimal(23,20)
 		DECLARE @LongRange decimal(23,20)
 		SET @LatRange = (@MaxDistanceDec / 111) -- 111 is number of km in 1 degree lat
 		SET @LongRange = ((@MaxDistanceDec / 111) / cos(Radians(@Latitude)))
 
  		DECLARE @MinLatitude decimal(23,20), @MaxLatitude decimal(23,20), @MinLongitude decimal(23,20), @MaxLongitude decimal(23,20)
 		SET @MinLatitude = @Latitude - @LatRange
  		SET @MaxLatitude = @Latitude + @LatRange
  		SET @MinLongitude = @Longitude - @LongRange
  		SET @MaxLongitude = @Longitude + @LongRange
 
 		-- Search
 		-- NOTE: I did create a CalculateDistance function but it's far faster to perform calculations inline!
 		INSERT INTO @GeoNameIDs (GeoNameID, Distance)
  		SELECT geo.GeoNameID,
  			ROUND(@EarthRadius * ACOS(ROUND(
  				(SIN(RADIANS(@Latitude)) * SIN(RADIANS(geo.Latitude))) +
  				(COS(RADIANS(@Latitude)) * COS(RADIANS(geo.Latitude)) *
  				 COS(RADIANS(geo.Longitude) - RADIANS(@Longitude))),15)),0) AS Distance
  		FROM GeoName AS geo
  		WHERE -- Below is rough proximity search to super speed the query up
  			(geo.Latitude >= @MinLatitude AND geo.Latitude <= @MaxLatitude AND geo.Longitude >= @MinLongitude AND geo.Longitude <= @MaxLongitude)
  			-- Below should result in accurate proximity search
  			AND ROUND(@EarthRadius * ACOS(ROUND(
  				(SIN(RADIANS(@Latitude)) * SIN(RADIANS(geo.Latitude))) +
  				(COS(RADIANS(@Latitude)) * COS(RADIANS(geo.Latitude)) *
  				 COS(RADIANS(geo.Longitude) - RADIANS(@Longitude))),15)),0) <= @MaxDistanceDec
  		ORDER BY 2 ASC
 	END
 	RETURN
 END

Gavin Harriss
Portfolio: www.gavinharriss.com
 
Forum Index -> General
Go to:   
Powered by JForum 2.1.5 © JForum Team