<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
	<channel>
		<title><![CDATA[Latest posts for the topic "Design Review: Find NearbyPlaces of a whole KML-Track (Performance)"]]></title>
		<link>http://forum.geonames.org/gforum/posts/list/4.page</link>
		<description><![CDATA[Latest messages posted in the topic "Design Review: Find NearbyPlaces of a whole KML-Track (Performance)"]]></description>
		<generator>JForum - http://www.jforum.net</generator>
			<item>
				<title>Design Review: Find NearbyPlaces of a whole KML-Track (Performance)</title>
				<description><![CDATA[ The task seems simple: I want to find populated NearbyLocations of a whole KML-Track (a bike-tour, for example).

<b>The basics: </b>
<li> I have installed a local database-copy of geonames for germany with about 180.000 entries with a MyISAM Database-Engine (which is faster than InnoDB)
<li> I have a KML-Track of about 130 km with about 3.800 coordinates
<li> I work on a xampp-installation on an eeepc (yeah!)
<li> I work with php, pdo and mysql

<b>The problem: </b>
If you use a query with a simple cosine-formula, a database-call requires about 1,7 seconds to find any nearby-places with a radius of 1 km. To analyze the whole track (which is quite short with 130km) would require nearly 2 hours. 

<b>The partly solution: </b>
I think there are two possible optimations: 
<li> Reduce the amount of calls to the database
<li> Narrow the relevant data in the database bevor you do the heavy math-query

<b>Reduce Calls </b>
To reduce the amount of calls I use a simple trick: You need a radius to find nearby-locations of a given longitude and latitude, let's say 1 km. So I take the first coordinate and do a nearby-search of 1 km, then I jump 2 km forward in the Kml-Data, then I do another nearby-search of 1 km and so on. I can do so easily, because I calculated the distance between each coordinate and stored this information in an array before. This way I can reduce the amount of calls from 3.800 to about 60 (130 km / 2km) with probably the same result-quality.

<b>Database Query</b>
My competence in math is limited to simple 1*1-multiplication, so I made use of the tutorial on this site: <a href='http://www.movable-type.co.uk/scripts/latlong-db.html' target='_new' rel="nofollow">http://www.movable-type.co.uk/scripts/latlong-db.html</a>

If you use the simple cosine law-query, a call takes about 1,7 seconds on my eeepc. If you use the FirstCut-Query shown in that example, the time reduces to about 1 second. With another pre-query, that filters cities with a population of > 500, the amount of data reduces to less than 10.000 for the heavy math-stuff. This reduces the execution-time to about 0,6 seconds. 

So I started at 3.800 x 1.7 seconds (two hours), now I am at 60 x 0,6 seconds. It's quite good and the results are fantastic (pretty cool to see your city-stops without manually edit them), but the whole analyze will still take more than half a minute for only 130 km. 

So my question: Am I totally off the track and is there an easy standard-solution for this task, or is the only solution to do it in the background with ajax and stuff like that, so that the user can do something else in the meantime? 

Any hints are highly welcome. I'm not a pro, so sorry for stupid questions...]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/6537.page#15051</guid>
				<link>http://forum.geonames.org/gforum/posts/list/6537.page#15051</link>
				<pubDate><![CDATA[Fri, 7 Mar 2014 22:05:29]]> GMT</pubDate>
				<author><![CDATA[ Trendschau]]></author>
			</item>
			<item>
				<title>Re:Design Review: Find NearbyPlaces of a whole KML-Track (Performance)</title>
				<description><![CDATA[ Hi Trendschau

- The coordinates are really every 35m (130km / 3800 points)? Four bike-seconds…

- Your simple trick to reduce the calls will not always return what it should >>>
A bounding circle with 1km radius every 2km means that the two circles are tangent only on track. A village, 1km after the start and 500m off-track, will not show up in any bounding circle.
Only joining rectangular bounding boxes could do that as required.

I think you have two independent problems: looking for possible positions in the database and do the maths. The maths should not consume too much of time, but the database…

I had a similar problem >>> from a given position, look for every nearest position in a DB, in steps of one degree bearing, worldwide.
My “database” is an Excel sheet with 4000 positions, the code is in VBA, I know nothing about other languages or about real databases. This is a very slow combination!

After beginning like you and extreme calculation times, I put the relevant positions into a variable, for my sample my whole world, for you maybe the bounding box of the whole trip. Only then, the distances are calculated (a 20’000km bounding box for each degree of bearing). VBA calculates this in about 0.15 seconds.

This Link >>> <a href='http://www.movable-type.co.uk/scripts/latlong.html' target='_new' rel="nofollow">http://www.movable-type.co.uk/scripts/latlong.html</a> shows formulae for the cross-track distance, maybe this would be better for your purposes. It returns the distance from the start point to the point on track where the previously selected point is exactly on the left or the right side off-track.

But remember when you work with bearings (for distances it does not matter):
For spherical earth formulae the east longitudes are minus, the west longitudes are plus; normally (e.g. GoogleEarth, GeoNames) it is the other way round.

Maybe this gives you some ideas… I don’t know about a standard solution…

Cheers
Urs
]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/6537.page#15181</guid>
				<link>http://forum.geonames.org/gforum/posts/list/6537.page#15181</link>
				<pubDate><![CDATA[Sat, 8 Mar 2014 07:51:31]]> GMT</pubDate>
				<author><![CDATA[ saphir]]></author>
			</item>
			<item>
				<title>Re:Design Review: Find NearbyPlaces of a whole KML-Track (Performance)</title>
				<description><![CDATA[ Thank you Urs!

The tipp with the bounding box is highly welcome, I will change this to get a more accurate result.

Yes, 3800 points in 130 km, I grabbed this kml as test-data from a bike-track website, and the biker was probably a bit too accurate (30 seconds would be exact enough for bikes, I think, but I don't have much experience with that). But not bad to test an application with extreem data, I think...  

I already found a solution for my performance-problem: You can create two indices on the latitude and longitude-rows in MySQL, which will speed up the database-calls dramatically. The whole track (60 points) is now calculated in less than one second, which is totally ok for my little project. If that is not enough, there are even some more optimation-possibilities like mysql-spatial extensions described in this post on stackoverflow, where someone had a similar problem like me: <a href='http://stackoverflow.com/questions/18202545/optimization-techniques-for-select-query-on-single-table-with-2-25m-rows' target='_new' rel="nofollow">http://stackoverflow.com/questions/18202545/optimization-techniques-for-select-query-on-single-table-with-2-25m-rows</a>

So with the help of you and stackoverflow I will get a pretty good solution, I think, so thank you again!

Sebastian]]></description>
				<guid isPermaLink="true">http://forum.geonames.org/gforum/posts/list/6537.page#15183</guid>
				<link>http://forum.geonames.org/gforum/posts/list/6537.page#15183</link>
				<pubDate><![CDATA[Sun, 9 Mar 2014 13:03:50]]> GMT</pubDate>
				<author><![CDATA[ Trendschau]]></author>
			</item>
	</channel>
</rss>