Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise R-tree queries in the case of map matching #6881

Merged
merged 16 commits into from
May 20, 2024

Conversation

SiarheiFedartsou
Copy link
Member

@SiarheiFedartsou SiarheiFedartsou commented May 12, 2024

Issue

Since in map matching we don't need to find the N nearest projections, but simply all projections within a certain radius, instead of using a priority queue-based algorithm, we can simply find all edges intersecting the bounding box defined by this radius and then post-process the results to remove edges that are not within the required radius. According to our current simple map matching benchmark with a default search radius of 5m (which is effectively 15m in practice), this makes map matching up to 50% faster! However, with an increase in radius, this optimization becomes less noticeable since a larger radius increases the number of candidates, impacting the number of routes we need to compute in the map matching algorithm. But I have some ideas on how to optimize this as well in my future PRs.

Actual improvement can be seen below in the match_ch and match_mld benchmarks(the result in master corresponds to default/5m in this PR).

Tasklist

Requirements / Relations

Link any requirements here. Other pull requests this PR is based on?

Benchmark Results

Benchmark Base PR
alias aliased u32: 1101.99
plain u32: 1093.37
aliased double: 954.491
plain double: 966.534
aliased u32: 1098.31
plain u32: 1101.72
aliased double: 966.005
plain double: 964.487
json-render String: 8.62683ms
Stringstream: 11.6428ms
Vector: 7.58827ms
String: 8.60096ms
Stringstream: 11.7067ms
Vector: 7.58147ms
match_ch Default radius:
8.23085ms/req at 82 coordinate
0.100376ms/coordinate
Radius 5m:
8.19639ms/req at 82 coordinate
0.099956ms/coordinate
Radius 10m:
19.3981ms/req at 82 coordinate
0.236562ms/coordinate
Radius 15m:
41.5492ms/req at 82 coordinate
0.506697ms/coordinate
Radius 30m:
320.866ms/req at 82 coordinate
3.913ms/coordinate
Default radius:
4.50758ms/req at 82 coordinate
0.0549705ms/coordinate
Radius 5m:
4.48594ms/req at 82 coordinate
0.0547065ms/coordinate
Radius 10m:
15.3305ms/req at 82 coordinate
0.186957ms/coordinate
Radius 15m:
37.3413ms/req at 82 coordinate
0.455381ms/coordinate
Radius 30m:
318.02ms/req at 82 coordinate
3.87829ms/coordinate
match_mld Default radius:
7.43256ms/req at 82 coordinate
0.090641ms/coordinate
Radius 5m:
7.12935ms/req at 82 coordinate
0.0869433ms/coordinate
Radius 10m:
16.5298ms/req at 82 coordinate
0.201583ms/coordinate
Radius 15m:
36.6781ms/req at 82 coordinate
0.447294ms/coordinate
Radius 30m:
361.997ms/req at 82 coordinate
4.4146ms/coordinate
Default radius:
3.39512ms/req at 82 coordinate
0.0414039ms/coordinate
Radius 5m:
3.37432ms/req at 82 coordinate
0.0411503ms/coordinate
Radius 10m:
12.3558ms/req at 82 coordinate
0.15068ms/coordinate
Radius 15m:
32.0192ms/req at 82 coordinate
0.390477ms/coordinate
Radius 30m:
351.998ms/req at 82 coordinate
4.29266ms/coordinate
packedvector random write:
std::vector 9792.03 ms
util::packed_vector 82197.7 ms
slowdown: 8.39435
random read:
std::vector 8512.14 ms
util::packed_vector 33173.5 ms
slowdown: 3.8972
random write:
std::vector 9835.57 ms
util::packed_vector 74173.3 ms
slowdown: 7.54133
random read:
std::vector 8557.96 ms
util::packed_vector 29830.8 ms
slowdown: 3.48573
rtree 1 result:
208.575ms -> 0.0208575 ms/query
10 results:
243.318ms -> 0.0243318 ms/query
1 result:
220.757ms -> 0.0220757 ms/query
10 results:
242.402ms -> 0.0242402 ms/query

@@ -631,6 +611,73 @@ class StaticRTree
}

private:
template <typename Callback>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added callback parameter instead of returning vector and made this implementation private(for public we have another SearchInBox now which calls this method)

@SiarheiFedartsou SiarheiFedartsou changed the title Optimise R-tree queries in the case if there is no need to limit maximum number of results Optimise R-tree queries in the case of map matching May 14, 2024
@@ -168,6 +167,18 @@ struct RectangleInt2D
min_lat != FixedLatitude{std::numeric_limits<std::int32_t>::max()} &&
max_lat != FixedLatitude{std::numeric_limits<std::int32_t>::min()};
}

static RectangleInt2D ExpandMeters(const Coordinate &coordinate, const double meters)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SiarheiFedartsou SiarheiFedartsou marked this pull request as ready for review May 14, 2024 18:23
auto phantom_nodes = MakePhantomNodes(input_coordinate, results);
// there is no guarantee that `SearchInRange` will return results sorted by distance,
// but we may rely on it somewhere
std::sort(phantom_nodes.begin(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be tempted to leave this out, and just see what happens. We already have some "construction order dependent" results (when there are two paths to a destination of equal weight, the one picked depends on the order the graph was built, not some other stable heuristic), so stability is already not guaranteed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'd be an interesting insight, indeed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, actually after I added it I realised that this method is only used in map matching. For map matching order shouldn't be important I believe. So I think it will be safe to remove it 👍

@@ -47,6 +47,42 @@ template <typename RTreeT, typename DataFacadeT> class GeospatialQuery
return rtree.SearchInBox(bbox);
}

std::vector<PhantomNodeWithDistance>
NearestPhantomNodes(const util::Coordinate input_coordinate,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's an internal API, but might be worth changing NearestPhantomNodes with optional max_results parameter to be a required parameter, to clarify the distinction between the two functions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of interest, did you evaluate this approach for when max_results is set also?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's an internal API, but might be worth changing NearestPhantomNodes with optional max_results parameter to be a required parameter, to clarify the distinction between the two functions.

Great idea, updated 👍

Out of interest, did you evaluate this approach for when max_results is set also?

Nope, but I thought about it: we probably could make another optimisation when both max_results and max_distance are set. Or may be you see how it can be applied when only max_results is set?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No particular insights from me, was just interested 🙂

CheckSegmentDistance(input_coordinate, segment, max_distance);
if (invalidDistance)
{
return std::pair<bool, bool>{false, false};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably only a matter of style, but how about doing make_pair(false, false)?

auto phantom_nodes = MakePhantomNodes(input_coordinate, results);
// there is no guarantee that `SearchInRange` will return results sorted by distance,
// but we may rely on it somewhere
std::sort(phantom_nodes.begin(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'd be an interesting insight, indeed.

Copy link
Collaborator

@DennisOSRM DennisOSRM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice optimization!

@SiarheiFedartsou SiarheiFedartsou merged commit d259848 into master May 20, 2024
21 checks passed
@SiarheiFedartsou SiarheiFedartsou deleted the sf-optimise-geospatial-query branch May 20, 2024 10:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants