Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I use rstar for geographic points & use a great circle distance? #135

Open
amandasaurus opened this issue Sep 26, 2023 · 8 comments
Open

Comments

@amandasaurus
Copy link
Member

I have a millions of points (in latitude & longitude). I want to calculate nearest neighbour queries using great circle distance, i.e. distance in metres along the surface of the earth. Can I use rstar for this?

rstar::PointDistance::distance_2 refers to “squared euclidean distance”, which implies not. I'm unsure if #70 answers this question... I'm OK with adding points twice, but I fear I don't fully understand how to do this right...

@urschrei
Copy link
Member

I've wondered about that wording myself, but I think it's misleadingly restrictive based on the trait definition (https://docs.rs/rstar/0.11.0/rstar/trait.PointDistance.html). @adamreichold I know you've got some experience with distances other than Euclidean – can you help to clarify this?

@urschrei
Copy link
Member

@amandasaurus Here's what I was thinking of: #48 (comment)

You should be able to plug in https://docs.rs/geo/0.26.0/geo/algorithm/geodesic_distance/trait.GeodesicDistance.html squared and test it fairly easily??

@adamreichold
Copy link
Member

adamreichold commented Sep 26, 2023

I think the reference to "Euclidean" in the docs is misleading. Any distance metric fulfilling the usual axioms should do. (Out of the top of my head, I also don't see that we need anything beyond that, e.g. the metric does not need to be implied by a vector norm or a scalar product.)

(Note though that in #48 (comment), I was indeed using the normal Euclidean distance in two-dimensional real space, I just changed the coordinates to polar coordinates and implemented the trait directly in terms of that. But the values were the same as if I would have converted my polar coordinates to Cartesian coordinates first and then used the provided implementation.)

Concerning adding points twice: This is one relatively simple way to handle the periodic "boundary conditions" of the angular coordinates, but there are others if this feels contrived to you. The paper https://arxiv.org/pdf/1712.02977.pdf discusses common approaches and proposes one especially efficient way of doing it.

@amandasaurus
Copy link
Member Author

amandasaurus commented Sep 29, 2023 via email

@adamreichold
Copy link
Member

if I add point P with ($latitude1, $longitude1), then I must also add ($longitude1 + 180°, $latitude).

How do I do it with other geo types, like LineStrings or MultiLineStrings?

I would say that this is just a translation, i.e. you add a copy of the geometry with all its constituent points translated in the same manner.

The trick from the paper basically is re-parametrize you geometry to have fewer absolute and more relative and hence translation-invariant parameters so there is less work to do. But if all of your geometry is basically a bag of points, then just translating all of them should do the trick. (Of course, this then means negative angles and multiple overlap from a single piece of geometry and its mirrors, i.e. it can be inefficient, but it should still work.)

@feelingsonice
Copy link

I'm running into a similar use case. My "hypothetical" approach is to use mercator projections to translate between planar and longitude/latitude points. Is this the recommended approach?

@urschrei
Copy link
Member

urschrei commented May 17, 2024

This paper is the clearest explanation of how to implement Lon / Lat indexing in an R{*}-tree that I've been able to find. I'm sure it's available on Sci-Hub in breach of copyright if you were to check the DOI (10.1007/978-3-642-40235-7_9)

If you're not using geo-types you have the option of using either the "transform to 3D ECEF" approach, or if you are / would prefer it, the more elegant "compute MBR for geodetic coordinates" approach. Pseudocode for the distance function for option 2 is given in Algorithm 1 (p156) and an optimised version in Algorithm 2 (p159). I'm reproducing them below:

Algorithm 1

Data: c circumference of earth (spherical model)
Data: (φq,λq) query point
Data: (φl, λl, φh, λh) index MBR

if φl ≤ φq ≤φl then
    if λq ≤ λl then return c·(λl − λq) / 360; // South of MBR
    if λq ≥ λt then return c·(λq − λt)/ 360; // North of MBR
    return 0; // Inside MBR
else if mod360(φl − φq) ≤ mod360(φq − φh) then // West of MBR
    θh ← Azimuth(φl, λh, φq, λq);
    if θh ≥ 270 then // North-West
        return Great-Circle-Distance(φl, λh, φq, λq)
    end
    θl ← Azimuth(φl, λl, φq, λq);
    if θl ≤ 270 then // South-West
        return Great-Circle-Distance(φl, λl, φq, λq)
    end
    return |Cross-Track-Distance(φl, λl, φl, λh, φq, λq)|; // West
else // East of MBR
    θh ← Azimuth(φh, λh, φq, λq);
    if θh ≤ 90 then // North-East
        return Great-Circle-Distance(φh, λh, φq, λq)
    end
    θl ← Azimuth(φh, λl, φq, λq);
    if θl ≥ 90 then // South-East
        return Great-Circle-Distance(φh, λl, φq, λq)
    end
    return |Cross-Track-Distance(φh, λh, φh, λl, φq, λq)|; // East
end

Algorithm 2:

Data: c circumference of earth (spherical model)
Data: (φq, λq) query point
Data: (φl, λl, φh, λh) index MBR

if φl ≤ φq ≤ φl then
    if λq ≤ λl then return c·(λl − λq) / 360; // South of MBR
    if λq ≥ λt then return c·(λq − λt) / 360; // North of MBR
    return 0; // Inside MBR
else if mod360(φl − φq) ≤ mod360(φq − φh) then // West of MBR
    τ ← tan360(λq);
    if mod360(φl − φq) ≥ 90 then // Large Δφ
        if τ ≤ tan360((λl + λh) / 2) cos360(φl − φq) then
            return Great-Circle-Distance(φl, λh, φq, λq); // North-West
        else
            return Great-Circle-Distance(φl, λl, φq, λq); // South-West
        end
    end
    if τ ≥ tan360 (λh ) cos360 (φl − φq ) then // North-West
        return Great-Circle-Distance(φl, λh, φq, λq); // South-West
    if τ ≤ tan360 (λl) cos360 (φl − φq ) then
        return Great-Circle-Distance(φl, λl, φq, λq);
    return |Cross-Track-Distance(φl, λl, φl, λh, φq, λq)|; // West
else // East of MBR
    τ ← tan360(λq);
    if mod360(φq − φh) ≥ 90 then // Large Δφ
        if τ ≤ tan360((λl + λh) / 2) cos360(φh − φq) then
            return Great-Circle-Distance(φh, λh, φq, λq); // North-East
        else
            return Great-Circle-Distance(φh, λl, φq, λq); // South-East
        end
    end
    if τ ≥ tan360 (λh) cos360 (φh − φq ) then
        return Great-Circle-Distance(φh, λh, φq, λq); // North-East
    if τ ≤ tan360 (λl ) cos360 (φh − φq ) then
        return Great-Circle-Distance(φh, λl, φq, λq); // South-East
    return |Cross-Track-Distance(φh, λh, φh, λl, φq, λq)|; // East
end

Some things worth noting:

  1. If you aren't using geo, you'll have to implement the Great-Circle distance and Cross-Track distance algorithms yourself. Feel free to use our implementations as a starting point – they tend to come with some baggage in terms of supporting functions – good luck!
  2. Cross-Track Distance in the above algorithms accepts six parameters. I assume they relate to point, line_point_a, line_point_b, whereas our implementation is implemented on Point directly, so only requires the line_point_a and line_point_b parameters.
  3. I think you'll want to use Haversine for Great-Circle-Distance`.
  4. Performance will be good (rstar is fast) but trig functions are orders of magnitude slower than simple euclidean functions. In practical terms this shouldn't be a problem unless you have an expectation of sub-5ms queries on tens of millions of points or something.

Edit

To expand on this a little: you would use Algorithm{1, 2} as the basis of an rstar::Envelope impl: the algorithms above return the minimum distance between a bounding box and a query point, so the trait methods would leverage the envelope method of {Coord, Line, LineString, Polygon} to calculate distance_2, contains_point (distance_2 <= 0), and contains_envelope. The query point's distance_2 impl would have to match, so it would also have to be Haversine.

@grevgeny
Copy link

Hello @urschrei! I've been working on a solution for indexing on a spherical surface, inspired by the paper mentioned above. Would you be interested in extending the rstar crate to support spherical envelopes, alongside the existing AABB? If this aligns with the project's direction, I'd be happy to prepare a draft PR for further discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants