Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why must point scalars be signed? #129

Open
jasonwhite opened this issue Jul 26, 2023 · 9 comments
Open

Why must point scalars be signed? #129

jasonwhite opened this issue Jul 26, 2023 · 9 comments

Comments

@jasonwhite
Copy link

While evaluating this crate to see if I can use it for a project, I bumped into the issue that RTreeNum requires Signed. I was hoping to use this data structure with unsigned integers (not floats). I'm not aware of any fundamental reason why an R-tree can't be used with unsigned numbers. I was able to remove the requirement locally and all the tests still pass. Is there something I'm missing or is it just a historical artifact? Thanks!

@adamreichold
Copy link
Member

I think that using unsigned numbers would mean that we would have to audit all algorithms to not produce underflow at zero and thereby produce nonsensical results (or panic in debug builds). And even this check would most likely imply the introduction of new failure modes into the API or at least document new possibilities of panics.

Personally, I would recommend to use a signed type in the tree and convert your coordinates when interface with the rest of your code. If you are confident, you can also effectively circumvent the requirement without changes to the rstar source by doing something like:

impl Signed for MyFancyNumberType {
    fn abs(&self) -> Self { unreachable!() }

    fn abs_sub(&self, other: &Self) -> Self { unreachable!() }

    fn signum(&self) -> Self { unreachable!() }

    fn is_positive(&self) -> bool { unreachable!() }

    fn is_negative(&self) -> bool { unreachable!() }
}

@jasonwhite
Copy link
Author

I think that using unsigned numbers would mean that we would have to audit all algorithms to not produce underflow at zero and thereby produce nonsensical results (or panic in debug builds).

I went down this rabbit hole yesterday. So far, I've found that the PointExt::distance_2 calculation can underflow in its sub calculation. I was able to partially work around that by implementing it this way instead:

pub trait PointExt: Point { 
    fn abs_diff(&self, other: &Self) -> Self {
        self.component_wise(other, |l, r| {
            if l < r {
                r - l
            } else {
                l - r
            }
        })
    }
    
    fn distance_2(&self, other: &Self) -> Self::Scalar {
        self.abs_diff(other).length_2()
    }
}

In an optimized build, abs_diff boils down to about 4 x86-64 instructions by using a conditional move. Not great nor necessary for signed numbers. Stabilization of specialization in Rust would be nice to avoid this overhead.

However, PointExt::length_2() can now overflow when used with large integers, because it is implemented like this:

    fn length_2(&self) -> Self::Scalar {
        self.fold(Zero::zero(), |acc, cur| cur * cur + acc)
    }

I think it would be correct-ish to implement this in terms of saturating_mul and saturating_add, but unfortunately num_traits::{SaturatingMul, SaturatingAdd} are not implemented for f32 or f64 as I believe floats are already saturating at "infinity".

It might be good to use more saturating/wrapping operations in these calculations, as it would extend the range of signed numbers that can be used with this library.

Personally, I would recommend to use a signed type in the tree and convert your coordinates when interface with the rest of your code.

I think this is what I'll end up doing (or implement my own specialized version of R-tree). For my project, I will likely have a rather huge spatial index that would benefit from being memory mapped, something that this library cannot provide yet, so I might go down the latter path.

@jasonwhite
Copy link
Author

I tried converting my u64s to i64s using:

fn u64_to_i64(x: u64) -> i64 {
    // Simply flip the sign bit. This is equivalent to subtracting 2^63.
    (x ^ (i64::MIN as u64)) as i64
}

But I still run into overflow problems. My rectangles can be very large, or even take up the entire i64 / u64 range.

First, I ran into an overflow in <AABB as Envelope>::center(), which is calculated like this:

|x, y| (x + y) / two

Overflows can be avoided by calculating it like this instead:

|x, y| x / two + y / two

(It's just one additional shr instruction.)

Next, I am still running into the length_2 overflow mentioned above. Not sure how to solve that one without doing saturating arithmetic operations. Even if I calculated the square root, it'd still overflow before getting there.

@rmanoka
Copy link
Contributor

rmanoka commented Jul 29, 2023

Thanks @jasonwhite for the in-depth changeset required for this; reg. the distance_2 changes: could you tell us how much the benchmarks regress after this change? About length_2, I think it's fair if the data-structure works correctly on geoms with points whose pairwise distance^2 are within the scalar range.

I am for merging a PR to support unsigned types as long it's a not a huge regression.

@jasonwhite
Copy link
Author

I'll likely end up going with a quadtree instead. My use-case doesn't require the need to find the nearest neighbors. I only care about querying for which rectangles overlap with a given input.

Feel free to close this issue, but I suspect others may be interested in having support for unsigned numbers (provided that they are small enough to avoid overflow).


FYI, I'm not able to run the benchmarks because of a problem compiling the geo crate.

error[E0275]: overflow evaluating the requirement `[closure@/home/jasonwhite/.cargo/registry/src/index.crates.io-6f17d22bba15001f/geo-0.24.1/src/algorithm/map_coords.rs:855:69: 855:72]: Fn<(geo_types::Coord<T>,)>`
  |
  = help: consider increasing the recursion limit by adding a `#![recursion_limit = "256"]` attribute to your crate (`geo`)
  = note: required for `&[closure@/home/jasonwhite/.cargo/registry/src/index.crates.io-6f17d22bba15001f/geo-0.24.1/src/algorithm/map_coords.rs:855:69: 855:72]` to implement `Fn<(geo_types::Coord<T>,)>`
  = note: 128 redundant requirements hidden
  = note: required for `&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&[closure@/home/jasonwhite/.cargo/registry/src/index.crates.io-6f17d22bba15001f/geo-0.24.1/src/algorithm/map_coords.rs:855:69: 855:72]` to implement `Fn<(geo_types::Coord<T>,)>`

@michaelkirk
Copy link
Member

I'm not able to run the benchmarks

I've just now opened #131 which should address that error.

@adamreichold
Copy link
Member

reg. the distance_2 changes: could you tell us how much the benchmarks regress after this change?

If fear the change is unpalatable in the general case. let d = x - y; d * d is two vectorizable instructions whereas let d = if x > y { x - y } else { y - x}; d * d is 11 instruction containing a branch if the compiler decides against conditional moves.

I think that generally, this operation significantly profits from using signed types. As an alternative, we could have a helper trait like SquaredDiff: Num + PartialOrd which uses the unsigned-safe implementation as a default for is overridden for concrete types like i64 and f64 which do not need that? However, it does seem strange for that to live inside of rstar instead of e.g. num-traits.

@w14
Copy link

w14 commented Feb 10, 2024

I'm a bit confused why overflow should occur at all. I took a long time to painfully implement unsigned ints. Now the unsigned ints are overflowing. Why does RStar subtract from unsigned ints at all?

Error:

attempt to subtract with overflow
stack backtrace:
   0: rust_begin_unwind
             at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/std/src/panicking.rs:645:5
   1: core::panicking::panic_fmt
             at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/core/src/panicking.rs:72:14
   2: core::panicking::panic
             at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/core/src/panicking.rs:144:5
   3: <usize as core::ops::arith::Sub>::sub
             at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/core/src/ops/arith.rs:212:45
   4: <xl::unsigned_rstar::RStarInt as core::ops::arith::Sub>::sub
             at ./src/unsigned_rstar.rs:10:97
   5: rstar::point::PointExt::sub::{{closure}}
             at /Users/username/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rstar-0.11.0/src/point.rs:259:43
   6: rstar::point::PointExt::component_wise::{{closure}}
             at /Users/username/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rstar-0.11.0/src/point.rs:197:28
   7: <[S; N] as rstar::point::Point>::generate::{{closure}}
             at /Users/username/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rstar-0.11.0/src/point.rs:320:23
   8: core::ops::try_trait::NeverShortCircuit<T>::wrap_mut_1::{{closure}}
             at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/core/src/ops/try_trait.rs:385:36
   9: <core::iter::adapters::map::Map<I,F> as core::iter::traits::unchecked_iterator::UncheckedIterator>::next_unchecked
             at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/core/src/iter/adapters/map.rs:203:9
  10: core::array::try_from_trusted_iterator::next::{{closure}}
             at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/core/src/array/mod.rs:791:22
  11: core::array::try_from_fn_erased
             at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/core/src/array/mod.rs:822:20
  12: core::array::try_from_fn
             at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/core/src/array/mod.rs:104:11
  13: core::array::try_from_trusted_iterator
             at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/core/src/array/mod.rs:795:5
  14: core::array::<impl [T; N]>::try_map::{{closure}}
             at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/core/src/array/mod.rs:538:39
  15: core::array::drain::drain_array_with
             at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/core/src/array/drain.rs:26:5
  16: core::array::<impl [T; N]>::try_map
             at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/core/src/array/mod.rs:538:9
  17: core::array::<impl [T; N]>::map
             at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/core/src/array/mod.rs:501:14
  18: <[S; N] as rstar::point::Point>::generate
             at /Users/username/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rstar-0.11.0/src/point.rs:319:9
  19: rstar::point::PointExt::component_wise
             at /Users/username/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rstar-0.11.0/src/point.rs:197:9
  20: rstar::point::PointExt::sub
             at /Users/username/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rstar-0.11.0/src/point.rs:259:9
  21: rstar::algorithm::rstar::get_nodes_for_reinsertion::{{closure}}
             at /Users/username/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rstar-0.11.0/src/algorithm/rstar.rs:341:9
  22: alloc::slice::<impl [T]>::sort_by::{{closure}}
             at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/alloc/src/slice.rs:267:34
  23: core::slice::sort::insert_tail
             at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/core/src/slice/sort.rs:52:12
  24: core::slice::sort::insertion_sort_shift_left
             at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/core/src/slice/sort.rs:163:13
  25: core::slice::sort::merge_sort
             at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/core/src/slice/sort.rs:1059:13
  26: alloc::slice::stable_sort
             at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/alloc/src/slice.rs:887:5
  27: alloc::slice::<impl [T]>::sort_by
             at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/alloc/src/slice.rs:267:9
  28: rstar::algorithm::rstar::get_nodes_for_reinsertion
             at /Users/username/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rstar-0.11.0/src/algorithm/rstar.rs:338:5
  29: rstar::algorithm::rstar::resolve_overflow
             at /Users/username/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rstar-0.11.0/src/algorithm/rstar.rs:244:37
  30: rstar::algorithm::rstar::recursive_insert
             at /Users/username/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rstar-0.11.0/src/algorithm/rstar.rs:130:16
  31: <rstar::algorithm::rstar::RStarInsertionStrategy as rstar::params::InsertionStrategy>::insert
             at /Users/username/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rstar-0.11.0/src/algorithm/rstar.rs:43:21
  32: rstar::rtree::RTree<T,Params>::insert
             at /Users/username/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rstar-0.11.0/src/rtree.rs:821:9
  33: xl::helper::RTreeFastValue<_,V>::insert
             at ./src/helper.rs:297:9

Code:

use std::ops::Deref;

#[derive(Clone, Copy, PartialEq, PartialOrd, Eq, Ord, Debug, num_derive::Zero, num_derive::One, num_derive::NumOps, num_derive::Num)]
pub struct RStarInt(pub usize);

impl Deref for RStarInt {
    type Target = usize;

    fn deref(&self) -> &Self::Target {
        &self.0
    }
}

impl std::ops::Neg for RStarInt {
	type Output = RStarInt;

	fn neg(self) -> Self {
		unreachable!()
	}
}

impl num_traits::Signed for RStarInt {
    fn abs(&self) -> Self {
		*self
	}

    fn abs_sub(&self, other: &Self) -> Self { unreachable!() }

    fn signum(&self) -> Self {
		if self.0 == 0 {
			RStarInt(0)
		} else {
			RStarInt(1)
		}
	}

    fn is_positive(&self) -> bool {
		true
	}

    fn is_negative(&self) -> bool {
		false
	}
}

impl num_traits::Bounded for RStarInt {
    fn min_value() -> Self {
        RStarInt(0)
    }

    fn max_value() -> Self {
        RStarInt(usize::MAX)
    }
}

impl std::hash::Hash for RStarInt {
    fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
        self.0.hash(state);
    }
}

@adamreichold
Copy link
Member

Usually for computing distances. Which basically applies here as well, i.e. your bracktrace comes from

node.children.sort_by(|l, r| {
    let l_center = l.envelope().center();
    let r_center = r.envelope().center();
    l_center
        .sub(&center)
        .length_2()
        .partial_cmp(&(r_center.sub(&center)).length_2())
        .unwrap()
});

which implements the R* insertion strategy by splitting a node along its center by sorting its children based on lexicographical distance to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants