Numba Neighbors

Approximate port of scikit-learn neighbors using Numba.

Installation

If you want to install/modify (recommented at this point):

git clone https://github.com/jackd/numba-neighbors.git
pip install -e numba-neighbors

Quick-start:

pip install git+git://github.com/jackd/numba-neighbors.git

You may see performance benefits from fastmath by installing Intel's short vector math library (SVML).

conda install -c numba icc_rt

Benchmarks

Requires scikit-learn >= 0.24.2

python benchmarks/kd_tree/query_radius.py

sklearn: sklearn.neighbors._kd_tree.KDTree implementation
numba: numba_neighbors.kd_tree.KDTree implementation
numba3: numba_neighbors.kd_tree.KDTree3 implementation
bu: backup implementations (i.e. takes advantage of known leaf indices of query points)
pre: pre-allocated memory

Burning in sklearn...
Benchmarking sklearn...
Burning in numba_pre...
Benchmarking numba_pre...
Burning in numba...
Benchmarking numba...
Burning in numba_bu...
Benchmarking numba_bu...
Burning in numba_bu_pre...
Benchmarking numba_bu_pre...
Burning in numba3...
Benchmarking numba3...
Burning in numba3_bu...
Benchmarking numba3_bu...
Burning in numba3_bu_pre...
Benchmarking numba3_bu_pre...
numba3_bu           : 0.0001566232 (1.00)
numba3_bu_pre       : 0.0001664740 (1.06)
numba3              : 0.0001693816 (1.08)
numba_bu            : 0.0001695879 (1.08)
numba_bu_pre        : 0.0001740832 (1.11)
numba_pre           : 0.0001744671 (1.11)
numba               : 0.0001828313 (1.17)
sklearn             : 0.0008870633 (5.66)

Debugging

Debugging is often simpler without jitting. To disable numba,

export NUMBA_DISABLE_JIT=1

and re-enable with

export NUMBA_DISABLE_JIT=0

Be wary of using os.environ["NUMBA_DISABLE_JIT"] = "1" from python code - this must be set above imports.

Differences compared to Scikit-learn

All operations are done using reduced distances. E.g. provided KDTree implementations use squared distances rather than actual distances both for inputs and outputs.
query_radius-like functions must specify a maximum number of neighbors. Over-estimating this is fairly cheap - it just means we allocate more data than necessary - but if the limit is reached the first max_count neighbors that are found are returned. These aren't necessarily the closest max_count neighbors.
Query outputs aren't sorted, though can be using binary_tree.simultaneous_sort_partial.
Use of Intel's short vector math library (SVML) if instaled. This makes computation faster, but may result in very small errors.

TODO

Note: this package has served it's purpose for what I wrote it for. I'm unlikely to get to these unless there's serious interest and I get the spare time.

better benchmarks with google-benchmark
n_nodes is inconsistent with scikit-learn implementation... - scikit bug?
Port NeighborsHeap from scikit learn.
query implementations.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
benchmarks		benchmarks
example		example
numba_neighbors		numba_neighbors
.gitignore		.gitignore
.isort.cfg		.isort.cfg
.pre-commit-config.yaml		.pre-commit-config.yaml
.pylintrc		.pylintrc
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmarks

benchmarks

example

example

numba_neighbors

numba_neighbors

.gitignore

.gitignore

.isort.cfg

.isort.cfg

.pre-commit-config.yaml

.pre-commit-config.yaml

.pylintrc

.pylintrc

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

Numba Neighbors

Installation

Benchmarks

Debugging

Differences compared to Scikit-learn

TODO

About

Releases

Packages

Contributors 2

Languages

License

jackd/numba-neighbors

Folders and files

Latest commit

History

Repository files navigation

Numba Neighbors

Installation

Benchmarks

Debugging

Differences compared to Scikit-learn

TODO

About

Resources

License

Stars

Watchers

Forks

Languages