-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: unique*
functions in lazy libraries: add size=None, fill_value=None
#883
Comments
Please note that while ONNX models may profit from statically known dimension lengths, it is not a necessity. The only thing that is required from ndonnx's perspective is that operations produce arrays with statically known rank. That said, we could support the proposed API in ndonnx, too. |
@crusaderky Thank you for opening this RFC. Question: while If we were to standardize, we'd want to ensure broad applicability across array libraries, JIT'd or not. |
It would be useful to speed up "fail early" and fast-path tests, or in general whenever you don't care beyond a certain unique count. e.g. you could change if xp.unique_values(arr).size < 10:
fast_path(arr)
else:
slow_path(arr) to _, counts = xp.unique_counts(arr, size=10)
if counts[-1] != 0: # <10 unique elements
fast_path(arr)
else:
slow_path(arr) The runtime has changed from O(n), where n=arr.size, to O(1~n) depending on the entropy level of arr; O(1) when contents order is uniformly randomized. |
Another important use case: on Dask, this is a reduction equivalent to So having a size= parameter on dask could make the difference between an algorithm that can ingest any data in a predictable amount of time and one that can cause |
The functions
currently come with the disclaimer
I propose to
size=None, fill_value=None
, lifted from JAXThis not only helps with JAX and ndonnx interoperability, but also with Dask as it would allow to avoid NaN-sized arrays.
The text was updated successfully, but these errors were encountered: