Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR partially addresses #862
[ ✏️ Write your summary here. ]
After profiling, it seems that the iteration over the array was the slowest part. I have mostly replaced it with numpy operations to avoid looping over each element. In addition, I have replaced the try except block with an if statement.
For memory I used the memory-profiler library. The code I used for benchmarking is copied below. In addition I sorted the imports in the modified files. Note: For benchmarking the current version I removed the tqdm in the loop.
Code Setup
Current version
This PR
Testing
References
Reviewer Notes
I had to make changes to the tqdm component because we are not looping over all the labels any more. I tried to follow the approach in other files of just displaying progress when verbose is True. However the bar now is only updated with each unique_label.
The new version consumes more memory than the previous version because of the numpy masked arrays. It seems that the memory bottleneck is not this function because this is the result I get when calling the find_label_issues function with the same input data:
However, I am open to try new things to reduce memory consumption by increasing execution time.