Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python] Add hypothesis tests #3560

Open
wants to merge 33 commits into
base: main
Choose a base branch
from
Open

[python] Add hypothesis tests #3560

wants to merge 33 commits into from

Conversation

bkmartinjr
Copy link
Member

@bkmartinjr bkmartinjr commented Jan 14, 2025

Add an initial set of tests based upon the Hypothesis property-based testing framework. Included are:

  • API tests for IntIndexer and fastercsx
  • State machines covering basic operations of DataFrame, SparseNDArray and DenseNDArray
  • Integration into pytest and the testing CI

See the README included in this PR for some other important information regarding these new tests.

Copy link

codecov bot commented Jan 15, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.31%. Comparing base (60cd908) to head (5ad9e59).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3560      +/-   ##
==========================================
+ Coverage   86.25%   86.31%   +0.06%     
==========================================
  Files          55       55              
  Lines        6381     6381              
==========================================
+ Hits         5504     5508       +4     
+ Misses        877      873       -4     
Flag Coverage Δ
python 86.31% <ø> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
python_api 86.31% <ø> (+0.06%) ⬆️
libtiledbsoma ∅ <ø> (∅)

@bkmartinjr bkmartinjr marked this pull request as ready for review February 1, 2025 02:03
@ryan-williams ryan-williams mentioned this pull request Feb 4, 2025
@ryan-williams
Copy link
Member

This is very cool, thanks @bkmartinjr. I've proposed #3663 on top of this, with some suggested changes, all pretty minor.

The only thing I want to flag is the added nondeterminism in CI runs (which I know is a common convention with frameworks like this). The current setup is, roughly:

  1. Run "a million" cases locally/once (to be sure the code is correct; --hypothesis-profile=expensive).
  2. Future CI jobs each run a random "thousand" cases (to balance [catching regressions] vs. [wasting CI time/compute]).

Would it be better to seed part 2, so that CI runs the same "thousand" cases every time (which are known to work)? I'm guessing the answer is "no," but wanted to mention this tradeoff and make sure we're on the same page. If [ongoing fuzzing of future, unrelated PRs] might turn up novel bugs, then we should just 10x or 100x part 1 above, to catch them now (is the argument).

@bkmartinjr
Copy link
Member Author

bkmartinjr commented Feb 4, 2025

Would it be better to seed part 2, so that CI runs the same "thousand" cases every time (which are known to work)? I'm guessing the answer is "no," but wanted to mention this tradeoff and make sure we're on the same page. If [ongoing fuzzing of future, unrelated PRs] might turn up novel bugs, then we should just 10x or 100x part 1 above, to catch them now (is the argument).

We are on the same page - the whole point of this type of testing is to mix it up. Test new stuff. Because you can't afford to do exhaustive testing in CI, you do a bit (randomly), and have logging tell you how to recreate the issue (works most of the time, not all of the time). I advocate for keeping it this way, as over the long haul, you end up running far more unique tests.

Info on how you recreate fails is here.

The other alternative we could consider is manually running these tests on a regular basis. But history has shown that this is something that will fail, as there is a human in the loop.

Long term, I think the best approach is CI, coupled with self-hosted GHA runners. With this strategy, you can run an "expensive" test on a schedule (say every Sunday night), and do a much, much more exhaustive search.


late edit: this may be worth raising with the whole team, as everybody is impacted by stochastic tests

@bkmartinjr
Copy link
Member Author

bkmartinjr commented Feb 4, 2025

@ryan-williams - also, a commit to main in the past few days now causes the tests to fail. I'll need to debug that before landing this PR (it may be a test bug, or a bug bug - won't know until I dig into it)

Update: root cause is a regression on main, filed as sc-62887.

Copy link
Member

@johnkerl johnkerl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

Thanks @bkmartinjr !!

ryan-williams and others added 2 commits February 4, 2025 08:37
* factor README link

* use `mode: OpenMode` instead of `mode: str`

* add missing `self` params

* OpenMode

* rm unused `SOMAArrayStateMachine._reopen`

* `s/work-arounds/workarounds/`

* add/tweak type annotations

* `get_entries`: return `set[str]`

* parameterize `Ledger` with a `LedgerEntryType`

this allows the type system to understand that e.g. the return value of a `.read()` can be a `PyDictLedgerEntry`, which can then have `to_dict()` invoked

* rm unused `concurrency` fixture

* rm unused imports

* avoid `st` shadowing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants