Skip to content

Releases: aai-institute/pyDVL

v0.9.2

07 May 13:36
Compare
Choose a tag to compare

0.9.2 - 🏗 Bug fixes, logging improvement

Added

  • Add progress bars to the computation of LazyChunkSequence and
    NestedLazyChunkSequence
    PR #567
  • Add a device fixture for pytest, which depending on the availability and
    user input (pytest --with-cuda) resolves to cuda device
    PR #574

Fixed

  • Fixed logging issue in decorator log_duration
    PR #567
  • Fixed missing move of tensors to model device in EkfacInfluence
    implementation PR #570
  • Missing move to device of preconditioner in CgInfluence implementation
    PR #572
  • Raise a more specific error message, when a RunTimeError occurs in
    torch.linalg.eigh, so the user can check if it is related to a known
    issue
    PR #578
  • Fix an edge case (empty train data) in the test
    test_classwise_scorer_accuracies_manual_derivation, which resulted
    in undefined behavior (np.nan to int conversion with different results
    depending on OS)
    PR #579

Changed

  • Changed logging behavior of iterative methods LissaInfluence and
    CgInfluence to warn on not achieving desired tolerance within maxiter,
    add parameter warn_on_max_iteration to set the level for this information
    to logging.DEBUG
    PR #567

v0.9.1

22 Apr 09:33
Compare
Choose a tag to compare

0.9.1

Fixed

  • FutureWarning for ParallelConfig constantly raised without actually
    instantiating the object
    PR #562
  • Modify log level for implementations of TorchInfluenceFunctionModel
  • Add duration logging to output of SequentialCalculator

v0.9.0

12 Apr 18:11
Compare
Choose a tag to compare

🆕 New methods, better docs and bugfixes 📚🐞

Added

  • New method MSR Banzhaf with accompanying notebook, and new stopping
    criterion RankCorrelation PR #520
  • New method: NystroemSketchInfluence PR #504
  • New preconditioned block variant of conjugate gradient PR #507
  • Improvements to documentation: fixes, links, text, example gallery, LFS and more PR #532, PR #543
  • Glossary of data valuation and influence terms in the documentation PR #537
  • Documentation about writing notes for new features, changes or deprecations PR #557

Fixed

  • Bug in LissaInfluence, when not using CPU device PR #495
  • Memory issue with CgInfluence and ArnoldiInfluence PR #498
  • Raising specific error message with install instruction when trying to load pydvl.utils.cache.memcached without pymemcache installed. If pymemcache is available, all symbols from pydvl.utils.cache.memcached are available through pydvl.utils.cache PR #509

Changed

  • Add property model_dtype to instances of type TorchInfluenceFunctionModel
  • Bump versions of CI actions to avoid warnings PR #502
  • Add Python Version 3.11 to supported versions PR #510
  • Documentation improvements and cleanup PR #521, PR #522
  • Simplified parallel backend configuration PR #549

New Contributors

Full Changelog: v0.8.1...v0.9.0

v0.8.1

26 Jan 09:46
Compare
Choose a tag to compare

🆕 New method and notebook, Games with exact shapley values, bug fixes and cleanup 🏗

Added

  • Implement new method: EkfacInfluence #451
  • New notebook to showcase ekfac for LLMs #483
  • Implemented exact games in Castro et al. 2009 and 2017 #341

Fixed

  • Bug in using DaskInfluenceCalcualator with TorchnumpyConverter for single dimensional arrays #485
  • Fix implementations of to methods of TorchInfluenceFunctionModel implementations #487
  • Fixed bug with checking for converged values in semivalues #341

Docs

  • Add applications of data valuation section, display examples more prominently, make all sections visible in table of contents, use mkdocs material cards in the home page #492

New Contributors

Full Changelog: v0.8.0...v0.8.1

v0.8.0

21 Dec 11:35
Compare
Choose a tag to compare

🆕 New interfaces, scaling computation, bug fixes and improvements 🎁

Added

  • New cache backends: InMemoryCacheBackend and DiskCacheBackend PR #458
  • New influence function interface InfluenceFunctionModel
  • Data parallel computation with DaskInfluenceCalculator PR #26
  • Sequential batch-wise computation and write to disk with SequentialInfluenceCalculator PR #377
  • Adapt notebooks to new influence abstractions PR #430

Changed

  • Refactor and simplify caching implementation PR #458
  • Simplify display of computation progress PR #466
  • Improve readme and explain better the examples PR #465
  • Simplify and improve tests, add CodeCov code coverage PR #429
  • Breaking Changes
    • Removed compute_influences and all related code.
      Replaced by new InfluenceFunctionModel interface. Removed modules:
      • influence.general
      • influence.inversion
      • influence.twice_differentiable
      • influence.torch.torch_differentiable

Fixed

Full Changelog: v0.7.1...v0.8.0

v0.7.1

14 Oct 15:15
Compare
Choose a tag to compare

🆕 New methods, bug fixes and improvements for local tests 🐞🧪

Added

  • New method: Class-wise Shapley values PR #338
  • New method: Data-OOB by @BastienZim PR #426, PR #431
  • Added AntitheticPermutationSampler PR #439
  • Faster semi-value computation with per-index check of stopping criteria (optional) PR #437

Changed

  • No longer using docker within tests to start a memcached server PR #444
  • Using pytest-xdist for faster local tests PR #440
  • Improvements and fixes to notebooks PR #436
  • Refactoring of parallel module. Old imports will stop working in v0.9.0 PR #421

Fixed

  • Fix initialization of data_names in ValuationResult.zeros() PR #443

v0.7.0

02 Sep 16:20
Compare
Choose a tag to compare

📚🆕 Documentation and IF overhaul, new methods and bug fixes 💥🐞

This is our first β release! We have worked hard to deliver improvements across
the board, with a focus on documentation and usability. We have also reworked
the internals of the influence module, improved parallelism and handling of
randomness.

Added

  • Implemented solving the Hessian equation via spectral low-rank approximation PR #365
  • Enabled parallel computation for Leave-One-Out values PR #406
  • Added more abbreviations to documentation PR #415
  • Added seed to functions from pydvl.utils.numeric, pydvl.value.shapley and pydvl.value.semivalues. Introduced new type Seed and conversion function ensure_seed_sequence. PR #396

Changed

  • Replaced sphinx with mkdocs for documentation. Major overhaul of documentation PR #352
  • Made ray an optional dependency, relying on joblib as default parallel backend PR #408
  • Decoupled ray.init from ParallelConfig PR #373
  • Breaking Changes
    • Signature change: return information about Hessian inversion from compute_influence_factors PR #375
    • Major changes to IF interface and functionality. Foundation for a framework abstraction for IF computation. PR #278, PR #394
    • Renamed semivalues to compute_generic_semivalues PR #413
    • New joblib backend as default instead of ray. Simplify MapReduceJob. PR #355
    • Bump torch dependency for influence package to 2.0. PR #365

Fixed

  • Fixes to parallel computation of generic semi-values: properly handle all samplers and stopping criteria, irrespective of parallel backend. PR #372
  • Optimize memory usage in IF calculation PR #375
  • Fix adding valuation results with overlapping indices and different lengths PR #370
  • Fixed bugs in conjugate gradient and linear_solve PR #358
  • Fix installation of dev requirements for Python 3.10 PR #382
  • Improvements to IF documentation PR #371

New Contributors

Full Changelog: v0.6.1...v0.7.0

v0.6.1

13 Apr 12:18
Compare
Choose a tag to compare

🏗 Bug fixes and minor improvements

  • Fix parsing keyword arguments of compute_semivalues dispatch function by @kosmitive in #333
  • Create new RayExecutor class based on the concurrent.futures API, use the new class to fix an issue with Truncated Monte Carlo Shapley (TMCS) starting too many processes and dying, plus other small changes by @AnesBenmerzoug in #329
  • Fix creation of GroupedDataset objects using the from_arrays and from_sklearn class methods by @AnesBenmerzoug in #334
  • Fix release job not triggering on CI when a new tag is pushed by @AnesBenmerzoug in #331
  • Added alias ApproShapley from Castro et al. 2009 for permutation Shapley by @mdbenito in #332

Full Changelog: v0.6.0...v0.6.1

v0.6.0

16 Mar 11:06
Compare
Choose a tag to compare

🆕 New algorithms, cleanup and bug fixes 🏗

Full Changelog: v0.5.0...v0.6.0

v0.5.0

21 Feb 07:57
Compare
Choose a tag to compare

🛠️ Fixes, nicer interfaces and... more breaking changes 💥😒

Slow and steady does it

What’s changed

  • Fixed parallel and antithetic Owen sampling for Shapley values. Simplified and extended tests. #267
  • Added Scorer class for a cleaner interface. Fixed minor bugs around Group-Testing Shapley, added more tests and switched to cvxpy for the solver. #264
  • Generalised stopping criteria for valuation algorithms. Improved classes ValuationResult and Status with more operations. Some minor issues fixed. #250
  • Fixed a bug whereby compute_shapley_values would only spawn one process when using n_jobs=-1 and Monte Carlo methods. #270
  • Bugfix in RayParallelBackend: wrong semantics for kwargs. #268
  • Splitting of problem preparation and solution in Least-Core computation. Umbrella function for LC methods. #257
  • Operations on ValuationResult and Status and some cleanup #248
  • Bug fix and minor improvements: Fixes bug in TMCS with remote Ray cluster, raises an error for dummy sequential parallel backend with TMCS, clones model inside Utility before fitting by default, with flag clone_before_fit to disable it, catches all warnings in Utility when show_warnings is False. Adds Miner and Gloves toy games utilities #247

Full Changelog: v0.4.0...v0.5.0