[tmva] Implementation of a new shuffling strategy in RBatchGenerator #18887

martinfoell · 2025-05-28T15:13:58Z

This Pull request:

Introduces a new shuffling strategy for creating training batches, ensuring that each batch consists of data from different parts of the RDataFrame. Each chunk loaded into memory, which is used to create batches, now contains blocks of data drawn from different parts of the dataframe.

… in the dataframe

…dding

github-actions · 2025-05-28T19:15:28Z

Test Results

21 files 21 suites 3d 6h 23m 27s ⏱️
3 217 tests 3 217 ✅ 0 💤 0 ❌
65 834 runs 65 834 ✅ 0 💤 0 ❌

Results for commit 8d8ecf2.

♻️ This comment has been updated with latest results.

martamaja10

Hi @martinfoell, thank you for this nice PR! I am taking just a very first look, so I mainly went over your code and tried to see what could still be improved before we go into some more details. I left a few comments here and there, when I spotted some typos, but the main thing I would focus on is adding more documentation (doxygen handles) to your functions, as this would be very helpful for me and other reviewers as well as for the users in the future. I would also suggest that you add a few lines of comments in the more complex functions to explain a bit the logic behind, for example, CreateTrainingChunksIntervals(), LoadTrainingChunk() etc. There are also some lines here and there that were commented out and should be cleaned up now.

bindings/pyroot/pythonizations/python/ROOT/_pythonization/_tmva/_batchgenerator.py

tmva/tmva/inc/TMVA/BatchGenerator/RBatchGenerator.hxx

bindings/pyroot/pythonizations/python/ROOT/_pythonization/_tmva/_batchgenerator.py

tmva/tmva/inc/TMVA/BatchGenerator/RBatchGenerator.hxx

martamaja10 · 2025-06-05T13:09:13Z

tmva/tmva/inc/TMVA/BatchGenerator/RBatchLoader.hxx

-   /// @param eventIndices
-   void SaveRemainingData(TMVA::Experimental::RTensor<float> &remainderTensor, const std::size_t remainderTensorRow,
-                          const std::vector<std::size_t> eventIndices, const std::size_t start = 0)
+   TMVA::Experimental::RTensor<float> GetValidationBatch()


It would be nice to have some description of what each function does so that it can be easily accessed in doxygen as well - please review all the functions added and add the /// @brief etc to them.

martinfoell · 2025-07-22T08:07:17Z

Thank you for your comments @martamaja10 ! I have addressed the comments you gave and added more doxygen comments for the documentation of the functions. I also added more general comments at the beginning of each class to describe what they are used for.

Martin Foll added 8 commits May 28, 2025 16:44

Add RChunkConstructor.hxx for constructing chunks from blocks of data…

e21acb8

… in the dataframe

Add RChunkConstructor.hxx to CMakeLists.txt

4807c95

Update RChunkLoader.hxx for loading chunks into memory

53da378

Update RBatchLoader.hxx for creating batches from the chunks

dfa37b7

Update RBatchGenerator.hxx for generating batches from a dataframe

499b970

Update pythonization of RBatchGenerator

1d07702

Adjust RBatchGenerator tests and add tests for set_seed and vector pa…

da69c1f

…dding

Update RBatchGenerator tutorials

955cc84

martinfoell self-assigned this May 28, 2025

martinfoell requested review from bellenot, couet, lmoneta, dpiparo and vepadulano as code owners May 28, 2025 15:13

martamaja10 reviewed Jun 5, 2025

View reviewed changes

Martin Foll added 2 commits June 6, 2025 14:02

Fix typos and clean up

d79c12d

Add documentation and comments

b893283

Optimization of loading chunks by sorting BlocksInChunks first

8d8ecf2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[tmva] Implementation of a new shuffling strategy in RBatchGenerator #18887

[tmva] Implementation of a new shuffling strategy in RBatchGenerator #18887

Uh oh!

martinfoell commented May 28, 2025

Uh oh!

github-actions bot commented May 28, 2025 •

edited

Loading

Uh oh!

martamaja10 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

martamaja10 Jun 5, 2025

Uh oh!

martinfoell commented Jul 22, 2025

Uh oh!

Uh oh!

[tmva] Implementation of a new shuffling strategy in RBatchGenerator #18887

Are you sure you want to change the base?

[tmva] Implementation of a new shuffling strategy in RBatchGenerator #18887

Uh oh!

Conversation

martinfoell commented May 28, 2025

This Pull request:

Uh oh!

github-actions bot commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results

Uh oh!

martamaja10 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

martamaja10 Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

martinfoell commented Jul 22, 2025

Uh oh!

Uh oh!

github-actions bot commented May 28, 2025 •

edited

Loading