Skip to content

[tmva] Implementation of a new shuffling strategy in RBatchGenerator #18887

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

martinfoell
Copy link
Contributor

This Pull request:

Introduces a new shuffling strategy for creating training batches, ensuring that each batch consists of data from different parts of the RDataFrame. Each chunk loaded into memory, which is used to create batches, now contains blocks of data drawn from different parts of the dataframe.

Copy link

github-actions bot commented May 28, 2025

Test Results

    21 files      21 suites   3d 6h 23m 27s ⏱️
 3 217 tests  3 217 ✅ 0 💤 0 ❌
65 834 runs  65 834 ✅ 0 💤 0 ❌

Results for commit 8d8ecf2.

♻️ This comment has been updated with latest results.

Copy link
Contributor

@martamaja10 martamaja10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @martinfoell, thank you for this nice PR! I am taking just a very first look, so I mainly went over your code and tried to see what could still be improved before we go into some more details. I left a few comments here and there, when I spotted some typos, but the main thing I would focus on is adding more documentation (doxygen handles) to your functions, as this would be very helpful for me and other reviewers as well as for the users in the future. I would also suggest that you add a few lines of comments in the more complex functions to explain a bit the logic behind, for example, CreateTrainingChunksIntervals(), LoadTrainingChunk() etc. There are also some lines here and there that were commented out and should be cleaned up now.

/// @param eventIndices
void SaveRemainingData(TMVA::Experimental::RTensor<float> &remainderTensor, const std::size_t remainderTensorRow,
const std::vector<std::size_t> eventIndices, const std::size_t start = 0)
TMVA::Experimental::RTensor<float> GetValidationBatch()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have some description of what each function does so that it can be easily accessed in doxygen as well - please review all the functions added and add the /// @brief etc to them.

@martinfoell
Copy link
Contributor Author

Thank you for your comments @martamaja10 ! I have addressed the comments you gave and added more doxygen comments for the documentation of the functions. I also added more general comments at the beginning of each class to describe what they are used for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants