-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[tmva] Implementation of a new shuffling strategy in RBatchGenerator #18887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[tmva] Implementation of a new shuffling strategy in RBatchGenerator #18887
Conversation
… in the dataframe
Test Results 21 files 21 suites 3d 6h 23m 27s ⏱️ Results for commit 8d8ecf2. ♻️ This comment has been updated with latest results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @martinfoell, thank you for this nice PR! I am taking just a very first look, so I mainly went over your code and tried to see what could still be improved before we go into some more details. I left a few comments here and there, when I spotted some typos, but the main thing I would focus on is adding more documentation (doxygen handles) to your functions, as this would be very helpful for me and other reviewers as well as for the users in the future. I would also suggest that you add a few lines of comments in the more complex functions to explain a bit the logic behind, for example, CreateTrainingChunksIntervals()
, LoadTrainingChunk()
etc. There are also some lines here and there that were commented out and should be cleaned up now.
bindings/pyroot/pythonizations/python/ROOT/_pythonization/_tmva/_batchgenerator.py
Outdated
Show resolved
Hide resolved
bindings/pyroot/pythonizations/python/ROOT/_pythonization/_tmva/_batchgenerator.py
Outdated
Show resolved
Hide resolved
bindings/pyroot/pythonizations/python/ROOT/_pythonization/_tmva/_batchgenerator.py
Outdated
Show resolved
Hide resolved
bindings/pyroot/pythonizations/python/ROOT/_pythonization/_tmva/_batchgenerator.py
Outdated
Show resolved
Hide resolved
bindings/pyroot/pythonizations/python/ROOT/_pythonization/_tmva/_batchgenerator.py
Outdated
Show resolved
Hide resolved
/// @param eventIndices | ||
void SaveRemainingData(TMVA::Experimental::RTensor<float> &remainderTensor, const std::size_t remainderTensorRow, | ||
const std::vector<std::size_t> eventIndices, const std::size_t start = 0) | ||
TMVA::Experimental::RTensor<float> GetValidationBatch() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have some description of what each function does so that it can be easily accessed in doxygen as well - please review all the functions added and add the /// @brief
etc to them.
Thank you for your comments @martamaja10 ! I have addressed the comments you gave and added more doxygen comments for the documentation of the functions. I also added more general comments at the beginning of each class to describe what they are used for. |
This Pull request:
Introduces a new shuffling strategy for creating training batches, ensuring that each batch consists of data from different parts of the RDataFrame. Each chunk loaded into memory, which is used to create batches, now contains blocks of data drawn from different parts of the dataframe.