-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[skip-ci] Create histv7
, document terminology and design
#19213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
fd82076
[hist] Remove outdated comments in CMakeLists.txt
hahnjo d572af8
[hist] Create directory for new histograms
hahnjo 76aaebe
[hist] Document histogram terminology
hahnjo 4f74506
[hist] Document design decisions and implementation choices
hahnjo File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
# Design and Implementation | ||
|
||
This document describes key design decisions and implementation choices. | ||
|
||
## Templating | ||
|
||
Classes are only templated if required for data members, in particular the bin content type `T`. | ||
We use member function templates to accept variable number of arguments (see also below). | ||
Classes are **not** templated to improve performance, in particular not on the axis type(s). | ||
This avoids an explosion of types and simplifies serialization. | ||
Instead axis objects are run-time choices and stored in a `std::variant`. | ||
With a careful design, this still results in excellent performance. | ||
|
||
## Performance Optimizations | ||
|
||
If required, it would be possible to template performance-critical functions on the axis types. | ||
This was shown beneficial in microbenchmarks for one-dimensional histograms. | ||
However, it will not be implemented until shown useful in a real-world application. | ||
In virtually all cases, filling a (one-dimensional) histogram is negligible compared to reading, decompressing, and processing of data. | ||
|
||
The same applies for other optimizations, such as caching the pointer to the axis object stored in the `std::variant`. | ||
Such optimizations should only be implemented with a careful motivation for real-world applications. | ||
|
||
## Functions with Variable Number of Arguments | ||
|
||
Many member functions have two overloads: one accepting a function parameter pack and one accepting a `std::tuple` or `std::array`. | ||
|
||
### Arguments with Different Types | ||
|
||
Functions that take arguments with different types expect a `std::tuple`. | ||
An example is `template <typename A...> void Fill(const std::tuple<A...> &args)`. | ||
|
||
For user-convenience, a variadic function template forwards to the `std::tuple` overload: | ||
```cpp | ||
template <typename... A> void Fill(const A &...args) { | ||
Fill(std::forward_as_tuple(args...)); | ||
} | ||
``` | ||
This will forward the arguments as references, so no copy-constructors are called (that could potentially be expensive). | ||
|
||
### Arguments with Same Type | ||
|
||
In this case, the function has a `std::size_t N` template argument and accepts a `std::array`. | ||
An example is `template <std::size_t N> const T &GetBinContent(const std::array<RBinIndex, N> &args)` | ||
|
||
For user-convenience, a variadic function template forwards to the `std::array` overload: | ||
```cpp | ||
template <typename... A> const T &GetBinContent(const A &...args) { | ||
std::array<RBinIndex, sizeof...(A)> a{args...}; | ||
return GetBinContent(a); | ||
} | ||
``` | ||
This will copy the arguments, which is fine in this case because `RBinIndex` is small (see below). | ||
|
||
### Special Arguments | ||
|
||
Special arguments are passed last. | ||
Examples include | ||
```cpp | ||
template <typename... A> void Fill(const std::tuple<A...> &args, RWeight w); | ||
template <std::size_t N> void SetBinContent(const std::array<RBinIndex, N> &args, const T &content); | ||
``` | ||
The same works for the variadic function templates that will check the type of the last argument. | ||
|
||
For profiles, we accept the value with a template type as well to allow automatic conversion to `double`, for example from `int`. | ||
|
||
## Miscellaneous | ||
|
||
The implementation uses standard [C++17](https://en.cppreference.com/w/cpp/17.html): | ||
* No backports from later C++ versions, such as `std::span`, and | ||
* No ROOT types, to make sure the histogram package can be compiled standalone. | ||
|
||
Small objects are passed by value instead of by reference (`RBinIndex`, `RWeight`). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
# Histogram Terminology | ||
|
||
This document collects, defines, and explains terms that are used in ROOT's histogram package. | ||
The goal is to start from a common understanding, which should avoid ambiguities and ease discussions. | ||
It also helps (future) developers to navigate the code because classes and methods are named accordingly. | ||
The list is ordered alphabetically, though dependent terms are kept together with their parent. | ||
It is supposed to be exhaustive; any missing term should be added when needed. | ||
|
||
An *axis* is a bin configuration in one dimension. | ||
A *regular axis* has equidistant bins in the interval $[a, b)$. | ||
A *variable bin axis* is configured with explicit bin edges $[e_{n}, e_{n+1})$. | ||
A *categorical axis* has a unique label per bin. | ||
*Axes* is the plural of axis and usually means the bin configurations for all dimensions of a histogram. | ||
|
||
A *bin content* is the value of a single bin. | ||
The *bin content type* can be an integer type, a floating-point type, the special `RDoubleBinWithError`, or a user-defined type. | ||
|
||
A *bin error* is the Poisson error of a bin content. | ||
With the special `RDoubleBinWithError`, it is the square root of the sum of weights squared: $\sqrt{\sum w_i^2}$ | ||
Otherwise it is the square root of the bin content, which is only correct with unweighted filling. | ||
|
||
A *bin index* (plural *indices*) refers to a single bin of a dimension, an array of indices refers to a bin in a histogram. | ||
A *normal bin* is inside an axis and its index starts from 0. | ||
*Underflow* and *overflow* bins, also called *flow bins*, are outside the axis and their index has a special value. | ||
hahnjo marked this conversation as resolved.
Show resolved
Hide resolved
|
||
The *invalid bin index* is another special value. | ||
|
||
A *bin index range* is a range from `begin` (inclusive) to `end` (exclusive). | ||
For its purpose, the underflow bin is ordered before all normal bins while the overflow bin is placed after. | ||
As the `end` is exclusive, the invalid bin index is ordered last to make it possible to include the overflow bin. | ||
|
||
*Filling* a histogram means to add an entry to a histogram. | ||
*Concurrent filling* allows to modify the same histogram without (external) synchronization. | ||
|
||
A *histogram* is the combination of an axes configuration and storage of bin contents. | ||
For most use cases, it also includes (global) *histogram statistics*. | ||
On the one hand, these are the number of entries, the sum of weights, and the sum of weights squared. | ||
The number of *effective entries* can be computed as the ratio $$\frac{(\sum w_i)^2}{\sum w_i^2}$$. | ||
Furthermore, for each dimension the histogram statistics include the sum of weights times value and the sum of weights times value squared. | ||
This allows to compute the arithmetic mean and the standard deviation of the values before binning. | ||
|
||
A *linearized index* starts from 0 up to the total number of bins, potentially including flow bins. | ||
For a single axis, it places the flow bins after the normal bins. | ||
The *global index* is a combination of the linearized indices from all axes. | ||
|
||
A *profile* is a histogram that computes the arithmetic mean and standard deviation per bin. | ||
During filling, it accepts an additional `double` value and accumulates its sum and sum of squares. | ||
|
||
*Slicing* means to extract a subset of the normal bins in each dimension. | ||
Bin contents of excluded normal bins are added to the flow bins. | ||
|
||
A *snapshot* is a consistent clone of the histogram during concurrent filling. | ||
|
||
A *weight* is an optional floating-point value passed during filling. | ||
It defaults to $1$ if not specified, which is also called unweighted filling. |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.