root-project · hahnjo · Jul 2, 2025 · Jun 27, 2025 · Jun 27, 2025 · Jun 27, 2025
@@ -18,6 +18,7 @@
 /graf3d/ @couet
 /gui/ @bellenot
 /hist/ @lmoneta
+/hist/histv7/ @hahnjo
 /html/ @dpiparo
 /icons/ @bellenot
 /interpreter/ @dpiparo

@@ -4,11 +4,14 @@
 # For the licensing terms see $ROOTSYS/LICENSE.
 # For the list of contributors see $ROOTSYS/README/CREDITS.
 
-add_subdirectory(hist)             # special CMakeLists.txt
-add_subdirectory(histpainter)      # special CMakeLists.txt
+add_subdirectory(hist)
+add_subdirectory(histpainter)
+if(root7)
+  add_subdirectory(histv7)
+endif()
 if (spectrum)
    add_subdirectory(spectrum)
-   add_subdirectory(spectrumpainter)  # special CMakeLists.txt
+   add_subdirectory(spectrumpainter)
 endif()
 if(unfold)
   add_subdirectory(unfold)

@@ -0,0 +1,73 @@
+# Design and Implementation
+
+This document describes key design decisions and implementation choices.
+
+## Templating
+
+Classes are only templated if required for data members, in particular the bin content type `T`.
+We use member function templates to accept variable number of arguments (see also below).
+Classes are **not** templated to improve performance, in particular not on the axis type(s).
+This avoids an explosion of types and simplifies serialization.
+Instead axis objects are run-time choices and stored in a `std::variant`.
+With a careful design, this still results in excellent performance.
+
+## Performance Optimizations
+
+If required, it would be possible to template performance-critical functions on the axis types.
+This was shown beneficial in microbenchmarks for one-dimensional histograms.
+However, it will not be implemented until shown useful in a real-world application.
+In virtually all cases, filling a (one-dimensional) histogram is negligible compared to reading, decompressing, and processing of data.
+
+The same applies for other optimizations, such as caching the pointer to the axis object stored in the `std::variant`.
+Such optimizations should only be implemented with a careful motivation for real-world applications.
+
+## Functions with Variable Number of Arguments
+
+Many member functions have two overloads: one accepting a function parameter pack and one accepting a `std::tuple` or `std::array`.
+
+### Arguments with Different Types
+
+Functions that take arguments with different types expect a `std::tuple`.
+An example is `template <typename A...> void Fill(const std::tuple<A...> &args)`.
+
+For user-convenience, a variadic function template forwards to the `std::tuple` overload:
+```cpp
+template <typename... A> void Fill(const A &...args) {
+   Fill(std::forward_as_tuple(args...));
+}
+```
+This will forward the arguments as references, so no copy-constructors are called (that could potentially be expensive).
+
+### Arguments with Same Type
+
+In this case, the function has a `std::size_t N` template argument and accepts a `std::array`.
+An example is `template <std::size_t N> const T &GetBinContent(const std::array<RBinIndex, N> &args)`
+
+For user-convenience, a variadic function template forwards to the `std::array` overload:
+```cpp
+template <typename... A> const T &GetBinContent(const A &...args) {
+   std::array<RBinIndex, sizeof...(A)> a{args...};
+   return GetBinContent(a);
+}
+```
+This will copy the arguments, which is fine in this case because `RBinIndex` is small (see below).
+
+### Special Arguments
+
+Special arguments are passed last.
+Examples include
+```cpp
+template <typename... A> void Fill(const std::tuple<A...> &args, RWeight w);
+template <std::size_t N> void SetBinContent(const std::array<RBinIndex, N> &args, const T &content);
+```
+The same works for the variadic function templates that will check the type of the last argument.
+
+For profiles, we accept the value with a template type as well to allow automatic conversion to `double`, for example from `int`.
+
+## Miscellaneous
+
+The implementation uses standard [C++17](https://en.cppreference.com/w/cpp/17.html):
+ * No backports from later C++ versions, such as `std::span`, and
+ * No ROOT types, to make sure the histogram package can be compiled standalone.
+
+Small objects are passed by value instead of by reference (`RBinIndex`, `RWeight`).
@@ -0,0 +1,54 @@
+# Histogram Terminology
+
+This document collects, defines, and explains terms that are used in ROOT's histogram package.
+The goal is to start from a common understanding, which should avoid ambiguities and ease discussions.
+It also helps (future) developers to navigate the code because classes and methods are named accordingly.
+The list is ordered alphabetically, though dependent terms are kept together with their parent.
+It is supposed to be exhaustive; any missing term should be added when needed.
+
+An *axis* is a bin configuration in one dimension.
+A *regular axis* has equidistant bins in the interval $[a, b)$.
+A *variable bin axis* is configured with explicit bin edges $[e_{n}, e_{n+1})$.
+A *categorical axis* has a unique label per bin.
+*Axes* is the plural of axis and usually means the bin configurations for all dimensions of a histogram.
+
+A *bin content* is the value of a single bin.
+The *bin content type* can be an integer type, a floating-point type, the special `RDoubleBinWithError`, or a user-defined type.
+
+A *bin error* is the Poisson error of a bin content.
+With the special `RDoubleBinWithError`, it is the square root of the sum of weights squared: $\sqrt{\sum w_i^2}$
+Otherwise it is the square root of the bin content, which is only correct with unweighted filling.
+
+A *bin index* (plural *indices*) refers to a single bin of a dimension, an array of indices refers to a bin in a histogram.
+A *normal bin* is inside an axis and its index starts from 0.
+*Underflow* and *overflow* bins, also called *flow bins*, are outside the axis and their index has a special value.
+The *invalid bin index* is another special value.
+
+A *bin index range* is a range from `begin` (inclusive) to `end` (exclusive).
+For its purpose, the underflow bin is ordered before all normal bins while the overflow bin is placed after.
+As the `end` is exclusive, the invalid bin index is ordered last to make it possible to include the overflow bin.
+
+*Filling* a histogram means to add an entry to a histogram.
+*Concurrent filling* allows to modify the same histogram without (external) synchronization.
+
+A *histogram* is the combination of an axes configuration and storage of bin contents.
+For most use cases, it also includes (global) *histogram statistics*.
+On the one hand, these are the number of entries, the sum of weights, and the sum of weights squared.
+The number of *effective entries* can be computed as the ratio $$\frac{(\sum w_i)^2}{\sum w_i^2}$$.
+Furthermore, for each dimension the histogram statistics include the sum of weights times value and the sum of weights times value squared.
+This allows to compute the arithmetic mean and the standard deviation of the values before binning.
+
+A *linearized index* starts from 0 up to the total number of bins, potentially including flow bins.
+For a single axis, it places the flow bins after the normal bins.
+The *global index* is a combination of the linearized indices from all axes.
+
+A *profile* is a histogram that computes the arithmetic mean and standard deviation per bin.
+During filling, it accepts an additional `double` value and accumulates its sum and sum of squares.
+
+*Slicing* means to extract a subset of the normal bins in each dimension.
+Bin contents of excluded normal bins are added to the flow bins.
+
+A *snapshot* is a consistent clone of the histogram during concurrent filling.
+
+A *weight* is an optional floating-point value passed during filling.
+It defaults to $1$ if not specified, which is also called unweighted filling.