[skip-ci] Create `histv7`, document terminology and design #19213

hahnjo · 2025-06-27T11:05:09Z

No description provided.

Design documents and the implementation will be added over time.

hageboeck

Hello, LGTM!

Of the comments I added, the "flow bins" are my biggest concern.

hageboeck · 2025-06-27T13:12:49Z

hist/histv7/doc/Terminology.md

+
+A *bin index* (plural *indices*) refers to a single bin of a dimension.
+A *normal bin* is inside an axis and its index starts from 0.
+*Underflow* and *overflow* bins, also called *flow bins*, are outside the axis and their index has a special value.


Could "flow bins" be an unfortunate abbreviation? I never heard it outside UHI discussions and/or the ROOT team. I find it imprecise, because integers and floating-point numbers under- or overflow, but they don't "flow" (other than through neural networks and the likes). Along the same lines, a value can under- and overflow an axis range, but it doesn't flow around the axis.

Outside of UHI, it is the terminology employed by boost-histogram as well

OK, let me demonstrate my point a bit better 😄
https://www.google.com/search?q=flow-bin

https://www.google.com/search?q=underflow-bin

I see your point: 'flow bins' isn't widely used in the histogramming world yet.
But it appears extensively in boost-histogram documentation (e.g., views) and in UHI documentation (e.g., slicing). I think it's gaining traction, and ROOT histograms could benefit from adopting it.
I personally find 'flow bins' short and clear, but 'under/overflow bins' or 'out-of-range bins' work as well I guess 🤷🏻‍♀️

I agree on your point, Stephan, I had also not really heard the term before Silia mentioned it. I do think though that we can still establish it now, because there is no clear and concise term yet. Thinking a bit during the weekend, a potential alternative would be "excess bins" but that has the disadvantage that it's completely new...

hist/histv7/doc/DesignImplementation.md

jblomer

Great! Very good starting point.

jblomer · 2025-06-27T13:28:03Z

hist/histv7/doc/DesignImplementation.md

@@ -0,0 +1,73 @@
+# Design and Implementation


Maybe "Interface Design"? Unless that will get extended by more aspects.

It already has some aspects that go beyond interface in the "Miscellaneous" section. The way I view this document are the key choices that lead to the histogram package. For example, one aspect still missing is the goal of being able to support concurrent filling without degrading sequential filling, which we get by storing the bin contents as-is and only using atomic instructions on top. I plan to have the "Code Architecture" (ie how it's organized into classes) in a separate document.

What about "Package Design" maybe, in the sense of "software architecture design"?

jblomer · 2025-06-27T13:32:52Z

hist/histv7/doc/Terminology.md

+The *bin content type* can be an integer type, a floating-point type, the special `DoubleBinWithError`, or a user-defined type.
+
+A *bin error* is the Poisson error of a bin content.
+With the special `DoubleBinWithError`, it is the square root of the sum of weights squared: $\sqrt{\sum w_i^2}$


For later: can we prevent weighted filling with the wrong bin content type (double instead of DoubleBinWithError?

Yes, in the prototype I have static_asserts failing when trying to use weighted filling on integer bin content types. So far I've opted to allow it for floating-point types because that may be interesting when only looking at the bin content, but not the error. It's possible to reconsider at a later stage.

hist/histv7/doc/Terminology.md

jblomer · 2025-06-27T13:34:46Z

hist/histv7/doc/Terminology.md

+With the special `DoubleBinWithError`, it is the square root of the sum of weights squared: $\sqrt{\sum w_i^2}$
+Otherwise it is the square root of the bin content, which is only correct with unweighted filling.
+
+A *bin index* (plural *indices*) refers to a single bin of a dimension.


What does this mean for multi-dimensional histograms?

For a multi-dimensional histogram, you'll need an array of bin indices to refer to a single bin content. It's indeed ambiguous how we define a "bin": Is it per dimension or is it only once you give coordinates for all dimensions, ie what really has the bin content in the end?

The ambiguity theoretically exists also for the TH1-style histograms, but in practice, it is always clear from the context if you are talking about a bin on an axis (1-dim), or a bin in the histogram (possibly an index tuple). I guess one could add a sentence about bins in histogram and cover all cases.

Yes, that's why there is no definition of a "bin" itself 😇 for "bin content" it is clear that it's the bin of a histogram, and for a single "bin index" it makes life easier if that's on a single axis.

Could you concretely propose where you would like to see added what?

hist/histv7/doc/Terminology.md

jblomer · 2025-06-27T13:38:25Z

hist/histv7/doc/Terminology.md

+
+A *linear index* starts from 0 up to the total number of bins, potentially including flow bins.
+For a single axis, it places the flow bins after the normal bins.
+The *global index* is a combination of the linear indices from all axes.


Perhaps give more details here on the flattening.

I'm not sure, in the end I would consider this an implementation detail. I plan to have API functions to compute the global index, but in principle I don't want users to do this themselves...

siliataider

I think this is a great start to unify the terminology, thanks!

siliataider · 2025-06-27T14:00:58Z

hist/histv7/doc/Terminology.md

+
+A *bin index* (plural *indices*) refers to a single bin of a dimension.
+A *normal bin* is inside an axis and its index starts from 0.
+*Underflow* and *overflow* bins, also called *flow bins*, are outside the axis and their index has a special value.


Outside of UHI, it is the terminology employed by boost-histogram as well

hahnjo added 2 commits June 27, 2025 12:40

[hist] Remove outdated comments in CMakeLists.txt

fd82076

[hist] Create directory for new histograms

d572af8

Design documents and the implementation will be added over time.

hahnjo requested review from jblomer, hageboeck and siliataider June 27, 2025 11:05

hahnjo self-assigned this Jun 27, 2025

hahnjo requested a review from bellenot as a code owner June 27, 2025 11:05

hahnjo added the in:Hist label Jun 27, 2025

hahnjo requested review from lmoneta and dpiparo as code owners June 27, 2025 11:05

hageboeck reviewed Jun 27, 2025

View reviewed changes

jblomer approved these changes Jun 27, 2025

View reviewed changes

siliataider approved these changes Jun 27, 2025

View reviewed changes

hahnjo added 2 commits June 30, 2025 08:40

[hist] Document histogram terminology

dc089c5

[hist] Document design decisions and implementation choices

ccfa9cb

hahnjo force-pushed the hist-doc branch from 09818fa to ccfa9cb Compare June 30, 2025 06:43

[skip-ci] Create histv7, document terminology and design #19213

Are you sure you want to change the base?

[skip-ci] Create histv7, document terminology and design #19213

Conversation

hahnjo commented Jun 27, 2025

Uh oh!

hageboeck left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jblomer left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

siliataider left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

[skip-ci] Create `histv7`, document terminology and design #19213

[skip-ci] Create `histv7`, document terminology and design #19213