Create Monte Carlo Workspace Algorithm #38341

Despiix · 2024-10-31T16:23:17Z

Description of work

Summary of work

To benchmark the Mantid fitting engine, we need a method to generate randomly distributed data based on specific fit functions or workspaces. This will allow us to compare the generated data with known true values, providing insights into the performance of the fitting engine. I implemented a Mantid algorithm that uses a Cumulative Distribution Function (CDF) approach. This algorithm:

Takes a Workspace as Input: The input workspace defines the distribution’s characteristics.
Generates a Random Distribution: Using the CDF of the input data, the algorithm samples random points, producing a distribution that mirrors the input.
Ensures Reproducibility: The algorithm seeds the random number generator, ensuring reproducible results for consistent benchmarking.

Fixes #38196.

Report to: @thomashampson

Further detail of work

**Expected Outcome:**

This algorithm allows us to create random distributions based on known values, making it easier to assess the fitting engine’s accuracy and performance across different scenarios.

To test:

Validate my tests
Run the tests
Verify algorithm does what it's expected to do.
Verify that documentation is correct + Location of image is correct

Reviewer

Please comment on the points listed below (full description).
Your comments will be used as part of the gatekeeper process, so please comment clearly on what you have checked during your review. If changes are made to the PR during the review process then your final comment will be the most important for gatekeepers. In this comment you should make it clear why any earlier review is still valid, or confirm that all requested changes have been addressed.

Code Review

Is the code of an acceptable quality?
Does the code conform to the coding standards?
Are the unit tests small and test the class in isolation?
If there is GUI work does it follow the GUI standards?
If there are changes in the release notes then do they describe the changes appropriately?
Do the release notes conform to the release notes guide?

Functional Tests

Do changes function as described? Add comments below that describe the tests performed?
Do the changes handle unexpected situations, e.g. bad input?
Has the relevant (user and developer) documentation been added/updated?

Does everything look good? Mark the review as Approve. A member of @mantidproject/gatekeepers will take care of it.

Gatekeeper

If you need to request changes to a PR then please add a comment and set the review status to "Request changes". This will stop the PR from showing up in the list for other gatekeepers.

Framework/Algorithms/inc/MantidAlgorithms/CreateMonteCarloWorkspace.h

RichardWaiteSTFC

Sorry for sticking my oar in here - I found this very interesting (I do a similar thing to generate random data in 3D Q for single-crystal peak integration testing).

I'm aware it may be a work in progress/unfinished, so please forgive me, but I had a couple of questions:

(1) What is the advantage of this over using numpy random number generation like so (assuming the bin-width is small enough you don't need to worry about variation in the underlying pdf over the width of a bin) ?

params =  np.array([2.5e4, 0.06,  0.015, 30000, 30,  50])

peak_func = FunctionFactory.Instance().createPeakFunction("BackToBackExponential")
[peak_func.setParameter(ipar, params[ipar]) for ipar in range(peak_func.nParams())]
comp_func = FunctionWrapper(peak_func) + FlatBackground(A0=params[-1])

# simuate data
np.random.seed(1)
x = np.linspace(ptrue[3]-350, ptrue[3]+500, 61)
y = np.random.poisson(comp_func(x))
e = np.sqrt(y)
ws = CreateWorkspace(DataX=x, DataY=y, DataE=e)

(2) This seems to only support single histogram workspaces, but it doesn't check the number of histograms in the input workspace (unless I missed it?) - would it just make sense to generalise it and add a (parallelised) loop over histograms?

RichardWaiteSTFC · 2024-11-18T10:17:32Z

Framework/Algorithms/src/CreateMonteCarloWorkspace.cpp

+
+int CreateMonteCarloWorkspace::computeNumberOfIterations(const Mantid::HistogramData::HistogramY &yData) {
+  double total_counts = std::accumulate(yData.begin(), yData.end(), 0.0);
+  return static_cast<int>(std::round(total_counts));


What if this rounded to 0? I suppose you'd get 0s in the data, this might be confusing? Perhaps at least throw a warning?
Would it make more sense to add the number of MC events as an input parameter? Say for example you want to see how counting times/stats affect optimiser performance, I think to do this you would need to scale the input workspace which seems a bit clunky.

I think having the parameter to specify number of MC events is good, and the default could be the integral of the input.

I commited the suggestion above, please have a look and let me know. The scaling function is not perfect, if the number of MC events is too little, the new workspace appears to be a flat line when overplotted onto the original workspace. I am open to any suggestions on ways to improve it.

Framework/Algorithms/src/CreateMonteCarloWorkspace.cpp

RichardWaiteSTFC · 2024-11-18T10:19:56Z

Framework/Algorithms/src/CreateMonteCarloWorkspace.cpp

+
+  MatrixWorkspace_sptr outputWs = WorkspaceFactory::Instance().create(instWs);
+  progress.report();
+  std::this_thread::sleep_for(std::chrono::milliseconds(100));


Maybe a stupid question, sorry, but why do you need these sleep statements?

Not a stupid question, I found it was the only way to get the progress bar to actually work. If I remove the sleep statements it doesn't work anymore. I believe it's because the code execution might complete so quickly that the progress bar doesn't have enough time to refresh or visually update its state.

If you happen to know a better way to implement it please feel free to let me know.

I think you might've have the progress bar a little wrong (although I'm no expert).

If your algorithm, for example, iterated over the spectra of an input workspace and performed some kind of operation with each, you would have the progress bar update after each spectra operation. I think that's the idea anyway, it's showing the progress of the main bulk of the computation. I think for you this would be in fillHistogramWithRandomData.

I think you can also set it to display text (e.g. "writing output"). Maybe you can have a look at some other algorithms and see how it's used there.

I updated the code for the progress bar, although personally I prefer the previous look, the new version does not need the sleep statements.

Despiix · 2024-11-18T16:22:04Z

Sorry for sticking my oar in here - I found this very interesting (I do a similar thing to generate random data in 3D Q for single-crystal peak integration testing).

I'm aware it may be a work in progress/unfinished, so please forgive me, but I had a couple of questions:

(1) What is the advantage of this over using numpy random number generation like so (assuming the bin-width is small enough you don't need to worry about variation in the underlying pdf over the width of a bin) ?
params =  np.array([2.5e4, 0.06,  0.015, 30000, 30,  50])

peak_func = FunctionFactory.Instance().createPeakFunction("BackToBackExponential")
[peak_func.setParameter(ipar, params[ipar]) for ipar in range(peak_func.nParams())]
comp_func = FunctionWrapper(peak_func) + FlatBackground(A0=params[-1])

# simuate data
np.random.seed(1)
x = np.linspace(ptrue[3]-350, ptrue[3]+500, 61)
y = np.random.poisson(comp_func(x))
e = np.sqrt(y)
ws = CreateWorkspace(DataX=x, DataY=y, DataE=e)

Thank you for taking the time to review my algorithm. To answer to your first question based on my understanding:

The main advantage of my approach is that it doesn't assume any specific underlying functional form for the data. By constructing a cumulative distribution function directly from the input data and sampling from it, we can capture the actual empirical distribution, including any irregularities or complexities that may not be well-represented by a predefined function. This is done to deal with data where the underlying distribution is unknown, complex, or doesn't fit standard models.

Please correct the above if I am wrong, to be honest I hadn't considered using numpy random number generation. Let me know if you still think it would be a better alternative.

...source/release/v6.12.0/Inelastic/Algorithms/New_features/38196_CreateMonteCarloWorkspace.rst

docs/source/algorithms/CreateMonteCarloWorkspace-v1.rst

jhaigh0 · 2024-11-19T15:27:35Z

Framework/Algorithms/src/CreateMonteCarloWorkspace.cpp

+
+  MatrixWorkspace_sptr outputWs = WorkspaceFactory::Instance().create(instWs);
+  progress.report();
+  std::this_thread::sleep_for(std::chrono::milliseconds(100));


I think you might've have the progress bar a little wrong (although I'm no expert).

If your algorithm, for example, iterated over the spectra of an input workspace and performed some kind of operation with each, you would have the progress bar update after each spectra operation. I think that's the idea anyway, it's showing the progress of the main bulk of the computation. I think for you this would be in fillHistogramWithRandomData.

I think you can also set it to display text (e.g. "writing output"). Maybe you can have a look at some other algorithms and see how it's used there.

Framework/Algorithms/src/CreateMonteCarloWorkspace.cpp

jhaigh0 · 2024-11-19T15:33:48Z

Framework/Algorithms/test/CreateMonteCarloWorkspaceTest.h

+    // The total counts should be less than numIterations because some random numbers are not counted
+    auto sumCounts = std::accumulate(outputY.begin(), outputY.end(), 0.0);
+    TS_ASSERT_LESS_THAN(sumCounts, numIterations);


do you know that this will always happen, is there a chance these two value could be equal?

I think these are equal by design in this algorithm, this is in contrast to the method of randomly generating data I posted, where the total counts itself would be Poisson distributed (I believe, I should check...) - I've asked @Despiix to check the distribution of counts in each bin from multiple simulations are Poisson distributed (or close enough).

I think having the parameter to specify number of MC events is good, and the default could be the integral of the input.

@thomashampson - what about making the number of MC events equal to a random number drawn from Poisson distribution with mean of integral of the input? This might help the Poisson-ness of the stats in each bin (if that makes sense...)

RichardWaiteSTFC · 2024-12-03T14:04:00Z

Framework/Algorithms/src/CreateMonteCarloWorkspace.cpp

+
+//----------------------------------------------------------------------------------------------
+
+Mantid::HistogramData::HistogramY CreateMonteCarloWorkspace::fillHistogramWithRandomData(const std::vector<double> &cdf,


I'm wondering whether one could optionally output an event workspace (which is essentially what you are simulating), or produce an event workspace and then rebin to match the input workspace?

sf1919 · 2025-01-14T15:17:37Z

Does this need to move to v6.13?

Despiix · 2025-01-14T15:27:10Z

Does this need to move to v6.13?

No, I am waiting for someone to review it. All the additional features will be added through a seperate pr.

RichardWaiteSTFC

Thanks for addressing some of the points, in particular setting the errors.

This works well and the tests are good (checking reproducibility with the seed, respects number of MC events etc.).

I checked with the test workspace from the code I posted earlier,

I also checked that the variance of the simulated counts agrees with the average over ~5000 simulations so I think it does seem reasonable to set e = sqrt(y), at least for large counts. I haven't tested the validity of the Poisson distribution etc.

There are some small changes requested: perhaps some unnecessary normalisation done if the monte-carlo events are user specified?

I think there are some additional features I would like (support for workspaces with more than 1 spectrum, perhaps event workspaces as well) but these can be addressed in a separate PR as and when it becomes useful.

RichardWaiteSTFC · 2025-01-15T15:07:26Z

Framework/Algorithms/inc/MantidAlgorithms/CreateMonteCarloWorkspace.h

@@ -0,0 +1,44 @@
+// Mantid Repository : https://github.com/mantidproject/mantid
+//
+// Copyright &copy; 2024 ISIS Rutherford Appleton Laboratory UKRI,


Probably 2025 now!

RichardWaiteSTFC · 2025-01-15T15:09:49Z

Framework/Algorithms/CMakeLists.txt

    src/PolarizationCorrections/PolarizationEfficienciesWildes.cpp
+    src/PolarizationCorrections/PolarizerEfficiency.cpp


These changes look unrelated to the PR, do you need to rebase?

RichardWaiteSTFC · 2025-01-15T15:20:15Z

Framework/Algorithms/src/CreateMonteCarloWorkspace.cpp

+}
+
+Mantid::HistogramData::HistogramY
+CreateMonteCarloWorkspace::scaleInputToMatchMCEvents(const Mantid::HistogramData::HistogramY &yData,


Seems to me like the CDF gets normalized (such that summation equals 1) anyway so is this necessary? Is it not just possible to set the total MC events as equal to sum of counts in workspace or number passed by user? Seems wasteful to scale the data then sum it up again later to get the total number of MC events in computeNumberOfIterations?

You’re right that once you’ve normalized to get the CDF, you already only deal with ratios. The separate scaling step is mostly to ensure the raw Y data sums to the desired total MC events before you compute the CDF. Then computeNumberOfIterations just picks up that (optionally scaled) total.

Practically, you could skip calling scaleInputToMatchMCEvents if you’ve already decided your total MC events. In that case, you’d just ensure your iteration count equals the sum of Y (or user value), normalize to get the CDF, and sample from it. The scaling will still need to be used if no MC events are entered because there are times where the output looks weird if I don't. I am not exactly sure why.

RichardWaiteSTFC · 2025-01-15T15:21:59Z

Framework/Algorithms/test/CreateMonteCarloWorkspaceTest.h

+    TS_ASSERT_DELTA(totalScaledCounts, 20.0, 1e-6); // Verify the scaled sum matches targetMCEvents
+  }
+
+  MatrixWorkspace_sptr createInputWorkspace(int numBins, double initialValue) {


Looks like there are a few helper functions here, these are good but perhaps separate them from the tests at the bottom of the file? Makes it easier to read somehow (perhaps just personal preference)

RichardWaiteSTFC · 2025-01-15T15:28:33Z

docs/source/algorithms/CreateMonteCarloWorkspace-v1.rst

+    fig.show()
+
+
+.. image:: ../../../images/New.png


Would it be possible to rename this file to something more self-explanatory e.g. CreateMonteCarloWorkspace_spectrum.png?

…ning the random numbers generated.

+ Changed outputWS values

for more information, see https://pre-commit.ci

Co-authored-by: Jonathan Haigh <[email protected]>

…errors are calculated as the sqrt of the counts.

for more information, see https://pre-commit.ci

Does not work if MC events are much less than data points in original workspace

for more information, see https://pre-commit.ci

RichardWaiteSTFC

Thanks for the changes, in particular renaming the .png and removing the unnecessary additional normalisation when the user supplies the number of events.

I know this PR has been going a while, I'm happy to approve now as I think this algorithm works well and is important to continue with fitbenchmarking work. There are a few nit-picking comments, but as there are a few other features I'd like added separately after this release I think these can be addressed later! What do you think @jclarkeSTFC, @thomashampson, @jhaigh0 and @sf1919 ?

Framework/Algorithms/src/CreateMonteCarloWorkspace.cpp

RichardWaiteSTFC · 2025-01-16T15:32:53Z

Framework/Algorithms/src/CreateMonteCarloWorkspace.cpp

+ *  Determine how many iterations to use for MC sampling.
+ *  If userMCEvents > 0, use that directly; otherwise use the integral of the input data.
+ */
+int CreateMonteCarloWorkspace::computeNumberOfIterations(const Mantid::HistogramData::HistogramY &yData,


Stylistically I would prefer to have this if (userMCEvents > 0) in the exec and then this method would be called integrateYData or similar. But I'm happy to merge as is!

RichardWaiteSTFC · 2025-01-16T15:38:33Z

Oh just noticed the copyright is still 2024, is that correct or should it be 2025? Again v minor...

The base branch was changed.

thomashampson

I think we would benefit from a better image for the documentation to showcase the algorithm.

RichardWaiteSTFC · 2025-01-16T16:03:21Z

I think we would benefit from a better image for the documentation to showcase the algorithm.

Feel free to use this if you want

or this if you prefer smooth input data

The script to generate it is here

# import mantid algorithms, numpy and matplotlib
from mantid.simpleapi import *
import matplotlib.pyplot as plt
import numpy as np
from mantid.api import FunctionFactory

func = FunctionWrapper(FunctionFactory.createInitialized("name=BackToBackExponential,I=25000,A=0.06,B=0.015,X0=30000,S=30;name=FlatBackground,A0=50"))
# create input workspace
x = np.linspace(29650.0, 30500.0, 201)
y = func(x)
e = np.sqrt(y)
ws = CreateWorkspace(DataX=x, DataY=y, DataE=e, UnitX="TOF")
# call algorithm
ws_mc = CreateMonteCarloWorkspace(InputWorkspace=ws, Seed=0)

fig, axes = plt.subplots(subplot_kw={'projection': 'mantid'})
axes.plot(ws, label='input', wkspIndex=0)
axes.plot(ws_mc, label='CreateMonteCarloWorkspace output', wkspIndex=0, alpha=0.75)
legend = axes.legend(fontsize=8.0).set_draggable(True).legend
fig.show()

Co-authored-by: RichardWaiteSTFC <[email protected]>

Despiix added the ISIS Team: Core Issue and pull requests managed by the Core subteam at ISIS label Oct 31, 2024

Despiix added this to the Release 6.12 milestone Oct 31, 2024

Despiix self-assigned this Oct 31, 2024

Despiix force-pushed the Create_Monte_Carlo_Ws branch from 95efe61 to 190679a Compare November 6, 2024 15:44

Despiix closed this Nov 12, 2024

Despiix force-pushed the Create_Monte_Carlo_Ws branch from 4e478ed to 005e651 Compare November 12, 2024 17:02

Despiix reopened this Nov 12, 2024

Despiix marked this pull request as ready for review November 12, 2024 17:29

Despiix force-pushed the Create_Monte_Carlo_Ws branch from ffee360 to 685746e Compare November 13, 2024 13:34

jhaigh0 reviewed Nov 14, 2024

View reviewed changes

Framework/Algorithms/inc/MantidAlgorithms/CreateMonteCarloWorkspace.h Outdated Show resolved Hide resolved

RichardWaiteSTFC reviewed Nov 18, 2024

View reviewed changes

jhaigh0 requested changes Nov 19, 2024

View reviewed changes

Despiix force-pushed the Create_Monte_Carlo_Ws branch 2 times, most recently from c446801 to 491d7ce Compare November 21, 2024 13:50

peterfpeterson mentioned this pull request Nov 21, 2024

Move nexus reading code - ornl-next #38436

Merged

thomashampson assigned jhaigh0 and RichardWaiteSTFC and unassigned Despiix Nov 21, 2024

Despiix force-pushed the Create_Monte_Carlo_Ws branch from d876e9e to 5450e90 Compare November 29, 2024 14:06

RichardWaiteSTFC reviewed Dec 3, 2024

View reviewed changes

RichardWaiteSTFC requested changes Jan 15, 2025

View reviewed changes

Despiix added 6 commits January 16, 2025 10:54

Created Files and draft comments

779352d

Functions to calculate the variance and the mean of the vector contai…

9d4bcf5

…ning the random numbers generated.

Added the Probability Density Function

16b52d1

+ Changed outputWS values

Refactor: Clean up code and improve readability

2e60ffb

Version 1 using Cumulative Distribution Function

4698126

Clean up the codebase and break code into smaller functions.

859227a

Despiix and others added 16 commits January 16, 2025 10:56

Add progress bar

6c6b3c3

Add Algorithm Documentation

17f22a3

Release Notes

e8e2f4a

Add Description and remove redundant 0s

4c2303e

[pre-commit.ci] auto fixes from pre-commit.com hooks

f1a3d0e

for more information, see https://pre-commit.ci

Fix Progress Bar, Edit Release Note

c0ac7e4

Update Tests

1938647

[pre-commit.ci] auto fixes from pre-commit.com hooks

0f2f060

for more information, see https://pre-commit.ci

Update CreateMonteCarloWorkspace-v1.rst

0092517

Co-authored-by: Jonathan Haigh <[email protected]>

Draft: Add Input for MC events and ScaleInputToMatchMCEvents and the …

cfc4631

…errors are calculated as the sqrt of the counts.

[pre-commit.ci] auto fixes from pre-commit.com hooks

df966a8

for more information, see https://pre-commit.ci

Update Scaling Function

509efd9

Does not work if MC events are much less than data points in original workspace

Updated Tests

2a57e58

Remove redundant code

c01d5f8

[pre-commit.ci] auto fixes from pre-commit.com hooks

1fa20bc

for more information, see https://pre-commit.ci

Adjust Algorithm and tests based on review

f30f378

jclarkeSTFC modified the milestones: Release 6.12, Release 6.13 Jan 16, 2025

Despiix force-pushed the Create_Monte_Carlo_Ws branch from 5ef163a to f30f378 Compare January 16, 2025 14:56

RichardWaiteSTFC previously approved these changes Jan 16, 2025

View reviewed changes

jhaigh0 previously approved these changes Jan 16, 2025

View reviewed changes

jclarkeSTFC modified the milestones: Release 6.13, Release 6.12 Jan 16, 2025

jclarkeSTFC changed the base branch from main to release-next January 16, 2025 15:49

thomashampson requested changes Jan 16, 2025

View reviewed changes

Update Framework/Algorithms/src/CreateMonteCarloWorkspace.cpp

32e583d

Co-authored-by: RichardWaiteSTFC <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create Monte Carlo Workspace Algorithm #38341

Create Monte Carlo Workspace Algorithm #38341

Despiix commented Oct 31, 2024 •

edited

Loading

RichardWaiteSTFC left a comment •

edited

Loading

RichardWaiteSTFC Nov 18, 2024

thomashampson Nov 21, 2024

Despiix Nov 29, 2024

RichardWaiteSTFC Nov 18, 2024

Despiix Nov 18, 2024

jhaigh0 Nov 19, 2024

Despiix Nov 29, 2024

Despiix commented Nov 18, 2024 •

edited

Loading

jhaigh0 Nov 19, 2024

jhaigh0 Nov 19, 2024

RichardWaiteSTFC Nov 20, 2024

RichardWaiteSTFC Nov 21, 2024

RichardWaiteSTFC Dec 3, 2024

sf1919 commented Jan 14, 2025

Despiix commented Jan 14, 2025

RichardWaiteSTFC left a comment

RichardWaiteSTFC Jan 15, 2025

RichardWaiteSTFC Jan 15, 2025

RichardWaiteSTFC Jan 15, 2025 •

edited

Loading

Despiix Jan 16, 2025

RichardWaiteSTFC Jan 15, 2025

RichardWaiteSTFC Jan 15, 2025

RichardWaiteSTFC left a comment

RichardWaiteSTFC Jan 16, 2025

RichardWaiteSTFC commented Jan 16, 2025 •

edited

Loading

thomashampson left a comment

RichardWaiteSTFC commented Jan 16, 2025 •

edited

Loading


		//----------------------------------------------------------------------------------------------

		Mantid::HistogramData::HistogramY CreateMonteCarloWorkspace::fillHistogramWithRandomData(const std::vector<double> &cdf,

		src/PolarizationCorrections/PolarizationEfficienciesWildes.cpp
		src/PolarizationCorrections/PolarizerEfficiency.cpp

Create Monte Carlo Workspace Algorithm #38341

Are you sure you want to change the base?

Create Monte Carlo Workspace Algorithm #38341

Conversation

Despiix commented Oct 31, 2024 • edited Loading

Description of work

Summary of work

Further detail of work

To test:

Reviewer

Code Review

Functional Tests

Gatekeeper

RichardWaiteSTFC left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Despiix commented Nov 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sf1919 commented Jan 14, 2025

Despiix commented Jan 14, 2025

RichardWaiteSTFC left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RichardWaiteSTFC Jan 15, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RichardWaiteSTFC left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RichardWaiteSTFC commented Jan 16, 2025 • edited Loading

thomashampson left a comment

Choose a reason for hiding this comment

RichardWaiteSTFC commented Jan 16, 2025 • edited Loading

Despiix commented Oct 31, 2024 •

edited

Loading

RichardWaiteSTFC left a comment •

edited

Loading

Despiix commented Nov 18, 2024 •

edited

Loading

RichardWaiteSTFC Jan 15, 2025 •

edited

Loading

RichardWaiteSTFC commented Jan 16, 2025 •

edited

Loading

RichardWaiteSTFC commented Jan 16, 2025 •

edited

Loading