Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insane memory consumption with single crystal powder model simulation #193

Open
artsiommiksiuk opened this issue Aug 8, 2024 · 15 comments

Comments

@artsiommiksiuk
Copy link

I'm doing a lot of simulations, and have heavy issues with simulation time and memory. I can solve the time issue (put everything into joblib and run simulations in parallel), but this lib uses insane amount of memory even for single crystal simulations.

import numpy as np
import xrayutilities as xu
# Required to make the simulation faster (it's stops overusing multiprocessing and long waits for processes joins) 
xu.config.NTHREADS = 1

# from https://www.crystallography.net/cod/1000021.html
cryst = xu.materials.Crystal.fromCIF("1000021.cif")

powder = xu.simpack.Powder(cryst, 1, crystallite_size_gauss=1e-7)
pm = xu.simpack.PowderModel(
    powder, 
    wl=0.1647,
    tt_cutoff=13.5,
    fpsettings={ 
        "axial": { 
            "AxDiv": None,
            "slit_length_source": 0.000001, 
            "slit_length_target": 0.0000001, 
            "length_sample": 0.0000001,
            "angD_deg": 0.000001
        }, 
        "global": {
            "equatorial_divergence_deg": 0,
            "diffractometer_radius": 1500
        },
        "emission": {
            "emiss_gauss_widths": (1.06886840e+00*3e-14),
            "emiss_lor_widths": (7.13941570e-01*0.5e-14)
        },
    }
)

try:
    x = np.linspace(0, 13.5, 10000)
    y = pm.simulate(x, mode="local")

    print(x, y)
finally:
    pm.close()
    pm = None
    powder = None
    cryst = None

Generally there is a correlation on how large cif file is (but not always), and lattice size. I didn't find direct correlation here as well, memory issues might be for bigger or smaller crystals.

So, my main problem is with unpredictable amount of resources per crystal simulation, some using only few hundred megabytes, another one using 32 Gb of RAM + 32 Gb of swap, which are quite insane amounts at this points.

I had a chance to try very exotic solution to check if I can solve it with very big swap, and attach 1Tb SSD and assigned it as swap. In this end some of the simulations were using 80+ GB of swap + 32Gb of ram, but they couldn't complete in 10 minutes task.

I have a feeling that some configurations produces infinite or close very large numbers solutions which all trying to convolve, but idk.

Another side issue with it, is that running this single simulation in Jupyter notebook (in vscode) leaks memory heavily. Running this single simulation leaves ~28 Gb of unreleased memory. I was suspecting that it might be that notebook itself holds some references, so I've tried to none all of the variables connected to xu, but it didn't help. Only reloading of the kernel frees the mem.

Tested on:

OS: Mac OS 14.5 - M3, Ubuntu 24.04 - i9 13900K
python: 3.12.2
xrayutilities: 1.7.7 - 1.7.8

@dkriegner
Copy link
Owner

The problem is you are not telling the powdermodel initially that you need your simulation only up to 13.5 degree. It by default prepares for calculations up to 180 deg and together with your large unit cell this causes a huge number of peaks. I assume that this is the origin of the problem you are facing.

can you try to add a reasonable value (maybe a bit bigger than your 13.5) for tt_cutoff (optional argument to PowderModel). Since you are the second person having troubles with this in a short time I think I need to document this better.

@artsiommiksiuk
Copy link
Author

@dkriegner, but it is set in the code in PowderModel + x values are in range from 0 to tt_cutoff.

Or It should be provided somewhere else?

pm = xu.simpack.PowderModel(
    ...
    wl=0.1647,
    tt_cutoff=13.5,
    fpsettings={ 
    ...

@artsiommiksiuk
Copy link
Author

And I verified that tt_cutoff is set in an underlying PowderDiffraction as well to the correct value.

@dkriegner
Copy link
Owner

sorry. i missed that.

Can you try to reduce the number of custom settings for the fundamental parameter model (of course keep your wavelength) and check if this has any impact? Did you check if giving zero divergence is handled well by the underlying convolvers?

@artsiommiksiuk
Copy link
Author

With all fpsettings section removed nothing changed. (Maybe 1-2Gb less used), but still 32 + about 15 Gb used.

@artsiommiksiuk
Copy link
Author

Zero axial divergence also doesn't change anything.

@dkriegner
Copy link
Owner

ok, i look into it. the PowderDiffraction code allocates a lot of buffers and does a lot of caching which in your case seems to be too much. the buffering is implemented to speed up subsequent calculations (e.g. during a fitting procedure), but in a scenario where different structures which have nothing in common are calculated this is likely not helpful. I believe it here is also in particular extreme due to the large number of peaks and high point density.

@artsiommiksiuk
Copy link
Author

Any workaround I can make? Is the fix / config for this would be hard to add?

@dkriegner
Copy link
Owner

if the buffers generation is indeed the problem its rooted very deep in the code. The original author of this code section should likely be consulted. It certainly all scales with the number of points and angular range you request in the output. Internally in the calculation an even higher point density is used. so one question is of course if you really need the 10000 points in the output. Other then reducing these values I am not sure if there is some easy fix.

@artsiommiksiuk
Copy link
Author

artsiommiksiuk commented Aug 8, 2024

Well, 10000 is required because of the very sharp peaks we are getting. Having less will just loose all the peaks shape information, which is not a lot anyway even with 10000 points (only 100 at best falls into single peak).

Okay, good to know at least that there isn't anything I'm missing. It would be still very nice to have this resolved somehow or have a workaround, tell me if something comes up in your head.

@dkriegner
Copy link
Owner

I was thinking a bit about this problem and I acknowledge that how I have included the FP_profile class by @mendenmh is oriented towards performance for many recalculations of the powder pattern without at all thinking about the memory use. For materials with smaller unit cell and commonly used CuKalpha wavelength this is also not at all an issue. If one, however, has much shorter wavelength and larger unit cell one runs into the problems you are observing.

I can imagine that one should provide a "low memory" variant of PowderDiffraction. which would not intialize a FP_profile class for each powder line already during initialization but generate them dynamically during the calculation. (One for each available thread). This must mean somehow slower calculation but should bring down the memory use dramatically.

I am currently not able to look into this for time reasons. It is likely not difficult but one needs to make sure to keep all the logic of finding the right parameters for all parts to work together. Are you willing and able to work on the code changes for this? I can provide some guidance on what to look at.

@mendenmh
Copy link

mendenmh commented Aug 12, 2024 via email

@artsiommiksiuk
Copy link
Author

@dkriegner no promises for now. I understood most of the code, and I'll be able to that, but idk if I'll have a priority with this in our project. I think in the course of the next 2 months this will be more clear.

As a workaround for my issue I handpicked samples with very low CPU time required and I was able to use and have an output for them.

Thanks for the hints and feedback! I'll get in touch if I'm going to solve this.

@mendenmh
Copy link

mendenmh commented Aug 13, 2024 via email

@dkriegner
Copy link
Owner

thanks to both of you. I definitely keep this issue open.
If I get to make changes in this part of the code (unlikely at the moment) I will think of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants