Skip to content

4RDM implementation #105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 32 commits into from
Apr 10, 2025
Merged

4RDM implementation #105

merged 32 commits into from
Apr 10, 2025

Conversation

DanielCalero1
Copy link
Collaborator

@DanielCalero1 DanielCalero1 commented Mar 25, 2025

This PR addresses issue #85, it includes the following implementations:

  1. Add to compute_rdms_1234() in rdm.cpp the code to compute the elements of the 4RDM
  2. Update the python bindings rdm.cpp as well as the bindings in binding.cpp and pyci.h
  3. Update the function spinize_rdms_1234 in utility.py to receive the vector elements of the 4RDM, create the general 4RDM matrix and symmetrize it.
  4. Update the function spin_free_rdms() in utility.py to compute the spin-free 4RDM
  5. Add new tests on test_routines.py to check the correctness of the 4RDM.

Besides that, I think I should add an option in the function spinize_rdms_1234 to allow the user select if they only want to compute the 3RDM or the 4RDM.
What would you think could be the best way of doing that? I thought of adding a flag to the function. This flag can have three values: "3rdm", "4rdm" or "34rdm". Depending on the choice, the 3rdm or/and the 4RDM will be computed.

Do you have another suggestion?

@DanielCalero1
Copy link
Collaborator Author

DanielCalero1 commented Mar 25, 2025

I found a problem, when is fixed I will reopen the pull-request

@msricher
Copy link
Collaborator

msricher commented Mar 25, 2025

Hi, thanks for the hard work! I think having a flag determine which RDMs to compute is a good way to do it.

There's some dynamic programming you can do too, if you want... basically you'd have an array of numbers/functions [X...] corresponding to which RDM you want, and at each level in the loop (for p..., for q...,etc.) you'd iterate through the list and call some corresponding update_rdm_X() to update the state of the RDMs you're currently computing. Essentially, the loop you have now would be a static loop, but you'd be doing dynamically-chosen computations at each level of the loop. You'd likely need some kind of class and method to store the intermediates, which would then be gathered into a tuple and returned from the top level function.

Func

{Determine which RDMs to compute, e.g. 1 and 4, make a list of some
callable objects containing corresponding intermediate RDM state,
e.g. comps = [RDMState1(), RDMState4()].}

For p...

    For comp in comps:
        comp.update_p(p)

    For q...

        For comp in comps:
            comp.update_pq(p, q)

        ....

    Endfor
Endfor

return [comp.rdm for comp in comps]

Endfunc

Doing it that way probably doesn't give us much, and the operation is slow anyway, so I'm happy to just use a flag and if/else.

On second thought, a flag is way simpler and probably faster.

@DanielCalero1
Copy link
Collaborator Author

I fix some problems I had with the mixed spin blocks. Now it is giving the correct result. I made test with BH3, H8 and H10 in compute Canada and all of them are working alright.
I also add the option to compute only the 3RDM or both the 3- and 4RDM. I did not add the option to compute only the 4RDM because it will slightly complicate the assignation of elements (I would have to add some extra conditionals or make the function longer), so since the 3RDM does not occupy to much memory compared to the 4RDM I just add the option to compute only the 3RDM or both of them.

Evaluating the 4RDM increases the computation time significantly. To be specific, while evaluating the 1-,2-,3RDM for H8 can take 5 seconds, the computation time including the 4RDM can take 8 minutes.
Additionally, as expected, since the 4RDM is a 8-index tensor, it requires a lot of memory. For example, for H8 and BH3 in STO-6g it requires ~70GB, for H10 in the same basis it requires ~250GB.

The most expensive part of the code both in performance and memory usage is the symmetrization of the 4RDM. This object has 12 symmetries for the upper indices and 12 symmetries for the lower indices (plus the antisymmetries), so I has a total of 24 symmetrization operations done with numpy einsum. When doing those symmetrizations with einsum, it seems to be pretty slow, however I think einsum is the faster and simpler way to do it. I am testing right now if adding an 'optimize' argument to the einsum helps with the performance

Besides from that issue, everything seems to be working correctly

@DanielCalero1 DanielCalero1 reopened this Apr 2, 2025
@DanielCalero1
Copy link
Collaborator Author

I tested the numpy.einsum operations with different 'optimize' arguments and it does not help. I've been reading and it seems that for an array as big as the 4RDM might be better to use numpy.transpose instead of numpy.einsum, as the transpose does not create a copy of the array but just a view of it in the operation. I don't know if that will then affect performance, I am going to check that and then post in here what I found to be the best approach.

@DanielCalero1
Copy link
Collaborator Author

I did some test using numpy.tranpose instead of numpy.einsum. It is significantly slower and additionally for some reason it also has some memory problems. Normally numpy transpose returns only a view of the transpose array, however when the array is too big (as in the 4RDM) it sometimes see the need to create a copy. When a copy is created this functions results in a lot of cache misses strongly decreasing performance.

For that reason, I think that using numpy einsum would be the best performance we can get from the python side, at least without adding any external package.

So, I'll leave the code as it is right now.

@PaulWAyers
Copy link
Member

PaulWAyers commented Apr 3, 2025

@DanielCalero1 when you have a seniority-zero RDM, do you have the option of storing only the unique and/or non-zero elements in some sensible way? It obviously reduces the memory (by a square root) and the operators (also by a square root, I think).

@DanielCalero1
Copy link
Collaborator Author

Well, in the C++ side I am compute directly the seniority-zero elements of the 4RDM, that I store in different vectors (e.g. d1 = $\left&lt;pp|qq\right&gt;$, d2 = $ \left<pq|pq\right>$, d3= $\left&lt;pqr|pqr\right&gt;$, etc ). Those vectors are pretty fast and they do not occupy that much memory. I then use those vectors to create the big 4RDM tensor (in the Python side).

The problem is that d1,d2,.. they have different structures, some of them are matrices, other 3-index tensors, other 4-index tensors.

So yes, I can store only the non-zero elements, but they are in principle in different objects. Additionally, even if I do a tensor product with identity to leave all the objects with the same number of indices and add them up, I don't know if I will end up with an object with the appropriate structure to use for other computations. (at least on my side, the CT commutators force the 4 RDM to have eight indices, so I wouldn't know how to use an object with just the non-zero elements of the 4RDM).

I would need to think more about it.

The other idea, as I mentioned you the other day, would be to compute the spin-traced RDM directly from the d1,d2,d3,..etc vectors. I know that, since the 4RDM is an 8 index tensor where each index runs over 2*n_orbital values, and the spin-traced 4RDM is an 8 index tensor where each index runds over n_orbtial values, then the spin traced 4RDM is $2^8$ times smaller. However, I am not really sure on how to do this.

@PaulWAyers
Copy link
Member

Perfect. If the DM is seniority-zero I guess we need to either return only the special blocks (but that would be a special function with its own API) or return the 4DM as a sparse matrix/object. It's worth thinking about but not, for now, getting bogged down in.

@msricher
Copy link
Collaborator

msricher commented Apr 7, 2025

Yep, I think this is fine for now, I purposely took the approach of only storing the unique spin-blocks of the RDMs so that other users can expand it out into a (giant) spin-resolved N-RDM whenever they like.

@PaulWAyers
Copy link
Member

Sounds good. I will let @msricher or @DanielCalero1 do the final work of merging.

@msricher
Copy link
Collaborator

msricher commented Apr 7, 2025

It looks fine to merge once the conflicts are fixed.

@DanielCalero1
Copy link
Collaborator Author

I can fix the conflicts.

@DanielCalero1
Copy link
Collaborator Author

I fixed the conflicts. The test is failing cause is not able to allocate enough memory for the test. What are your suggestion in this case @msricher. I tested in compute Canada several times and it worked, however I don't know how to do it here if there is not enough memory.

@msricher
Copy link
Collaborator

msricher commented Apr 8, 2025

Okay. Can you add a Pytest mark to these tests (https://gist.github.com/devops-school/c0b260e7b845dff98556511071d0bf7c#8-marking-tests), maybe a "bigmem" mark, which is not enabled by default?

@msricher
Copy link
Collaborator

msricher commented Apr 8, 2025

https://docs.pytest.org/en/7.1.x/example/simple.html#control-skipping-of-tests-according-to-command-line-option

@DanielCalero1
Copy link
Collaborator Author

Great! I add a new file to configure the mark "bigmem" such that the test for the function is normally skipped unless specified ("pytest --bigmem").
I think on also modifying the Makefile such that the test with the "bigmem" arg can be call from it. I have two options, one is adding an extra argument to the test rule in the Make file

.PHONY: test test: @set -e; $(PYTHON) -m pytest -sv ./pyci $(PYTEST_ARGS)

Then to execute the test would be something like "make test PYTEST_ARGS="--bigmem" "

The other options is just create a new rule:

.PHONY: test-bigmem test-bigmem: @set -e; $(PYTHON) -m pytest -sv ./pyci --bigmem

Then to execute the test would be just "make test-bigmem"

Which one would you prefer @msricher ?

@msricher
Copy link
Collaborator

msricher commented Apr 9, 2025

Thanks! I really don't have a preference so long as it works with Github Actions. The Makefile dispatchers are just for really quick convenience.

@DanielCalero1
Copy link
Collaborator Author

DanielCalero1 commented Apr 10, 2025

Great! I update then the test to check for the compute_rdm_1234 just in case it is specified (make test-bigmem). I just tested again in Compute Canada and it is working fine. Additionally, the tests here seems to be working. I think the PR is ready to merge.

Copy link
Member

@PaulWAyers PaulWAyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msricher if you think it is ready to merge then either you or @DanielCalero1 can do it.

@msricher msricher merged commit b85fc93 into master Apr 10, 2025
1 check passed
@DanielCalero1 DanielCalero1 mentioned this pull request Apr 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants