-
Notifications
You must be signed in to change notification settings - Fork 12
4RDM implementation #105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
4RDM implementation #105
Conversation
I found a problem, when is fixed I will reopen the pull-request |
Hi, thanks for the hard work! I think having a flag determine which RDMs to compute is a good way to do it. There's some dynamic programming you can do too, if you want... basically you'd have an array of numbers/functions [X...] corresponding to which RDM you want, and at each level in the loop (for p..., for q...,etc.) you'd iterate through the list and call some corresponding update_rdm_X() to update the state of the RDMs you're currently computing. Essentially, the loop you have now would be a static loop, but you'd be doing dynamically-chosen computations at each level of the loop. You'd likely need some kind of class and method to store the intermediates, which would then be gathered into a tuple and returned from the top level function.
Doing it that way probably doesn't give us much, and the operation is slow anyway, so I'm happy to just use a flag and if/else. On second thought, a flag is way simpler and probably faster. |
I fix some problems I had with the mixed spin blocks. Now it is giving the correct result. I made test with BH3, H8 and H10 in compute Canada and all of them are working alright. Evaluating the 4RDM increases the computation time significantly. To be specific, while evaluating the 1-,2-,3RDM for H8 can take 5 seconds, the computation time including the 4RDM can take 8 minutes. The most expensive part of the code both in performance and memory usage is the symmetrization of the 4RDM. This object has 12 symmetries for the upper indices and 12 symmetries for the lower indices (plus the antisymmetries), so I has a total of 24 symmetrization operations done with numpy einsum. When doing those symmetrizations with einsum, it seems to be pretty slow, however I think einsum is the faster and simpler way to do it. I am testing right now if adding an 'optimize' argument to the einsum helps with the performance Besides from that issue, everything seems to be working correctly |
I tested the numpy.einsum operations with different 'optimize' arguments and it does not help. I've been reading and it seems that for an array as big as the 4RDM might be better to use numpy.transpose instead of numpy.einsum, as the transpose does not create a copy of the array but just a view of it in the operation. I don't know if that will then affect performance, I am going to check that and then post in here what I found to be the best approach. |
I did some test using numpy.tranpose instead of numpy.einsum. It is significantly slower and additionally for some reason it also has some memory problems. Normally numpy transpose returns only a view of the transpose array, however when the array is too big (as in the 4RDM) it sometimes see the need to create a copy. When a copy is created this functions results in a lot of cache misses strongly decreasing performance. For that reason, I think that using numpy einsum would be the best performance we can get from the python side, at least without adding any external package. So, I'll leave the code as it is right now. |
@DanielCalero1 when you have a seniority-zero RDM, do you have the option of storing only the unique and/or non-zero elements in some sensible way? It obviously reduces the memory (by a square root) and the operators (also by a square root, I think). |
Well, in the C++ side I am compute directly the seniority-zero elements of the 4RDM, that I store in different vectors (e.g. d1 = The problem is that d1,d2,.. they have different structures, some of them are matrices, other 3-index tensors, other 4-index tensors. So yes, I can store only the non-zero elements, but they are in principle in different objects. Additionally, even if I do a tensor product with identity to leave all the objects with the same number of indices and add them up, I don't know if I will end up with an object with the appropriate structure to use for other computations. (at least on my side, the CT commutators force the 4 RDM to have eight indices, so I wouldn't know how to use an object with just the non-zero elements of the 4RDM). I would need to think more about it. The other idea, as I mentioned you the other day, would be to compute the spin-traced RDM directly from the d1,d2,d3,..etc vectors. I know that, since the 4RDM is an 8 index tensor where each index runs over 2*n_orbital values, and the spin-traced 4RDM is an 8 index tensor where each index runds over n_orbtial values, then the spin traced 4RDM is |
Perfect. If the DM is seniority-zero I guess we need to either return only the special blocks (but that would be a special function with its own API) or return the 4DM as a sparse matrix/object. It's worth thinking about but not, for now, getting bogged down in. |
Yep, I think this is fine for now, I purposely took the approach of only storing the unique spin-blocks of the RDMs so that other users can expand it out into a (giant) spin-resolved N-RDM whenever they like. |
Sounds good. I will let @msricher or @DanielCalero1 do the final work of merging. |
It looks fine to merge once the conflicts are fixed. |
I can fix the conflicts. |
I fixed the conflicts. The test is failing cause is not able to allocate enough memory for the test. What are your suggestion in this case @msricher. I tested in compute Canada several times and it worked, however I don't know how to do it here if there is not enough memory. |
Okay. Can you add a Pytest mark to these tests (https://gist.github.com/devops-school/c0b260e7b845dff98556511071d0bf7c#8-marking-tests), maybe a "bigmem" mark, which is not enabled by default? |
Great! I add a new file to configure the mark "bigmem" such that the test for the function is normally skipped unless specified ("pytest --bigmem").
Then to execute the test would be something like "make test PYTEST_ARGS="--bigmem" " The other options is just create a new rule:
Then to execute the test would be just "make test-bigmem" Which one would you prefer @msricher ? |
Thanks! I really don't have a preference so long as it works with Github Actions. The Makefile dispatchers are just for really quick convenience. |
Great! I update then the test to check for the compute_rdm_1234 just in case it is specified (make test-bigmem). I just tested again in Compute Canada and it is working fine. Additionally, the tests here seems to be working. I think the PR is ready to merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@msricher if you think it is ready to merge then either you or @DanielCalero1 can do it.
This PR addresses issue #85, it includes the following implementations:
Besides that, I think I should add an option in the function spinize_rdms_1234 to allow the user select if they only want to compute the 3RDM or the 4RDM.
What would you think could be the best way of doing that? I thought of adding a flag to the function. This flag can have three values: "3rdm", "4rdm" or "34rdm". Depending on the choice, the 3rdm or/and the 4RDM will be computed.
Do you have another suggestion?