-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cocycles feature #23
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a first quick look at ripser_bindings.cpp
Even if longer, I would slightly prefer |
|
Are you asking if it should be positional or keyword-based with a default? I would strongly support keyword-based with a default of |
a0828ca
to
ff7f925
Compare
|
* Enable feature in C++ backend * Cannot enable the computation in Python right now * A lot of changes happened in the bindings, but in reality is not much, I forgot to run previously clang-format Signed-off-by: julian <[email protected]>
Signed-off-by: julian <[email protected]>
Co-authored-by: Umberto Lupo <[email protected]>
Signed-off-by: julian <[email protected]>
Signed-off-by: julian <[email protected]>
Signed-off-by: julian <[email protected]>
Signed-off-by: julian <[email protected]>
Co-authored-by: Umberto Lupo <[email protected]>
Signed-off-by: julian <[email protected]>
Signed-off-by: julian <[email protected]>
This allows that finite cocycles are sorted with the corresponding finite barcode Signed-off-by: julian <[email protected]>
Signed-off-by: julian <[email protected]>
Signed-off-by: julian <[email protected]>
Remove check for first insertion for pivot_to_death_idx hash table Signed-off-by: julian <[email protected]>
Signed-off-by: julian <[email protected]>
Signed-off-by: julian <[email protected]>
Signed-off-by: julian <[email protected]>
72aff5e
to
2ab0b4b
Compare
@ulupo updated ! |
using namespace pybind11::literals; | ||
m.doc() = "Ripser python interface"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this indentation intentionally 2 spaces instead of 4?
.def_readwrite("flag_persistence_generators_by_dim", | ||
&ripserResults::flag_persistence_generators); | ||
|
||
m.def( | ||
"rips_dm", | ||
[](py::array_t<value_t>& D, py::array_t<value_t>& diag, int modulus, | ||
int dim_max, float threshold, int num_threads, | ||
bool return_generators) { | ||
bool return_generators, const bool do_cocycles) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation: 3 spaces?
Hey everyone! I'm pretty sure I'll end up being able to provide a nice notebook that showcases the cocycles feature. My first attempt is in the attached ipynb, where the resulting circular coordinate ends up not being great (it should visualize with all the colors of the rainbow...) |
Thanks a lot @michiexile! Do you think there is a bug in the cocycles implementation in this PR, or in the necessary post-processing to produce the graphics? |
I honestly don't know, and I haven't had time to dig into it. The post-processing, we've been using relatively successfully with |
I see. Since |
should have been signed instead of unsigned Signed-off-by: julian <[email protected]>
Hi @michiexile, Thank you so much for helping us with this. I looked at your notebook (thank you for sharing it with us ;)), at I compared the results with the cocycles obtained with Regarding:
If I am not mistaken, the permutations in Again thank you so much to help us, please let me know if I can do anything here ;) |
Great! Thank you!!
It's not clear to me what you might want to do with an explicit cocycle that does not involve the explicit boundary matrix in one form or another - in my example code, we use the distance matrix to create the relevant boundary matrix. Maybe add them (as many of "distance matrix" and "boundary matrix" as you can provide) as optional outputs? |
@MonkeyBreaker thanks for chasing the issue and fixing it! @michiexile returning the distance matrix (as |
Hey everyone, Sorry I was unexpectedly busy this last week ... About returning the computed distance Matrix, we did it before and we remove it because of memory and performance implications if I remember right. What are your thoughts on this (@ulupo and @michiexile), if you think it makes sense to always return it when computing cocycles makes sense, then sure, lets do it. Also, another point, @michiexile would you have an idea of an example of cocycles related to an essential pair (non-finite) ? Is that even possible ? Thank you again for the help you provide us with :) ! |
About returning the computed distance Matrix, we did it before and we remove it because of memory and performance implications if I remember right.
I don't have a really strong opinion on the matter, I don't think that adding an additional parameter for returning the computed the distance matrix is a good idea, make the interface more heavy. But, I am not sure neither if we should always return when cocycles are enable the distance matrix (prevent memory duplication if the user provided already a distance matrix to ripser interface).
Shouldn't it be _really_ easy to detect whether the user is providing a distance matrix or not, and only produce the distance matrix when the user asks for cocycles _and_ has not provided a distance matrix themselves?
What are your thoughts on this ***@***.*** <https://github.com/ulupo> and @michiexile <https://github.com/michiexile>), if you think it makes sense to always return it when computing cocycles makes sense, then sure, lets do it.
I have a hard time thinking of a situation where I would want the cocycles explicitly, but have no need for the distance matrix (or more specifically, no need for the VR-complex coboundary map). If you're willing to build the coboundary map for me, I can probably survive without a distance matrix pretty often.
Also, another point, @michiexile <https://github.com/michiexile> would you have an idea of an example of cocycles related to an essential pair (non-finite) ? Is that even possible ?
Ummmm... I would pretty much _always_ work with essential pairs. Because non-essential pairs no longer produce cocycles at the radius you are looking at (because they have both a birth and a death time smaller than the "current" time)
So the typical way I would be computing with circular coordinates goes something like this:
1. (optional) Compute a barcode to infinity, or to some high parameter value. Look for a parameter value during which a feature I am interested in studying will be essential.
2. Compute cocycle representatives at this parameter value, chosen specifically to make some of the pairs essential.
3. Compute the coboundary matrix at this parameter value.
4. Use cocycle and coboundary to compute circular coordinate.
Key to this workflow is being able to pick a parameter value to force some cocycles to become essential with respect to the (truncated) filtration we are looking at. Because if they are not, they will not actually be cocycles, and the computation in step 4 will fail.
I hope this helps describe the usecase I'm interested in?
All the best,
--
Mikael Vejdemo-Johansson
***@***.*** ***@***.***>
|
Sorry if I made it sound like it could be difficult. My concern here is if user provides an already computed distance matrix and we return it, that we don't duplicate memory but just forward (move) the memory. Per example, with a dataset of 50k points, it took on my machine like 10GB, if we are not careful, it could take 20GB when returning the distance matrix, because Python copies to return it. I will try this week to prepare something here and check if we don't have this issue of memory duplication.
I think the coboundary map could be interesting, but would require some heavy changes in c++ part. I would for the moment avoid this but if this might be useful for additional use cases, then, it would be part of another PR.
Right, sorry I was confused, I don't have a background in mathematics, without @ulupo I am lost sometimes.
Definitely, I need to prepare a reproducible test where I always use the same input dataset and compute twice cocycles with a different threshold on each run. But, as Umberto already discussed with me, we would need to have access to the coboundary matrix, that, currently is only present in the c++ part... Anyway, don't worry to much on this, I will still think about it. Also, concerning your notebook, do you need additional features from our side ? Best, |
I'm on vacation now and can only reply very briefly but this is just to say that returning the coboundary matrix may not be a terrible idea if it saves the user the extra effort. @MonkeyBreaker I already coded this for the unit tests that should be included in this PR and I included it in a notebook I shared with you a while back. @michiexile is there something the distance matrix would provide beyond the ability to build the coboundary matrix yourself? Sorry if I'm missing something due to the rush. |
Hey,
I think that returning the coboundary matrix would be a good idea, but not for this PR (or if we compute it in Python). The reason is that one of the tricks user in Otherwise, returning the distance matrix if computed, only for when returning cocycles, I am fine with this. I think we have 3 cases:
|
@MonkeyBreaker, thanks for the comments! What I meant to say initially is that we could compute the coboundary matrix in pure Python, as I already partly did that work for those unit tests I mentioned. But now I think that, while those unit tests should be there, it would be a bit ambitious to construct the coboundary matrix in full generality. I only covered small homology dimensions in those tests, and only mod 2 coefficients. I could do it in full generality by including further dependencies (such as PHAT or GUDHI), but we probably shouldn't go there just yet. So I also now agree that we should return the distance matrix only if the user wants cocycles. We should always return the distance matrix just before it is passed to the C++ backend. So, in the case with weights, the distance matrix after reweighting. |
I can't think of anything the distance matrix brings us that the coboundary matrix doesn't (and if the distance matrix is _really_ important, the user can and will probably compute it themselves for those other applications)
What I _do_ however need to be able to do is to carve out the coboundary matrix _at a specific threshold_. It is not enough for me to get the coboundary matrix at the end of the computation - for the circular coordinate computations to work, I need to be able to get the coboundary during a point where the cocycles I want to study are actually alive.
All the best,
--
Mikael Vejdemo-Johansson
***@***.*** ***@***.***>
… On 13Sep, 2022, at 02:15, Umberto Lupo ***@***.***> wrote:
I'm on vacation now and can only reply very briefly but this is just to say that returning the coboundary matrix may not be a terrible idea if it saves the user the extra effort. @MonkeyBreaker <https://github.com/MonkeyBreaker> I already coded this for the unit tests that should be included in this PR and I included it in a notebook I shared with you a while back.
@michiexile <https://github.com/michiexile> is there something the distance matrix would provide beyond the ability to build the coboundary matrix yourself?
Sorry if I'm missing something due to the rush.
—
Reply to this email directly, view it on GitHub <#23 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAH6LWF7ZGT7IGX6U3JYGLV6ALYDANCNFSM5CHSXXNQ>.
You are receiving this because you were mentioned.
|
This PR allows users to retrieve cocycles from the Persistent Homology computation.
cocycles
with argumentdo_cocycles
do_cocycles
?cocycles
with maybe some interaction with a plot for a nice visualization.