Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

selection of sparse canonical variates (w_1, ..., w_N) #29

Open
yugeji opened this issue Oct 26, 2022 · 5 comments
Open

selection of sparse canonical variates (w_1, ..., w_N) #29

yugeji opened this issue Oct 26, 2022 · 5 comments
Labels
Algorithm question Questions about the algorithm and mathematical/statistical modeling

Comments

@yugeji
Copy link

yugeji commented Oct 26, 2022

Hello,

The selection of sparse canonical variates in the Methods section of your paper details that $w_i$ and $w_j$ are fit according to the multi-factor PMD algorithm in Witten 2009. The difference in DIALOGUE, however, is that an additional summation term $\sum_{i \lt j}$ is added such that $w_i$ and $w_j$ are optimized over all pairwise combinations of cell types.

We are confused about how $w_1, ..., w_N$ - specifically, a single $w_i$ for each cell type - is selected when there are multiple $w_i$ output by MultiCCA per cell type (due to the pairwise combinations).

For example, given three cell types 1, 2, and 3, DIALOGUE computes $w$ such that

$maximize \sum_{i \lt j} w_i^TX_i^TX_jw_j$ -> $\max{w_1^TX_1^TX_2w_2} + \max{w_2^TX_2^TX_3w_3} + \max{w_1^TX_1^TX_3w_3}$

In this case we end up computing two $w_1$, two $w_2$, and two $w_3$ because each $\max$ is calculated independently with MultiCCA. How do you select which of these $w$ to select for $w_1, ..., w_N$?

Alternatively, does MultiCCA actually optimize for $maximize \sum_{i \lt j} w_i^TX_i^TX_jw_j$ including the summation?

@livnatje
Copy link
Owner

livnatje commented Mar 1, 2023

Hi @yugeji thank you for your interest in our method and apologies for the late response.

You are asking an important question. The optimization is done iteratively (using UpdateW), such that the Ws are selected considering all the pairwise combinations and the total value of the objective function (summing over all the pairs).

You can see the documentation of MultiCCA here https://rdrr.io/cran/PMA/man/MultiCCA.html

@livnatje livnatje added the Algorithm question Questions about the algorithm and mathematical/statistical modeling label Mar 1, 2023
@yugeji
Copy link
Author

yugeji commented Mar 1, 2023

Hi @livnatje ,

Thank you very much for your response!! (And no worries.)

In the meantime, we've been investigating the behavior of the update_w function as implemented in the link you posted above, and we'd like to note that due to the specific pairwise iteration of the maximization implementation, the order in which datasets (cell types in your case) are passed matters for the final latent factor (w) calculation, whereas the theoretical implementation proposed in Witten 2009, we believe, should not.

But, this is just an addendum point. I'll leave the issue open in case of additional comment on the topic but otherwise please feel free to close, and thank you very much for your response!

@livnatje
Copy link
Owner

livnatje commented Mar 1, 2023

Oh, interesting and thanks for surfacing that! We will look into that too and examine sensitivity. DIALOGUE outputs have been pretty robust (as CCA is convex and the HLM gives unique solutions).

Did you find substantial variation?

@yugeji
Copy link
Author

yugeji commented Mar 1, 2023

We have yet to investigate the robustness of the $w$ calculations in a biologically relevant setting. However, @meritback and @rengesf constructed an alternate implementation of multi-factor PMD in a linear programming framework which performs the optimization simultaneously and therefore produces the same solution regardless of order.

A demonstration of this permutation invariance and the effect of permutation on the current MultiCCA implementation on a toy example can be found here: https://github.com/theislab/sparsecca/blob/main/examples/linear_programming_multicca.ipynb

@livnatje
Copy link
Owner

livnatje commented Mar 1, 2023

OK, so we will explore this as well. I will leave this issue open as you suggested and we can follow up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algorithm question Questions about the algorithm and mathematical/statistical modeling
Projects
None yet
Development

No branches or pull requests

2 participants