Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with inconsistent lines output from K-means clustering #1253

Open
magnusdottir opened this issue Sep 7, 2023 · 0 comments
Open

Problem with inconsistent lines output from K-means clustering #1253

magnusdottir opened this issue Sep 7, 2023 · 0 comments

Comments

@magnusdottir
Copy link

Hi,
I'm having problems with plotHeatmap and K-means clustering. I am not able to reproducibly plot heatmaps, as the numbers of lines in them varies between runs, even when giving the exact same command. Somehow it seems like the initial runs have the expected line density but then subsequent runs don't, which is strange as I'm running scripts on SLURM. I do get a high number of lines in unclustered heat maps, and fewer clusters tend to perform "better" in terms of including the full number of lines, but this is still erratic.

E.g. I have two data points that I ran a matrix for and plotted a heat map. This looked good without clustering and then good as well with two clusters as well as three clusters but four clusters gave me what looks like a much lower density (in terms of lines) heat map.

I was outputting .pdf files and started thinking this might be something to do with how the program outputs/plots pdfs. I therefore ran the exact same script with a .png output and it gave the more dense heat map (i.e. what appears to have the same total line density as the original heat map and the 2 cluster heatmap). But then increasing to 6 clusters gave me a coarser heatmap again, and THEN going back to 4 clusters, still WITH .png gave me the less dense heatmap again. The only thing I've changed in the below script between runs is the --kmeans cluster number and the file name.


Python: Python/3.9.6-GCCcore-11.2.0
deepTools: deepTools/3.5.1-foss-2021b

This is my code with the file names changed that gave the different results when run two different times on the same matrix:

plotHeatmap     -m $outPath/Matrix/Matrix_TSS_2Kb \
                -out $outPath/Plots/TSS_2Kb_4Clusters.png \
                --colorMap RdBu \
                --whatToShow 'heatmap and colorbar' \
                --zMin -3 --zMax 3 \
                --kmeans 4

This is the top of the clustered heat map that I get, with the right hand side plot seeming to be a lot sparser in terms of number of data points:
image

Has anyone had a similar problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant