Skip to content

leiden and umap not reproducible on different CPUs #2014

@grst

Description

@grst
  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of scanpy.
  • (optional) I have confirmed this bug exists on the master branch of scanpy.

I noticed that running the same single-cell analyses on different nodes of our HPC produces different results.
Starting from the same anndata object with a precomputed X_scVI latent representation, the UMAP and leiden-clustering looks different.

On

  • Intel(R) Xeon(R) CPU E5-2699A v4 @ 2.40GHz
  • AMD EPYC 7352 24-Core Processor
  • Intel(R) Xeon(R) CPU E7-4850 v4 @ 2.10GHz

image

adata.obs["leiden"].value_counts()
0     4268
1     2132
2     1691
3     1662
4     1659
5     1563
...

On

  • Intel(R) Xeon(R) CPU E7- 4870 @ 2.40GHz

image

0     3856
1     2168
2     2029
3     1659
4     1636
5     1536
...

Minimal code sample (that we can copy&paste without having any data)

A git repository with example data, notebook and a nextflow pipeline is available here:
https://github.com/grst/scanpy_reproducibility

A report of the analysis executed on four different CPU architectures is available here:
https://grst.github.io/scanpy_reproducibility/

Versions

WARNING: If you miss a compact list, please try `print_header`!
-----
anndata     0.7.5
scanpy      1.6.0
sinfo       0.3.1
-----
PIL                 8.0.1
anndata             0.7.5
backcall            0.2.0
cairo               1.20.0
cffi                1.14.4
colorama            0.4.4
cycler              0.10.0
cython_runtime      NA
dateutil            2.8.1
decorator           4.4.2
get_version         2.1
h5py                3.1.0
igraph              0.8.3
ipykernel           5.3.4
ipython_genutils    0.2.0
jedi                0.17.2
joblib              0.17.0
kiwisolver          1.3.1
legacy_api_wrap     0.0.0
leidenalg           0.8.3
llvmlite            0.35.0
matplotlib          3.3.3
mpl_toolkits        NA
natsort             7.1.0
numba               0.52.0
numexpr             2.7.1
numpy               1.19.4
packaging           20.7
pandas              1.1.4
parso               0.7.1
pexpect             4.8.0
pickleshare         0.7.5
pkg_resources       NA
prompt_toolkit      3.0.8
ptyprocess          0.6.0
pycparser           2.20
pygments            2.7.2
pyparsing           2.4.7
pytz                2020.4
scanpy              1.6.0
scipy               1.5.3
setuptools_scm      NA
sinfo               0.3.1
six                 1.15.0
sklearn             0.23.2
sphinxcontrib       NA
storemagic          NA
tables              3.6.1
texttable           1.6.3
tornado             6.1
traitlets           5.0.5
umap                0.4.6
wcwidth             0.2.5
yaml                5.3.1
zmq                 20.0.0
-----
IPython             7.19.0
jupyter_client      6.1.7
jupyter_core        4.7.0
-----
Python 3.8.6 | packaged by conda-forge | (default, Nov 27 2020, 19:31:52) [GCC 9.3.0]
Linux-3.10.0-1160.11.1.el7.x86_64-x86_64-with-glibc2.10
64 logical CPU cores, x86_64
-----
Session information updated at 2021-10-15 09:58

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions