Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Array in 1-dimension erro #28

Closed
preetida opened this issue Jul 6, 2020 · 2 comments
Closed

Array in 1-dimension erro #28

preetida opened this issue Jul 6, 2020 · 2 comments
Labels
question Further information is requested

Comments

@preetida
Copy link

preetida commented Jul 6, 2020

Hi,
I am getting this error while adding a new column for MT genes.
What is that I am doing wrong?

adata = sc.read_10x_h5(dir_path + 'filtered_feature_bc_matrix.h5')
adata.var_names_make_unique() 
print(adata.X)
print(adata.obs['sample'].value_counts())
print(adata.obs['sample'].value_counts())
print(f'Number of cells before filter: {adata.n_obs}')

# Quality control - calculate QC covariates
adata.obs['n_counts'] = adata.X.sum(1)
adata.obs['log_counts'] = np.log(adata.obs['n_counts'])
adata.obs['n_genes'] = (adata.X > 0).sum(1)

mt_gene_mask = [gene.startswith('MT-') for gene in adata.var_names]
adata.obs['mt_frac'] = adata.X[:, mt_gene_mask].sum(1)/adata.obs['n_counts']
AV_24    571
Name: sample, dtype: int64
Number of cells before filter: 571
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-26-5e7c1a502972> in <module>
     11 
     12 mt_gene_mask = [gene.startswith('MT-') for gene in adata.var_names]
---> 13 adata.obs['mt_frac'] = adata.X[:, mt_gene_mask].sum(1)/adata.obs['n_counts']

~/my_virtualenv/lib/python3.7/site-packages/pandas/core/series.py in __array_ufunc__(self, ufunc, method, *inputs, **kwargs)
    634         # for binary ops, use our custom dunder methods
    635         result = ops.maybe_dispatch_ufunc_to_dunder_op(
--> 636             self, ufunc, method, *inputs, **kwargs
    637         )
    638         if result is not NotImplemented:

pandas/_libs/ops_dispatch.pyx in pandas._libs.ops_dispatch.maybe_dispatch_ufunc_to_dunder_op()

~/my_virtualenv/lib/python3.7/site-packages/pandas/core/ops/common.py in new_method(self, other)
     62         other = item_from_zerodim(other)
     63 
---> 64         return method(self, other)
     65 
     66     return new_method

~/my_virtualenv/lib/python3.7/site-packages/pandas/core/ops/__init__.py in wrapper(left, right)
    503         result = arithmetic_op(lvalues, rvalues, op, str_rep)
    504 
--> 505         return _construct_result(left, result, index=left.index, name=res_name)
    506 
    507     wrapper.__name__ = op_name

~/my_virtualenv/lib/python3.7/site-packages/pandas/core/ops/__init__.py in _construct_result(left, result, index, name)
    476     # We do not pass dtype to ensure that the Series constructor
    477     #  does inference in the case where `result` has object-dtype.
--> 478     out = left._constructor(result, index=index)
    479     out = out.__finalize__(left)
    480 

~/my_virtualenv/lib/python3.7/site-packages/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    303                     data = data.copy()
    304             else:
--> 305                 data = sanitize_array(data, index, dtype, copy, raise_cast_failure=True)
    306 
    307                 data = SingleBlockManager(data, index, fastpath=True)

~/my_virtualenv/lib/python3.7/site-packages/pandas/core/construction.py in sanitize_array(data, index, dtype, copy, raise_cast_failure)
    480     elif subarr.ndim > 1:
    481         if isinstance(data, np.ndarray):
--> 482             raise Exception("Data must be 1-dimensional")
    483         else:
    484             subarr = com.asarray_tuplesafe(data, dtype=dtype)

Exception: Data must be 1-dimensional
@flying-sheep flying-sheep added the bug Something isn't working label Jan 22, 2024
@Echo-Nie
Copy link

Echo-Nie commented Mar 10, 2025

import scanpy as sc
import numpy as np
from anndata import AnnData

# Read single-cell data
adata = sc.read_10x_h5('./data/filtered_feature_bc_matrix.h5')

# Debug variable names
var = adata.var_names

# Ensure variable names are unique
adata.var_names_make_unique()

# Debug variable names after making them unique
var = adata.var_names

# Print the data matrix
print(adata.X)

# Check and handle the 'sample' column
if 'sample' in adata.obs.columns:
    print(adata.obs['sample'].value_counts())
else:
    print("Warning: 'sample' column not found in adata.obs")
    adata.obs['sample'] = 'default_sample'
    print(adata.obs['sample'].value_counts())

# Print the number of cells before filtering
print(f'Number of cells before filter: {adata.n_obs}')

# Quality control - Calculate QC metrics
adata.obs['n_counts'] = adata.X.sum(1)  # Total UMI counts per cell
adata.obs['log_counts'] = np.log(adata.obs['n_counts'])  # Log-transformed UMI counts
adata.obs['n_genes'] = (adata.X > 0).sum(1)  # Number of detected genes per cell

# Calculate mitochondrial gene fraction
mt_gene_mask = [gene.startswith('MT-') for gene in adata.var_names]  # Mask for mitochondrial genes
mt_counts = adata.X[:, mt_gene_mask].sum(1)  # Total mitochondrial gene counts per cell
total_counts = adata.obs['n_counts']
adata.obs['mt_frac'] = mt_counts.A.flatten() / total_counts  # Fraction of mitochondrial gene counts

# Visualize QC metrics
sc.pl.violin(adata, ['n_counts', 'n_genes', 'mt_frac'], groupby='sample')

# Filter cells based on QC metrics
adata = adata[adata.obs['n_counts'] < 20000, :]  # Filter out cells with high UMI counts
adata = adata[adata.obs['mt_frac'] < 0.2, :]  # Filter out cells with high mitochondrial gene fraction

print(f'Number of cells after filter: {adata.n_obs}')

Because of adata X is a sparse matrix, direct slicing and summation can lead to dimensional mismatch.

The printed info is as follows:

<Compressed Sparse Row sparse matrix of dtype 'float32'
	with 5959380 stored elements and shape (4226, 33538)>
  Coords	Values
  (0, 33508)	5.0
  (0, 33505)	8.0
  (0, 33504)	2.0
  (0, 33503)	8.0
  (0, 33502)	12.0
  (0, 33501)	10.0
  (0, 33499)	16.0
  (0, 33498)	8.0
  (0, 33497)	8.0
  (0, 33496)	11.0
  (0, 33494)	3.0
  (0, 33474)	1.0
  (0, 33376)	1.0
  (0, 33360)	1.0
  (0, 33254)	1.0
  (0, 33209)	1.0
  (0, 33157)	1.0
  (0, 33131)	1.0
  (0, 33098)	1.0
  (0, 33097)	1.0
  (0, 33078)	1.0
  (0, 32987)	2.0
  (0, 32910)	1.0
  (0, 32844)	1.0
  (0, 32808)	1.0
  :	:
  (4225, 472)	1.0
  (4225, 458)	1.0
  (4225, 449)	1.0
  (4225, 443)	2.0
  (4225, 439)	2.0
  (4225, 421)	1.0
  (4225, 412)	2.0
  (4225, 411)	1.0
  (4225, 410)	1.0
  (4225, 407)	1.0
  (4225, 396)	1.0
  (4225, 259)	1.0
  (4225, 226)	2.0
  (4225, 220)	1.0
  (4225, 219)	3.0
  (4225, 214)	1.0
  (4225, 201)	2.0
  (4225, 190)	2.0
  (4225, 172)	1.0
  (4225, 161)	1.0
  (4225, 152)	1.0
  (4225, 93)	1.0
  (4225, 86)	1.0
  (4225, 70)	1.0
  (4225, 32)	2.0
Warning: 'sample' column not found in adata.obs
sample
default_sample    4226
Name: count, dtype: int64
Number of cells before filter: 4226

Image

@flying-sheep
Copy link
Member

flying-sheep commented Mar 10, 2025

yeah, you can do .sum().toarray() to get a numpy array. (I think the .A alias of that method is going away, so using the method is better)

In the future:

  • scanpy will support cs{rc}_arrays, for which .sum() returns a 1D array (other than cs{rc}_matrix, for which .sum() returns a 2D matrix)
  • we’ll publish fast-arrayutilswhosesum()` function also returns a 1D ndarray

@flying-sheep flying-sheep added question Further information is requested and removed bug Something isn't working labels Mar 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants