Skip to content

Commit 7c74743

Browse files
committed
update
1 parent 8976049 commit 7c74743

File tree

6 files changed

+175
-19
lines changed

6 files changed

+175
-19
lines changed

DESCRIPTION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ Authors@R:
1010
email = "[email protected]")
1111
)
1212
Depends: R (>= 4.2.0)
13-
Imports: scuttle, stringr, AnnotationDbi, org.Hs.eg.db, org.Mm.eg.db, stats, grDevices, graphics, speckle, ggplot2, SingleCellExperiment
13+
Imports: scuttle, stringr, AnnotationDbi, org.Hs.eg.db, org.Mm.eg.db, stats,SingleCellExperiment, speckle
1414
VignetteBuilder: knitr
1515
Suggests: knitr, rmarkdown, CellBench, scater, patchwork
1616
Description: The cellXY package contains functions for predicting sex labels for single cell RNA sequencing data.

NAMESPACE

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
export(classifySex)
44
export(findMfDoublet)
55
export(preprocess)
6+
export(preprocessDb)
67
importFrom(AnnotationDbi,select)
78
importFrom(org.Hs.eg.db,org.Hs.eg.db)
89
importFrom(org.Mm.eg.db,org.Mm.eg.db)

R/findMFDoublet.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ findMfDoublet<-function(x, genome=NULL, qc = FALSE)
6060
genome <- match.arg(genome,c("Hs","Mm"))
6161

6262
# pre-process
63-
processed.data<-preprocess(x, genome = genome, qc = FALSE)
63+
processed.data<-preprocessDb(x, genome = genome, qc = FALSE)
6464

6565
# the processed transposed count matrix
6666
tcm <-processed.data$tcm.final

R/preprocessDb.R

Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
2+
#' Pre-processing function for sex classification
3+
#'
4+
#' The purpose of this function is to process a single cell counts matrix into
5+
#' the appropriate format for the \code{classifySex} function.
6+
#'
7+
#' This function will filter out cells that are unable to be classified due to
8+
#' zero counts on *XIST/Xist* and all of the Y chromosome genes. If
9+
#' \code{qc=TRUE} additional cells are removed as identified by the
10+
#' \code{perCellQCMetrics} and \code{quickPerCellQC} functions from the
11+
#' \code{scuttle} package. The resulting counts matrix is then log-normalised
12+
#' and scaled.
13+
#'
14+
#' @param x the counts matrix, rows are genes and columns are cells. Row names
15+
#' must be gene symbols.
16+
#' @param genome the genome the data arises from. Current options are
17+
#' human: genome = "Hs" or mouse: genome = "Mm".
18+
#' @param qc logical, indicates whether to perform additional quality control
19+
#' on the cells. qc = TRUE will predict cells that pass quality control only
20+
#' and the filtered cells will not be classified. qc = FALSE will predict
21+
#' every cell except the cells with zero counts on *XIST/Xist* and the sum
22+
#' of the Y genes. Default is TRUE.
23+
#'
24+
#' @return outputs a list object with the following components
25+
#' \item{tcm.final }{A transposed count matrix where rows are cells and columns
26+
#' are the features used for classification.}
27+
#' \item{data.df }{The normalised and scaled \code{tcm.final} matrix.}
28+
#' \item{discarded.cells }{Character vector of cell IDs for the cells that are
29+
#' discarded when \code{qc=TRUE}.}
30+
#' \item{zero.cells }{Character vector of cell IDs for the cells that can not
31+
#' be classified as male/female due to zero counts on *Xist* and all the
32+
#' Y chromosome genes.}
33+
#'
34+
#' @importFrom AnnotationDbi select
35+
#' @importFrom stringr str_to_title
36+
#' @importFrom scuttle perCellQCMetrics
37+
#' @importFrom scuttle quickPerCellQC
38+
#' @importFrom org.Hs.eg.db org.Hs.eg.db
39+
#' @importFrom org.Mm.eg.db org.Mm.eg.db
40+
#' @export preprocessDb
41+
#'
42+
43+
preprocessDb<- function(x, genome=genome, qc=qc){
44+
45+
x <- as.matrix(x)
46+
row.names(x)<- toupper(row.names(x))
47+
# genes located in the X chromosome that have been reported to escape
48+
# X-inactivation
49+
# http://bioinf.wehi.edu.au/software/GenderGenes/index.html
50+
Xgenes<- c("ARHGAP4","STS","ARSD", "ARSL", "AVPR2", "BRS3", "S100G",
51+
"CHM", "CLCN4", "DDX3X","EIF1AX","EIF2S3", "GPM6B",
52+
"GRPR", "HCFC1", "L1CAM", "MAOA", "MYCLP1", "NAP1L3",
53+
"GPR143", "CDK16", "PLXNB3", "PRKX", "RBBP7", "RENBP",
54+
"RPS4X", "TRAPPC2", "SH3BGRL", "TBL1X","UBA1", "KDM6A",
55+
"XG", "XIST", "ZFX", "PUDP", "PNPLA4", "USP9X", "KDM5C",
56+
"SMC1A", "NAA10", "OFD1", "IKBKG", "PIR", "INE2", "INE1",
57+
"AP1S2", "GYG2", "MED14", "RAB9A", "ITM2A", "MORF4L2",
58+
"CA5B", "SRPX2", "GEMIN8", "CTPS2", "CLTRN", "NLGN4X",
59+
"DUSP21", "ALG13","SYAP1", "SYTL4", "FUNDC1", "GAB3",
60+
"RIBC1", "FAM9C","CA5BP1")
61+
62+
# genes belonging to the male-specific region of chromosome Y (unique genes)
63+
# http://bioinf.wehi.edu.au/software/GenderGenes/index.html
64+
Ygenes<-c("AMELY", "DAZ1", "PRKY", "RBMY1A1", "RBMY1HP", "RPS4Y1", "SRY",
65+
"TSPY1", "UTY", "ZFY","KDM5D", "USP9Y", "DDX3Y", "PRY", "XKRY",
66+
"BPY2", "VCY", "CDY1", "EIF1AY", "TMSB4Y","CDY2A", "NLGN4Y",
67+
"PCDH11Y", "HSFY1", "TGIF2LY", "TBL1Y", "RPS4Y2", "HSFY2",
68+
"CDY2B", "TXLNGY","CDY1B", "DAZ3", "DAZ2", "DAZ4")
69+
70+
# build artificial genes
71+
Xgene.set <-Xgenes[Xgenes %in% row.names(x)]
72+
Ygene.set <-Ygenes[Ygenes %in% row.names(x)]
73+
cm.new<-as.data.frame(matrix(rep(0, 3*ncol(x)), ncol = ncol(x),nrow = 3))
74+
row.names(cm.new) <- c("XIST","superX","superY")
75+
colnames(cm.new) <- colnames(x)
76+
cm.new["XIST", ]<- x["XIST", ]
77+
cm.new["superX", ] <-colSums(x[Xgene.set,])
78+
cm.new["superY", ] <-colSums(x[Ygene.set,])
79+
80+
############################################################################
81+
# Pre-processing
82+
# perform simple QC
83+
# keep a copy of library size
84+
discarded.cells <- NA
85+
if (qc == TRUE){
86+
#data.sce <-SingleCellExperiment(assays = list(counts = x))
87+
qcstats <- scuttle::perCellQCMetrics(x,subsets=list(Mito=1:100))
88+
qcfilter <- scuttle::quickPerCellQC(qcstats,
89+
percent_subsets=c("subsets_Mito_percent"))
90+
# save the discarded cells
91+
discarded.cells <- colnames(x[,qcfilter$discard])
92+
93+
# cm.new only contains cells that pass the quality control
94+
cm.new <-cm.new[,!qcfilter$discard]
95+
}
96+
97+
tcm.final <- t(cm.new)
98+
tcm.final <- as.data.frame(tcm.final)
99+
100+
# Do Not Classify
101+
# zero.cells <- NA
102+
# dnc <- tcm.final$superY==0 & tcm.final$superX==0
103+
#
104+
# if(any(dnc)==TRUE){
105+
# zero.cells <- row.names(tcm.final)[dnc]
106+
# message(length(zero.cells), "cell/s are unable to be classified
107+
# due to an abundance of zeroes on X and Y chromosome genes\n")
108+
# }
109+
# tcm.final <- tcm.final[!dnc, ]
110+
#
111+
# cm.new <- cm.new[,!dnc]
112+
113+
cm.lib.size<- colSums(x[,colnames(cm.new)], na.rm=TRUE)
114+
med.ls = median(cm.lib.size)
115+
116+
117+
# log-normalisation performed for each cell
118+
# scaling performed for each gene
119+
normsca.cm <- data.frame(lognormCounts(cm.new, log = TRUE,
120+
prior.count = 0.5,lib.size=cm.lib.size))
121+
data.df <- t(normsca.cm)
122+
data.df <- as.data.frame(data.df)
123+
data.df$med.ls = cm.lib.size/med.ls
124+
tcm.final$med.ls = cm.lib.size/med.ls
125+
126+
list(tcm.final=tcm.final, data.df=data.df, discarded.cells=discarded.cells)
127+
}

README.md

Lines changed: 1 addition & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -7,15 +7,6 @@
77
The cellXY package currently contains trained models to classify cells as male or
88
female and to predict whether a cell is a male-female doublet or not.
99

10-
The propeller function performs statistical tests for differences in cell
11-
type composition in single cell data. In order to test for differences in cell
12-
type proportions between multiple experimental conditions at least one of the
13-
groups must have some form of biological replication (i.e. at least two
14-
samples). For a two group scenario, the absolute minimum sample size is thus
15-
three. Since there are many technical aspects which can affect cell type
16-
proportion estimates, having biological replication is essential for a
17-
meaningful analysis.
18-
1910
The propeller function takes a SingleCellExperiment or Seurat object as input,
2011
extracts the relevant cell information, and tests whether the cell type
2112
proportions are statistically significantly different between experimental
@@ -53,7 +44,7 @@ devtools::install_github("phipsonlab/cellXY")
5344

5445
## Sex label prediction example
5546

56-
This is a basic example which shows you how to obtain a sex label preidction for each cell.
47+
This is a basic example which shows you how to obtain a sex label prediction for each cell.
5748

5849
``` r
5950
library(speckle)
@@ -75,11 +66,4 @@ sex <- classifySex(counts, genome="Hs")
7566
table(sex$prediction)
7667
boxplot(counts["XIST",]~sex$prediction)
7768
```
78-
Please note that this basic implementation is for when you are only modelling
79-
group information. When you have additional covariates that you would like to
80-
account for, please use the propeller.ttest() and propeller.anova() functions
81-
directly. Please read the vignette for examples on how to model a continuous
82-
variable, account for additional covariates and include a random effect in the
83-
analysis.
84-
8569

man/preprocessDb.Rd

Lines changed: 44 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)