Skip to content

Commit

Permalink
Updating a bit the documentation and small bug correction.
Browse files Browse the repository at this point in the history
  • Loading branch information
jracle85 committed Jul 12, 2023
1 parent f467f77 commit 50a4f40
Show file tree
Hide file tree
Showing 12 changed files with 234 additions and 128 deletions.
54 changes: 27 additions & 27 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,27 +1,27 @@
Package: EPIC
Type: Package
Title: Estimate the Proportion of Immune and Cancer cells
Version: 1.1.6
Authors@R: as.person(c(
"Julien Racle <[email protected]> [aut, cre]",
"David Gfeller <[email protected]> [aut]"
))
Description: Package implementing EPIC method to estimate the proportion of
immune, stromal, endothelial and cancer or other cells from bulk gene
expression data.
It is based on reference gene expression profiles for the main non-malignant
cell types and it predicts the proportion of these cells and of the
remaining "other cells" (that are mostly cancer cells) for which no
reference profile is given.
Depends:
R (>= 3.2.0)
License: file LICENSE
LazyData: TRUE
RoxygenNote: 7.2.1
Suggests:
testthat,
knitr,
rmarkdown
Imports:
stats
VignetteBuilder: knitr
Package: EPIC
Type: Package
Title: Estimate the Proportion of Immune and Cancer cells
Version: 1.1.7
Authors@R: as.person(c(
"Julien Racle <[email protected]> [aut, cre]",
"David Gfeller <[email protected]> [aut]"
))
Description: Package implementing EPIC method to estimate the proportion of
immune, stromal, endothelial and cancer or other cells from bulk gene
expression data.
It is based on reference gene expression profiles for the main non-malignant
cell types and it predicts the proportion of these cells and of the
remaining "other cells" (that are mostly cancer cells) for which no
reference profile is given.
Depends:
R (>= 3.2.0)
License: file LICENSE
LazyData: TRUE
RoxygenNote: 7.2.1
Suggests:
testthat,
knitr,
rmarkdown
Imports:
stats
VignetteBuilder: knitr
6 changes: 3 additions & 3 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
# Generated by roxygen2: do not edit by hand

export(EPIC)
# Generated by roxygen2: do not edit by hand
export(EPIC)
10 changes: 10 additions & 0 deletions NEWS
Original file line number Diff line number Diff line change
@@ -1,3 +1,13 @@
Version 1.1.7
------------------------------------------------------------------------
* Small changes in the documentation (in particular, explaining in the
README's FAQ section when to use the *mRNAProportions* or *cellFractions*).
* Removed the warning message about unknown *mRNA_cell* values that was written
nearly in all runs (writing the caution message about this directly in the FAQ
section).
* Corrected a bug when there were duplicated *empty* gene names (i.e., genes
named simply as "").

Version 1.1.6
------------------------------------------------------------------------
* Changed person of contact for commercial licenses to Nadette Bulgin.
Expand Down
4 changes: 2 additions & 2 deletions R/EPIC_descr.R
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@
#' estimate the proportion of immune, stromal, endothelial and cancer or other
#' cells from bulk gene expression data.
#'
#' See the package \link[=../doc/info.html]{vignette} and function definitions
#' below.
#' See the package vignette (command in the R console: \emph{vignette("EPIC")} )
#' and function definitions below.
#'
#' @section EPIC functions:
#' \code{\link{EPIC}} is the main function to call to estimate the
Expand Down
36 changes: 22 additions & 14 deletions R/EPIC_fun.R
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,11 @@
#' @return A list of 3 matrices:\describe{
#' \item{\code{mRNAProportions}}{(\code{nSamples} x (\code{nCellTypes+1})) the
#' proportion of mRNA coming from all cell types with a ref profile + the
#' uncharacterized other cell.}
#' uncharacterized other cell. Please note that if working with reconstructed
#' in silico bulk samples built for example from single-cell RNA-seq data,
#' then you should compare the 'true' proportions against these
#' 'mRNAProportions', while if working with true bulk samples, then you should
#' compare the cell proportions against the 'cellFractions'.}
#' \item{\code{cellFractions}}{(\code{nSamples} x (\code{nCellTypes+1})) this
#' gives the proportion of cells from each cell type after accounting for
#' the mRNA / cell value.}
Expand Down Expand Up @@ -392,18 +396,20 @@ EPIC <- function(bulk, reference=NULL, mRNA_cell=NULL, mRNA_cell_sub=NULL,
if (anyNA(tInds)){
defaultInd <- match("default", names(mRNA_cell))
if (is.na(defaultInd)){
tStr <- paste(" and no default value is given for this mRNA per cell,",
"so we cannot estimate the cellFractions, only",
"the mRNA proportions")
warning("mRNA_cell value unknown for some cell types: ",
paste(colnames(mRNAProportions)[is.na(tInds)], collapse=", "),
" and no default value is given for the mRNA per cell, so we cannot ",
"estimate the cellFractions, only the mRNA proportions")
} else {
tStr <- paste(" - using the default value of", mRNA_cell[defaultInd],
"for these but this might bias the true cell proportions from",
"all cell types.")
# warning("mRNA_cell value unknown for some cell types: ",
# paste(colnames(mRNAProportions)[is.na(tInds)], collapse=", "),
# " - using the default value of", mRNA_cell[defaultInd], " for these but ",
# "this might bias the true cell proportions from all cell types.")
# Not indicating this warning message as it comes about always if the
# user doesn't define additional mRNA_cell values by himself. Instead,
# I've indicated this warning in the documentation directly.
tInds[is.na(tInds)] <- defaultInd
}
warning("mRNA_cell value unknown for some cell types: ",
paste(colnames(mRNAProportions)[is.na(tInds)], collapse=", "),
tStr)
tInds[is.na(tInds)] <- defaultInd
}
cellFractions <- t( t(mRNAProportions) / mRNA_cell[tInds])
cellFractions <- cellFractions / rowSums(cellFractions, na.rm=FALSE)
Expand Down Expand Up @@ -465,15 +471,17 @@ merge_duplicates <- function(mat, warn=TRUE, in_type=NULL){
if (warn){
warning("There are ", length(dupl_genes), " duplicated gene names",
ifelse(!is.null(in_type), paste(" in the", in_type), ""),
". We'll use the median value for each of these cases.")
" (e.g., ", paste0("'", dupl_genes[1:(min(5, length(dupl_genes)))],
"'", collapse=", "), "). We'll use the median value for ",
"each of these cases.")
}
mat_dupl <- mat[rownames(mat) %in% dupl_genes,,drop=F]
mat_dupl_names <- rownames(mat_dupl)
mat <- mat[!dupl,,drop=F]
# First put the dupl cases in a separate matrix and keep only the unique
# gene names in the mat matrix.
mat[dupl_genes,] <- t(sapply(dupl_genes, FUN=function(cgene)
apply(mat_dupl[mat_dupl_names == cgene,,drop=F], MARGIN=2, FUN=median)))
mat[match(dupl_genes, rownames(mat)),] <- t(sapply(dupl_genes, FUN=function(cgene)
apply(mat_dupl[mat_dupl_names == cgene,,drop=F], MARGIN=2, FUN=stats::median)))
}
return(mat)
}
25 changes: 20 additions & 5 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,21 @@ and David Gfeller ([[email protected]](mailto:[email protected])).


## FAQ
##### Which proportions returned by EPIC should I use?
* EPIC is returning two proportion values: *mRNAProportions* and *cellFractions*,
where the 2nd represents the true proportion of cells coming from the different
cell types when considering differences in mRNA expression between cell types.
So in principle, it is best to consider these *cellFractions*.

However, please note, that when the goal is to benchmark EPIC predictions, if
the 'bulk samples' correspond in fact to in silico samples reconstructed for
example from single-cell RNA-seq data, then it is usually better to compare the
'true' proportions against the *mRNAProportions* from EPIC. Indeed, when
building such in silico samples, the fact that different cell types express
different amount of mRNA is usually not taken into account. On the other side,
if working with true bulk samples, then you should compare the true cell
proportions (measured e.g., by FACS) against the *cellFractions*.

##### What do the "*other cells*" represent?
* EPIC predicts the proportions of the various cell types for which we have
gene expression reference profiles (and corresponding gene signatures). But,
Expand All @@ -99,7 +114,7 @@ epithelial cells for example.
Please make sure that your bulk data is in the form of a matrix (and also
your reference gene expression profiles if using custom ones).

##### What is the meaning of the warning message telling that some mRNA_cell values are unknown?
##### Is there some caution to consider about the *cellFractions* and *mRNA_cell* values?
* As described in our manuscript, EPIC first estimates the proportion of mRNA
per cell type in the bulk and then it uses the fact that some cell types have
more mRNA copies per cell than other to normalize this and obtain an estimate of
Expand All @@ -108,10 +123,10 @@ if you need the one or the other). For this normalization we had either measured
the amount of mRNA per cell or found it in the literature (fig. 1 – fig.
supplement 2 of our paper). However we don’t currently have such values for the
endothelial cells and CAFs. Therefore for these two cell types, we use an average
value, which might not reflect their true value and this is the reason why we
output this message. If you have some values for these mRNA/cell abundances, you
can also add them into EPIC, with help of the parameter "*mRNA_cell*" or
*mRNA_cell_sub*” (and that would be great to share these values).
value, which might not reflect their true value and this could bias a bit the
predictions, especially for these cell types. If you have some values for these
mRNA/cell abundances, you can also add them into EPIC, with help of the parameter
"*mRNA_cell*" or *mRNA_cell_sub*” (and that would be great to share these values).

If the mRNA proportions of these cell types are low, then even if you don't
correct the results with their true mRNA/cell abundances, it would not really
Expand Down
30 changes: 24 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,24 @@ Julien Racle (<[email protected]>), and David Gfeller

## FAQ

##### Which proportions returned by EPIC should I use?

- EPIC is returning two proportion values: *mRNAProportions* and
*cellFractions*, where the 2nd represents the true proportion of cells
coming from the different cell types when considering differences in
mRNA expression between cell types. So in principle, it is best to
consider these *cellFractions*.

However, please note, that when the goal is to benchmark EPIC
predictions, if the ‘bulk samples’ correspond in fact to in silico
samples reconstructed for example from single-cell RNA-seq data, then
it is usually better to compare the ‘true’ proportions against the
*mRNAProportions* from EPIC. Indeed, when building such in silico
samples, the fact that different cell types express different amount
of mRNA is usually not taken into account. On the other side, if
working with true bulk samples, then you should compare the true cell
proportions (measured e.g., by FACS) against the *cellFractions*.

##### What do the “*other cells*” represent?

- EPIC predicts the proportions of the various cell types for which we
Expand All @@ -104,7 +122,7 @@ Julien Racle (<[email protected]>), and David Gfeller
matrix (and also your reference gene expression profiles if using
custom ones).

##### What is the meaning of the warning message telling that some mRNA_cell values are unknown?
##### Is there some caution to consider about the *cellFractions* and *mRNA_cell* values?

- As described in our manuscript, EPIC first estimates the proportion of
mRNA per cell type in the bulk and then it uses the fact that some
Expand All @@ -115,11 +133,11 @@ Julien Racle (<[email protected]>), and David Gfeller
mRNA per cell or found it in the literature (fig. 1 – fig. supplement
2 of our paper). However we don’t currently have such values for the
endothelial cells and CAFs. Therefore for these two cell types, we use
an average value, which might not reflect their true value and this is
the reason why we output this message. If you have some values for
these mRNA/cell abundances, you can also add them into EPIC, with help
of the parameter “*mRNA_cell*” or*mRNA_cell_sub*” (and that would be
great to share these values).
an average value, which might not reflect their true value and this
could bias a bit the predictions, especially for these cell types. If
you have some values for these mRNA/cell abundances, you can also add
them into EPIC, with help of the parameter “*mRNA_cell*” or
*mRNA_cell_sub*” (and that would be great to share these values).

If the mRNA proportions of these cell types are low, then even if you
don’t correct the results with their true mRNA/cell abundances, it
Expand Down
25 changes: 20 additions & 5 deletions inst/doc/EPIC.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,21 @@ and David Gfeller ([[email protected]](mailto:[email protected])).


## FAQ
##### Which proportions returned by EPIC should I use?
* EPIC is returning two proportion values: *mRNAProportions* and *cellFractions*,
where the 2nd represents the true proportion of cells coming from the different
cell types when considering differences in mRNA expression between cell types.
So in principle, it is best to consider these *cellFractions*.

However, please note, that when the goal is to benchmark EPIC predictions, if
the 'bulk samples' correspond in fact to in silico samples reconstructed for
example from single-cell RNA-seq data, then it is usually better to compare the
'true' proportions against the *mRNAProportions* from EPIC. Indeed, when
building such in silico samples, the fact that different cell types express
different amount of mRNA is usually not taken into account. On the other side,
if working with true bulk samples, then you should compare the true cell
proportions (measured e.g., by FACS) against the *cellFractions*.

##### What do the "*other cells*" represent?
* EPIC predicts the proportions of the various cell types for which we have
gene expression reference profiles (and corresponding gene signatures). But,
Expand All @@ -95,7 +110,7 @@ epithelial cells for example.
Please make sure that your bulk data is in the form of a matrix (and also
your reference gene expression profiles if using custom ones).

##### What is the meaning of the warning message telling that some mRNA_cell values are unknown?
##### Is there some caution to consider about the *cellFractions* and *mRNA_cell* values?
* As described in our manuscript, EPIC first estimates the proportion of mRNA
per cell type in the bulk and then it uses the fact that some cell types have
more mRNA copies per cell than other to normalize this and obtain an estimate of
Expand All @@ -104,10 +119,10 @@ if you need the one or the other). For this normalization we had either measured
the amount of mRNA per cell or found it in the literature (fig. 1 – fig.
supplement 2 of our paper). However we don’t currently have such values for the
endothelial cells and CAFs. Therefore for these two cell types, we use an average
value, which might not reflect their true value and this is the reason why we
output this message. If you have some values for these mRNA/cell abundances, you
can also add them into EPIC, with help of the parameter "*mRNA_cell*" or
*mRNA_cell_sub*” (and that would be great to share these values).
value, which might not reflect their true value and this could bias a bit the
predictions, especially for these cell types. If you have some values for these
mRNA/cell abundances, you can also add them into EPIC, with help of the parameter
"*mRNA_cell*" or *mRNA_cell_sub*” (and that would be great to share these values).

If the mRNA proportions of these cell types are low, then even if you don't
correct the results with their true mRNA/cell abundances, it would not really
Expand Down
41 changes: 31 additions & 10 deletions inst/doc/EPIC.html
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@

<meta name="author" content="Julien Racle and David Gfeller" />

<meta name="date" content="2023-03-13" />
<meta name="date" content="2023-07-12" />

<title>EPIC package</title>

Expand Down Expand Up @@ -340,7 +340,7 @@

<h1 class="title toc-ignore">EPIC package</h1>
<h4 class="author">Julien Racle and David Gfeller</h4>
<h4 class="date">2023-03-13</h4>
<h4 class="date">2023-07-12</h4>



Expand Down Expand Up @@ -409,6 +409,26 @@ <h2>Contact information</h2>
</div>
<div id="faq" class="section level2">
<h2>FAQ</h2>
<div id="which-proportions-returned-by-epic-should-i-use" class="section level5">
<h5>Which proportions returned by EPIC should I use?</h5>
<ul>
<li><p>EPIC is returning two proportion values: <em>mRNAProportions</em>
and <em>cellFractions</em>, where the 2nd represents the true proportion
of cells coming from the different cell types when considering
differences in mRNA expression between cell types. So in principle, it
is best to consider these <em>cellFractions</em>.</p>
<p>However, please note, that when the goal is to benchmark EPIC
predictions, if the ‘bulk samples’ correspond in fact to in silico
samples reconstructed for example from single-cell RNA-seq data, then it
is usually better to compare the ‘true’ proportions against the
<em>mRNAProportions</em> from EPIC. Indeed, when building such in silico
samples, the fact that different cell types express different amount of
mRNA is usually not taken into account. On the other side, if working
with true bulk samples, then you should compare the true cell
proportions (measured e.g., by FACS) against the
<em>cellFractions</em>.</p></li>
</ul>
</div>
<div id="what-do-the-other-cells-represent" class="section level5">
<h5>What do the “<em>other cells</em>” represent?</h5>
<ul>
Expand All @@ -433,9 +453,9 @@ <h5>I receive an error message “<em>attempt to set ‘colnames’ on an
ones).</li>
</ul>
</div>
<div id="what-is-the-meaning-of-the-warning-message-telling-that-some-mrna_cell-values-are-unknown" class="section level5">
<h5>What is the meaning of the warning message telling that some
mRNA_cell values are unknown?</h5>
<div id="is-there-some-caution-to-consider-about-the-cellfractions-and-mrna_cell-values" class="section level5">
<h5>Is there some caution to consider about the <em>cellFractions</em>
and <em>mRNA_cell</em> values?</h5>
<ul>
<li><p>As described in our manuscript, EPIC first estimates the
proportion of mRNA per cell type in the bulk and then it uses the fact
Expand All @@ -446,11 +466,12 @@ <h5>What is the meaning of the warning message telling that some
mRNA per cell or found it in the literature (fig. 1 – fig. supplement 2
of our paper). However we don’t currently have such values for the
endothelial cells and CAFs. Therefore for these two cell types, we use
an average value, which might not reflect their true value and this is
the reason why we output this message. If you have some values for these
mRNA/cell abundances, you can also add them into EPIC, with help of the
parameter “<em>mRNA_cell</em>” or “<em>mRNA_cell_sub</em>” (and that
would be great to share these values).</p>
an average value, which might not reflect their true value and this
could bias a bit the predictions, especially for these cell types. If
you have some values for these mRNA/cell abundances, you can also add
them into EPIC, with help of the parameter “<em>mRNA_cell</em>” or
<em>mRNA_cell_sub</em>” (and that would be great to share these
values).</p>
<p>If the mRNA proportions of these cell types are low, then even if you
don’t correct the results with their true mRNA/cell abundances, it would
not really have a big impact on the results. On the other side, if there
Expand Down
Loading

0 comments on commit 50a4f40

Please sign in to comment.