Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

names() does not include the description #273

Open
SamGG opened this issue Jan 20, 2025 · 6 comments
Open

names() does not include the description #273

SamGG opened this issue Jan 20, 2025 · 6 comments

Comments

@SamGG
Copy link
Contributor

SamGG commented Jan 20, 2025

Hi guys,
I noticed that the documentation of the names() function/method does not match its result as the description is not included.
Best,
Samuel

library(flowCore)
data("GvHD")
# names Extract pretty formated names of the parameters including parameter descriptions.
names(GvHD[[1]])
#> [1] "<FSC-H> FSC-H" "<SSC-H> SSC-H" "<FL1-H> FL1-H" "<FL2-H> FL2-H"
#> [5] "<FL3-H> FL3-H" "<FL2-A> FL2-A" "<FL4-H> FL4-H" "<Time> Time"
pData(parameters(GvHD[[1]]))
#>      name              desc range minRange maxRange
#> $P1 FSC-H        FSC-Height  1024        0     1023
#> $P2 SSC-H        SSC-Height  1024        0     1023
#> $P3 FL1-H         CD15 FITC  1024        1    10000
#> $P4 FL2-H           CD45 PE  1024        1    10000
#> $P5 FL3-H        CD14 PerCP  1024        1    10000
#> $P6 FL2-A              <NA>  1024        0     1023
#> $P7 FL4-H          CD33 APC  1024        1    10000
#> $P8  Time Time (51.20 sec.)  1024        0     1023
sessionInfo()
#> R version 4.4.2 (2024-10-31 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 11 x64 (build 22631)
#> 
#> Matrix products: default
#> 
#> 
#> locale:
#> [1] LC_COLLATE=French_France.utf8  LC_CTYPE=French_France.utf8   
#> [3] LC_MONETARY=French_France.utf8 LC_NUMERIC=C                  
#> [5] LC_TIME=French_France.utf8    
#> 
#> time zone: Europe/Paris
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] flowCore_2.18.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.37       fastmap_1.2.0       xfun_0.50          
#>  [4] matrixStats_1.5.0   glue_1.8.0          RProtoBufLib_2.18.0
#>  [7] cytolib_2.18.1      knitr_1.49          BiocGenerics_0.52.0
#> [10] Biobase_2.66.0      htmltools_0.5.8.1   rmarkdown_2.29     
#> [13] stats4_4.4.2        lifecycle_1.0.4     cli_3.6.3          
#> [16] reprex_2.1.1        withr_3.0.2         compiler_4.4.2     
#> [19] rstudioapi_0.17.1   tools_4.4.2         evaluate_1.0.3     
#> [22] yaml_2.3.10         S4Vectors_0.44.0    rlang_1.1.4        
#> [25] fs_1.6.5

Created on 2025-01-20 with reprex v2.1.1

@mikejiang
Copy link
Member

parameter descriptions is referring to "desc" column, which is marker/protein name

@SamGG
Copy link
Contributor Author

SamGG commented Jan 21, 2025

Hi Mike.
I don't get your point. My understanding is that the names() function/method should report the association of the name column and the desc column. IIRC, this was case at one moment, but not any more. Instead of " FSC-H", it should be " FSC-Height". Otherwise, I don't see the need of this function.
Best.

@SamGG
Copy link
Contributor Author

SamGG commented Jan 24, 2025

names() aims to combine the name and description columns as it is stated in its heading comment and in the doc. It is even clearly described in the vignette that never changed. In its code, names() combines the results of colnames() and featureNames().

colnames() always reported the same values, ie the name column.

featureNames() reports values from the name column since Bioc 2.2, but not before. In Bioc 2.1, names() reports the desc column. Currently, colnames() and featureNames() are said synonyms in the doc, which they are. Functions that perform the same thing should sound suspect IMHO and are misleading package users.

names() changes in Bioc 2.2 (2007-2008), and I don't know why Nolwenn did it. The current doc and vignette are still misleading in some parts since and did mislead me a few times.

names() does not fill the original goal which is to get a string combining name ($PnN) and description ($PnS, if present).

As names() currently combines the same information twice, which sounds useless IMO, could we revert to something more useful and centralized in flowCore?

I mean, either changing featureNames() to report desc column, or changing names() to really combine name and desc column. I think the latter is safer. It implies changing line 41 in names-methods.R IIRC.

If none of these changes are OK, please, update the documentation.

Thanks guys for reading.
@mikejiang @gfinak @jspidlen @DillonHammill

## ==========================================================================
## This return a pretified version of the parameter names, including the
## parameter description if present
## - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
#' @export
setMethod("names",
signature=signature(x="flowFrame"),
definition=function(x)
{
cn <- colnames(x)
fn <- featureNames(x)
if(length(fn) == length(cn)) {
cn <- paste("<", cn, ">", sep="")
for(i in seq(along=fn)) {
if(!is.na(fn[i]) && fn[i]!="")
cn[i] <- paste(cn[i],fn[i])
}
}
cn
})

## ==========================================================================
## accessor method for feature names
## - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
#' @export
setMethod("featureNames",
signature=signature(object="flowFrame"),
definition=function(object)
object@parameters$name
)

flowCore/R/AllClasses.R

Lines 278 to 280 in 152b8e4

#' \item{featureNames, colnames, colnames<-}{. \code{colnames} and
#' \code{featureNames} are synonyms, they extract parameter names (i.e., the
#' colnames of the data matrix) .

flowCore/R/AllClasses.R

Lines 294 to 295 in 152b8e4

#' \item{names}{Extract pretty formated names of the parameters
#' including parameter descriptions.

\Rfunction{colnames} methods. The \Rfunction{names} function
returns a concatenated version of \Rfunction{colnames} and
\Rfunction{featureNames} using a format similar to the one
employed by most flow cytometry analysis software. The
\Rfunction{colnames} method returns the detector names,
often named for the fluorochrome detected, while the
\Rfunction{featureNames} method returns the description
field of the parameters, which will typically be an
identifier for the antibody.

https://code.bioconductor.org/browse/flowCore/blob/RELEASE_2_1/R/flowFrame-accessors.R#L90

https://code.bioconductor.org/browse/flowCore/blob/RELEASE_2_2/R/flowFrame-accessors.R#L283

@jspidlen
Copy link
Contributor

jspidlen commented Jan 24, 2025 via email

@SamGG
Copy link
Contributor Author

SamGG commented Jan 24, 2025

Hi Josef, I agree with this principle in general. Changing featuresNames() back to its orign, my first proposal, is probably problematic as developer could hesitate between featureName() and colnames(). As they both return the same information we could imagine that developers used both.
Changing names() to match its definition would not break any code in my view. The current result is a string (and will stay a string) and this string is nearly useless as it repeats the same information. So I don't think any developer is using it and relies on it. I saw names() called in flowViz. In those calls, the change brings back sense to the label of the x and y axes of the plot.
In the end, it would take less work changing L41 than changing the doc and vignette, what should be done as they are not in agreement with the code. This is what I did, and I think dropping a single line in the NEWS would be sufficient.

> names(GvHD[[1]])
[1] "<FSC-H> FSC-Height"       "<SSC-H> SSC-Height"       "<FL1-H> CD15 FITC"        "<FL2-H> CD45 PE"         
[5] "<FL3-H> CD14 PerCP"       "<FL2-A>"                  "<FL4-H> CD33 APC"         "<Time> Time (51.20 sec.)"

> GvHD[[1]]
flowFrame object 's5a01'
with 3420 cells and 8 observables:
      name              desc     range  minRange  maxRange
$P1  FSC-H        FSC-Height      1024         0      1023
$P2  SSC-H        SSC-Height      1024         0      1023
$P3  FL1-H         CD15 FITC      1024         1     10000
$P4  FL2-H           CD45 PE      1024         1     10000
$P5  FL3-H        CD14 PerCP      1024         1     10000
$P6  FL2-A                NA      1024         0      1023
$P7  FL4-H          CD33 APC      1024         1     10000
$P8   Time Time (51.20 sec.)      1024         0      1023
170 keywords are stored in the 'description' slot

@jspidlen
Copy link
Contributor

jspidlen commented Jan 25, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants