-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[r] Restore SOMASparseNDArray$read_sparse_matrix()
#1414
Conversation
Restore reading of one-based sparse matrices. Zero-based sparse matrices cause issues with the downstream ecosystems. This PR brings back support for reading one-based matrices while retaining ability to read zero-based sparse matrices. Code is shared between both readers to reduce duplication.
apis/r/R/SOMASparseNDArray.R
Outdated
coords = NULL, | ||
result_order = "auto", | ||
repr = c("C", "T", "R"), | ||
iterated = FALSE, | ||
log_level = "warn" | ||
) { | ||
private$check_open_for_read() | ||
|
||
repr <- repr[1L] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need that given the next line. match.arg()
will not let you pass a vector through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've had match.arg()
fail when passing default vectors, eg
f <- function(repr = c("C", "T", "R")) {
match.arg(repr)
}
g <- function(repr = c("C", "T", "R")) {
f(x)
}
g() # throws an error
but can't seem to reproduce it; I'll remove the line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
g() is 'outside the spec' methinks. match.arg() does some black art with the signature of its enclosing function. You break that in the example, you get to keep the pieces.
At least that's how it reads to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Department of Couldn't be quicker and dirtier if we tried calling on line 2:
> f <- function(repr = c("C", "T", "R")) {
match.arg(repr)
}
> f(c("T", "R", "R"))
Error in match.arg(repr) : 'arg' must be of length 1
>
You had the right idea there but is already baked in. And by default more explicit (--> actual error).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked in full detail but it looks generally ok. One worry is whether this will bite with eg #1411 ?
Codecov ReportPatch coverage has no change and project coverage change:
❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more. Additional details and impacted files@@ Coverage Diff @@
## main #1414 +/- ##
===========================================
- Coverage 63.95% 52.10% -11.85%
===========================================
Files 98 68 -30
Lines 8170 5711 -2459
===========================================
- Hits 5225 2976 -2249
+ Misses 2945 2735 -210
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Good question; haven't seen #1411 so I'm tagging @pablo-gar to ensure he sees this. From a cursory glance, it seems like the only things changing are the use of |
apis/r/R/SOMASparseNDArray.R
Outdated
@@ -188,7 +188,6 @@ SOMASparseNDArray <- R6::R6Class( | |||
log_level = "warn" | |||
) { | |||
private$check_open_for_read() | |||
repr <- repr[1L] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@mojaveazure What's the problem of reading a zero-based matrix and then converting to one-based? Like here:
I think that gives you effectively the same result as which after #1411 it would look like this one_based <- soma_sparse$read_sparse_matrix_zero_based(repr="C")$get_one_based_matrix() |
@pablo-gar I believe it does give the same result, but in a far less convoluted way. What
In perusing #1411, step 5 seems to get a lot more complicated TileDB-SOMA/apis/r/R/SOMAExperimentAxisQuery.R Lines 592 to 596 in f0b9d51
This PR instead simplifies this process to
I've argued before (and again) that the vast majority of users, including Seurat/SingleCellExperiment, won't need zero-based matrices, so providing easy access for one-based matrices makes life easier. Moreover, because the zero-based view is a wrapper around one-based matrices, this simply exposes the first part of that process |
@mojaveazure If this is a UX wrapper I'm not opposed nor in favor but I would suggest we don't duplicate code. So I propose we either implement these function this way: read_sparse_matrix = function(...) {
mat <- self$read_sparse_matrix_zero_based(...)
mat$get_one_based_matrix()
} OR the converse: read_sparse_matrix_zero_based = function(...) {
mat <- self$read_sparse_matrix(...)
matrixZeroBasedView$new(mat)
} you did the latter! I'm ok with that |
@pablo-gar This PR does just that: TileDB-SOMA/apis/r/R/SOMASparseNDArray.R Lines 238 to 266 in 06e0105
|
@pablo-gar Whoops, didn't see that your edit saying you already saw that, my bad |
@mojaveazure Np! I'm processing as I type! I should process and then type after. The other part that I'm trying to grasp is how this will collide with #1274 but I think that should be an "easy fix" on that other PR |
@mojaveazure also see that we may get rid altogether of the ability to do "full" reads |
@pablo-gar Hmm, interesting. I'm not sure how well that will work in the R-world as R doesn't have a useful version of iterator and both Seurat and SingleCellExperiment need the full matrices (at least for now, there are ways around that but it'll involve a lot more work) |
@mojaveazure yup it'll be a long call like in Python full_mat <- soma_sparse$read()$sparse_matrix()$concat() |
Belated- I'm also neutral on this, but having merged it, shouldn't we also update SOMAExperimentAxisQuery.R to use it instead of |
SOMASparseNDArray$read_sparse_matrix()
SOMASparseNDArray$read_sparse_matrix()
Restore reading of one-based sparse matrices. Zero-based sparse matrices cause issues with the downstream ecosystems (eg. #1279). This PR brings back support for reading one-based matrices while retaining ability to read zero-based sparse matrices. Code is shared between both readers to reduce duplication.
Also makes uses of
Matrix::sparseMatrix(index1 = FALSE)
to reduce the amount of coercion from zero-based in SOMA → 1-based in call toMatrix::sparseMatrix()
→ 0-based internally within Matrix