Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[r] Implement as generics for SOMASparseNDArrayReader #1458

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions apis/r/DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -70,3 +70,49 @@ VignetteBuilder: knitr
Config/testthat/edition: 2
OS_type: unix
SystemRequirements: cmake, git
Collate:
'ConfigList.R'
'Factory.R'
'MappingBase.R'
'PlatformConfig.R'
'RcppExports.R'
'ReadIter.R'
'SOMAArrayBase.R'
'SOMAAxisIndexer.R'
'SOMAAxisQuery.R'
'SOMAAxisQueryResult.R'
'SOMACollection.R'
'SOMACollectionBase.R'
'SOMAContextBase.R'
'SOMADataFrame.R'
'SOMADenseNDArray.R'
'SOMAExperiment.R'
'SOMAExperimentAxisQuery.R'
'SOMAMeasurement.R'
'SOMAOpen.R'
'SOMASparseNDArray.R'
'SOMASparseNDArrayRead.R'
'SOMATileDBContext.R'
'ScalarMap.R'
'SparseReadIter.R'
'TableReadIter.R'
'TileDBArray.R'
'TileDBCreateOptions.R'
'TileDBGroup.R'
'TileDBObject.R'
'TileDBURI.R'
'datasets.R'
'ephemeral.R'
'pad_matrix.R'
'roxygen.R'
'utils-arrow.R'
'utils-assertions.R'
'utils-coercions.R'
'utils-matrixZeroBasedView.R'
'utils-readerTransformers.R'
'utils-seurat.R'
'utils-tiledb.R'
'utils-uris.R'
'utils.R'
'write_seurat.R'
'write_soma.R'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. Why would we need this now when we did not before?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the Collate secion is not necessary. My guess is it was generated when @includes were part of the code, then added as part of this PR

5 changes: 5 additions & 0 deletions apis/r/NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,11 @@

S3method("[[",MappingBase)
S3method("[[<-",MappingBase)
S3method(as.data.frame,SOMASparseNDArrayRead)
S3method(as.data.frame,TableReadIter)
S3method(as.list,MappingBase)
S3method(as_arrow_table,SOMASparseNDArrayRead)
S3method(as_arrow_table,TableReadIter)
S3method(length,MappingBase)
S3method(names,MappingBase)
S3method(pad_matrix,default)
Expand Down Expand Up @@ -78,6 +82,7 @@ importFrom(Matrix,as.matrix)
importFrom(Matrix,sparseMatrix)
importFrom(Rcpp,evalCpp)
importFrom(arrow,RecordBatch)
importFrom(arrow,as_arrow_table)
importFrom(arrow,concat_arrays)
importFrom(bit64,as.integer64)
importFrom(bit64,lim.integer64)
Expand Down
40 changes: 40 additions & 0 deletions apis/r/R/utils-coercions.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
#' Coercion methods for SOMA classes

#' @importFrom arrow as_arrow_table
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we reexport as_arrow_table?

#' @importFrom arrow as_arrow_table
#' @export
#'
arrow::as_arrow_table

If not, these methods will be inaccessible to the end-user without library(arrow) first

#' @export
as_arrow_table.SOMASparseNDArrayRead <- function(x){
x$tables()$concat()
}

#' Coerce \link[tiledbsoma]{SOMASparseNDArrayRead} to \link{data.frame} or \link[tibble]{tibble}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could mention the existence of these methods in the docs for SOMASparseNDArray but I don't think we should add a description here, which results in a separate documentation entry.

#' @export
as.data.frame.SOMASparseNDArrayRead <- function(x, ...){
as.data.frame(x$tables()$concat(), ...)
}

# Coerce \link[tiledbsoma]{SOMASparseNDArrayRead} to Matrix::\link[Matrix]{dgTMatrix}
setAs(from = "SOMASparseNDArrayRead",
to = "TsparseMatrix",
def = function(from) from$sparse_matrix()$concat()
)
Comment on lines +26 to +30
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The R6 classes may need to be declared as an old class with methods::setOldClass()

methods::setOldClass("SOMASparseNDArrayRead")

We may also need to import the target classes from Matrix

#' @importClassesFrom Matrix TsparseMatrix CsparseMatrix RsparseMatrix 
#'
NULL


# Coerce \link[tiledbsoma]{SOMASparseNDArrayRead} to Matrix::\link[Matrix]{dgCMatrix}
setAs(from = "SOMASparseNDArrayRead",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also provide a delayed method for SeuratObject::as.sparse(); this is called in SeuratObject::CreateSeuratObject() and SeuratObject::CreateAssayObject() and would allow passing a sparse array/sparse array read directly to those functions

#' @exportS3Method SeuratObject::as.sparse
#'
as.sparse.SOMASparseNDArrayRead <- function(x, ...) {
  as(x, "CsparseMatrix")
}

to = "CsparseMatrix",
def = function(from) as(as(from, "TsparseMatrix"), "CsparseMatrix")
)

# Coerce \link[tiledbsoma]{SOMASparseNDArrayRead} to Matrix::\link[Matrix]{dgRMatrix}
setAs(from = "SOMASparseNDArrayRead",
to = "RsparseMatrix",
def = function(from) as(as(from, "TsparseMatrix"), "RsparseMatrix")
)

#' @importFrom arrow as_arrow_table
#' @export
as_arrow_table.TableReadIter <- function(x) x$concat()

#' @export
as.data.frame.TableReadIter <- function(x, row.names = NULL, optional = FALSE, ...){
as.data.frame(x$concat(), row.names = row.names, optional = optional, ...)
}
2 changes: 1 addition & 1 deletion apis/r/R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ arrow_to_dt <- function(arrlst) {
}

##' @noRd
as_arrow_table <- function(arrlst) {
to_arrow_table <- function(arrlst) {
Copy link
Contributor

@eddelbuettel eddelbuettel Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could possibly make that simple list of two external pointers a simple S3 class (which I considered for the simple features like pretty printing). That may open the door for a dispatch of as_arrow_table.CLASSHERE as in your coercion utilities. Is that better?

In the short term the renaming is fine but I do like these "verbs" to start with 'as' ...

(We could also decide to make it .as_arrow_table() with a leading dot. It is already a non-documented, non-exported internal helper.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could possibly make that simple list of two external pointers a simple S3 class

Love this idea. Almost did it myself #1461

Copy link
Member

@aaronwolen aaronwolen Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But @pablo-gar, note that I removed the internal as_arrow_table() in #1461 since it was redundant with soma_array_to_arrow_table() and conflicted with arrow::as_arrow_table().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For correctness, let's add "proposing to remove as_arrow_table() in #1461" and maybe we should move a little slower here / not quite do that.

Copy link
Contributor

@eddelbuettel eddelbuettel Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this idea.

Come to think about it, in tiledb-r I changed this and am now returning at arrow Table each time. That is simpler. Maybe we should do that here too. So then the as_arrow_table() would become an R-level internal function, not exported, not visible that wraps around sr_next() and other data gathers (i.e. soma_array_reader() and instead of being handed a list of two (external pointers to Arrow structs) it returns an arrow Table made from them.

check_arrow_pointers(arrlst)
arrow::as_arrow_table(arrow::RecordBatch$import_from_c(arrlst[[1]], arrlst[[2]]))
}
Expand Down
11 changes: 11 additions & 0 deletions apis/r/man/as.data.frame.SOMASparseNDArrayRead.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

11 changes: 11 additions & 0 deletions apis/r/man/as_arrow_table.SOMASparseNDArrayRead.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

16 changes: 8 additions & 8 deletions apis/r/tests/testthat/test-SOMAArrayReader-Arrow.R
Original file line number Diff line number Diff line change
Expand Up @@ -8,21 +8,21 @@ test_that("Arrow Interface from SOMAArrayReader", {
columns <- c("n_counts", "n_genes", "louvain")

z <- soma_array_reader(uri, columns)
tb <- as_arrow_table(z)
tb <- to_arrow_table(z)
expect_true(inherits(tb, "Table"))
rb <- arrow::as_record_batch(tb) #arch::from_arch_array(z, arrow::RecordBatch)
expect_true(inherits(rb, "RecordBatch"))


soma_array_reader(uri, columns) |>
as_arrow_table() |>
to_arrow_table() |>
dplyr::collect() -> D
expect_equal(nrow(D), 2638)

arr <- tiledb_array(uri) # need array for schema access to qc parser
qc <- parse_query_condition(n_counts < 1000 && n_genes >= 400, ta=arr)
soma_array_reader(uri, columns, qc@ptr) |>
as_arrow_table() |>
to_arrow_table() |>
dplyr::collect() -> D

expect_equal(nrow(D), 47)
Expand All @@ -31,7 +31,7 @@ test_that("Arrow Interface from SOMAArrayReader", {


soma_array_reader(uri) |> # read everything
as_arrow_table() |>
to_arrow_table() |>
dplyr::collect() -> D
expect_equal(nrow(D), 2638)
expect_equal(ncol(D), 6)
Expand All @@ -41,7 +41,7 @@ test_that("Arrow Interface from SOMAArrayReader", {
dim_ranges=list(soma_joinid=rbind(bit64::as.integer64(c(1000, 1004)),
bit64::as.integer64(c(2000, 2004)))),
dim_points=list(soma_joinid=bit64::as.integer64(seq(0, 100, by=20)))) |>
as_arrow_table() |>
to_arrow_table() |>
dplyr::collect() -> D
expect_equal(nrow(D), 16)
expect_equal(ncol(D), 4)
Expand All @@ -54,17 +54,17 @@ test_that("Arrow Interface from SOMAArrayReader", {
ndarray$close()

M1 <- soma_array_reader(uri = uri, result_order = "auto") |>
as_arrow_table() |>
to_arrow_table() |>
dplyr::collect()
expect_equal(M, matrix(M1$soma_data, 4, 4, byrow=TRUE))

M2 <- soma_array_reader(uri = uri, result_order = "row-major") |>
as_arrow_table() |>
to_arrow_table() |>
dplyr::collect()
expect_equal(M, matrix(M2$soma_data, 4, 4, byrow=TRUE))

M3 <- soma_array_reader(uri = uri, result_order = "column-major") |>
as_arrow_table() |>
to_arrow_table() |>
dplyr::collect()
expect_equal(M, matrix(M3$soma_data, 4, 4, byrow=FALSE))

Expand Down
10 changes: 9 additions & 1 deletion apis/r/tests/testthat/test-SOMADataFrame.R
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,7 @@ test_that("SOMADataFrame read", {
expect_equal(z$num_columns, 3L)
expect_equal(z$ColumnNames(), columns)
sdf$close()

columns <- c("n_counts", "does_not_exist")
sdf <- SOMADataFrameOpen(uri)
expect_error(sdf$read(column_names=columns))
Expand All @@ -233,6 +233,14 @@ test_that("SOMADataFrame read", {
z <- sdf$read(coords = list(soma_joinid=coords))$concat()
expect_equal(z$num_rows, 10L)
sdf$close()

# coercion from TableReader to arrow Table and data.frame
sdf <- SOMADataFrameOpen(uri)
all.equal(sdf$read()$concat(),
arrow::as_arrow_table(sdf$read()))
all.equal(as.data.frame(sdf$read()$concat()),
as.data.frame(sdf$read()))
sdf$close()
})

test_that("soma_ prefix is reserved", {
Expand Down
103 changes: 66 additions & 37 deletions apis/r/tests/testthat/test-SOMASparseNDArray.R
Original file line number Diff line number Diff line change
Expand Up @@ -73,69 +73,98 @@ test_that("SOMASparseNDArray creation", {

})

test_that("SOMASparseNDArray read_sparse_matrix", {
test_that("SOMASparseNDArray read", {
uri <- withr::local_tempdir("sparse-ndarray")
ndarray <- SOMASparseNDArrayCreate(uri, arrow::int32(), shape = c(10, 10))

# For this test, write 9x9 data into 10x10 array. Leaving the last row & column
# empty touches corner cases with setting dims() correctly
mat <- create_sparse_matrix_with_int_dims(9, 9)
mat <- create_sparse_matrix_with_int_dims(9, 9, repr = "T")
ndarray$write(mat)
expect_equal(as.numeric(ndarray$shape()), c(10, 10))
ndarray$close()

# read_sparse_matrix

ndarray <- SOMASparseNDArrayOpen(uri)
mat2 <- ndarray$read()$sparse_matrix(zero_based = T)$concat()

# read sparse matrix directly
mat2 <- ndarray$read()$sparse_matrix(zero_based = F)$concat()
expect_s4_class(mat2, "sparseMatrix")
expect_s4_class(mat2, "dgTMatrix")
expect_equal(dim(mat2), c(10, 10))
expect_equal(nrow(mat2), 10)
expect_equal(ncol(mat2), 10)
all.equal(mat, mat2)
all.equal(mat[1:9, 1:9], mat2[1:9, 1:9])

# test coerced matrix: dgTMatrix
mat2 <- ndarray$read()$sparse_matrix(zero_based = F)$concat()
mat_coerced <- as(ndarray$read(), "TsparseMatrix")
expect_s4_class(mat_coerced, "dgTMatrix")
all.equal(mat2, mat_coerced)

# test coerced matrix: dgCMatrix
mat2 <- ndarray$read()$sparse_matrix(zero_based = F)$concat()
mat_coerced <- as(ndarray$read(), "CsparseMatrix")
expect_s4_class(mat_coerced, "dgCMatrix")
all.equal(as(mat2, "CsparseMatrix"), mat_coerced)

# test coerced matrix: dgTMatrix
mat2 <- ndarray$read()$sparse_matrix(zero_based = F)$concat()
mat_coerced <- as(ndarray$read(), "RsparseMatrix")
expect_s4_class(mat_coerced, "dgRMatrix")
all.equal(as(mat2, "RsparseMatrix"), mat_coerced)


# test zero-based Matrix view
mat2 <- ndarray$read()$sparse_matrix(zero_based=T)$concat()
expect_true(inherits(mat2, "matrixZeroBasedView"))
expect_s4_class(mat2$get_one_based_matrix(), "sparseMatrix")
expect_s4_class(mat2$get_one_based_matrix(), "dgTMatrix")
expect_equal(mat2$dim(), c(10, 10))
expect_equal(mat2$nrow(), 10)
expect_equal(mat2$ncol(), 10)
## not sure why all.equal(mat, mat2) does not pass
expect_true(all.equal(as.numeric(mat[1:9, 1:9]), as.numeric(mat2$take(0:8, 0:8)$get_one_based_matrix())))
expect_equal(sum(mat), sum(mat2$get_one_based_matrix()))

ndarray <- SOMASparseNDArrayOpen(uri)

ndarray$close()
})

test_that("SOMASparseNDArray read_sparse_matrix_zero_based", {
uri <- withr::local_tempdir("sparse-ndarray")
ndarray <- SOMASparseNDArrayCreate(uri, arrow::int32(), shape = c(10, 10))

# For this test, write 9x9 data into 10x10 array. Leaving the last row & column
# empty touches corner cases with setting dims() correctly
mat <- create_sparse_matrix_with_int_dims(9, 9)
ndarray$write(mat)
expect_equal(as.numeric(ndarray$shape()), c(10, 10))
ndarray$close()

# read_sparse_matrix
ndarray <- SOMASparseNDArrayOpen(uri)
mat2 <- ndarray$read()$sparse_matrix(zero_based=T)$concat()
all.equal(mat, mat2$get_one_based_matrix)
all.equal(mat, mat2$take(0:8,0:8)$get_one_based_matrix())

# test zero-based view and convert to sparse matrix
mat2 <- ndarray$read()$sparse_matrix(zero_based = T)$concat()
expect_true(inherits(mat2, "matrixZeroBasedView"))
expect_s4_class(mat2$get_one_based_matrix(), "sparseMatrix")
expect_equal(mat2$dim(), c(10, 10))
expect_equal(mat2$nrow(), 10)
expect_equal(mat2$ncol(), 10)
## not sure why all.equal(mat, mat2) does not pass
expect_true(all.equal(as.numeric(mat), as.numeric(mat2$take(0:8,0:8)$get_one_based_matrix())))
expect_equal(sum(mat), sum(mat2$get_one_based_matrix()))

ndarray <- SOMASparseNDArrayOpen(uri)
all.equal(mat, mat2)
all.equal(mat[1:9, 1:9], mat2$take(0:8, 0:8)$get_one_based_matrix())

# repeat with iterated reader
# test with iterated reader
iterator <- ndarray$read()$sparse_matrix(zero_based = T)
mat2 <- iterator$read_next()
expect_true(inherits(mat2, "matrixZeroBasedView"))
expect_s4_class(mat2$get_one_based_matrix(), "sparseMatrix")
expect_equal(mat2$dim(), c(10, 10))
expect_equal(mat2$nrow(), 10)
expect_equal(mat2$ncol(), 10)
expect_true(all.equal(as.numeric(mat), as.numeric(mat2$take(0:8,0:8)$get_one_based_matrix())))
expect_equal(sum(mat), sum(mat2$get_one_based_matrix()))
all.equal(mat, mat2$get_one_based_matrix)
all.equal(mat, mat2$take(0:8,0:8)$get_one_based_matrix())

# test arrow table reader
df <- data.frame(soma_dim_0 = mat@i, soma_dim_1 = mat@j, soma_data = mat@x)
tbl <- arrow::arrow_table(df[order(df$soma_dim_0, df$soma_dim_1),])
tbl2 <- ndarray$read(result_order = "ROW_MAJOR")$tables()$concat()
all.equal(tbl, tbl2)

# test arrow table coercion
df <- data.frame(soma_dim_0 = mat@i, soma_dim_1 = mat@j, soma_data = mat@x)
tbl <- arrow::arrow_table(df[order(df$soma_dim_0, df$soma_dim_1),])
tbl2 <- arrow::as_arrow_table(ndarray$read(result_order = "ROW_MAJOR"))
all.equal(tbl, tbl2)

# test data.frame coercion
df <- data.frame(soma_dim_0 = mat@i, soma_dim_1 = mat@j, soma_data = mat@x)
df <- df[order(df$soma_dim_0, df$soma_dim_1),]
df2 <- as.data.frame(ndarray$read(result_order = "ROW_MAJOR"))
expect_true(inherits(df2, "data.frame"))
all.equal(df, df2)

ndarray$close()
})

Expand Down
Loading