Releases · SebKrantz/collapse

14 Apr 20:51

SebKrantz

v2.1.1

79803d7

collapse version 2.1.1 Latest

Latest

alloc(list(1), 2) now gives list(1, 1) instead of list(list(1), list(1)), which can still be generated with alloc(list(1), 2, simplify = FALSE). This change also affects ftransform()/fmutate(), making, e.g., fmutate(data, y = list(1)) consistent with dplyr::mutate(data, y = list(1)). Thanks @MattAFiedler (#753).
fslice() now works with sf data frames.

Contributors

MattAFiedler

Assets 2

10 Mar 04:05

SebKrantz

v2.1.0

3e84d8e

collapse version 2.1.0

collapse 2.1.0, released in March 2025, introduces a fast slicing function, an improved weighted quantile algorithm, a few convenience features, and removes some legacy functions from the package.

Potentially breaking changes

Functions pwNobs, as.factor_GRP, as.factor_qG, is.GRP, is.qG, is.unlistable, is.categorical, is.Date, as.numeric_factor, as.character_factor, and Date_vars, which were renamed in v1.6.0 by either replacing '.' with '_' or using all lower-case letters, and depreciated since then, are now finally removed from the package.
num_vars() (and thus also cat_vars() and collap()) were changed to a simpler C-definition of numeric data types which is more in-line with is.numeric(): is_numeric_C <- function(x) typeof(x) %in% c("integer", "double") && !inherits(x, c("factor", "Date", "POSIXct", "yearmon", "yearqtr")). The previous definition was: is_numeric_C_old <- function(x) typeof(x) %in% c("integer", "double") && (!is.object(x) || inherits(x, c("ts", "units", "integer64"))). Thus, the definition changed from including only certain classes to excluding the most important classes. Thanks @maouw for flagging this (#727).

Bug Fixes

Fixed some issues using collapse and the tidyverse together, particularly regarding tidyverse methods for 'grouped_df' - thanks @NicChr (#645).
More consistent handling of zero-length inputs - they are now also returned in fmean() and fmedian()/fnth() instead of returning NA (#628).

Additions

Added function fslice(): a fast alternative to dplyr::slice_[head|tail|min|max] that also works with matrices. Thanks @alinacherkas for the proposal and initial implementation (#725).
Added function groupv() as programmers version of group(), or rather, groupv() is now identical to the former group(), and group() now supports multiple vectors as input e.g. group(v1, v2). This is done for convenience and consistency with radixorder[v](). For backwards compatibility, group() also supports a single list as input.
join() has a new argument require allowing the user to generate messages or errors if the join operation is not successful enough:

join(df1, df2, require = list(x = 0.8, fail = "warning"))
#> Warning: Matched 75.0% of records in table df1 (x), but 80.0% is required
#> left join: df1[id1, id2] 3/4 (75%) <1:1st> df2[id1, id2] 3/4 (75%)
#>   id1 id2 name age salary      dept
#> 1   1   a John  35  60000        IT
#> 2   1   b Jane  28     NA      <NA>
#> 3   2   b  Bob  42  55000 Marketing
#> 4   3   c Carl  50  70000     Sales

psmat() now has a fill argument to fill empty slots in matrix/array with other elements (default NULL/NA).

Improvements

The weighted quantile algorithm in fquantile()/fnth() was improved to a more theoretically sound method following excellent notes by Matthew Kay. It now also supports quantile type 4, but it does not skip zero weights anymore, as the new algorithm makes it difficult to skip them 'on the fly'. Note that the existing collapse algorithm already had very good properties after a bug fix in v2.0.17, but the new algorithm is more exact and also faster.
The collapse arXiv article has been updated and significantly enhanced. It is an excellent resource to get an overview of the package.

Notes

On CRAN, collapse R dependency was changed to >= 4.1.0 to be able to use the base pipe in examples without generating a NOTE on R CMD check (another absolutely unnecessary restriction). The package depends on R >= 3.5.0 and the DESCRIPTION file on GitHub/R-universe will continue to reflect this.

Contributors

alinacherkas, maouw, and NicChr

Assets 2

09 Jan 16:53

SebKrantz

v2.0.19

ee6f69f

collapse version 2.0.19

fmatch(factor(NA), NA) now gives 1 instead of NA. Thanks @NicChr (#675).
New developer focused vignette on developing with collapse.
Fixed minor CRAN issues (#676, #702).
Fixed bug with integer64 types in rowbind(). Thanks @arthurgailes for reporting and @aitap for providing a fix (#697).
collapse now also has a Bluesky account at https://bsky.app/profile/rcollapse.bsky.social.

Contributors

aitap, arthurgailes, and NicChr

Assets 2

23 Nov 12:03

SebKrantz

v2.0.18

4c0501f

collapse version 2.0.18

Cases in pivot(..., how = "longer") with no values columns now no longer give an error. Thanks @alvarocombo for flagging this (#663).
Fixed bug in qF(c(4L, 1L, NA), sort = FALSE): hash function failure due to a coding bug. Thanks @mayer79 for flagging this (#666).
If x is already a qG object of the right properties, calling qG(x) now does not copy x anymore. Thanks @mayer79 (mayer79/effectplots#11).

Contributors

alvarocombo and mayer79

Assets 2

02 Nov 21:24

SebKrantz

v2.0.17

6f2515d

collapse version 2.0.17

In GRP.default(), the "group.starts" attribute is always returned, even if there is only one group or every observation is its own group. Thanks @JamesThompsonC (#631).
Fixed a bug in pivot() if na.rm = TRUE and how = "wider"|"recast" and there are multiple value columns with different missingness patterns. In this case na_omit(values) was applied with default settings to the original (long) value columns, implying potential loss of information. The fix applies na_omit(values, prop = 1), i.e., only removes completely missing rows.
qDF()/qDT()/qTBL() now allow a length-2 vector of names to row.names.col if X is a named atomic vector, e.g., qDF(fmean(mtcars), c("cars", "mean")) gives the same as pivot(fmean(mtcars, drop = FALSE), names = list("car", "mean")).
Added a subsection on using internal (ad-hoc) grouping to the collapse for tidyverse users vignette.
qsu() now adds a WeightSum column giving the sum of (non-zero or missing) weights if the w argument is used. Thanks @mayer79 for suggesting (#650). For panel data (pid) the 'Between' sum of weights is also simply the number of groups, and the 'Within' sum of weights is the 'Overall' sum of weights divided by the number of groups.
Fixed an inaccuracy in fquantile()/fnth() with weights: As per documentation the target sum is sumwp = (sum(w) - min(w)) * p, however, in practice, the weight of the minimum element of x was used instead of the minimum weight. Since the smallest element in the sample usually has a small weight this was unnoticed for a long while, but thanks to @Jahnic-kb now reported and fixed (#659).
Fixed a bug in recode_char() when regex = TRUE and the default argument was used. Thanks @alinacherkas for both reporing and fixing (#654).

Contributors

JamesThompsonC, mayer79, and 2 other contributors

Assets 2

20 Aug 10:19

SebKrantz

v2.0.16

f29f7fd

collapse version 2.0.16

Fixes an installation bug on some Linux systems (conflicting types) (#613).
collapse now enforces string encoding in fmatch() / join(), which caused problems if strings being matched had different encodings (#566, #579, and #618). To avoid noticeable performance implications, checks are done heuristically, i.e., the first, middle and last string of a character vector are checked, and if not UTF8, the entire vector is coerced to UTF8 strings before the matching process. In general, character vectors in R can contain strings of different encodings, but this is not the case with most regular data. For performance reasons, collapse assumes that character vectors are uniform in terms of string encoding.
Fixes a bug using qualified names for fast statistical functions inside across() (#621, thanks @alinacherkas).
collapse now depends on R >= 3.4.0 due to the enforcement of STRICT_R_HEADERS = 1 from R v4.5.0. In particular R API functions were renamed Calloc -> R_Calloc and Free -> R_Free.

Contributors

alinacherkas

Assets 2

08 Jul 13:15

SebKrantz

v2.0.15

1b852bb

collapse version 2.0.15

Some changes on the C-side to move the package closer to C API compliance (demanded by R-Core). One notable change is that gsplit() no longer supports S4 objects (because SET_S4_OBJECT is not part of the API and asS4() is too expensive for tight loops). I cannot think of a single example where it would be necessary to split an S4 object, but if you do have applications please file an issue.
pivot() has new arguments FUN = "last" and FUN.args = NULL, allowing wide and recast pivots with aggregation (default last value as before). FUN currently supports a single function returning a scalar value. Fast Statistical Functions receive vectorized execution. FUN.args can be used to supply a list of function arguments, including data-length arguments such as weights. There are also a couple of internal functions callable using function strings: "first", "last", "count", "sum", "mean", "min", or "max". These are built into the reshaping C-code and thus extremely fast. Thanks @AdrianAntico for the request (#582).

join() now provides enhanced verbosity, indicating the average order of the join between the two tables, e.g.

join(data.frame(id = c(1, 2, 2, 4)), data.frame(id = c(rep(1,4), 2:3)))
#> left join: x[id] 3/4 (75%) <1.5:1st> y[id] 2/6 (33.3%)
#>   id
#> 1  1
#> 2  2
#> 3  2
#> 4  4
join(data.frame(id = c(1, 2, 2, 4)), data.frame(id = c(rep(1,4), 2:3)), multiple = TRUE)
#> left join: x[id] 3/4 (75%) <1.5:2.5> y[id] 5/6 (83.3%)
#>   id
#> 1  1
#> 2  1
#> 3  1
#> 4  1
#> 5  2
#> 6  2
#> 7  4

In collap(), with multiple functions passed to FUN or catFUN and return = "long", the "Function" column is now generated as a factor variable instead of character (which is more efficient).

Contributors

AdrianAntico

Assets 2

20 May 15:14

SebKrantz

v2.0.14

71141c2

collapse version 2.0.14

Updated 'collapse and sf' vignette to reflect the recent support for units objects, and added a few more examples.
Fixed a bug in join() where a full join silently became a left join if there are no matches between the tables (#574). Thanks @D3SL for reporting.
Added function group_by_vars(): A standard evaluation version of fgroup_by() that is slimmer and safer for programming, e.g. data |> group_by_vars(ind1) |> collapg(custom = list(fmean = ind2, fsum = ind3)). Or, using magrittr:

library(magrittr)
set_collapse(mask = "manip") # for fgroup_vars -> group_vars

data %>% 
  group_by_vars(ind1) %>% {
  add_vars(
    group_vars(., "unique"),
    get_vars(., ind2) %>% fmean(keep.g = FALSE) %>% add_stub("mean_"),
    get_vars(., ind3) %>% fsum(keep.g = FALSE) %>% add_stub("sum_")
  ) 
}

Added function as_integer_factor() to turn factors/factor columns into integer vectors. as_numeric_factor() already exists, but is memory inefficient for most factors where levels can be integers.
join() now internally checks if the rows of the joined datasets match exactly. This check, using identical(m, seq_row(y)), is inexpensive, but, if TRUE, saves a full subset and deep copy of y. Thus join() now inherits the intelligence already present in functions like fsubset(), roworder() and funique() - a key for efficient data manipulation is simply doing less.
In join(), if attr = TRUE, the count option to fmatch() is always invoked, so that the attribute attached always has the same form, regardless of verbose or validate settings.
roworder[v]() has optional setting verbose = 2L to indicate if x is already sorted, making the call to roworder[v]() redundant.

Contributors

D3SL

Assets 2

13 Apr 21:10

SebKrantz

v2.0.13

0e93792

collapse version 2.0.13

collapse now explicitly supports xts/zoo and units objects and concurrently removes an additional check in the .default method of statistical functions that called the matrix method if is.matrix(x) && !inherits(x, "matrix"). This was a smart solution to account for the fact that xts objects are matrix-based but don't inherit the "matrix" class, thus wrongly calling the default method. The same is the case for units, but here, my recent more intensive engagement with spatial data convinced me that this should be changed. For one, under the previous heuristic solution, it was not possible to call the default method on a units matrix, e.g., fmean.default(st_distance(points_sf)) called fmean.matrix() and yielded a vector. This should not be the case. Secondly, aggregation e.g. fmean(st_distance(points_sf)) or fmean(st_distance(points_sf), g = group_vec) yielded a plain numeric object that lost the units class (in line with the general attribute handling principles). Therefore, I have now decided to remove the heuristic check within the default methods, and explicitly support zoo and units objects. For Fast Statistical Functions, the methods are FUN.zoo <- function(x, ...) if(is.matrix(x)) FUN.matrix(x, ...) else FUN.default(x, ...) and FUN.units <- function(x, ...) if(is.matrix(x)) copyMostAttrib(FUN.matrix(x, ...), x) else FUN.default(x, ...). While the behavior for xts/zoo remains the same, the behavior for units is enhanced, as now the class is preserved in aggregations (the .default method preserves attributes except for ts), and it is possible to manually invoke the .default method on a units matrix and obtain an aggregate statistic. This change may impact computations on other matrix based classes which don't inherit from "matrix" (mts does inherit from "matrix", and I am not aware of any other affected classes, but user code like m <- matrix(rnorm(25), 5); class(m) <- "bla"; fmean(m) will now yield a scalar instead of a vector. Such code must be adjusted to either class(m) <- c("bla", "matrix") or fmean.matrix(m)). Overall, the change makes collapse behave in a more standard and predictable way, and enhances its support for units objects central in the sf ecosystem.
fquantile() now also preserves the attributes of the input, in line with quantile().

Assets 2

01 Apr 10:58

SebKrantz

v2.0.12

9a08762

collapse version 2.0.12

Fixes some issues with signed int overflows inside hash functions and possible protect bugs flagged by RCHK. With few exceptions these fixes are cosmetic to appease the C/C++ code checks on CRAN.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributors

Potentially breaking changes

Bug Fixes

Additions

Improvements

Notes

Contributors

Contributors

Contributors

Contributors

Contributors

Contributors

Contributors

Releases: SebKrantz/collapse

collapse version 2.1.1

Contributors

collapse version 2.1.0

Potentially breaking changes

Bug Fixes

Additions

Improvements

Notes

Contributors

collapse version 2.0.19

Contributors

collapse version 2.0.18

Contributors

collapse version 2.0.17

Contributors

collapse version 2.0.16

Contributors

collapse version 2.0.15

Contributors

collapse version 2.0.14

Contributors

collapse version 2.0.13

collapse version 2.0.12