Skip to content

Releases: easystats/datawizard

datawizard 1.2.0

17 Jul 12:22
Compare
Choose a tag to compare

BREAKING CHANGES

  • The following deprecated arguments have been removed (#603):
    • drop_na in data_match()
    • safe, pattern, and verbose in data_rename()

CHANGES

  • data_read() and data_write() now support the .parquet file format, via
    the nanoparquet package (#625).

  • data_tabulate() gets a display() method (#627).

  • data_tabulate() gets an as.table() method to coerce the frequency or
    contingency table into a (list of) table() object(s). This can be useful for
    further statistical analysis, e.g. in combination with chisq.test() (#629).

  • The print() method for data_tabulate() now appears in the documentation,
    making the big_mark argument visible (#627).

BUG FIXES

  • Fixed an issue when printing cross tables using data_tabulate(by = ...),
    which was caused by the recent changes in insight::export_table().

  • Fixed another issue when printing cross tables using data_tabulate(by = ...),
    when more than one variable was selected for select (#630).

  • Fixed typo in the documentation of data_match().

datawizard 1.1.0

10 May 07:17
a81733b
Compare
Choose a tag to compare

BREAKING CHANGES

  • data_read() now also returns Bayesian models from packages brms and
    rstanarm as original model objects, and no longer coerces them into data
    frames (#606).

  • The output format of describe_distribution() on grouped data has changed.
    Before, it printed one table per group combination. Now, it prints a single
    table with group columns at the start (#610).

  • The output format of describe_distribution() when confidence intervals are
    requested has changed. Now, for each centrality measure a confidence interval
    is calculated (#617).

  • data_modify() now always uses values of a vector for a modified or newly
    created variable, and no longer tries to detect whether a character value
    possibly contains an expression. To allow expression provided as string (or
    character vectors), use the helper-function as_expr(). Only literal
    expressions or strings wrapped in as_expr() will be evaluated as
    expressions, everything else will be treated as vector with values for new
    variables (#605).

CHANGES

  • display() is now re-exported from package insight.

  • data_read() and data_write() now rely on base-R functions for files of
    type .rds, .rda or .rdata. Thus, package rio is no longer required
    to be installed for these file types (#607).

  • data_codebook() gives an informative warning when no column names matched
    the selection pattern (#601).

  • data_to_long() now errors when columns selected to reshape do not exist in
    the data, to avoid nonsensical results that could be missed (#602).

  • New argument by in describe_distribution() (#604).

  • describe_distribution() now gives informative errors when column names
    in the input data frame conflict with column from the output table (#612).

  • The methods for parameters_distribution objects are now defined in
    datawizard (they were previously in parameters) (#613).

BUG FIXES

  • Fixed bug in data_to_wide(), where new column names in names_from were
    ignored when that column only contained one unique value.

  • Fixed bug in describe_distribution() when some group combinations
    didn't appear in the data (#609).

  • Fixed bug in describe_distribution() when more than one value for the
    centrality argument were specified (#617).

  • Fixed bug in describe_distribution() where setting verbose = FALSE
    didn't hide some warnings (#617).

  • Fixed warning in data_summary() when a variable had the same name as
    another object in the global environment (#585).

datawizard 1.0.2

25 Mar 10:37
Compare
Choose a tag to compare

BUG FIXES

  • Fixed failing R CMD check on ATLAS, noLD, and OpenBLAS due to small numerical
    differences (#592).

datawizard 1.0.1

07 Mar 10:19
5b7f717
Compare
Choose a tag to compare

BUG FIXES

  • Fixed issue in data_arrange() for data frames that only had one column.
    Formerly, the data frame was coerced into a vector, now the data frame class
    is preserved.

  • Fixed issue in R-devel (4.5.0) due to a change in how grep() handles logical
    arguments with missing values (#588).

datawizard 1.0.0

10 Jan 10:05
Compare
Choose a tag to compare

BREAKING CHANGES AND DEPRECATIONS

  • datawizard now requires R >= 4.0 (#515).

  • Argument drop_na in data_match() is deprecated now. Please use
    remove_na instead (#556).

  • In data_rename() (#567):

    • argument pattern is deprecated. Use select instead.
    • argument safe is deprecated. The function now errors when select
      contains unknown column names.
    • when replacement is NULL, an error is now thrown (previously, column
      indices were used as new names).
    • if select (previously pattern) is a named vector, then all elements
      must be named, e.g. c(length = "Sepal.Length", "Sepal.Width") errors.
  • Order of arguments by and probability_weights in rescale_weights() has
    changed, because for method = "kish", the by argument is optional (#575).

  • The name of the rescaled weights variables in rescale_weights() have been
    renamed. pweights_a and pweights_b are now named rescaled_weights_a
    and rescaled_weights_b (#575).

  • print() methods for data_tabulate() with multiple sub-tables (i.e. when
    length of by was > 1) were revised. Now, an integrated table instead of
    multiple tables is returned. Furthermore, print_html() did not work, which
    was also fixed now (#577).

  • demean() (and degroup()) gets an append argument that defaults to TRUE,
    to append the centered variables to the original data frame, instead of
    returning the de- and group-meaned variables only. Use append = FALSE to
    for the previous default behaviour (i.e. only returning the newly created
    variables) (#579).

CHANGES

  • rescale_weights() gets a method argument, to choose method to rescale
    weights. Options are "carle" (the default) and "kish" (#575).

  • The select argument, which is available in different functions to select
    variables, can now also be a character vector with quoted variable names,
    including a colon to indicate a range of several variables (e.g. "cyl:gear")
    (#551).

  • New function row_sums(), to calculate row sums (optionally with minimum
    amount of valid values), as complement to row_means() (#552).

  • New function row_count(), to count specific values row-wise (#553).

  • data_read() no longer shows warning about forthcoming breaking changes
    in upstream packages when reading .RData files (#557).

  • data_modify() now recognizes n(), for example to create an index for data
    groups with 1:n() (#535).

  • The replacement argument in data_rename() now supports glue-styled
    tokens (#563).

  • data_summary() also accepts the results of bayestestR::ci() as summary
    function (#483).

  • ranktransform() has a new argument zeros to determine how zeros should be
    handled when sign = TRUE (#573).

BUG FIXES

  • describe_distribution() no longer errors if the sample was too sparse to compute
    CIs. Instead, it warns the user and returns NA (#550).

  • data_read() preserves variable types when importing files from rds or
    rdata format (#558).

datawizard 0.13.0

06 Oct 10:46
Compare
Choose a tag to compare

BREAKING CHANGES

  • data_rename() now errors when the replacement argument contains NA values
    or empty strings (#539).

  • Removed deprecated functions get_columns(), data_find(), format_text() (#546).

  • Removed deprecated arguments group and na.rm in multiple functions. Use by and remove_na instead (#546).

  • The default value for the argument dummy_factors in to_numeric() has
    changed from TRUE to FALSE (#544).

CHANGES

  • The pattern argument in data_rename() can also be a named vector. In this
    case, names are used as values for the replacement argument (i.e. pattern
    can be a character vector using <new name> = "<old name>").

  • categorize() gains a new breaks argument, to decide whether breaks are
    inclusive or exclusive (#548).

  • The labels argument in categorize() gets two new options, "range" and
    "observed", to use the range of categorized values as labels (i.e. factor
    levels) (#548).

  • Minor additions to reshape_ci() to work with forthcoming changes in the
    {bayestestR} package.

datawizard 0.12.3

02 Sep 12:25
Compare
Choose a tag to compare

CHANGES

  • demean() (and degroup()) now also work for nested designs, if argument
    nested = TRUE and by specifies more than one variable (#533).

  • Vignettes are no longer provided in the package, they are now only available
    on the website. There is only one "Overview" vignette available in the package,
    it contains links to the other vignettes on the website. This is because there
    are CRAN errors occurring when building vignettes on macOS and we couldn't
    determine the cause after multiple patch releases (#534).

datawizard 0.12.2

21 Jul 07:50
389738d
Compare
Choose a tag to compare
  • Remove htmltools from Suggests in an attempt of fixing an error in CRAN
    checks due to failures to build a vignette (#528).

datawizard 0.12.0

11 Jul 12:30
Compare
Choose a tag to compare

BREAKING CHANGES

  • The argument include_na in data_tabulate() and data_summary() has been
    renamed into remove_na. Consequently, to mimic former behaviour, FALSE and
    TRUE need to be switched (i.e. remove_na = TRUE is equivalent to the former
    include_na = FALSE).

  • Class names for objects returned by data_tabulate() have been changed to
    datawizard_table and datawizard_crosstable (resp. the plural forms,
    *_tables), to provide a clearer and more consistent naming scheme.

CHANGES

  • data_select() can directly rename selected variables when a named vector
    is provided in select, e.g. data_select(mtcars, c(new1 = "mpg", new2 = "cyl")).

  • data_tabulate() gains an as.data.frame() method, to return the frequency
    table as a data frame. The structure of the returned object is a nested data
    frame, where the first column contains name of the variable for which
    frequencies were calculated, and the second column contains the frequency table.

  • demean() (and degroup()) now also work for cross-classified designs, or
    more generally, for data with multiple grouping or cluster variables (i.e.
    by can now specify more than one variable).

datawizard 0.11.0

05 Jun 19:41
Compare
Choose a tag to compare

BREAKING CHANGES

  • Arguments named group or group_by are deprecated and will be removed
    in a future release. Please use by instead. This affects the following
    functions in datawizard (#502).

    • data_partition()
    • demean() and degroup()
    • means_by_group()
    • rescale_weights()
  • Following aliases are deprecated and will be removed in a future release (#504):

    • get_columns(), use data_select() instead.
    • data_find() and find_columns(), use extract_column_names() instead.
    • format_text(), use text_format() instead.

CHANGES

  • recode_into() is more relaxed regarding checking the type of NA values.
    If you recode into a numeric variable, and one of the recode values is NA,
    you no longer need to use NA_real_ for numeric NA values.

  • Improved documentation for some functions.

BUG FIXES

  • data_to_long() did not work for data frame where columns had attributes
    (like labelled data).