Skip to content

Commit

Permalink
Merge pull request #131 from weecology/juniper_active
Browse files Browse the repository at this point in the history
LDATS 0.2.0
  • Loading branch information
juniperlsimonis authored Jul 10, 2019
2 parents 8fe7099 + 8d079b6 commit bf918db
Show file tree
Hide file tree
Showing 99 changed files with 2,483 additions and 1,233 deletions.
1 change: 1 addition & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@
^\.Rproj\.user$
^CONTRIBUTING\.md$
^CODE_OF_CONDUCT\.md$
^_pkgdown\.yml$
3 changes: 2 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: LDATS
Title: Latent Dirichlet Allocation coupled with Time Series analyses
Version: 0.1.0
Version: 0.2.0
Authors@R: c(
person(c("Juniper", "L."), "Simonis",
email = "[email protected]", role = c("aut", "cre"),
Expand Down Expand Up @@ -30,6 +30,7 @@ Imports:
coda,
digest,
dplyr,
extraDistr,
graphics,
grDevices,
here,
Expand Down
21 changes: 16 additions & 5 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Generated by roxygen2: do not edit by hand

S3method(AIC,TS_fit)
S3method(logLik,LDA_VEM)
S3method(logLik,TS_fit)
S3method(logLik,multinom_TS_fit)
S3method(plot,LDA_TS)
S3method(plot,LDA_VEM)
Expand All @@ -10,15 +10,16 @@ S3method(plot,TS_fit)
S3method(print,LDA_TS)
S3method(print,TS_fit)
S3method(print,TS_on_LDA)
export(AICc)
export(LDA_TS)
export(LDA_TS_controls_list)
export(LDA_controls_list)
export(LDA_TS_control)
export(LDA_msg)
export(LDA_plot_bottom_panel)
export(LDA_plot_top_panel)
export(LDA_set)
export(LDA_set_control)
export(TS)
export(TS_controls_list)
export(TS_control)
export(TS_diagnostics_plot)
export(TS_on_LDA)
export(TS_summary_plot)
Expand All @@ -40,6 +41,7 @@ export(check_seeds)
export(check_timename)
export(check_topics)
export(check_weights)
export(conform_LDA_TS_data)
export(count_trips)
export(diagnose_ptMCMC)
export(document_weights)
Expand All @@ -49,6 +51,8 @@ export(est_regressors)
export(eta_diagnostics_plots)
export(eval_step)
export(expand_TS)
export(iftrue)
export(logsumexp)
export(measure_eta_vcov)
export(measure_rho_vcov)
export(memoise_fun)
Expand All @@ -59,6 +63,7 @@ export(multinom_TS_chunk)
export(normalize)
export(package_LDA_TS)
export(package_LDA_set)
export(package_TS)
export(package_TS_on_LDA)
export(package_chunk_fits)
export(posterior_plot)
Expand Down Expand Up @@ -88,8 +93,11 @@ export(set_LDA_plot_colors)
export(set_TS_summary_plot_cols)
export(set_gamma_colors)
export(set_rho_hist_colors)
export(sim_LDA_TS_data)
export(sim_LDA_data)
export(sim_TS_data)
export(softmax)
export(step_chains)
export(summarize_TS)
export(summarize_etas)
export(summarize_rhos)
export(swap_chains)
Expand All @@ -106,6 +114,8 @@ importFrom(coda,autocorr)
importFrom(coda,autocorr.diag)
importFrom(coda,effectiveSize)
importFrom(digest,digest)
importFrom(extraDistr,rcat)
importFrom(extraDistr,rdirichlet)
importFrom(grDevices,devAskNewPage)
importFrom(grDevices,rgb)
importFrom(graphics,abline)
Expand All @@ -132,6 +142,7 @@ importFrom(stats,ecdf)
importFrom(stats,logLik)
importFrom(stats,median)
importFrom(stats,rgeom)
importFrom(stats,rnorm)
importFrom(stats,runif)
importFrom(stats,sd)
importFrom(stats,terms)
Expand Down
40 changes: 40 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,45 @@
# LDATS (development version)

Version numbers follow [Semantic Versioning](https://semver.org/).

# [LDATS 0.2.0](https://github.com/weecology/ldats/releases/tag/v0.2.0)
*2019-07-09*

## API updates
* At the `LDA_TS` function level, the separate inputs for data tables (`document_term_table` and `document_covariate_table`) have been merged into a single input `data`, which can be just the `document_term_table` or a list including the `document_term_table` and optionally also a `document_covariate_table`. If covariates aren't provided, the function now constructs a covariate table assuming equi-spaced observations. If using a list, the function assumes that one and only one element of the list will have a name containing the letters "term", and at most one element containing the letters "covariate" (regular expressions are used for matching). ([addresses issue 119](https://github.com/weecology/LDATS/issues/119))
* `timename` has been moved from within the `TS_controls_list` to a main argument in all associated functions.
* The control lists have been made easier to interact with. Primarily, the arguments that previously required `LDA_controls_list`, `TS_controls_list`, or `LDA_TS_controls_list` inputs now take general `list` inputs (so `LDA_TS` does not need to have a nested set of control functions). Each control list is passed through a function (`LDA_set_control`, `TS_control`, or `LDA_TS_control`) to set any non-input values to their defaults. This also allows the removal of those controls list class definitions. ([addresses issue 130](https://github.com/weecology/LDATS/issues/130))

## Fixed and updated example code to improve user experience
* Reduced the complexity of the example in the README ([addresses issue 115](https://github.com/weecology/LDATS/issues/115))
* Added `control` input in the `plot` call in the example in the README ([addresses issue 116](https://github.com/weecology/LDATS/issues/116))
* Reduced the number of seeds in the rodent vignette example ([addresses issue 117](https://github.com/weecology/LDATS/issues/117))

## Updated calculation of the number of observations in LDA
* The number of observations for a VEM-fit LDA is now calculated as the number of entries in the document-term matrix (following Hoffman et al. and Buntine, see `?logLik.LDA_VEM` for references.
* Associated, we now include an AICc function that is general and works in this specific case as defined ([addresses issue 129](https://github.com/weecology/LDATS/issues/129))

## Fixed bug in plotting across multiple outputs
* A few plotting functions use `devAskNewPage` to help flip through multiple outputs, but were only resetting it with `devAskNewPage(FALSE)` at the end of a clean execution. The code has been updated with `on.exit(devAskNewPage(FALSE))`, which accounts for failed executions. ([addresses issue 118](https://github.com/weecology/LDATS/issues/118))

## Renamed functions
* `summarize_TS` has been renamed `package_TS` to align with the other `package_` functions that package model output.

## Simulate functions
* Basic simulation functionality has been added for help with generating data sets to analyze. ([addresses issue 114](https://github.com/weecology/LDATS/issues/114))
* `sim_LDA_data` simulates an LDA model's document-term-matrix
* `sim_TS_data` simulates an TS model's document-topic distribution matrix
* `sim_LDA_TS_data` simulates an LDA_TS model's document-term-matrix
* `softmax` and `logsumexp` are added as utility functions

## Improved pkgdown site
* Function organization ([addresses issue 122](https://github.com/weecology/LDATS/issues/122)) and navbar formatting.

## Editing of output from `TS`
* Due to a misread of earlier code, the AIC value in the output from `TS` was named "deviance". The output has been updated to return the AIC.

## Replacement of `AIC` method with `logLik` method for `TS_fit`

# [LDATS 0.1.0](https://github.com/weecology/LDATS/pull/105)
*2019-02-11*

Expand Down
91 changes: 53 additions & 38 deletions R/LDA.R
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,11 @@
#' @param nseeds Number of seeds (replicate starts) to use for each
#' value of \code{topics}. Must be conformable to \code{integer} value.
#'
#' @param control Class \code{LDA_controls} list of control parameters to be
#' used in \code{LDA} (note that \code{seed} will be overwritten).
#' @param control A \code{list} of parameters to control the running and
#' selecting of LDA models. Values not input assume default values set
#' by \code{\link{LDA_set_control}}. Values for running the LDAs replace
#' defaults in (\code{LDAcontol}, see \code{\link[topicmodels]{LDA}} (but if
#' \code{seed} is given, it will be overwritten; use \code{iseed} instead).
#'
#' @return List (class: \code{LDA_set}) of LDA models (class: \code{LDA_VEM}).
#'
Expand All @@ -46,10 +49,12 @@
#' @export
#'
LDA_set <- function(document_term_table, topics = 2, nseeds = 1,
control = LDA_controls_list()){
control = list()){
check_LDA_set_inputs(document_term_table, topics, nseeds, control)
control <- do.call("LDA_set_control", control)
mod_topics <- rep(topics, each = length(seq(2, nseeds * 2, 2)))
mod_seeds <- rep(seq(2, nseeds * 2, 2), length(topics))
iseed <- control$iseed
mod_seeds <- rep(seq(iseed, iseed + (nseeds - 1)* 2, 2), length(topics))
nmods <- length(mod_topics)
mods <- vector("list", length = nmods)
for (i in 1:nmods){
Expand All @@ -63,9 +68,13 @@ LDA_set <- function(document_term_table, topics = 2, nseeds = 1,

#' @title Calculate the log likelihood of a VEM LDA model fit
#'
#' @description Imported calculations from topicmodels package, as applied to
#' Latent Dirichlet Allocation fit with Variational Expectation Maximization
#' via \code{\link[topicmodels]{LDA}}.
#' @description Imported but updated calculations from topicmodels package, as
#' applied to Latent Dirichlet Allocation fit with Variational Expectation
#' Maximization via \code{\link[topicmodels]{LDA}}.
#'
#' @details The number of degrees of freedom is 1 (for alpha) plus the number
#' of entries in the document-topic matrix. The number of observations is
#' the number of entries in the document-term matrix.
#'
#' @param object A \code{LDA_VEM}-class object.
#'
Expand All @@ -75,17 +84,26 @@ LDA_set <- function(document_term_table, topics = 2, nseeds = 1,
#' (degrees of freedom) and \code{nobs} (number of observations) values.
#'
#' @references
#' Buntine, W. 2002. Variational extentions to EM and multinomial PCA.
#' \emph{European Conference on Machine Learning, Lecture Notes in Computer
#' Science} \strong{2430}:23-34. \href{https://bit.ly/327sltH}{link}.
#'
#' Grun B. and K. Hornik. 2011. topicmodels: An R Package for Fitting Topic
#' Models. \emph{Journal of Statistical Software} \strong{40}:13.
#' \href{https://www.jstatsoft.org/article/view/v040i13}{link}.
#'
#' Hoffman, M. D., D. M. Blei, and F. Bach. 2010. Online learning for
#' latent Dirichlet allocation. \emph{Advances in Neural Information
#' Processing Systems} \strong{23}:856-864.
#' \href{https://bit.ly/2LEr5sb}{link}.
#'
#' @export
#'
logLik.LDA_VEM <- function(object, ...){
val <- sum(object@loglikelihood)
df <- as.integer(object@control@estimate.alpha) + length(object@beta)
attr(val, "df") <- df
attr(val, "nobs") <- object@Dim[1]
attr(val, "nobs") <- object@Dim[1] * object@Dim[2]
class(val) <- "logLik"
val
}
Expand All @@ -104,16 +122,15 @@ check_LDA_set_inputs <- function(document_term_table, topics, nseeds,
check_document_term_table(document_term_table)
check_topics(topics)
check_seeds(nseeds)
check_control(control, "LDA_controls")
check_control(control)
}

#' @title Set the control inputs to include the seed
#'
#' @description Update the control list for the LDA model with the specific
#' seed as indicated.
#' seed as indicated. And remove controls not used within the LDA itself.
#'
#' @param seed \code{number} of seeds (replicate starts) to use for the
#' specific model.
#' @param seed \code{integer} used to set the seed of the specific model.
#'
#' @param control Named list of control parameters to be used in
#' \code{\link[topicmodels]{LDA}} Note that is \code{control} has an
Expand All @@ -124,17 +141,12 @@ check_LDA_set_inputs <- function(document_term_table, topics, nseeds,
#'
#' @export
#'
prep_LDA_control <- function(seed, control = NULL){
if("LDA_controls" %in% class(control)){
class(control) <- "list"
control$quiet <- NULL
control$measurer <- NULL
control$selector <- NULL
control$seed <- seed
}
if(is.null(control)){
control <- list(seed = seed)
}
prep_LDA_control <- function(seed, control = list()){
control$quiet <- NULL
control$measurer <- NULL
control$selector <- NULL
control$iseed <- NULL
control$seed <- seed
control
}

Expand All @@ -147,9 +159,11 @@ prep_LDA_control <- function(seed, control = NULL){
#' @param LDA_models An object of class \code{LDA_set} produced by
#' \code{\link{LDA_set}}.
#'
#' @param control Class \code{LDA_controls} list (generated by
#' \code{\link{LDA_controls_list}}) including named elements
#' corresponding to the \code{measurer} and \code{evaluator} functions.
#' @param control A \code{list} of parameters to control the running and
#' selecting of LDA models. Values not input assume default values set
#' by \code{\link{LDA_set_control}}. Values for running the LDAs replace
#' defaults in (\code{LDAcontol}, see \code{\link[topicmodels]{LDA}} (but if
#' \code{seed} is given, it will be overwritten; use \code{iseed} instead).
#'
#' @return A reduced version of \code{LDA_models} that only includes the
#' selected LDA model(s). The returned object is still an object of
Expand All @@ -165,14 +179,13 @@ prep_LDA_control <- function(seed, control = NULL){
#'
#' @export
#'
select_LDA <- function(LDA_models = NULL, control = LDA_controls_list()){

measurer <- control$measurer
selector <- control$selector
select_LDA <- function(LDA_models = NULL, control = list()){
if("LDA_set" %in% attr(LDA_models, "class") == FALSE){
stop("LDA_models must be of class LDA_set")
}

control <- do.call("LDA_set_control", control)
measurer <- control$measurer
selector <- control$selector
lda_measured <- vapply(LDA_models, measurer, 0) %>%
matrix(ncol = 1)
lda_selected <- apply(lda_measured, 2, selector)
Expand Down Expand Up @@ -227,15 +240,16 @@ package_LDA_set <- function(mods, mod_topics, mod_seeds){
#'
#' @export
#'
LDA_msg <- function(mod_topics, mod_seeds, control){
LDA_msg <- function(mod_topics, mod_seeds, control = list()){
control <- do.call("LDA_set_control", control)
check_topics(mod_topics)
check_seeds(mod_seeds)
topic_msg <- paste0("Running LDA with ", mod_topics, " topics ")
seed_msg <- paste0("(seed ", mod_seeds, ")")
qprint(paste0(topic_msg, seed_msg), "", control$quiet)
}

#' @title Create control list for LDA model
#' @title Create control list for set of LDA models
#'
#' @description This function provides a simple creation and definition of
#' the list used to control the set of LDA models. It is set up to be easy
Expand All @@ -250,16 +264,17 @@ LDA_msg <- function(mod_topics, mod_seeds, control){
#' and \code{selector} operates on the values to choose the model(s) to
#' pass on.
#'
#' @param iseed \code{integer} initial seed for the model set.
#'
#' @param ... Additional arguments to be passed to
#' \code{\link[topicmodels]{LDA}} as a \code{control} input.
#'
#' @return Class \code{LDA_controls} list for controlling the LDA model fit.
#'
#' @export
#'
LDA_controls_list <- function(quiet = FALSE, measurer = AIC, selector = min,
...){
out <- list(quiet = quiet, measurer = measurer, selector = selector, ...)
class(out) <- c("LDA_controls", "list")
out
LDA_set_control <- function(quiet = FALSE, measurer = AIC, selector = min,
iseed = 2, ...){
list(quiet = quiet, measurer = measurer, selector = selector,
iseed = iseed, ...)
}
16 changes: 10 additions & 6 deletions R/LDATS.R
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
#' @importFrom coda as.mcmc autocorr autocorr.diag effectiveSize HPDinterval
#' @importFrom digest digest
#' @importFrom extraDistr rcat rdirichlet
#' @importFrom graphics abline axis hist mtext par plot points rect text
#' @importFrom grDevices devAskNewPage rgb
#' @importFrom lubridate is.Date
Expand All @@ -9,8 +10,8 @@
#' @importFrom mvtnorm rmvnorm
#' @importFrom nnet multinom
#' @importFrom progress progress_bar
#' @importFrom stats acf AIC as.formula coef ecdf logLik median rgeom runif sd
#' terms var vcov
#' @importFrom stats acf AIC as.formula coef ecdf logLik median rgeom rnorm
#' runif sd terms var vcov
#' @importFrom topicmodels LDA
#' @importFrom viridis viridis
#'
Expand All @@ -23,12 +24,15 @@
#' 2003) and Bayesian Time Series models (Western and Kleykamp 2004) that we
#' extend for multinomial data using softmax regression (Venables and Ripley
#' 2002) following Christensen \emph{et al.} (2018).
#' \cr \cr
#' \href{https://github.com/weecology/LDATS/blob/master/manuscript/simonis_et_al.pdf}{Technical mathematical manuscript}
#'
#' @section Documentation:
#' \href{https://bit.ly/2Jq73A5}{Technical mathematical manuscript}
#' \cr \cr
#' \href{https://weecology.github.io/LDATS/articles/rodents-example.html}{End-user-focused vignette worked example}
#' \href{https://bit.ly/2Jvj9GS}{End-user-focused vignette worked example}
#' \cr \cr
#' \href{https://weecology.github.io/LDATS/articles/LDATS_codebase.html}{Computational pipeline vignette}
#' \href{https://bit.ly/2xFzJOW}{Computational pipeline vignette}
#' \cr \cr
#' \href{https://bit.ly/2NFTVLh}{Comparison to Christensen \emph{et al.}}
#'
#' @references
#'
Expand Down
Loading

0 comments on commit bf918db

Please sign in to comment.