Title: | Latent Dirichlet Allocation Coupled with Time Series Analyses |
Version: | 0.3.0 |
Description: | Combines Latent Dirichlet Allocation (LDA) and Bayesian multinomial time series methods in a two-stage analysis to quantify dynamics in high-dimensional temporal data. LDA decomposes multivariate data into lower-dimension latent groupings, whose relative proportions are modeled using generalized Bayesian time series models that include abrupt changepoints and smooth dynamics. The methods are described in Blei et al. (2003) <doi:10.1162/jmlr.2003.3.4-5.993>, Western and Kleykamp (2004) <doi:10.1093/pan/mph023>, Venables and Ripley (2002, ISBN-13:978-0387954578), and Christensen et al. (2018) <doi:10.1002/ecy.2373>. |
URL: | https://weecology.github.io/LDATS/, https://github.com/weecology/LDATS |
BugReports: | https://github.com/weecology/LDATS/issues |
Depends: | R (≥ 3.5.0) |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
LazyData: | true |
Imports: | coda, digest, extraDistr, graphics, grDevices, lubridate, magrittr, memoise, methods, mvtnorm, nnet, progress, stats, topicmodels, viridis |
Suggests: | knitr, pkgdown, rmarkdown, testthat, vdiffr |
SystemRequirements: | gsl |
VignetteBuilder: | knitr |
RoxygenNote: | 7.2.3 |
NeedsCompilation: | no |
Packaged: | 2023-09-18 16:29:40 UTC; dappe |
Author: | Juniper L. Simonis
|
Maintainer: | Juniper L. Simonis <juniper.simonis@weecology.org> |
Repository: | CRAN |
Date/Publication: | 2023-09-19 09:10:06 UTC |
Calculate AICc
Description
Calculate the small sample size correction of
AIC
for the input object.
Usage
AICc(object)
Arguments
object |
Value
numeric
value of AICc.
Examples
dat <- data.frame(y = rnorm(50), x = rnorm(50))
mod <- lm(dat)
AICc(mod)
Package to conduct two-stage analyses combining Latent Dirichlet Allocation with Bayesian Time Series models
Description
Performs two-stage analysis of multivariate temporal data using a combination of Latent Dirichlet Allocation (Blei et al. 2003) and Bayesian Time Series models (Western and Kleykamp 2004) that we extend for multinomial data using softmax regression (Venables and Ripley 2002) following Christensen et al. (2018).
Documentation
Technical mathematical manuscript
End-user-focused vignette worked example
Computational pipeline vignette
Comparison to Christensen et al.
References
Blei, D. M., A. Y. Ng, and M. I. Jordan. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research 3:993-1022. link.
Christensen, E., D. J. Harris, and S. K. M. Ernest. 2018. Long-term community change through multiple rapid transitions in a desert rodent community. Ecology 99:1523-1529. link.
Venables, W. N. and B. D. Ripley. 2002. Modern and Applied Statistics with S. Fourth Edition. Springer, New York, NY, USA.
Western, B. and M. Kleykamp. 2004. A Bayesian change point model for historical time series analysis. Political Analysis 12:354-374. link.
Run a full set of Latent Dirichlet Allocations and Time Series models
Description
Conduct a complete LDATS analysis (Christensen
et al. 2018), including running a suite of Latent Dirichlet
Allocation (LDA) models (Blei et al. 2003, Grun and Hornik 2011)
via LDA_set
, selecting LDA model(s) via
select_LDA
, running a complete set of Bayesian Time Series
(TS) models (Western and Kleykamp 2004) via TS_on_LDA
on
the chosen LDA model(s), and selecting the best TS model via
select_TS
.
conform_LDA_TS_data
converts the data
input to
match internal and sub-function specifications.
check_LDA_TS_inputs
checks that the inputs to
LDA_TS
are of proper classes for a full analysis.
Usage
LDA_TS(
data,
topics = 2,
nseeds = 1,
formulas = ~1,
nchangepoints = 0,
timename = "time",
weights = TRUE,
control = list()
)
conform_LDA_TS_data(data, quiet = FALSE)
check_LDA_TS_inputs(
data = NULL,
topics = 2,
nseeds = 1,
formulas = ~1,
nchangepoints = 0,
timename = "time",
weights = TRUE,
control = list()
)
Arguments
data |
Either a document term table or a list including at least
a document term table (with the word "term" in the name of the element)
and optionally also a document covariate table (with the word
"covariate" in the name of the element).
|
topics |
Vector of the number of topics to evaluate for each model.
Must be conformable to |
nseeds |
|
formulas |
Vector of |
nchangepoints |
Vector of |
timename |
|
weights |
Optional input for overriding standard weighting for
documents in the time series. Defaults to |
control |
A |
quiet |
|
Value
LDA_TS
: a class LDA_TS
list object including all
fitted LDA and TS models and selected models specifically as elements
"LDA models"
(from LDA_set
),
"Selected LDA model"
(from select_LDA
),
"TS models"
(from TS_on_LDA
), and
"Selected TS model"
(from select_TS
).
conform_LDA_TS_data
: a data list
that is ready for analyses
using the stage-specific functions.
check_LDA_TS_inputs
: an error message is thrown if any input is
improper, otherwise NULL
.
References
Blei, D. M., A. Y. Ng, and M. I. Jordan. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research 3:993-1022. link.
Christensen, E., D. J. Harris, and S. K. M. Ernest. 2018. Long-term community change through multiple rapid transitions in a desert rodent community. Ecology 99:1523-1529. link.
Grun B. and K. Hornik. 2011. topicmodels: An R Package for Fitting Topic Models. Journal of Statistical Software 40:13. link.
Western, B. and M. Kleykamp. 2004. A Bayesian change point model for historical time series analysis. Political Analysis 12:354-374. link.
Examples
data(rodents)
mod <- LDA_TS(data = rodents, topics = 2, nseeds = 1, formulas = ~1,
nchangepoints = 1, timename = "newmoon")
conform_LDA_TS_data(rodents)
check_LDA_TS_inputs(rodents, timename = "newmoon")
Create the controls list for the LDATS model
Description
Create and define a list of control options used to run the
LDATS model, as implemented by LDA_TS
.
Usage
LDA_TS_control(
quiet = FALSE,
measurer_LDA = AIC,
selector_LDA = min,
iseed = 2,
memoise = TRUE,
response = "gamma",
lambda = 0,
measurer_TS = AIC,
selector_TS = min,
ntemps = 6,
penultimate_temp = 2^6,
ultimate_temp = 1e+10,
q = 0,
nit = 10000,
magnitude = 12,
burnin = 0,
thin_frac = 1,
summary_prob = 0.95,
seed = NULL,
...
)
Arguments
quiet |
|
measurer_LDA , selector_LDA |
Function names for use in evaluation of
the LDA models. |
iseed |
|
memoise |
|
response |
|
lambda |
|
measurer_TS , selector_TS |
Function names for use in evaluation of the
TS models. |
ntemps |
|
penultimate_temp |
Penultimate temperature in the ptMCMC sequence. |
ultimate_temp |
Ultimate temperature in the ptMCMC sequence. |
q |
Exponent controlling the ptMCMC temperature sequence from the focal chain (reference with temperature = 1) to the penultimate chain. 0 (default) implies a geometric sequence. 1 implies squaring before exponentiating. |
nit |
|
magnitude |
Average magnitude (defining a geometric distribution) for the proposed step size in the ptMCMC algorithm. |
burnin |
|
thin_frac |
Fraction of iterations to retain, from the ptMCMC. Must be
|
summary_prob |
Probability used for summarizing the posterior
distributions (via the highest posterior density interval, see
|
seed |
Input to |
... |
Additional arguments to be passed to
|
Value
list
of control lists
, with named elements
LDAcontrol
, TScontrol
, and quiet
.
Examples
LDA_TS_control()
Create the model-running-message for an LDA
Description
Produce and print the message for a given LDA model.
Usage
LDA_msg(mod_topics, mod_seeds, control = list())
Arguments
mod_topics |
|
mod_seeds |
|
control |
Class |
Examples
LDA_msg(mod_topics = 4, mod_seeds = 2)
Run a set of Latent Dirichlet Allocation models
Description
For a given dataset consisting of counts of words across
multiple documents in a corpus, conduct multiple Latent Dirichlet
Allocation (LDA) models (using the Variational Expectation
Maximization (VEM) algorithm; Blei et al. 2003) to account for [1]
uncertainty in the number of latent topics and [2] the impact of initial
values in the estimation procedure.
LDA_set
is a list wrapper of LDA
in the topicmodels
package (Grun and Hornik 2011).
check_LDA_set_inputs
checks that all of the inputs
are proper for LDA_set
(that the table of observations is
conformable to a matrix of integers, the number of topics is an integer,
the number of seeds is an integer and the controls list is proper).
Usage
LDA_set(document_term_table, topics = 2, nseeds = 1, control = list())
check_LDA_set_inputs(document_term_table, topics, nseeds, control)
Arguments
document_term_table |
Table of observation count data (rows:
documents, columns: terms. May be a class |
topics |
Vector of the number of topics to evaluate for each model.
Must be conformable to |
nseeds |
Number of seeds (replicate starts) to use for each
value of |
control |
A |
Value
LDA_set
: list
(class: LDA_set
) of LDA models
(class: LDA_VEM
).
check_LDA_set_inputs
: an error message is thrown if any input is
improper, otherwise NULL
.
References
Blei, D. M., A. Y. Ng, and M. I. Jordan. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research 3:993-1022. link.
Grun B. and K. Hornik. 2011. topicmodels: An R Package for Fitting Topic Models. Journal of Statistical Software 40:13. link.
Examples
data(rodents)
lda_data <- rodents$document_term_table
r_LDA <- LDA_set(lda_data, topics = 2, nseeds = 2)
Create control list for set of LDA models
Description
This function provides a simple creation and definition of
the list used to control the set of LDA models. It is set up to be easy
to work with the existing control capacity of
LDA
.
Usage
LDA_set_control(quiet = FALSE, measurer = AIC, selector = min, iseed = 2, ...)
Arguments
quiet |
|
measurer , selector |
Function names for use in evaluation of the LDA
models. |
iseed |
|
... |
Additional arguments to be passed to
|
Value
list
for controlling the LDA model fit.
Examples
LDA_set_control()
Conduct a single multinomial Bayesian Time Series analysis
Description
This is the main interface function for the LDATS application
of Bayesian change point Time Series analyses (Christensen et al.
2018), which extends the model of Western and Kleykamp (2004;
see also Ruggieri 2013) to multinomial (proportional) response data using
softmax regression (Ripley 1996, Venables and Ripley 2002, Bishop 2006)
using a generalized linear modeling approach (McCullagh and Nelder 1989).
The models are fit using parallel tempering Markov Chain Monte Carlo
(ptMCMC) methods (Earl and Deem 2005) to locate change points and
neural networks (Ripley 1996, Venables and Ripley 2002, Bishop 2006) to
estimate regressors.
check_TS_inputs
checks that the inputs to
TS
are of proper classes for a full analysis.
Usage
TS(
data,
formula = gamma ~ 1,
nchangepoints = 0,
timename = "time",
weights = NULL,
control = list()
)
check_TS_inputs(
data,
formula = gamma ~ 1,
nchangepoints = 0,
timename = "time",
weights = NULL,
control = list()
)
Arguments
data |
|
formula |
|
nchangepoints |
|
timename |
|
weights |
Optional class |
control |
A |
Value
TS
: TS_fit
-class list containing the following
elements, many of
which are hidden for print
ing, but are accessible:
- data
data
input to the function.- formula
formula
input to the function.- nchangepoints
nchangepoints
input to the function.- weights
weights
input to the function.- control
control
input to the function.- lls
Iteration-by-iteration logLik values for the full time series fit by
multinom_TS
.- rhos
Iteration-by-iteration change point estimates from
est_changepoints
.- etas
Iteration-by-iteration marginal regressor estimates from
est_regressors
, which have been unconditioned with respect to the change point locations.- ptMCMC_diagnostics
ptMCMC diagnostics, see
diagnose_ptMCMC
- rho_summary
Summary table describing
rhos
(the change point locations), seesummarize_rhos
.- rho_vcov
Variance-covariance matrix for the estimates of
rhos
(the change point locations), seemeasure_rho_vcov
.- eta_summary
Summary table describing
ets
(the regressors), seesummarize_etas
.- eta_vcov
Variance-covariance matrix for the estimates of
etas
(the regressors), seemeasure_eta_vcov
.- logLik
Across-iteration average of log-likelihoods (
lls
).- nparams
Total number of parameters in the full model, including the change point locations and regressors.
- deviance
Penalized negative log-likelihood, based on
logLik
andnparams
.
check_TS_inputs
: An error message is thrown if any input
is not proper, else NULL
.
References
Bishop, C. M. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY, USA.
Christensen, E., D. J. Harris, and S. K. M. Ernest. 2018. Long-term community change through multiple rapid transitions in a desert rodent community. Ecology 99:1523-1529. link.
Earl, D. J. and M. W. Deem. 2005. Parallel tempering: theory, applications, and new perspectives. Physical Chemistry Chemical Physics 7: 3910-3916. link.
McCullagh, P. and J. A. Nelder. 1989. Generalized Linear Models. 2nd Edition. Chapman and Hall, New York, NY, USA.
Ripley, B. D. 1996. Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge, UK.
Ruggieri, E. 2013. A Bayesian approach to detecting change points in climactic records. International Journal of Climatology 33:520-528. link.
Venables, W. N. and B. D. Ripley. 2002. Modern and Applied Statistics with S. Fourth Edition. Springer, New York, NY, USA.
Western, B. and M. Kleykamp. 2004. A Bayesian change point model for historical time series analysis. Political Analysis 12:354-374. link.
Examples
data(rodents)
document_term_table <- rodents$document_term_table
document_covariate_table <- rodents$document_covariate_table
LDA_models <- LDA_set(document_term_table, topics = 2)[[1]]
data <- document_covariate_table
data$gamma <- LDA_models@gamma
weights <- document_weights(document_term_table)
TSmod <- TS(data, gamma ~ 1, nchangepoints = 1, "newmoon", weights)
check_TS_inputs(data, timename = "newmoon")
Create the controls list for the Time Series model
Description
This function provides a simple creation and definition of a
list used to control the time series model fit occurring within
TS
.
Usage
TS_control(
memoise = TRUE,
response = "gamma",
lambda = 0,
measurer = AIC,
selector = min,
ntemps = 6,
penultimate_temp = 2^6,
ultimate_temp = 1e+10,
q = 0,
nit = 10000,
magnitude = 12,
quiet = FALSE,
burnin = 0,
thin_frac = 1,
summary_prob = 0.95,
seed = NULL
)
Arguments
memoise |
|
response |
|
lambda |
|
measurer , selector |
Function names for use in evaluation of the TS
models. |
ntemps |
|
penultimate_temp |
Penultimate temperature in the ptMCMC sequence. |
ultimate_temp |
Ultimate temperature in the ptMCMC sequence. |
q |
Exponent controlling the ptMCMC temperature sequence from the focal chain (reference with temperature = 1) to the penultimate chain. 0 (default) implies a geometric sequence. 1 implies squaring before exponentiating. |
nit |
|
magnitude |
Average magnitude (defining a geometric distribution) for the proposed step size in the ptMCMC algorithm. |
quiet |
|
burnin |
|
thin_frac |
Fraction of iterations to retain, must be |
summary_prob |
Probability used for summarizing the posterior
distributions (via the highest posterior density interval, see
|
seed |
Input to |
Value
list
, with named elements corresponding to the arguments.
Examples
TS_control()
Plot the diagnostics of the parameters fit in a TS model
Description
Plot 4-panel figures (showing trace plots, posterior ECDF,
posterior density, and iteration autocorrelation) for each of the
parameters (change point locations and regressors) fitted within a
multinomial time series model (fit by TS
).
eta_diagnostics_plots
creates the diagnostic plots
for the regressors (etas) of a time series model.
rho_diagnostics_plots
creates the diagnostic plots
for the change point locations (rho) of a time series model.
Usage
TS_diagnostics_plot(x, interactive = TRUE)
eta_diagnostics_plots(x, interactive)
rho_diagnostics_plots(x, interactive)
Arguments
x |
Object of class |
interactive |
|
Value
NULL
.
Examples
data(rodents)
document_term_table <- rodents$document_term_table
document_covariate_table <- rodents$document_covariate_table
LDA_models <- LDA_set(document_term_table, topics = 2)[[1]]
data <- document_covariate_table
data$gamma <- LDA_models@gamma
weights <- document_weights(document_term_table)
TSmod <- TS(data, gamma ~ 1, nchangepoints = 1, "newmoon", weights)
TS_diagnostics_plot(TSmod)
Conduct a set of Time Series analyses on a set of LDA models
Description
This is a wrapper function that expands the main Time Series
analyses function (TS
) across the LDA models (estimated
using LDA
or LDA_set
and the
Time Series models, with respect to both continuous time formulas and the
number of discrete changepoints. This function allows direct passage of
the control parameters for the parallel tempering MCMC through to the
main Time Series function, TS
, via the
ptMCMC_controls
argument.
check_TS_on_LDA_inputs
checks that the inputs to
TS_on_LDA
are of proper classes for a full analysis.
Usage
TS_on_LDA(
LDA_models,
document_covariate_table,
formulas = ~1,
nchangepoints = 0,
timename = "time",
weights = NULL,
control = list()
)
check_TS_on_LDA_inputs(
LDA_models,
document_covariate_table,
formulas = ~1,
nchangepoints = 0,
timename = "time",
weights = NULL,
control = list()
)
Arguments
LDA_models |
List of LDA models (class |
document_covariate_table |
Document covariate table (rows: documents,
columns: time index and covariate options). Every model needs a
covariate to describe the time value for each document (in whatever
units and whose name in the table is input in |
formulas |
Vector of |
nchangepoints |
Vector of |
timename |
|
weights |
Optional class |
control |
A |
Value
TS_on_LDA
: TS_on_LDA
-class list
of results
from TS
applied for each model on each LDA model input.
check_TS_inputs
: An error message is thrown if any input
is not proper, else NULL
.
Examples
data(rodents)
document_term_table <- rodents$document_term_table
document_covariate_table <- rodents$document_covariate_table
LDAs <- LDA_set(document_term_table, topics = 2:3, nseeds = 2)
LDA_models <- select_LDA(LDAs)
weights <- document_weights(document_term_table)
formulas <- c(~ 1, ~ newmoon)
mods <- TS_on_LDA(LDA_models, document_covariate_table, formulas,
nchangepoints = 0:1, timename = "newmoon", weights)
Create the summary plot for a TS fit to an LDA model
Description
Produces a two-panel figure of [1] the change point
distributions as histograms over time and [2] the time series of the
fitted topic proportions over time, based on a selected set of
change point locations.
pred_gamma_TS_plot
produces a time series of the
fitted topic proportions over time, based on a selected set of change
point locations.
rho_hist
: make a plot of the change point
distributions as histograms over time.
Usage
TS_summary_plot(
x,
cols = set_TS_summary_plot_cols(),
bin_width = 1,
xname = NULL,
border = NA,
selection = "median",
LDATS = FALSE
)
pred_gamma_TS_plot(
x,
selection = "median",
cols = set_gamma_colors(x),
xname = NULL,
together = FALSE,
LDATS = FALSE
)
rho_hist(
x,
cols = set_rho_hist_colors(x$rhos),
bin_width = 1,
xname = NULL,
border = NA,
together = FALSE,
LDATS = FALSE
)
Arguments
x |
Object of class |
cols |
|
bin_width |
Width of the bins used in the histograms, in units of the x-axis (the time variable used to fit the model). |
xname |
Label for the x-axis in the summary time series plot. Defaults
to |
border |
Border for the histogram, default is |
selection |
Indicator of the change points to use. Currently only defined for "median" and "mode". |
LDATS |
|
together |
|
Value
NULL
.
Examples
data(rodents)
document_term_table <- rodents$document_term_table
document_covariate_table <- rodents$document_covariate_table
LDA_models <- LDA_set(document_term_table, topics = 2)[[1]]
data <- document_covariate_table
data$gamma <- LDA_models@gamma
weights <- document_weights(document_term_table)
TSmod <- TS(data, gamma ~ 1, nchangepoints = 1, "newmoon", weights)
TS_summary_plot(TSmod)
pred_gamma_TS_plot(TSmod)
rho_hist(TSmod)
Produce the autocorrelation panel for the TS diagnostic plot of a parameter
Description
Produce a vanilla ACF plot using acf
for
the parameter of interest (rho or eta) as part of
TS_diagnostics_plot
.
Usage
autocorr_plot(x)
Arguments
x |
Vector of parameter values drawn from the posterior distribution, indexed to the iteration by the order of the vector. |
Value
NULL
.
Examples
autocorr_plot(rnorm(100, 0, 1))
Check that LDA model input is proper
Description
Check that the LDA_models
input is either a set of
LDA models (class LDA_set
, produced by
LDA_set
) or a singular LDA model (class LDA
,
produced by LDA
).
Usage
check_LDA_models(LDA_models)
Arguments
LDA_models |
List of LDA models or singular LDA model to evaluate. |
Value
An error message is thrown if LDA_models
is not proper,
else NULL
.
Examples
data(rodents)
document_term_table <- rodents$document_term_table
document_covariate_table <- rodents$document_covariate_table
LDAs <- LDA_set(document_term_table, topics = 2, nseeds = 1)
LDA_models <- select_LDA(LDAs)
check_LDA_models(LDA_models)
Check that a set of change point locations is proper
Description
Check that the change point locations are numeric
and conformable to interger
values.
Usage
check_changepoints(changepoints = NULL)
Arguments
changepoints |
Change point locations to evaluate. |
Value
An error message is thrown if changepoints
are not proper,
else NULL
.
Examples
check_changepoints(100)
Check that a control list is proper
Description
Check that a list of controls is of the right class.
Usage
check_control(control, eclass = "list")
Arguments
control |
Control list to evaluate. |
eclass |
Expected class of the list to be evaluated. |
Value
an error message is thrown if the input is improper, otherwise
NULL
.
Examples
check_control(list())
Check that the document covariate table is proper
Description
Check that the table of document-level covariates is conformable to a data frame and of the right size (correct number of documents) for the document-topic output from the LDA models.
Usage
check_document_covariate_table(
document_covariate_table,
LDA_models = NULL,
document_term_table = NULL
)
Arguments
document_covariate_table |
Document covariate table to evaluate. |
LDA_models |
Reference LDA model list (class |
document_term_table |
Optional input for checking when
|
Value
An error message is thrown if document_covariate_table
is
not proper, else NULL
.
Examples
data(rodents)
check_document_covariate_table(rodents$document_covariate_table)
Check that document term table is proper
Description
Check that the table of observations is conformable to a matrix of integers.
Usage
check_document_term_table(document_term_table)
Arguments
document_term_table |
Table of observation count data (rows:
documents, columns: terms. May be a class |
Value
an error message is thrown if the input is improper, otherwise
NULL
.
Examples
data(rodents)
check_document_term_table(rodents$document_term_table)
Check that a formula is proper
Description
Check that formula
is actually a
formula
and that the
response and predictor variables are all included in data
.
Usage
check_formula(data, formula)
Arguments
data |
|
formula |
|
Value
An error message is thrown if formula
is not proper,
else NULL
.
Examples
data(rodents)
document_term_table <- rodents$document_term_table
document_covariate_table <- rodents$document_covariate_table
LDA_models <- LDA_set(document_term_table, topics = 2)[[1]]
data <- document_covariate_table
data$gamma <- LDA_models@gamma
check_formula(data, gamma ~ 1)
Check that formulas vector is proper and append the response variable
Description
Check that the vector of formulas is actually formatted
as a vector of formula
objects and that the
predictor variables are all included in the document covariate table.
Usage
check_formulas(formulas, document_covariate_table, control = list())
Arguments
formulas |
Vector of the formulas to evaluate. |
document_covariate_table |
Document covariate table used to evaluate the availability of the data required by the formula inputs. |
control |
A |
Value
An error message is thrown if formulas
is
not proper, else NULL
.
Examples
data(rodents)
check_formulas(~ 1, rodents$document_covariate_table)
Check that nchangepoints vector is proper
Description
Check that the vector of numbers of changepoints is conformable to integers greater than 1.
Usage
check_nchangepoints(nchangepoints)
Arguments
nchangepoints |
Vector of the number of changepoints to evaluate. |
Value
An error message is thrown if nchangepoints
is not proper,
else NULL
.
Examples
check_nchangepoints(0)
check_nchangepoints(2)
Check that nseeds value or seeds vector is proper
Description
Check that the vector of numbers of seeds is conformable to integers greater than 0.
Usage
check_seeds(nseeds)
Arguments
nseeds |
|
Value
an error message is thrown if the input is improper, otherwise
NULL
.
Examples
check_seeds(1)
check_seeds(2)
Check that the time vector is proper
Description
Check that the vector of time values is included in the
document covariate table and that it is either a integer-conformable or
a date
. If it is a date
, the input is converted to an
integer, resulting in the timestep being 1 day, which is often not
desired behavior.
Usage
check_timename(document_covariate_table, timename)
Arguments
document_covariate_table |
Document covariate table used to query for the time column. |
timename |
Column name for the time variable to evaluate. |
Value
An error message is thrown if timename
is
not proper, else NULL
.
Examples
data(rodents)
check_timename(rodents$document_covariate_table, "newmoon")
Check that topics vector is proper
Description
Check that the vector of numbers of topics is conformable to integers greater than 1.
Usage
check_topics(topics)
Arguments
topics |
Vector of the number of topics to evaluate for each model.
Must be conformable to |
Value
an error message is thrown if the input is improper, otherwise
NULL
.
Examples
check_topics(2)
Check that weights vector is proper
Description
Check that the vector of document weights is numeric and positive and inform the user if the average weight isn't 1.
Usage
check_weights(weights)
Arguments
weights |
Vector of the document weights to evaluate, or |
Value
An error message is thrown if weights
is not proper,
else NULL
.
Examples
check_weights(1)
wts <- runif(100, 0.1, 100)
check_weights(wts)
wts2 <- wts / mean(wts)
check_weights(wts2)
check_weights(TRUE)
Count trips of the ptMCMC particles
Description
Count the full trips (from one extreme temperature chain to
the other and back again; Katzgraber et al. 2006) for each of the
ptMCMC particles, as identified by their id on initialization.
This function was designed to work within TS
and process
the output of est_changepoints
as a component of
diagnose_ptMCMC
, but has been generalized
and would work with any output from a ptMCMC as long as ids
is formatted properly.
Usage
count_trips(ids)
Arguments
ids |
|
Value
list
of [1] vector
of within particle trip counts
($trip_counts
), and [2] vector
of within-particle average
trip rates ($trip_rates
).
References
Katzgraber, H. G., S. Trebst, D. A. Huse. And M. Troyer. 2006. Feedback-optimized parallel tempering Monte Carlo. Journal of Statistical Mechanics: Theory and Experiment 3:P03018 link.
Examples
data(rodents)
document_term_table <- rodents$document_term_table
document_covariate_table <- rodents$document_covariate_table
LDA_models <- LDA_set(document_term_table, topics = 2)[[1]]
data <- document_covariate_table
data$gamma <- LDA_models@gamma
weights <- document_weights(document_term_table)
data <- data[order(data[,"newmoon"]), ]
rho_dist <- est_changepoints(data, gamma ~ 1, 1, "newmoon", weights,
TS_control())
count_trips(rho_dist$ids)
Calculate ptMCMC summary diagnostics
Description
Summarize the step and swap acceptance rates as well as trip metrics from the saved output of a ptMCMC estimation.
Usage
diagnose_ptMCMC(ptMCMCout)
Arguments
ptMCMCout |
Named |
Details
Within-chain step acceptance rates are averaged for each of the
chains from the raw step acceptance histories
(ptMCMCout$step_accepts
) and between-chain swap acceptance rates
are similarly averaged for each of the neighboring pairs of chains from
the raw swap acceptance histories (ptMCMCout$swap_accepts
).
Trips are defined as movement from one extreme chain to the other and
back again (Katzgraber et al. 2006). Trips are counted and turned
to per-iteration rates using count_trips
.
This function was first designed to work within TS
and
process the output of est_changepoints
, but has been
generalized and would work with any output from a ptMCMC as long as
ptMCMCout
is formatted properly.
Value
list
of [1] within-chain average step acceptance rates
($step_acceptance_rate
), [2] average between-chain swap acceptance
rates ($swap_acceptance_rate
), [3] within particle trip counts
($trip_counts
), and [4] within-particle average trip rates
($trip_rates
).
References
Katzgraber, H. G., S. Trebst, D. A. Huse. And M. Troyer. 2006. Feedback-optimized parallel tempering Monte Carlo. Journal of Statistical Mechanics: Theory and Experiment 3:P03018 link.
Examples
data(rodents)
document_term_table <- rodents$document_term_table
document_covariate_table <- rodents$document_covariate_table
LDA_models <- LDA_set(document_term_table, topics = 2)[[1]]
data <- document_covariate_table
data$gamma <- LDA_models@gamma
weights <- document_weights(document_term_table)
data <- data[order(data[,"newmoon"]), ]
rho_dist <- est_changepoints(data, gamma ~ 1, 1, "newmoon",
weights, TS_control())
diagnose_ptMCMC(rho_dist)
Calculate document weights for a corpus
Description
Simple calculation of document weights based on the average number of words in a document within the corpus (mean value = 1).
Usage
document_weights(document_term_table)
Arguments
document_term_table |
Table of observation count data (rows:
documents, columns: terms. May be a class |
Value
Vector of weights, one for each document, with the average sample receiving a weight of 1.0.
Examples
data(rodents)
document_weights(rodents$document_term_table)
Produce the posterior distribution ECDF panel for the TS diagnostic plot of a parameter
Description
Produce a vanilla ECDF (empirical cumulative distribution
function) plot using ecdf
for the parameter of interest (rho or
eta) as part of TS_diagnostics_plot
. A horizontal line
is added to show the median of the posterior.
Usage
ecdf_plot(x, xlab = "parameter value")
Arguments
x |
Vector of parameter values drawn from the posterior distribution, indexed to the iteration by the order of the vector. |
xlab |
|
Value
NULL
.
Examples
ecdf_plot(rnorm(100, 0, 1))
Use ptMCMC to estimate the distribution of change point locations
Description
This function executes ptMCMC-based estimation of the change point location distributions for multinomial Time Series analyses.
Usage
est_changepoints(
data,
formula,
nchangepoints,
timename,
weights,
control = list()
)
Arguments
data |
|
formula |
|
nchangepoints |
|
timename |
|
weights |
Optional class |
control |
A |
Value
List of saved data objects from the ptMCMC estimation of
change point locations (unless nchangepoints
is 0, then
NULL
is returned).
Examples
data(rodents)
document_term_table <- rodents$document_term_table
document_covariate_table <- rodents$document_covariate_table
LDA_models <- LDA_set(document_term_table, topics = 2)[[1]]
data <- document_covariate_table
data$gamma <- LDA_models@gamma
weights <- document_weights(document_term_table)
formula <- gamma ~ 1
nchangepoints <- 1
control <- TS_control()
data <- data[order(data[,"newmoon"]), ]
rho_dist <- est_changepoints(data, formula, nchangepoints, "newmoon",
weights, control)
Estimate the distribution of regressors, unconditional on the change point locations
Description
This function uses the marginal posterior distributions of
the change point locations (estimated by est_changepoints
)
in combination with the conditional (on the change point locations)
posterior distributions of the regressors (estimated by
multinom_TS
) to estimate the marginal posterior
distribution of the regressors, unconditional on the change point
locations.
Usage
est_regressors(rho_dist, data, formula, timename, weights, control = list())
Arguments
rho_dist |
List of saved data objects from the ptMCMC estimation of
change point locations (unless |
data |
|
formula |
|
timename |
|
weights |
Optional class |
control |
A |
Details
The general approach follows that of Western and Kleykamp
(2004), although we note some important differences. Our regression
models are fit independently for each chunk (segment of time), and
therefore the variance-covariance matrix for the full model
has 0
entries for covariances between regressors in different
chunks of the time series. Further, because the regression model here
is a standard (non-hierarchical) softmax (Ripley 1996, Venables and
Ripley 2002, Bishop 2006), there is no error term in the regression
(as there is in the normal model used by Western and Kleykamp 2004),
and so the posterior distribution used here is a multivariate normal,
as opposed to a multivariate t, as used by Western and Kleykamp (2004).
Value
matrix
of draws (rows) from the marginal posteriors of the
coefficients across the segments (columns).
References
Bishop, C. M. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY, USA.
Ripley, B. D. 1996. Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge, UK.
Venables, W. N. and B. D. Ripley. 2002. Modern and Applied Statistics with S. Fourth Edition. Springer, New York, NY, USA.
Western, B. and M. Kleykamp. 2004. A Bayesian change point model for historical time series analysis. Political Analysis 12:354-374. link.
Examples
data(rodents)
document_term_table <- rodents$document_term_table
document_covariate_table <- rodents$document_covariate_table
LDA_models <- LDA_set(document_term_table, topics = 2)[[1]]
data <- document_covariate_table
data$gamma <- LDA_models@gamma
weights <- document_weights(document_term_table)
formula <- gamma ~ 1
nchangepoints <- 1
control <- TS_control()
data <- data[order(data[,"newmoon"]), ]
rho_dist <- est_changepoints(data, formula, nchangepoints, "newmoon",
weights, control)
eta_dist <- est_regressors(rho_dist, data, formula, "newmoon", weights,
control)
Expand the TS models across the factorial combination of LDA models, formulas, and number of change points
Description
Expand the completely crossed combination of model inputs: LDA model results, formulas, and number of change points.
Usage
expand_TS(LDA_models, formulas, nchangepoints)
Arguments
LDA_models |
List of LDA models (class |
formulas |
Vector of |
nchangepoints |
Vector of |
Value
Expanded data.frame
table of the three values (columns) for
each unique model run (rows): [1] the LDA model (indicated
as a numeric element reference to the LDA_models
object), [2] the
regressor formula, and [3] the number of changepoints.
Examples
data(rodents)
document_term_table <- rodents$document_term_table
document_covariate_table <- rodents$document_covariate_table
LDAs <- LDA_set(document_term_table, topics = 2:3, nseeds = 2)
LDA_models <- select_LDA(LDAs)
weights <- document_weights(document_term_table)
formulas <- c(~ 1, ~ newmoon)
nchangepoints <- 0:1
expand_TS(LDA_models, formulas, nchangepoints)
Replace if TRUE
Description
If the focal input is TRUE
, replace it with
alternative.
Usage
iftrue(x = TRUE, alt = NULL)
Arguments
x |
Focal input. |
alt |
Alternative value. |
Value
x
if not TRUE
, alt
otherwise.
Examples
iftrue()
iftrue(TRUE, 1)
iftrue(2, 1)
iftrue(FALSE, 1)
Jornada rodent data
Description
Counts of 17 rodent species across 24 sampling events, with the count being the total number observed across three trapping webs (146 traps in total) (Lightfoot et al. 2012).
Usage
jornada
Format
A list
of two data.frame
-class objects with rows
corresponding to documents (sampling events). One element is the
document term table (called document_term_table
), which contains
counts of the species (terms) in each sample (document), and the other is
the document covariate table (called document_covariate_table
)
with columns of covariates (time step, year, season).
Source
https://lter.jornada.nmsu.edu/data-catalog/
References
Lightfoot, D. C., A. D. Davidson, D. G. Parker, L. Hernandez, and J. W. Laundre. 2012. Bottom-up regulation of desert grassland and shrubland rodent communities: implications of species-specific reproductive potentials. Journal of Mammalogy 93:1017-1028. link.
Calculate the log likelihood of a VEM LDA model fit
Description
Imported but updated calculations from topicmodels package, as
applied to Latent Dirichlet Allocation fit with Variational Expectation
Maximization via LDA
.
Usage
## S3 method for class 'LDA_VEM'
logLik(object, ...)
Arguments
object |
A |
... |
Not used, simply included to maintain method compatibility. |
Details
The number of degrees of freedom is 1 (for alpha) plus the number of entries in the document-topic matrix. The number of observations is the number of entries in the document-term matrix.
Value
Log likelihood of the model logLik
, also with df
(degrees of freedom) and nobs
(number of observations) values.
References
Buntine, W. 2002. Variational extensions to EM and multinomial PCA. European Conference on Machine Learning, Lecture Notes in Computer Science 2430:23-34. link.
Grun B. and K. Hornik. 2011. topicmodels: An R Package for Fitting Topic Models. Journal of Statistical Software 40:13. link.
Hoffman, M. D., D. M. Blei, and F. Bach. 2010. Online learning for latent Dirichlet allocation. Advances in Neural Information Processing Systems 23:856-864. link.
Examples
data(rodents)
lda_data <- rodents$document_term_table
r_LDA <- LDA_set(lda_data, topics = 2)
logLik(r_LDA[[1]])
Determine the log likelihood of a Time Series model
Description
Convenience function to extract and format the log likelihood
of a TS_fit
-class object fit by multinom_TS
.
Usage
## S3 method for class 'TS_fit'
logLik(object, ...)
Arguments
object |
Class |
... |
Not used, simply included to maintain method compatibility. |
Value
Log likelihood of the model logLik
, also with df
(degrees of freedom) and nobs
(number of observations) values.
Examples
data(rodents)
document_term_table <- rodents$document_term_table
document_covariate_table <- rodents$document_covariate_table
LDA_models <- LDA_set(document_term_table, topics = 2)[[1]]
data <- document_covariate_table
data$gamma <- LDA_models@gamma
weights <- document_weights(document_term_table)
TSmod <- TS(data, gamma ~ 1, nchangepoints = 1, "newmoon", weights)
logLik(TSmod)
Log likelihood of a multinomial TS model
Description
Convenience function to simply extract the logLik
element (and df
and nobs
) from a multinom_TS_fit
object fit by multinom_TS
. Extends
logLik
from multinom
to
multinom_TS_fit
objects.
Usage
## S3 method for class 'multinom_TS_fit'
logLik(object, ...)
Arguments
object |
A |
... |
Not used, simply included to maintain method compatibility. |
Value
Log likelihood of the model, as class logLik
, with
attributes df
(degrees of freedom) and nobs
(the number of
weighted observations, accounting for size differences among documents).
Examples
data(rodents)
dtt <- rodents$document_term_table
lda <- LDA_set(dtt, 2, 1, list(quiet = TRUE))
dct <- rodents$document_covariate_table
dct$gamma <- lda[[1]]@gamma
weights <- document_weights(dtt)
mts <- multinom_TS(dct, formula = gamma ~ 1, changepoints = c(20,50),
timename = "newmoon", weights = weights)
logLik(mts)
Calculate the log-sum-exponential (LSE) of a vector
Description
Calculate the exponent of a vector (offset by the max), sum the elements, calculate the log, remove the offset.
Usage
logsumexp(x)
Arguments
x |
|
Value
The LSE.
Examples
logsumexp(1:10)
Logical control on whether or not to memoise
Description
This function provides a simple, logical toggle control on
whether the function fun
should be memoised via
memoise
or not.
Usage
memoise_fun(fun, memoise_tf = TRUE)
Arguments
fun |
Function name to (potentially) be memoised. |
memoise_tf |
|
Value
fun
, memoised if desired.
Examples
sum_memo <- memoise_fun(sum)
Optionally generate a message based on a logical input
Description
Given the input to quiet
, generate the message(s)
in msg
or not.
Usage
messageq(msg = NULL, quiet = FALSE)
Arguments
msg |
|
quiet |
|
Examples
messageq("hello")
messageq("hello", TRUE)
Create a properly symmetric variance covariance matrix
Description
A wrapper on vcov
to produce a symmetric
matrix. If the default matrix returned by vcov
is
symmetric it is returned simply. If it is not, in fact, symmetric
(as occurs occasionally with multinom
applied to
proportions), the matrix is made symmetric by averaging the lower and
upper triangles. If the relative difference between the upper and lower
triangles for any entry is more than 0.1
Usage
mirror_vcov(x)
Arguments
x |
Model object that has a defined method for
|
Value
Properly symmetric variance covariance matrix
.
Examples
dat <- data.frame(y = rnorm(50), x = rnorm(50))
mod <- lm(dat)
mirror_vcov(mod)
Determine the mode of a distribution
Description
Find the most common entry in a vector. Ties are not allowed, the first value encountered within the modal set if there are ties is deemed the mode.
Usage
modalvalue(x)
Arguments
x |
|
Value
Numeric value of the mode.
Examples
d1 <- c(1, 1, 1, 2, 2, 3)
modalvalue(d1)
Fit a multinomial change point Time Series model
Description
Fit a set of multinomial regression models (via
multinom
, Venables and Ripley 2002) to a time series
of data divided into multiple segments (a.k.a. chunks) based on given
locations for a set of change points.
check_multinom_TS_inputs
checks that the inputs to
multinom_TS
are of proper classes for an analysis.
Usage
multinom_TS(
data,
formula,
changepoints = NULL,
timename = "time",
weights = NULL,
control = list()
)
check_multinom_TS_inputs(
data,
formula = gamma ~ 1,
changepoints = NULL,
timename = "time",
weights = NULL,
control = list()
)
Arguments
data |
|
formula |
|
changepoints |
Numeric vector indicating locations of the change
points. Must be conformable to |
timename |
|
weights |
Optional class |
control |
A |
Value
multinom_TS
: Object of class multinom_TS_fit
,
which is a list of [1]
chunk-level model fits ("chunk models"
), [2] the total log
likelihood combined across all chunks ("logLik"
), and [3] a
data.frame
of chunk beginning and ending times ("logLik"
with columns "start"
and "end"
).
check_multinom_TS_inputs
: an error message is thrown if any
input is improper, otherwise NULL
.
References
Venables, W. N. and B. D. Ripley. 2002. Modern and Applied Statistics with S. Fourth Edition. Springer, New York, NY, USA.
Examples
data(rodents)
dtt <- rodents$document_term_table
lda <- LDA_set(dtt, 2, 1, list(quiet = TRUE))
dct <- rodents$document_covariate_table
dct$gamma <- lda[[1]]@gamma
weights <- document_weights(dtt)
check_multinom_TS_inputs(dct, timename = "newmoon")
mts <- multinom_TS(dct, formula = gamma ~ 1, changepoints = c(20,50),
timename = "newmoon", weights = weights)
Fit a multinomial Time Series model chunk
Description
Fit a multinomial regression model (via
multinom
, Ripley 1996, Venables and Ripley 2002)
to a defined chunk of time (a.k.a. segment)
[chunk$start, chunk$end]
within a time series.
Usage
multinom_TS_chunk(
data,
formula,
chunk,
timename = "time",
weights = NULL,
control = list()
)
Arguments
data |
Class |
formula |
Formula as a |
chunk |
Length-2 vector of times: [1] |
timename |
|
weights |
Optional class |
control |
A |
Value
Fitted model object for the chunk, of classes multinom
and
nnet
.
References
Ripley, B. D. 1996. Pattern Recognition and Neural Networks. Cambridge.
Venables, W. N. and B. D. Ripley. 2002. Modern Applied Statistics with S. Fourth edition. Springer.
Examples
data(rodents)
dtt <- rodents$document_term_table
lda <- LDA_set(dtt, 2, 1, list(quiet = TRUE))
dct <- rodents$document_covariate_table
dct$gamma <- lda[[1]]@gamma
weights <- document_weights(dtt)
chunk <- c(start = 0, end = 100)
mtsc <- multinom_TS_chunk(dct, formula = gamma ~ 1, chunk = chunk,
timename = "newmoon", weights = weights)
Normalize a vector
Description
Normalize a numeric
vector to be on the scale of [0,1].
Usage
normalize(x)
Arguments
x |
|
Value
Normalized x
.
Examples
normalize(1:10)
Package the output of LDA_TS
Description
Combine the objects returned by LDA_set
,
select_LDA
, TS_on_LDA
, and
select_TS
, name them as elements of the list, and
set the class of the list as LDA_TS
, for the return from
LDA_TS
.
Usage
package_LDA_TS(LDAs, sel_LDA, TSs, sel_TSs)
Arguments
LDAs |
List (class: |
sel_LDA |
A reduced version of |
TSs |
Class |
sel_TSs |
A reduced version of |
Value
Class LDA_TS
-class object including all fitted models and
selected models specifically, ready to be returned from
LDA_TS
.
Examples
data(rodents)
data <- rodents
control <- LDA_TS_control()
dtt <- data$document_term_table
dct <- data$document_covariate_table
weights <- document_weights(dtt)
LDAs <- LDA_set(dtt, 2, 1, control$LDA_set_control)
sel_LDA <- select_LDA(LDAs, control$LDA_set_control)
TSs <- TS_on_LDA(sel_LDA, dct, ~1, 1, "newmoon", weights,
control$TS_control)
sel_TSs <- select_TS(TSs, control$TS_control)
package_LDA_TS(LDAs, sel_LDA, TSs, sel_TSs)
Package the output from LDA_set
Description
Name the elements (LDA models) and set the class
(LDA_set
) of the models returned by LDA_set
.
Usage
package_LDA_set(mods, mod_topics, mod_seeds)
Arguments
mods |
Fitted models returned from |
mod_topics |
Vector of |
mod_seeds |
Vector of |
Value
lis
(class: LDA_set
) of LDA models (class:
LDA_VEM
).
Examples
data(rodents)
document_term_table <- rodents$document_term_table
topics <- 2
nseeds <- 2
control <- LDA_set_control()
mod_topics <- rep(topics, each = length(seq(2, nseeds * 2, 2)))
iseed <- control$iseed
mod_seeds <- rep(seq(iseed, iseed + (nseeds - 1)* 2, 2), length(topics))
nmods <- length(mod_topics)
mods <- vector("list", length = nmods)
for (i in 1:nmods){
LDA_msg(mod_topics[i], mod_seeds[i], control)
control_i <- prep_LDA_control(seed = mod_seeds[i], control = control)
mods[[i]] <- topicmodels::LDA(document_term_table, k = mod_topics[i],
control = control_i)
}
package_LDA_set(mods, mod_topics, mod_seeds)
Summarize the Time Series model
Description
Calculate relevant summaries for the run of a Time Series
model within TS
and package the output as a
TS_fit
-class object.
Usage
package_TS(data, formula, timename, weights, control, rho_dist, eta_dist)
Arguments
data |
|
formula |
|
timename |
|
weights |
Optional class |
control |
A |
rho_dist |
List of saved data objects from the ptMCMC estimation of
change point locations returned by |
eta_dist |
Matrix of draws (rows) from the marginal posteriors of the
coefficients across the segments (columns), as estimated by
|
Value
TS_fit
-class list containing the following elements, many of
which are hidden for print
ing, but are accessible:
- data
data
input to the function.- formula
formula
input to the function.- nchangepoints
nchangepoints
input to the function.- weights
weights
input to the function.- timename
timename
input to the function.- control
control
input to the function.- lls
Iteration-by-iteration logLik values for the full time series fit by
multinom_TS
.- rhos
Iteration-by-iteration change point estimates from
est_changepoints
.- etas
Iteration-by-iteration marginal regressor estimates from
est_regressors
, which have been unconditioned with respect to the change point locations.- ptMCMC_diagnostics
ptMCMC diagnostics, see
diagnose_ptMCMC
- rho_summary
Summary table describing
rhos
(the change point locations), seesummarize_rhos
.- rho_vcov
Variance-covariance matrix for the estimates of
rhos
(the change point locations), seemeasure_rho_vcov
.- eta_summary
Summary table describing
ets
(the regressors), seesummarize_etas
.- eta_vcov
Variance-covariance matrix for the estimates of
etas
(the regressors), seemeasure_eta_vcov
.- logLik
Across-iteration average of log-likelihoods (
lls
).- nparams
Total number of parameters in the full model, including the change point locations and regressors.
- AIC
Penalized negative log-likelihood, based on
logLik
andnparams
.
Examples
data(rodents)
document_term_table <- rodents$document_term_table
document_covariate_table <- rodents$document_covariate_table
LDA_models <- LDA_set(document_term_table, topics = 2)[[1]]
data <- document_covariate_table
data$gamma <- LDA_models@gamma
weights <- document_weights(document_term_table)
formula <- gamma ~ 1
nchangepoints <- 1
control <- TS_control()
data <- data[order(data[,"newmoon"]), ]
rho_dist <- est_changepoints(data, formula, nchangepoints, "newmoon",
weights, control)
eta_dist <- est_regressors(rho_dist, data, formula, "newmoon", weights,
control)
package_TS(data, formula, "newmoon", weights, control, rho_dist,
eta_dist)
Package the output of TS_on_LDA
Description
Set the class and name the elements of the results list
returned from applying TS
to the combination of TS models
requested for the LDA model(s) input.
Usage
package_TS_on_LDA(TSmods, LDA_models, models)
Arguments
TSmods |
list of results from |
LDA_models |
List of LDA models (class |
models |
|
Value
Class TS_on_LDA
list of results from TS
applied for each model on each LDA model input.
Examples
data(rodents)
document_term_table <- rodents$document_term_table
document_covariate_table <- rodents$document_covariate_table
LDAs <- LDA_set(document_term_table, topics = 2:3, nseeds = 2)
LDA_models <- select_LDA(LDAs)
weights <- document_weights(document_term_table)
mods <- expand_TS(LDA_models, c(~ 1, ~ newmoon), 0:1)
nmods <- nrow(mods)
TSmods <- vector("list", nmods)
for(i in 1:nmods){
formula_i <- mods$formula[[i]]
nchangepoints_i <- mods$nchangepoints[i]
data_i <- prep_TS_data(document_covariate_table, LDA_models, mods, i)
TSmods[[i]] <- TS(data_i, formula_i, nchangepoints_i, "newmoon",
weights, TS_control())
}
package_TS_on_LDA(TSmods, LDA_models, mods)
Package the output of the chunk-level multinomial models into a multinom_TS_fit list
Description
Takes the list of fitted chunk-level models returned from
TS_chunk_memo
(the memoised version of
multinom_TS_chunk
and packages it as a
multinom_TS_fit
object. This involves naming the model fits based
on the chunk time windows, combining the log likelihood values across the
chunks, and setting the class of the output object.
Usage
package_chunk_fits(chunks, fits)
Arguments
chunks |
Data frame of |
fits |
List of chunk-level fits returned by |
Value
Object of class multinom_TS_fit
, which is a list of [1]
chunk-level model fits, [2] the total log likelihood combined across
all chunks, and [3] the chunk time data table.
Examples
data(rodents)
dtt <- rodents$document_term_table
lda <- LDA_set(dtt, 2, 1, list(quiet = TRUE))
dct <- rodents$document_covariate_table
dct$gamma <- lda[[1]]@gamma
weights <- document_weights(dtt)
formula <- gamma ~ 1
changepoints <- c(20,50)
timename <- "newmoon"
TS_chunk_memo <- memoise_fun(multinom_TS_chunk, TRUE)
chunks <- prep_chunks(dct, changepoints, timename)
nchunks <- nrow(chunks)
fits <- vector("list", length = nchunks)
for (i in 1:nchunks){
fits[[i]] <- TS_chunk_memo(dct, formula, chunks[i, ], timename,
weights, TS_control())
}
package_chunk_fits(chunks, fits)
Plot the key results from a full LDATS analysis
Description
Generalization of the plot
function to
work on fitted LDA_TS model objects (class LDA_TS
) returned by
LDA_TS
).
Usage
## S3 method for class 'LDA_TS'
plot(
x,
...,
cols = set_LDA_TS_plot_cols(),
bin_width = 1,
xname = NULL,
border = NA,
selection = "median"
)
Arguments
x |
A |
... |
Additional arguments to be passed to subfunctions. Not currently
used, just retained for alignment with |
cols |
|
bin_width |
Width of the bins used in the histograms of the summary time series plot, in units of the time variable used to fit the model (the x-axis). |
xname |
Label for the x-axis in the summary time series plot. Defaults
to |
border |
Border for the histogram, default is |
selection |
Indicator of the change points to use in the time series
summary plot. Currently only defined for |
Value
NULL
.
Examples
data(rodents)
mod <- LDA_TS(data = rodents, topics = 2, nseeds = 1, formulas = ~1,
nchangepoints = 1, timename = "newmoon")
plot(mod, binwidth = 5, xlab = "New moon")
Plot the results of an LDATS LDA model
Description
Create an LDATS LDA summary plot, with a top panel showing
the topic proportions for each word and a bottom panel showing the topic
proportions of each document/over time. The plot function is defined for
class LDA_VEM
specifically (see LDA
).
LDA_plot_top_panel
creates an LDATS LDA summary plot
top panel showing the topic proportions word-by-word.
LDA_plot_bottom_panel
creates an LDATS LDA summary plot
bottom panel showing the topic proportions over time/documents.
Usage
## S3 method for class 'LDA_VEM'
plot(
x,
...,
xtime = NULL,
xname = NULL,
cols = NULL,
option = "C",
alpha = 0.8,
LDATS = FALSE
)
LDA_plot_top_panel(
x,
cols = NULL,
option = "C",
alpha = 0.8,
together = FALSE,
LDATS = FALSE
)
LDA_plot_bottom_panel(
x,
xtime = NULL,
xname = NULL,
cols = NULL,
option = "C",
alpha = 0.8,
together = FALSE,
LDATS = FALSE
)
Arguments
x |
Object of class |
... |
Not used, retained for alignment with base function. |
xtime |
Optional x values used to plot the topic proportions according to a specific time value (rather than simply the order of observations). |
xname |
Optional name for the x values used in plotting the topic proportions (otherwise defaults to "Document"). |
cols |
Colors to be used to plot the topics.
Any valid color values (e.g., see |
option |
A |
alpha |
Numeric value [0,1] that indicates the transparency of the
colors used. Supported only on some devices, see
|
LDATS |
|
together |
|
Value
NULL
.
Examples
data(rodents)
lda_data <- rodents$document_term_table
r_LDA <- LDA_set(lda_data, topics = 4, nseeds = 10)
best_lda <- select_LDA(r_LDA)[[1]]
plot(best_lda, option = "cividis")
LDA_plot_top_panel(best_lda, option = "cividis")
LDA_plot_bottom_panel(best_lda, option = "cividis")
Plot a set of LDATS LDA models
Description
Generalization of the plot
function to
work on a list of LDA topic models (class LDA_set
).
Usage
## S3 method for class 'LDA_set'
plot(x, ...)
Arguments
x |
An |
... |
Additional arguments to be passed to subfunctions. |
Value
NULL
.
Examples
data(rodents)
lda_data <- rodents$document_term_table
r_LDA <- LDA_set(lda_data, topics = 2, nseeds = 2)
plot(r_LDA)
Plot an LDATS TS model
Description
Generalization of the plot
function to
work on fitted TS model objects (class TS_fit
) returned from
TS
.
Usage
## S3 method for class 'TS_fit'
plot(
x,
...,
plot_type = "summary",
interactive = FALSE,
cols = set_TS_summary_plot_cols(),
bin_width = 1,
xname = NULL,
border = NA,
selection = "median",
LDATS = FALSE
)
Arguments
x |
A |
... |
Additional arguments to be passed to subfunctions. Not currently
used, just retained for alignment with |
plot_type |
"diagnostic" or "summary". |
interactive |
|
cols |
|
bin_width |
Width of the bins used in the histograms of the summary time series plot, in units of the x-axis (the time variable used to fit the model). |
xname |
Label for the x-axis in the summary time series plot. Defaults
to |
border |
Border for the histogram, default is |
selection |
Indicator of the change points to use in the time series
summary plot. Currently only defined for |
LDATS |
|
Value
NULL
.
Examples
data(rodents)
document_term_table <- rodents$document_term_table
document_covariate_table <- rodents$document_covariate_table
LDA_models <- LDA_set(document_term_table, topics = 2)[[1]]
data <- document_covariate_table
data$gamma <- LDA_models@gamma
weights <- document_weights(document_term_table)
TSmod <- TS(data, gamma ~ 1, nchangepoints = 1, "newmoon", weights)
plot(TSmod)
Produce the posterior distribution histogram panel for the TS diagnostic plot of a parameter
Description
Produce a vanilla histogram plot using hist
for the
parameter of interest (rho or eta) as part of
TS_diagnostics_plot
. A vertical line is added to show the
median of the posterior.
Usage
posterior_plot(x, xlab = "parameter value")
Arguments
x |
Vector of parameter values drawn from the posterior distribution, indexed to the iteration by the order of the vector. |
xlab |
|
Value
NULL
.
Examples
posterior_plot(rnorm(100, 0, 1))
Set the control inputs to include the seed
Description
Update the control list for the LDA model with the specific seed as indicated. And remove controls not used within the LDA itself.
Usage
prep_LDA_control(seed, control = list())
Arguments
seed |
|
control |
Named list of control parameters to be used in
|
Value
list
of controls to be used in the LDA.
Examples
prep_LDA_control(seed = 1)
Prepare the model-specific data to be used in the TS analysis of LDA output
Description
Append the estimated topic proportions from a fitted LDA model
to the document covariate table to create the data structure needed for
TS
.
Usage
prep_TS_data(document_covariate_table, LDA_models, mods, i = 1)
Arguments
document_covariate_table |
Document covariate table (rows: documents,
columns: time index and covariate options). Every model needs a
covariate to describe the time value for each document (in whatever
units and whose name in the table is input in |
LDA_models |
List of LDA models (class |
mods |
The |
i |
|
Value
Class data.frame
object including [1] the time variable
(indicated in control
), [2] the predictor variables (required by
formula
) and [3], the multinomial response variable (indicated
in formula
), ready for input into TS
.
Examples
data(rodents)
document_term_table <- rodents$document_term_table
document_covariate_table <- rodents$document_covariate_table
LDAs <- LDA_set(document_term_table, topics = 2:3, nseeds = 2)
LDA_models <- select_LDA(LDAs)
weights <- document_weights(document_term_table)
formulas <- c(~ 1, ~ newmoon)
mods <- expand_TS(LDA_models, formulas = ~1, nchangepoints = 0)
data1 <- prep_TS_data(document_covariate_table, LDA_models, mods)
Prepare the time chunk table for a multinomial change point Time Series model
Description
Creates the table containing the start and end times for each
chunk within a time series, based on the change points (used to break up
the time series) and the range of the time series. If there are no
change points (i.e. changepoints
is NULL
, there is still a
single chunk defined by the start and end of the time series.
Usage
prep_chunks(data, changepoints = NULL, timename = "time")
Arguments
data |
Class |
changepoints |
Numeric vector indicating locations of the change
points. Must be conformable to |
timename |
|
Value
data.frame
of start
and end
times (columns)
for each chunk (rows).
Examples
data(rodents)
dtt <- rodents$document_term_table
lda <- LDA_set(dtt, 2, 1, list(quiet = TRUE))
dct <- rodents$document_covariate_table
dct$gamma <- lda[[1]]@gamma
chunks <- prep_chunks(dct, changepoints = 100, timename = "newmoon")
Initialize and update the change point matrix used in the ptMCMC algorithm
Description
Each of the chains is initialized by prep_cpts
using a
draw from the available times (i.e. assuming a uniform prior), the best
fit (by likelihood) draw is put in the focal chain with each subsequently
worse fit placed into the subsequently hotter chain. update_cpts
updates the change points after every iteration in the ptMCMC algorithm.
Usage
prep_cpts(data, formula, nchangepoints, timename, weights, control = list())
update_cpts(cpts, swaps)
Arguments
data |
|
formula |
|
nchangepoints |
|
timename |
|
weights |
Optional class |
control |
A |
cpts |
The existing matrix of change points. |
swaps |
Chain configuration after among-temperature swaps. |
Value
list
of [1] matrix
of change points (rows) for
each temperature (columns) and [2] vector
of log-likelihood
values for each of the chains.
Examples
data(rodents)
document_term_table <- rodents$document_term_table
document_covariate_table <- rodents$document_covariate_table
LDA_models <- LDA_set(document_term_table, topics = 2)[[1]]
data <- document_covariate_table
data$gamma <- LDA_models@gamma
weights <- document_weights(document_term_table)
data <- data[order(data[,"newmoon"]), ]
saves <- prep_saves(1, TS_control())
inputs <- prep_ptMCMC_inputs(data, gamma ~ 1, 1, "newmoon", weights,
TS_control())
cpts <- prep_cpts(data, gamma ~ 1, 1, "newmoon", weights, TS_control())
ids <- prep_ids(TS_control())
for(i in 1:TS_control()$nit){
steps <- step_chains(i, cpts, inputs)
swaps <- swap_chains(steps, inputs, ids)
saves <- update_saves(i, saves, steps, swaps)
cpts <- update_cpts(cpts, swaps)
ids <- update_ids(ids, swaps)
}
Initialize and update the chain ids throughout the ptMCMC algorithm
Description
prep_ids
creates and update_ids
updates
the active vector of identities (ids) for each of the chains in the
ptMCMC algorithm. These ids are used to track trips of the particles
among chains.
These functions were designed to work within TS
and
specifically est_changepoints
, but have been generalized
and would work within any general ptMCMC as long as control
,
ids
, and swaps
are formatted properly.
Usage
prep_ids(control = list())
update_ids(ids, swaps)
Arguments
control |
A |
ids |
The existing vector of chain ids. |
swaps |
Chain configuration after among-temperature swaps. |
Value
The vector of chain ids.
Examples
prep_ids()
data(rodents)
document_term_table <- rodents$document_term_table
document_covariate_table <- rodents$document_covariate_table
LDA_models <- LDA_set(document_term_table, topics = 2)[[1]]
data <- document_covariate_table
data$gamma <- LDA_models@gamma
weights <- document_weights(document_term_table)
data <- data[order(data[,"newmoon"]), ]
saves <- prep_saves(1, TS_control())
inputs <- prep_ptMCMC_inputs(data, gamma ~ 1, 1, "newmoon", weights,
TS_control())
cpts <- prep_cpts(data, gamma ~ 1, 1, "newmoon", weights, TS_control())
ids <- prep_ids(TS_control())
for(i in 1:TS_control()$nit){
steps <- step_chains(i, cpts, inputs)
swaps <- swap_chains(steps, inputs, ids)
saves <- update_saves(i, saves, steps, swaps)
cpts <- update_cpts(cpts, swaps)
ids <- update_ids(ids, swaps)
}
Initialize and tick through the progress bar
Description
prep_pbar
creates and update_pbar
steps
through the progress bars (if desired) in TS
Usage
prep_pbar(control = list(), bar_type = "rho", nr = NULL)
update_pbar(pbar, control = list())
Arguments
control |
A |
bar_type |
"rho" (for change point locations) or "eta" (for regressors). |
nr |
|
pbar |
The progress bar object returned from |
Value
prep_pbar
: the initialized progress bar object.
update_pbar
: the ticked-forward pbar
.
Examples
pb <- prep_pbar(control = list(nit = 2)); pb
pb <- update_pbar(pb); pb
pb <- update_pbar(pb); pb
Pre-calculate the change point proposal distribution for the ptMCMC algorithm
Description
Calculate the proposal distribution in advance of actually running the ptMCMC algorithm in order to decrease computation time. The proposal distribution is a joint of three distributions: [1] a multinomial distribution selecting among the change points within the chain, [2] a binomial distribution selecting the direction of the step of the change point (earlier or later in the time series), and [3] a geometric distribution selecting the magnitude of the step.
Usage
prep_proposal_dist(nchangepoints, control = list())
Arguments
nchangepoints |
Integer corresponding to the number of change points to include in the model. 0 is a valid input (corresponding to no change points, so a singular time series model), and the current implementation can reasonably include up to 6 change points. The number of change points is used to dictate the segmentation of the data for each continuous model and each LDA model. |
control |
A |
Value
list
of two matrix
elements: [1] the size of the
proposed step for each iteration of each chain and [2] the identity of
the change point location to be shifted by the step for each iteration of
each chain.
Examples
prep_proposal_dist(nchangepoints = 2)
Prepare the inputs for the ptMCMC algorithm estimation of change points
Description
Package the static inputs (controls and data structures) used
by the ptMCMC algorithm in the context of estimating change points.
This function was designed to work within TS
and
specifically est_changepoints
. It is still hardcoded to do
so, but has the capacity to be generalized to work with any estimation
via ptMCMC with additional coding work.
Usage
prep_ptMCMC_inputs(
data,
formula,
nchangepoints,
timename,
weights = NULL,
control = list()
)
Arguments
data |
Class |
formula |
|
nchangepoints |
Integer corresponding to the number of change points to include in the model. 0 is a valid input (corresponding to no change points, so a singular time series model), and the current implementation can reasonably include up to 6 change points. The number of change points is used to dictate the segmentation of the data for each continuous model and each LDA model. |
timename |
|
weights |
Optional class |
control |
A |
Value
Class ptMCMC_inputs
list
, containing the static
inputs for use within the ptMCMC algorithm for estimating change points.
Examples
data(rodents)
document_term_table <- rodents$document_term_table
document_covariate_table <- rodents$document_covariate_table
LDA_models <- LDA_set(document_term_table, topics = 2)[[1]]
data <- document_covariate_table
data$gamma <- LDA_models@gamma
weights <- document_weights(document_term_table)
data <- data[order(data[,"newmoon"]), ]
saves <- prep_saves(1, TS_control())
inputs <- prep_ptMCMC_inputs(data, gamma ~ 1, 1, "newmoon", weights,
TS_control())
Prepare and update the data structures to save the ptMCMC output
Description
prep_saves
creates the data structure used to save the
output from each iteration of the ptMCMC algorithm, which is added via
update_saves
. Once the ptMCMC is complete, the saved data objects
are then processed (burn-in iterations are dropped and the remaining
iterations are thinned) via process_saves
.
This set of functions was designed to work within TS
and
specifically est_changepoints
. They are still hardcoded to
do so, but have the capacity to be generalized to work with any
estimation via ptMCMC with additional coding work.
Usage
prep_saves(nchangepoints, control = list())
update_saves(i, saves, steps, swaps)
process_saves(saves, control = list())
Arguments
nchangepoints |
|
control |
A |
i |
|
saves |
The existing list of saved data objects. |
steps |
Chain configuration after within-temperature steps. |
swaps |
Chain configuration after among-temperature swaps. |
Value
list
of ptMCMC objects: change points ($cpts
),
log-likelihoods ($lls
), chain ids ($ids
), step acceptances
($step_accepts
), and swap acceptances ($swap_accepts
).
Examples
data(rodents)
document_term_table <- rodents$document_term_table
document_covariate_table <- rodents$document_covariate_table
LDA_models <- LDA_set(document_term_table, topics = 2)[[1]]
data <- document_covariate_table
data$gamma <- LDA_models@gamma
weights <- document_weights(document_term_table)
data <- data[order(data[,"newmoon"]), ]
saves <- prep_saves(1, TS_control())
inputs <- prep_ptMCMC_inputs(data, gamma ~ 1, 1, "newmoon", weights,
TS_control())
cpts <- prep_cpts(data, gamma ~ 1, 1, "newmoon", weights, TS_control())
ids <- prep_ids(TS_control())
for(i in 1:TS_control()$nit){
steps <- step_chains(i, cpts, inputs)
swaps <- swap_chains(steps, inputs, ids)
saves <- update_saves(i, saves, steps, swaps)
cpts <- update_cpts(cpts, swaps)
ids <- update_ids(ids, swaps)
}
process_saves(saves, TS_control())
Prepare the ptMCMC temperature sequence
Description
Create the series of temperatures used in the ptMCMC
algorithm.
This function was designed to work within TS
and
est_changepoints
specifically, but has been generalized
and would work with any ptMCMC model as long as control
includes the relevant control parameters (and provided that the
check_control
function and its use here are generalized).
Usage
prep_temp_sequence(control = list())
Arguments
control |
A |
Value
vector
of temperatures.
Examples
prep_temp_sequence()
Print the selected LDA and TS models of LDA_TS object
Description
Convenience function to print only the selected elements of a
LDA_TS
-class object returned by LDA_TS
Usage
## S3 method for class 'LDA_TS'
print(x, ...)
Arguments
x |
Class |
... |
Not used, simply included to maintain method compatibility. |
Value
The selected models in x
as a two-element list
with
the TS component only returning the non-hidden components.
Examples
data(rodents)
mod <- LDA_TS(data = rodents, topics = 2, nseeds = 1, formulas = ~1,
nchangepoints = 1, timename = "newmoon")
print(mod)
Print a Time Series model fit
Description
Convenience function to print only the most important
components of a TS_fit
-class object fit by
TS
.
Usage
## S3 method for class 'TS_fit'
print(x, ...)
Arguments
x |
Class |
... |
Not used, simply included to maintain method compatibility. |
Value
The non-hidden parts of x
as a list
.
Examples
data(rodents)
document_term_table <- rodents$document_term_table
document_covariate_table <- rodents$document_covariate_table
LDA_models <- LDA_set(document_term_table, topics = 2)[[1]]
data <- document_covariate_table
data$gamma <- LDA_models@gamma
weights <- document_weights(document_term_table)
TSmod <- TS(data, gamma ~ 1, nchangepoints = 1, "newmoon", weights)
print(TSmod)
Print a set of Time Series models fit to LDAs
Description
Convenience function to print only the names of a
TS_on_LDA
-class object generated by TS_on_LDA
.
Usage
## S3 method for class 'TS_on_LDA'
print(x, ...)
Arguments
x |
Class |
... |
Not used, simply included to maintain method compatibility. |
Value
character
vector
of the names of x
's models.
Examples
data(rodents)
document_term_table <- rodents$document_term_table
document_covariate_table <- rodents$document_covariate_table
LDAs <- LDA_set(document_term_table, topics = 2:3, nseeds = 2)
LDA_models <- select_LDA(LDAs)
weights <- document_weights(document_term_table)
formulas <- c(~ 1, ~ newmoon)
mods <- TS_on_LDA(LDA_models, document_covariate_table, formulas,
nchangepoints = 0:1, timename = "newmoon", weights)
print(mods)
Print the message to the console about which combination of the Time Series and LDA models is being run
Description
If desired, print a message at the beginning of every model combination stating the TS model and the LDA model being evaluated.
Usage
print_model_run_message(models, i, LDA_models, control)
Arguments
models |
|
i |
|
LDA_models |
List of LDA models (class |
control |
A |
Value
NULL
.
Examples
data(rodents)
document_term_table <- rodents$document_term_table
document_covariate_table <- rodents$document_covariate_table
LDAs <- LDA_set(document_term_table, topics = 2:3, nseeds = 2)
LDA_models <- select_LDA(LDAs)
weights <- document_weights(document_term_table)
formulas <- c(~ 1, ~ newmoon)
nchangepoints <- 0:1
mods <- expand_TS(LDA_models, formulas, nchangepoints)
print_model_run_message(mods, 1, LDA_models, TS_control())
Fit the chunk-level models to a time series, given a set of proposed change points within the ptMCMC algorithm
Description
This function wraps around TS_memo
(optionally memoised multinom_TS
) to provide a
simpler interface within the ptMCMC algorithm and is implemented within
propose_step
.
Usage
proposed_step_mods(prop_changepts, inputs)
Arguments
prop_changepts |
|
inputs |
Class |
Value
List of models associated with the proposed step, with an element for each chain.
Examples
data(rodents)
document_term_table <- rodents$document_term_table
document_covariate_table <- rodents$document_covariate_table
LDA_models <- LDA_set(document_term_table, topics = 2)[[1]]
data <- document_covariate_table
data$gamma <- LDA_models@gamma
weights <- document_weights(document_term_table)
data <- data[order(data[,"newmoon"]), ]
saves <- prep_saves(1, TS_control())
inputs <- prep_ptMCMC_inputs(data, gamma ~ 1, 1, "newmoon", weights,
TS_control())
cpts <- prep_cpts(data, gamma ~ 1, 1, "newmoon", weights, TS_control())
i <- 1
pdist <- inputs$pdist
ntemps <- length(inputs$temps)
selection <- cbind(pdist$which_steps[i, ], 1:ntemps)
prop_changepts <- cpts$changepts
curr_changepts_s <- cpts$changepts[selection]
prop_changepts_s <- curr_changepts_s + pdist$steps[i, ]
if(all(is.na(prop_changepts_s))){
prop_changepts_s <- NULL
}
prop_changepts[selection] <- prop_changepts_s
mods <- proposed_step_mods(prop_changepts, inputs)
Add change point location lines to the time series plot
Description
Adds vertical lines to the plot of the time series of fitted proportions associated with the change points of interest.
Usage
rho_lines(spec_rhos)
Arguments
spec_rhos |
|
Examples
data(rodents)
document_term_table <- rodents$document_term_table
document_covariate_table <- rodents$document_covariate_table
LDA_models <- LDA_set(document_term_table, topics = 2)[[1]]
data <- document_covariate_table
data$gamma <- LDA_models@gamma
weights <- document_weights(document_term_table)
TSmod <- TS(data, gamma ~ 1, nchangepoints = 1, "newmoon", weights)
pred_gamma_TS_plot(TSmod)
rho_lines(200)
Portal rodent data
Description
An example LDATS dataset, functionally that used in Christensen et al. (2018). The data are counts of 21 rodent species across 436 sampling events, with the count being the total number observed across 8 50 m x 50 m plots, each sampled using 49 live traps (Brown 1998, Ernest et al. 2016).
Usage
rodents
Format
A list
of two data.frame
-class objects with rows
corresponding to documents (sampling events). One element is the
document term table (called document_term_table
), which contains
counts of the species (terms) in each sample (document), and the other is
the document covariate table (called document_covariate_table
)
with columns of covariates (newmoon number, sin and cos of the fraction
of the year).
Source
https://github.com/weecology/PortalData/tree/master/Rodents
References
Brown, J. H. 1998. The desert granivory experiments at Portal. Pages 71-95 in W. J. Resetarits Jr. and J. Bernardo, editors, Experimental Ecology. Oxford University Press, New York, New York, USA.
Christensen, E., D. J. Harris, and S. K. M. Ernest. 2018. Long-term community change through multiple rapid transitions in a desert rodent community. Ecology 99:1523-1529. link.
Ernest, S. K. M., et al. 2016. Long-term monitoring and experimental manipulation of a Chihuahuan desert ecosystem near Portal, Arizona (1977-2013). Ecology 97:1082. link.
Select the best LDA model(s) for use in time series
Description
Select the best model(s) of interest from an
LDA_set
object, based on a set of user-provided functions. The
functions default to choosing the model with the lowest AIC value.
Usage
select_LDA(LDA_models = NULL, control = list())
Arguments
LDA_models |
An object of class |
control |
A |
Value
A reduced version of LDA_models
that only includes the
selected LDA model(s). The returned object is still an object of
class LDA_set
.
Examples
data(rodents)
lda_data <- rodents$document_term_table
r_LDA <- LDA_set(lda_data, topics = 2, nseeds = 2)
select_LDA(r_LDA)
Select the best Time Series model
Description
Select the best model of interest from an
TS_on_LDA
object generated by TS_on_LDA
, based on
a set of user-provided functions. The functions default to choosing the
model with the lowest AIC value.
Presently, the set of functions should result in a singular selected
model. If multiple models are chosen via the selection, only the first
is returned.
Usage
select_TS(TS_models, control = list())
Arguments
TS_models |
An object of class |
control |
A |
Value
A reduced version of TS_models
that only includes the
selected TS model. The returned object is a single TS model object of
class TS_fit
.
Examples
data(rodents)
document_term_table <- rodents$document_term_table
document_covariate_table <- rodents$document_covariate_table
LDAs <- LDA_set(document_term_table, topics = 2:3, nseeds = 2)
LDA_models <- select_LDA(LDAs)
weights <- document_weights(document_term_table)
formulas <- c(~ 1, ~ newmoon)
mods <- TS_on_LDA(LDA_models, document_covariate_table, formulas,
nchangepoints = 0:1, timename = "newmoon", weights)
select_TS(mods)
Create the list of colors for the LDATS summary plot
Description
A default list generator function that produces the options
for the colors controlling the panels of the LDATS summary plots, needed
because the change point histogram panel should be in a different color
scheme than the LDA and fitted time series model panels, which should be
in a matching color scheme. See set_LDA_plot_colors
,
set_TS_summary_plot_cols
, set_gamma_colors
,
and set_rho_hist_colors
for specific details on usage.
Usage
set_LDA_TS_plot_cols(
rho_cols = NULL,
rho_option = "D",
rho_alpha = 0.4,
gamma_cols = NULL,
gamma_option = "C",
gamma_alpha = 0.8
)
Arguments
rho_cols |
Colors to be used to plot the histograms of change points.
Any valid color values (e.g., see |
rho_option |
A |
rho_alpha |
Numeric value [0,1] that indicates the transparency of the
colors used. Supported only on some devices, see
|
gamma_cols |
Colors to be used to plot the LDA topic proportions,
time series of observed topic proportions, and time series of fitted
topic proportions. Any valid color values (e.g., see
|
gamma_option |
A |
gamma_alpha |
Numeric value [0,1] that indicates the transparency of
the colors used. Supported only on some devices, see
|
Value
list
of elements used to define the colors for the two
panels of the summary plot, as generated simply using
set_LDA_TS_plot_cols
. cols
has two elements:
LDA
and TS
, each corresponding the set of plots for
its stage in the full model. LDA
contains entries cols
and options
(see set_LDA_plot_colors
). TS
contains two entries, rho
and gamma
, each corresponding
to the related panel, and each containing default values for entries
named cols
, option
, and alpha
(see
set_TS_summary_plot_cols
, set_gamma_colors
,
and set_rho_hist_colors
).
Examples
set_LDA_TS_plot_cols()
Prepare the colors to be used in the LDA plots
Description
Based on the inputs, create the set of colors to be used in
the LDA plots made by plot.LDA_TS
.
Usage
set_LDA_plot_colors(x, cols = NULL, option = "C", alpha = 0.8)
Arguments
x |
Object of class |
cols |
Colors to be used to plot the topics.
Any valid color values (e.g., see |
option |
A |
alpha |
Numeric value [0,1] that indicates the transparency of the
colors used. Supported only on some devices, see
|
Value
vector
of character
hex codes indicating colors to
use.
Examples
data(rodents)
lda_data <- rodents$document_term_table
r_LDA <- LDA_set(lda_data, topics = 4, nseeds = 10)
set_LDA_plot_colors(r_LDA[[1]])
Create the list of colors for the TS summary plot
Description
A default list generator function that produces the options
for the colors controlling the panels of the TS summary plots, so needed
because the panels should be in different color schemes. See
set_gamma_colors
and set_rho_hist_colors
for
specific details on usage.
Usage
set_TS_summary_plot_cols(
rho_cols = NULL,
rho_option = "D",
rho_alpha = 0.4,
gamma_cols = NULL,
gamma_option = "C",
gamma_alpha = 0.8
)
Arguments
rho_cols |
Colors to be used to plot the histograms of change points.
Any valid color values (e.g., see |
rho_option |
A |
rho_alpha |
Numeric value [0,1] that indicates the transparency of the
colors used. Supported only on some devices, see
|
gamma_cols |
Colors to be used to plot the LDA topic proportions,
time series of observed topic proportions, and time series of fitted
topic proportions. Any valid color values (e.g., see
|
gamma_option |
A |
gamma_alpha |
Numeric value [0,1] that indicates the transparency of
the colors used. Supported only on some devices, see
|
Value
list
of elements used to define the colors for the two
panels. Contains two elements rho
and gamma
, each
corresponding to the related panel, and each containing default values
for entries named cols
, option
, and alpha
.
Examples
set_TS_summary_plot_cols()
Prepare the colors to be used in the gamma time series
Description
Based on the inputs, create the set of colors to be used in the time series of the fitted gamma (topic proportion) values.
Usage
set_gamma_colors(x, cols = NULL, option = "D", alpha = 1)
Arguments
x |
Object of class |
cols |
Colors to be used to plot the time series of fitted topic proportions. |
option |
A |
alpha |
Numeric value [0,1] that indicates the transparency of the
colors used. Supported only on some devices, see
|
Value
Vector of character
hex codes indicating colors to use.
Examples
data(rodents)
document_term_table <- rodents$document_term_table
document_covariate_table <- rodents$document_covariate_table
LDA_models <- LDA_set(document_term_table, topics = 2)[[1]]
data <- document_covariate_table
data$gamma <- LDA_models@gamma
weights <- document_weights(document_term_table)
TSmod <- TS(data, gamma ~ 1, nchangepoints = 1, "newmoon", weights)
set_gamma_colors(TSmod)
Prepare the colors to be used in the change point histogram
Description
Based on the inputs, create the set of colors to be used in the change point histogram.
Usage
set_rho_hist_colors(x = NULL, cols = NULL, option = "D", alpha = 1)
Arguments
x |
|
cols |
Colors to be used to plot the histograms of change points.
Any valid color values (e.g., see |
option |
A |
alpha |
Numeric value [0,1] that indicates the transparency of the
colors used. Supported only on some devices, see
|
Value
Vector of character
hex codes indicating colors to use.
Examples
data(rodents)
document_term_table <- rodents$document_term_table
document_covariate_table <- rodents$document_covariate_table
LDA_models <- LDA_set(document_term_table, topics = 2)[[1]]
data <- document_covariate_table
data$gamma <- LDA_models@gamma
weights <- document_weights(document_term_table)
TSmod <- TS(data, gamma ~ 1, nchangepoints = 1, "newmoon", weights)
set_rho_hist_colors(TSmod$rhos)
Simulate LDA_TS data from LDA and TS model structures and parameters
Description
For a given set of covariates X
; parameters
Beta
, Eta
, rho
, and err
; and
document-specific time stamps tD
and lengths N
),
simulate a document-by-topic matrix.
Additional structuring variables (the numbers of topics (k), terms (V),
documents (M), segments (S), and covariates per segment (C))
are inferred from input objects.
Usage
sim_LDA_TS_data(N, Beta, X, Eta, rho, tD, err = 0, seed = NULL)
Arguments
N |
A vector of document sizes (total word counts). Must be integer conformable. Is used to infer the total number of documents. |
Beta |
|
X |
|
Eta |
|
rho |
Vector of integer-conformable time locations of changepoints or
|
tD |
Vector of integer-conformable times of the documents. Must be
of length M (as determined by |
err |
Additive error on the link-scale. Must be a non-negative
|
seed |
Input to |
Value
A document-by-term matrix
of counts (dim: M x V).
Examples
N <- c(10, 22, 15, 31)
tD <- c(1, 3, 4, 6)
rho <- 3
X <- cbind(rep(1, 4), 1:4)
Eta <- cbind(c(0.5, 0.3, 0.9, 0.5), c(1.2, 1.1, 0.1, 0.5))
Beta <- matrix(c(0.1, 0.1, 0.8, 0.2, 0.6, 0.2), 2, 3, byrow = TRUE)
err <- 1
sim_LDA_TS_data(N, Beta, X, Eta, rho, tD, err)
Simulate LDA data from an LDA structure given parameters
Description
For a given set of parameters alpha
and Beta
and
document-specific total word counts, simulate a document-by-term matrix.
Additional structuring variables (the numbers of topics (k),
documents (M), terms (V)) are inferred from input objects.
Usage
sim_LDA_data(N, Beta, alpha = NULL, Theta = NULL, seed = NULL)
Arguments
N |
A vector of document sizes (total word counts). Must be integer conformable. Is used to infer the total number of documents. |
Beta |
|
alpha |
Single positive numeric value for the Dirichlet distribution
parameter defining topics within documents. To specifically define
document topic probabilities, use |
Theta |
|
seed |
Input to |
Value
A document-by-term matrix
of counts (dim: M x V).
Examples
N <- c(10, 22, 15, 31)
alpha <- 1.2
Beta <- matrix(c(0.1, 0.1, 0.8, 0.2, 0.6, 0.2), 2, 3, byrow = TRUE)
sim_LDA_data(N, Beta, alpha = alpha)
Theta <- matrix(c(0.2, 0.8, 0.8, 0.2, 0.5, 0.5, 0.9, 0.1), 4, 2,
byrow = TRUE)
sim_LDA_data(N, Beta, Theta = Theta)
Simulate TS data from a TS model structure given parameters
Description
For a given set of covariates X
; parameters Eta
,
rho
, and err
; and document-specific time stamps tD
,
simulate a document-by-topic matrix. Additional structuring variables
(numbers of topics (k), documents (M), segments (S), and
covariates per segment (C)) are inferred from input objects.
Usage
sim_TS_data(X, Eta, rho, tD, err = 0, seed = NULL)
Arguments
X |
|
Eta |
|
rho |
Vector of integer-conformable time locations of changepoints or
|
tD |
Vector of integer-conformable times of the documents. Must be
of length M (as determined by |
err |
Additive error on the link-scale. Must be a non-negative
|
seed |
Input to |
Value
A document-by-topic matrix
of probabilities (dim: M x k).
Examples
tD <- c(1, 3, 4, 6)
rho <- 3
X <- cbind(rep(1, 4), 1:4)
Eta <- cbind(c(0.5, 0.3, 0.9, 0.5), c(1.2, 1.1, 0.1, 0.5))
sim_TS_data(X, Eta, rho, tD, err = 1)
Calculate the softmax of a vector or matrix of values
Description
Calculate the softmax (normalized exponential) of a vector of values or a set of vectors stacked rowwise.
Usage
softmax(x)
Arguments
x |
|
Value
The softmax of x
.
Examples
dat <- matrix(runif(100, -1, 1), 25, 4)
softmax(dat)
softmax(dat[,1])
Conduct a within-chain step of the ptMCMC algorithm
Description
This set of functions steps the chains forward one iteration
of the within-chain component of the ptMCMC algorithm. step_chains
is the main function, comprised of a proposal (made by prop_step
),
an evaluation of that proposal (made by eval_step
), and then an
update of the configuration (made by take_step
).
This set of functions was designed to work within TS
and
specifically est_changepoints
. They are still hardcoded to
do so, but have the capacity to be generalized to work with any
estimation via ptMCMC with additional coding work.
Usage
step_chains(i, cpts, inputs)
propose_step(i, cpts, inputs)
eval_step(i, cpts, prop_step, inputs)
take_step(cpts, prop_step, accept_step)
Arguments
i |
|
cpts |
|
inputs |
Class |
prop_step |
Proposed step output from |
accept_step |
|
Details
For each iteration of the ptMCMC algorithm, all of the chains have the potential to take a step. The possible step is proposed under a proposal distribution (here for change points we use a symmetric geometric distribution), the proposed step is then evaluated and either accepted or not (following the Metropolis-Hastings rule; Metropolis, et al. 1953, Hasting 1960, Gupta et al. 2018), and then accordingly taken or not (the configurations are updated).
Value
step_chains
: list
of change points, log-likelihoods,
and logical indicators of acceptance for each chain.
propose_step
: list
of change points and
log-likelihood values for the proposal.
eval_step
: logical
vector indicating if each
chain's proposal was accepted.
take_step
: list
of change points, log-likelihoods,
and logical indicators of acceptance for each chain.
References
Gupta, S., L. Hainsworth, J. S. Hogg, R. E. C. Lee, and J. R. Faeder. 2018. Evaluation of parallel tempering to accelerate Bayesian parameter estimation in systems biology. link.
Hastings, W. K. 1970. Monte Carlo sampling methods using Markov Chains and their applications. Biometrika 57:97-109. link.
Metropolis, N., A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller. 1953. Equations of state calculations by fast computing machines. Journal of Chemical Physics 21: 1087-1092. link.
Examples
data(rodents)
document_term_table <- rodents$document_term_table
document_covariate_table <- rodents$document_covariate_table
LDA_models <- LDA_set(document_term_table, topics = 2)[[1]]
data <- document_covariate_table
data$gamma <- LDA_models@gamma
weights <- document_weights(document_term_table)
data <- data[order(data[,"newmoon"]), ]
saves <- prep_saves(1, TS_control())
inputs <- prep_ptMCMC_inputs(data, gamma ~ 1, 1, "newmoon", weights,
TS_control())
cpts <- prep_cpts(data, gamma ~ 1, 1, "newmoon", weights, TS_control())
ids <- prep_ids(TS_control())
for(i in 1:TS_control()$nit){
steps <- step_chains(i, cpts, inputs)
swaps <- swap_chains(steps, inputs, ids)
saves <- update_saves(i, saves, steps, swaps)
cpts <- update_cpts(cpts, swaps)
ids <- update_ids(ids, swaps)
}
# within step_chains()
cpts <- prep_cpts(data, gamma ~ 1, 1, "newmoon", weights, TS_control())
i <- 1
prop_step <- propose_step(i, cpts, inputs)
accept_step <- eval_step(i, cpts, prop_step, inputs)
take_step(cpts, prop_step, accept_step)
Summarize the regressor (eta) distributions
Description
summarize_etas
calculates summary statistics for each
of the chunk-level regressors.
measure_ets_vcov
generates the variance-covariance matrix for
the regressors.
Usage
summarize_etas(etas, control = list())
measure_eta_vcov(etas)
Arguments
etas |
Matrix of regressors (columns) across iterations of the
ptMCMC (rows), as returned from |
control |
A |
Value
summarize_etas
: table of summary statistics for chunk-level
regressors including mean, median, mode, posterior interval, standard
deviation, MCMC error, autocorrelation, and effective sample size for
each regressor.
measure_eta_vcov
: variance-covariance matrix for chunk-level
regressors.
Examples
etas <- matrix(rnorm(100), 50, 2)
summarize_etas(etas)
measure_eta_vcov(etas)
Summarize the rho distributions
Description
summarize_rho
calculates summary statistics for each
of the change point locations.
measure_rho_vcov
generates the variance-covariance matrix for the
change point locations.
Usage
summarize_rhos(rhos, control = list())
measure_rho_vcov(rhos)
Arguments
rhos |
Matrix of change point locations (columns) across iterations of
the ptMCMC (rows) or |
control |
A |
Value
summarize_rhos
: table of summary statistics for change point
locations including mean, median, mode, posterior interval, standard
deviation, MCMC error, autocorrelation, and effective sample size for
each change point location.
measure_rho_vcov
: variance-covariance matrix for change
point locations.
Examples
rhos <- matrix(sample(80:100, 100, TRUE), 50, 2)
summarize_rhos(rhos)
measure_rho_vcov(rhos)
Conduct a set of among-chain swaps for the ptMCMC algorithm
Description
This function handles the among-chain swapping based on
temperatures and likelihood differentials.
This function was designed to work within TS
and
specifically est_changepoints
. It is still hardcoded to do
so, but has the capacity to be generalized to work with any estimation
via ptMCMC with additional coding work.
Usage
swap_chains(chainsin, inputs, ids)
Arguments
chainsin |
Chain configuration to be evaluated for swapping. |
inputs |
Class |
ids |
The vector of integer chain ids. |
Details
The ptMCMC algorithm couples the chains (which are taking their own walks on the distribution surface) through "swaps", where neighboring chains exchange configurations (Geyer 1991, Falcioni and Deem 1999) following the Metropolis criterion (Metropolis et al. 1953). This allows them to share information and search the surface in combination (Earl and Deem 2005).
Value
list
of updated change points, log-likelihoods, and chain
ids, as well as a vector of acceptance indicators for each swap.
References
Earl, D. J. and M. W. Deem. 2005. Parallel tempering: theory, applications, and new perspectives. Physical Chemistry Chemical Physics 7: 3910-3916. link.
Falcioni, M. and M. W. Deem. 1999. A biased Monte Carlo scheme for zeolite structure solution. Journal of Chemical Physics 110: 1754-1766. link.
Geyer, C. J. 1991. Markov Chain Monte Carlo maximum likelihood. In Computing Science and Statistics: Proceedings of the 23rd Symposium on the Interface. pp 156-163. American Statistical Association, New York, USA. link.
Metropolis, N., A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller. 1953. Equations of state calculations by fast computing machines. Journal of Chemical Physics 21: 1087-1092. link.
Examples
data(rodents)
document_term_table <- rodents$document_term_table
document_covariate_table <- rodents$document_covariate_table
LDA_models <- LDA_set(document_term_table, topics = 2)[[1]]
data <- document_covariate_table
data$gamma <- LDA_models@gamma
weights <- document_weights(document_term_table)
data <- data[order(data[,"newmoon"]), ]
saves <- prep_saves(1, TS_control())
inputs <- prep_ptMCMC_inputs(data, gamma ~ 1, 1, "newmoon", weights,
TS_control())
cpts <- prep_cpts(data, gamma ~ 1, 1, "newmoon", weights, TS_control())
ids <- prep_ids(TS_control())
for(i in 1:TS_control()$nit){
steps <- step_chains(i, cpts, inputs)
swaps <- swap_chains(steps, inputs, ids)
saves <- update_saves(i, saves, steps, swaps)
cpts <- update_cpts(cpts, swaps)
ids <- update_ids(ids, swaps)
}
Produce the trace plot panel for the TS diagnostic plot of a parameter
Description
Produce a trace plot for the parameter of interest (rho or
eta) as part of TS_diagnostics_plot
. A horizontal line
is added to show the median of the posterior.
Usage
trace_plot(x, ylab = "parameter value")
Arguments
x |
Vector of parameter values drawn from the posterior distribution, indexed to the iteration by the order of the vector. |
ylab |
|
Value
NULL
.
Examples
trace_plot(rnorm(100, 0, 1))
Verify the change points of a multinomial time series model
Description
Verify that a time series can be broken into a set of chunks based on input change points.
Usage
verify_changepoint_locations(data, changepoints = NULL, timename = "time")
Arguments
data |
Class |
changepoints |
Numeric vector indicating locations of the change
points. Must be conformable to |
timename |
|
Value
Logical indicator of the check passing TRUE
or failing
FALSE
.
Examples
data(rodents)
dtt <- rodents$document_term_table
lda <- LDA_set(dtt, 2, 1, list(quiet = TRUE))
dct <- rodents$document_covariate_table
dct$gamma <- lda[[1]]@gamma
verify_changepoint_locations(dct, changepoints = 100,
timename = "newmoon")