Title: | Transcriptomic Dynamics Models in Field Conditions |
Version: | 0.0.6 |
Description: | Provides functionality for constructing statistical models of transcriptomic dynamics in field conditions. It further offers the function to predict expression of a gene given the attributes of samples and meteorological data. Nagano, A. J., Sato, Y., Mihara, M., Antonio, B. A., Motoyama, R., Itoh, H., Naganuma, Y., and Izawa, T. (2012). <doi:10.1016/j.cell.2012.10.048>. Iwayama, K., Aisaka, Y., Kutsuna, N., and Nagano, A. J. (2017). <doi:10.1093/bioinformatics/btx049>. |
Depends: | R (≥ 3.2.2) |
Imports: | methods, Rcpp (≥ 0.11.2), XML, gglasso (≥ 1.4), MASS |
License: | MPL (≥ 2) | file LICENSE |
LazyData: | true |
LinkingTo: | Rcpp, RcppEigen (≥ 0.3.2.1.2) |
SystemRequirements: | C++11 |
NeedsCompilation: | yes |
RoxygenNote: | 6.0.1 |
Packaged: | 2019-01-01 04:26:18 UTC; koji |
Author: | Koji Iwayama [cre], Yuri Aisaka [aut] |
Maintainer: | Koji Iwayama <fieldtranscriptome@gmail.com> |
Suggests: | knitr, rmarkdown |
VignetteBuilder: | knitr |
Repository: | CRAN |
Date/Publication: | 2019-01-07 14:50:15 UTC |
FIT: a statistical modeling tool for transcriptome dynamics under fluctuating field conditions
Description
Provides functionality for constructing statistical models of transcriptomic dynamics in field conditions. It further offers the function to predict expression of a gene given the attributes of samples and meteorological data. Nagano, A. J., Sato, Y., Mihara, M., Antonio, B. A., Motoyama, R., Itoh, H., Naganuma, Y., and Izawa, T. (2012). <doi:10.1016/j.cell.2012.10.048>. Iwayama, K., Aisaka, Y., Kutsuna, N., and Nagano, A. J. (2017). <doi:10.1093/bioinformatics/btx049>.
Overview
The FIT package is an R
implementation
of a class of transcriptomic models that
relates gene expressions of plants and weather conditions to which
the plants are exposed.
(The reader is referred to [Nagano et al.] for the detail of
the class of models concerned.)
By providing (a) gene expression profiles of plants brought up in a field condition, and (b) the relevant weather history (temperature etc.) of the said field, the user of the package is able to (1) construct optimized models (one for each gene) for their expressions, and (2) use them to predict the expressions for another weather history (possibly in a different field).
Below, we briefly explain the construction of the optimized models (“training phase”) and the way to use them to make predictions (“prediction phase”).
Model training phase
The model of [Nagano et al.] belongs to the class of statistical models called “linear models” and are specified by a set of “parameters” and “(linear regression) coefficients”. The former are used to convert weather conditions to the “input variables” for a regression, and the latter are then multiplied to the input variables to form the expectation values for the gene expressions. The reader is referred to the original article [Nagano et al.] for the formulas for the input variables. (See also [Iwayama] for a review.)
The training phase consists of three stages:
-
Init
: fixes the initial model parameters -
Optim
: optimizes the model parameters -
Fit
: fixes the linear regression coefficients
The user can configure the training phase
through a custom data structure (“recipe”),
which can be constructed by using the utility function
FIT::make.recipe()
.
The role of the first stage Init
is to fix the initial values
for the model parameters from which the parameter optimization is performed.
At the moment two methods, 'manual'
and 'gridsearch'
,
are implemented.
With the 'manual'
method the user can simply specify the set of
initial values that he thinks is promising.
For the 'gridsearch'
method the user discretizes
the parameter space to a grid by providing
a finite number of candidate values for each parameter.
FIT then performs a search over the grid
for the “best” combinations of the initial parameters.
The second stage Optim
is the main step of the model training,
and FIT tries to gradually improve the model parameters
using the Nelder-Mead method.
This stage could be run one or more times where each can be run
using the method 'none'
, 'lm'
or 'lasso'
.
The 'none'
method passes the given parameter as-is
to the next method in the Optim
pipeline or to the next stage Fit
.
(Basically, the method is there so that the user can skip the entire
Optim
stage, but the method could be used for slightly warming-up the CPU as well.)
The 'lm'
method uses the a simple (weighted) linear regression to
guide the parameter optimization. That is, FIT
first computes the “input variables” from the current parameters and
associated weather data, and then finds the set of linear coefficients
that best explains the “output variables” (gene expressions).
Finally, the quadratic residual is used as the measure for the
error and is fed back to the Nelder-Mead method.
The 'lasso'
method is similar to the 'lm'
method
but uses the (weighted) Lasso regression
(“linear” regression with an L1-regularization for the regression coefficients)
instead of the simple linear regression.
FIT uses the glmnet package to perform
the Lasso regression and the strength of the L1-regularization
is fixed via a cross validation. (See cv.glmnet()
from the glmnet
package.
The Lasso regression is said to suppress irrelevant input variables automatically
and tends to create models with better prediction ability.
On the other hand, 'lasso'
runs considerably slower than 'lm'
.
For example, passing a vector c('lm', 'lasso')
to the
argument optim
(of make.recipe()
) creates a recipe
that instructs the Optim
stage to
(1) first optimize using the 'lm'
method,
(2) and then fine tunes the parameters using the 'lasso'
method.
After fixing the model parameters in the Optim
stage,
the Fit
stage can be used to fix the linear coefficients
of the models.
Here, either 'fit.lm'
or 'fit.lasso'
can be used
to find the “best” coefficients, the main difference being that
the coefficients are penalized by an L1-norm for the latter.
Note that it is perfectly okay to use 'fit.lasso'
for
the parameters optimized using 'lm'
.
In order to prepare for the possibly huge variations
of expression data as measured by RNA-seq,
FIT provides a way to weight regression penalties from each sample
with different weights as in
sum_{s in samples} (weight_s) (error_s)^2
.
Prediction phase
For each gene, the trained model of the previous subsection
can be thought of as a black box that maps
the field conditions (weather data),
to which a plant containing the gene is exposed,
to its expected expression.
FIT provides a simple function
FIT::predict()
that does just this.
FIT::predict()
takes as its argument
a list of pretrained models
as well as actual/hypothetical plant sample attributes and weather data,
and returns the predicted values of gene expressions.
When there is a set of actually measured expressions,
an associated function FIT::prediction.errors()
)
can be used to check the validity of the predictions made by
the models.
Namespece contamination
The FIT package exports fairly ubiquitous names
auch as optim
, predict
etc.\ as its API.
Users, therefore, are advised to load FIT
via requireNamespace('FIT')
and use its API function with
a namaspace qualifier (e.g.~FIT::optim()
)
rather than loading and attaching it via library('FIT')
.
Basic usage
See vignettes for examples of actual scripts that use FIT.
References
[Nagano et al.] A.J.~Nagano, et al. “Deciphering and prediction of transcriptome dynamics under fluctuating field conditions,” Cell~151, 6, 1358–69 (2012)
[Iwayama] K.~Iwayama, et al. “FIT: statistical modeling tool for transcriptome dynamics under fluctuating field conditions,” Bioinformatics, btx049 (2017)
Examples
## Not run:
# The following snippet shows the structure of a typical
# driver script of the FIT package.
# See vignettes for examples of actual scripts that use FIT.
##############
## training ##
##############
## discretized parameter space (for 'gridsearch')
grid.coords <- list(
clock.phase = seq(0, 23*60, 1*60),
# :
gate.radiation.amplitude = c(-5, 5)
)
## create a training recipe
recipe <- FIT::make.recipe(c('temperature', 'radiation'),
init = 'gridsearch',
init.data = grid.coords,
optim = c('lm'),
fit = 'fit.lasso',
time.step = 10,
opts =
list(lm = list(maxit = 900),
lasso = list(maxit = 1000))
)
## names of genes to construct models
genes <- c('Os12g0189300', 'Os02g0724000')
## End(Not run)
## Not run:
## load training data
training.attribute <- FIT::load.attribute('attribute.2008.txt')
training.weather <- FIT::load.weather('weather.2008.dat', 'weather')
training.expression <- FIT::load.expression('expression.2008.dat', 'ex', genes)
## models will be a list of trained models (length: ngenes)
models <- FIT::train(training.expression,
training.attribute,
training.weather,
recipe)
## End(Not run)
################
## prediction ##
################
## Not run:
## load validation data
prediction.attribute <- FIT::load.attribute('attribute.2009.txt');
prediction.weather <- FIT::load.weather('weather.2009.dat', 'weather')
prediction.expression <- FIT::load.expression('expression.2009.dat', 'ex', genes)
## predict
prediction.result <- FIT::predict(models[[1]],
prediction.attribute,
prediction.weather)
## End(Not run)
Converts attribute data from a dataframe into an object.
Description
Converts attribute data from a dataframe into an object.
Usage
convert.attribute(data, sample = NULL)
Arguments
data |
A dataframe of the attributes of microarray/RNA-seq data. |
sample |
An optional numeric array that designates the samples, that is rows, of the dataframe to be loaded. |
Value
An object that represents the attributes of microarray/RNA-seq data. Internally, the object holds a dataframe whose number of entries (rows) equals that of the samples.
converts expression data from a dataframe into an object.
Description
converts expression data from a dataframe into an object.
Usage
convert.expression(data, entries = NULL)
Arguments
data |
A dataframe of expression data to be loaded. |
entries |
An optional string array that designates the entries of the dataframe to be loaded. |
Value
An object that represents the expression data of microarray/RNA-seq.
Internally, the object holds a matrix of size
nsamples * ngenes
.
Converts weather data from a dataframe into an object.
Description
Converts weather data from a dataframe into an object.
Usage
convert.weather(data, entries = IO$weather.entries)
Arguments
data |
A dataframe of weather data to be converted. |
entries |
An optional string array that designates the entries of the dataframe to be loaded. |
Value
An object that reprents the timeseries data of weather factors.
Internally, the object holds a dataframe of size
ntimepoints * nfactors
.
Converts regression weight data from a dataframe into an object.
Description
Converts regression weight data from a dataframe into an object.
Usage
convert.weight(data, entries = NULL)
Arguments
data |
A dataframe that contains weight data to be loaded. |
entries |
An optional string array that designates the entries of the dataframe to be loaded. |
Value
An object that represents the weights
Internally, the object holds a matrix of size
nsamples * ngenes
.
A raw API for fixing linear regression coefficients.
Description
Note: use train()
unless the user is willing to
accept breaking API changes in the future.
Usage
fit.models(expression, weight, attribute, weather, recipe, models)
Arguments
expression |
An object that represents gene expression data.
The object can be created from a dumped/saved dataframe
of size |
weight |
A matrix of size Note that, unlike for |
attribute |
An object that represents the attributes of
microarray/RNA-seq data.
The object can be created from a dumped/saved dataframe
of size |
weather |
An object that represents actual or hypothetical weather data
with which the training of models are done.
The object can be created from a dumped/saved dataframe
of size |
recipe |
An object that represents the training protocol of models.
A recipe can be created using |
models |
A collection of models being trained as is returnd by
|
Value
A collection of models whose parameters and regression coeffients are optimized.
A raw API for initializing model parameters.
Description
Note: use train()
unless the user is willing to
accept breaking API changes in the future.
Usage
init(expression, weight, attribute, weather, recipe)
Arguments
expression |
An object that represents gene expression data.
The object can be created from a dumped/saved dataframe
of size |
weight |
A matrix of size Note that, unlike for |
attribute |
An object that represents the attributes of a
microarray/RNA-seq data.
The object can be created from a dumped/saved dataframe
of size |
weather |
An object that represents actual or hypothetical weather data
with which the training of models are done.
The object can be created from a dumped/saved dataframe
of size |
recipe |
An object that represents the training protocol of models.
A recipe can be created using |
Value
A collection of models whose parameters are
set by using the 'init'
method in the argument recipe
.
Loads attribute data.
Description
Loads attribute data.
Usage
load.attribute(path, variable = NULL, sample = NULL)
Arguments
path |
A path of a file that contains attribute data to be loaded.
When the file is a loadable |
variable |
An optional string that designates the name of a
dataframe object that has been saved in an |
sample |
An optional numeric array that designates the samples, that is rows, of the dataframe to be loaded. |
Value
An object that represents the attributes of microarray/RNA-seq data. Internally, the object holds a dataframe whose number of entries (rows) equals that of the samples.
Loads expression data.
Description
Loads expression data.
Usage
load.expression(path, variable = NULL, entries = NULL)
Arguments
path |
A path of a file that contains attribute data to be loaded.
When the file is a loadable |
variable |
An optional string that designates the name of a
dataframe object that has been saved in an |
entries |
An optional string array that designates the entries of the dataframe to be loaded. |
Value
An object that represents the expression data of microarray/RNA-seq.
Internally, the object holds a matrix of size
nsamples * ngenes
.
Loads weather data.
Description
Loads weather data.
Usage
load.weather(path, variable = NULL, entries = IO$weather.entries)
Arguments
path |
A path of a file that contains weather data to be loaded.
When the file is a loadable |
variable |
An optional string that designates the name of a
dataframe object that has been saved in an |
entries |
An optional string array that designates the entries of the dataframe to be loaded. |
Value
An object that reprents the timeseries data of weather factors.
Internally, the object holds a dataframe of size
ntimepoints * nfactors
.
Loads regression weight data.
Description
Loads regression weight data.
Usage
load.weight(path, variable = NULL, entries = NULL)
Arguments
path |
A path of a file that contains weight data to be loaded.
When the file is a loadable |
variable |
An optional string that designates the name of a
dataframe object that has been saved in an |
entries |
An optional string array that designates the entries of the dataframe to be loaded. |
Value
An object that represents the weights
Internally, the object holds a matrix of size
nsamples * ngenes
.
Creates a recipe for training models.
Description
Creates a recipe for training models.
Usage
make.recipe(envs, init, optim, fit, init.data, time.step, gate.open.min = 0,
opts = NULL)
Arguments
envs |
An array of weather factors to be taken into account
during the construction of models.
At the moment, the array |
init |
A string to specify the method to choose the initial parameters.
(One of |
optim |
A string to specify the method to be used for optimizing
the model parameters.
(One of |
fit |
A string to specify the method to be used for fixing
the linear regression coefficients.
(One of |
init.data |
Auxiliary data needed to perform the Init stage
using the method specified by the |
time.step |
An integer to specify the basic unit of time (in minute) for the transcriptomic models. Must be a multiple of the time step of weather data. |
gate.open.min |
The minimum opning length in minutes of the gate function for environmental inputs. |
opts |
An optional named list that specifies the arguments to be passed to methods that constitute each stage of the model training. Each key of the list corresponds to a name of a method. See examples for the supported options. |
Value
An object representing the procedure to construct models.
Examples
## Not run:
init.params <- .. # choose them wisely
# Defined in Train.R:
# default.opts <- list(
# none = list(),
# lm = list(maxit=1500, nfolds=-1), # nfolds for lm is simply ignored
# lasso = list(maxit=1000, nfolds=10)
# )
recipe <- FIT::make.recipe(c('wind', 'temperature'),
init = 'manual',
init.data = init.params,
optim = c('lm', 'none', 'lasso'),
fit = 'fit.lasso',
time.step = 10,
opts =
list(lm = list(maxit = 900),
lasso = list(maxit = 1000)))
## End(Not run)
Makes trivial weight data
Description
Makes trivial weight data
Usage
make.trivial.weights(samples.n, genes)
Arguments
samples.n |
A number of samples. |
genes |
A list of genes. |
Value
An object that represens the trivial weights.
Internally, the object holds an identity matrix of size
nsamples * ngenes
.
A raw API for optimizing model parameters.
Description
Note: use train()
unless the user is willing to
accept breaking API changes in the future.
Usage
optim(expression, weight, attribute, weather, recipe, models, maxit = NULL,
nfolds = NULL)
Arguments
expression |
An object that represents gene expression data.
The object can be created from a dumped/saved dataframe
of size |
weight |
A matrix of size Note that, unlike for |
attribute |
An object that represents the attributes of
microarray/RNA-seq data.
The object can be created from a dumped/saved dataframe
of size |
weather |
An object that represents actual or hypothetical weather data
with which the training of models are done.
The object can be created from a dumped/saved dataframe
of size |
recipe |
An object that represents the training protocol of models.
A recipe can be created using |
models |
A collection of models being trained as is returnd by
At this moment, it must be a list (genes) of a list (envs) of models and must contain at least one model. (THIS MIGHT CHANGE IN A FUTURE.) |
maxit |
An optional number that specifies the maximal number of times that the parameter optimization is performed. The user can control this parameter by using the |
nfolds |
An optional number that specifies the order of
cross validation when |
Value
A collection of models whose parameters are
optimized by using the 'optim'
pipeline
in the argument recipe
.
Predicts gene expressions using pretrained models.
Description
Predicts gene expressions using pretrained models.
Usage
predict(models, attribute, weather)
Arguments
models |
A list of trained models for the genes of interest. At the moment the collection of trained models returned
by |
attribute |
An object that represents the attributes of
microarray/RNA-seq data.
The object can be created from a dumped/saved dataframe
of size |
weather |
An object that represents actual or hypothetical weather data
with which predictions of gene expressions are made.
The object can be created from a dumped/saved dataframe
of size |
Value
A list of prediction results as returned by the models.
Examples
## Not run:
# prepare models
# NOTE: FIT::train() returns a nested list of models
# so we have to flatten it using FIT::train.to.predict.adaptor()
# before passing it to FIT::predict().
models <- FIT::train(..)
models.flattened <- FIT::train.to.predict.adaptor(models)
# load data used for prediction
prediction.attribute <- FIT::load.attribute('attribute.2009.txt')
prediction.weather <- FIT::load.weather('weather.2009.dat', 'weather')
prediction.expression <- FIT::load.expression('expression.2009.dat', 'ex', genes)
prediction.results <- FIT::predict(models.flattened,
prediction.attribute,
prediction.weather)
## End(Not run)
Computes the prediction errors using the trained models.
Description
Computes the prediction errors using the trained models.
Usage
prediction.errors(models, expression, attribute, weather)
Arguments
models |
A list of trained models for the genes of interest. At the moment the collection of trained models returned
by |
expression |
An object that represents the actual measured data of
gene expressions.
The object can be created from a dumped/saved dataframe
of size |
attribute |
An object that represents the attributes of
microarray/RNA-seq data.
The object can be created from a dumped/saved dataframe
using |
weather |
An object that represents actual or hypothetical weather data
with which predictions of gene expressions are made.
The object can be created from a dumped/saved dataframe
using |
Value
A list of deviance (a measure of validity of predictions,
as is defined by each model) between the prediction results
and the measured results (as is provided by the user through
expression
argument).
Examples
## Not run:
# see the usage of FIT::predict()
## End(Not run)
Constructs models following a recipe.
Description
Constructs models following a recipe.
Usage
train(expression, attribute, weather, recipe, weight = NULL,
min.expressed.rate = 0.01)
Arguments
expression |
An object that represents gene expression data.
The object can be created from a dumped/saved dataframe
of size |
attribute |
An object that represents the attributes of
microarray/RNA-seq data.
The object can be created from a dumped/saved dataframe
of size |
weather |
An object that represents actual or hypothetical weather data
with which the training of models are done.
The object can be created from a dumped/saved dataframe
of size |
recipe |
An object that represents the training protocol of models.
A recipe can be created using |
weight |
An optional numerical matrix of size This argument is optional for a historical reason, and when it is omitted, all samples are equally penalized. |
min.expressed.rate |
A number used to
A gene with |
Value
A collection of trained models.
Examples
## Not run:
# create recipe
recipe <- FIT::make.recipe(..)
#load training data
training.attribute <- FIT::load.attribute('attribute.2008.txt');
training.weather <- FIT::load.weather('weather.2008.dat', 'weather')
training.expression <- FIT::load.expression('expression.2008.dat', 'ex', genes)
training.weight <- FIT::load.weight('weight.2008.dat', 'weight', genes)
# train models
models <- FIT::train(training.expression,
training.attribute,
training.weather,
recipe,
training.weight)
## End(Not run)
Supported weather factors.
Description
Supported weather factors.
Usage
weather.entries
Format
An object of class character
of length 6.
Examples
length(FIT::weather.entries)