Title: | Random-Effects Stochastic Reaction Networks |
Version: | 1.0.1 |
Description: | A random-effects stochastic model that allows quick detection of clonal dominance events from clonal tracking data collected in gene therapy studies. Starting from the Ito-type equation describing the dynamics of cells duplication, death and differentiation at clonal level, we first considered its local linear approximation as the base model. The parameters of the base model, which are inferred using a maximum likelihood approach, are assumed to be shared across the clones. Although this assumption makes inference easier, in some cases it can be too restrictive and does not take into account possible scenarios of clonal dominance. Therefore we extended the base model by introducing random effects for the clones. In this extended formulation the dynamic parameters are estimated using a tailor-made expectation maximization algorithm. Further details on the methods can be found in L. Del Core et al., (2022) <doi:10.1101/2022.05.31.494100>. |
License: | GPL-3 |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.3.9000 |
Imports: | Matrix, xtable, scales, stringr, ggplot2, scatterpie, RColorBrewer |
Depends: | R (≥ 2.10) |
LazyData: | true |
Suggests: | R.rsp |
VignetteBuilder: | R.rsp |
NeedsCompilation: | no |
Packaged: | 2024-02-15 10:41:07 UTC; pmzld |
Author: | Luca Del Core |
Maintainer: | Luca Del Core <l.del.core@rug.nl> |
Repository: | CRAN |
Date/Publication: | 2024-02-15 11:00:02 UTC |
Bhattacharyya distance
Description
Bhattacharyya distance between P and Q for multivariate normal distributions
Usage
BhattDist(mu1, S1, mu2, S2)
Arguments
mu1 |
mean vector of P. |
S1 |
covariance matrix of P. |
mu2 |
mean vector of Q. |
S2 |
covariance matrix of Q. |
Value
Bhattacharyya distance between P and Q.
Kullback-Leibler divergence
Description
Kullback-Leibler divergence KL(P \Vert Q)
of Q from P for multivariate normal distributions
Usage
KLDiv(mu1, S1, mu2, S2)
Arguments
mu1 |
mean vector of P. |
S1 |
covariance matrix of P. |
mu2 |
mean vector of Q. |
S2 |
covariance matrix of Q. |
Value
Kullback-Leibler divergence KL(P \Vert Q)
of Q from P.
E[u \vert y]
and V[u \vert y]
Description
Conditional expectation E[u \vert y]
and variance V[u \vert y]
of the latent states u given the observed states y
Usage
VEuy(theta_curr, M, M_bdiag, y, V, VCNs, nObs)
Arguments
theta_curr |
current p-dimensional vector parameter. |
M |
A |
M_bdiag |
A |
y |
n-dimensional vector of the time-adjacent cellular increments |
V |
A |
VCNs |
A n-dimensional vector including values of the vector copy number corresponding to the cell counts of y. |
nObs |
A K-dimensional vector including the frequencies of each clone k ( |
Value
the conditional expectation E[u \vert y]
and variance V[u \vert y]
of the latent states u given the observed states y.
Rhesus Macaque clonal tracking dataset
Description
A dataset containing clonal tracking cell counts from a Rhesus Macaque study.
Usage
Y_RM
Format
A list containing clonal tracking data for each animal (ZH33, ZH17, ZG66). Each clonal tracking dataset is a 3-dimensional array whose dimensions identify
- 1
time, in months
- 2
cell types: T, B, NK, Macrophages(M) and Granulocytes(G)
- 3
unique barcodes (clones)
Source
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3979461/bin/NIHMS567927-supplement-02.xlsx
Generate hazard function
Description
This function dynamically builds the function get.h() that will be used to get the hazard vector.
Usage
compile.h(rct.lst, envir)
Arguments
rct.lst |
list of biochemical reactions. A differentiation move from cell type "A" to cell type "B" must be coded as "A->B" Duplication of cell "A" must be coded as "A->1" Death of cell "A" must be coded as "A->0" |
Value
No return value.
Conditional AIC (cAIC)
Description
Conditional AIC (cAIC) of the conditional log-likelihood l(y \vert u)
of y given the random effects u
Usage
condAIC(X, Z, y, theta, Delta, V, VCNs, nObs, verbose = TRUE)
Arguments
X |
A |
Z |
A |
y |
n-dimensional vector of the time-adjacent cellular increments |
theta |
p-dimensional vector parameter. |
Delta |
covariance matrix of the random effects u |
V |
A |
VCNs |
A n-dimensional vector including values of the vector copy number corresponding to the cell counts of y. |
nObs |
A K-dimensional vector including the frequencies of each clone k ( |
verbose |
(defaults to TRUE) Logical value. If TRUE, then information messages on the progress of the algorithm are printed to the console. |
Value
Conditional AIC (cAIC) of the conditional log-likelihood l(y \vert u)
of y given the random effects u.
Gradient of E[u \vert \theta]
Description
Gradient of the expected values of the random effects u w.r.t. the free parameters
Usage
dBu(K, p)
Arguments
K |
number of clones being analyzed. |
p |
number of free parameters. |
Value
the p-dimensional gradient of u.
Gradient of V[u \vert \theta]
Description
Gradient of the covariance matrix of the random effects u w.r.t. the free parameters
Usage
dD2(K, p)
Arguments
K |
number of clones being analyzed. |
p |
number of free parameters. |
Value
the gradient of the covariance matrix of u.
Multivariate Normal density function
Description
Proability density function of a n-variate normal distribution
Usage
dmvNorm(x, Mu, Sigma, SigmaInv)
Arguments
x |
n-dimensional vector. |
Mu |
n-dimensional mean vector. |
Sigma |
n-dimensional covariance matrix. |
SigmaInv |
n-dimensional precision matrix. |
Value
the probability density of a n-variate normal distribution with mean vector Mu and covariance Sigma evaluated at x.
Fit the base (null) model
Description
This function builds the design matrix of the null model and returns the fitted values and the corresponding statistics.
Usage
fit.null(
Y,
rct.lst,
maxit = 10000,
factr = 1e+07,
pgtol = 1e-08,
lmm = 100,
trace = TRUE,
verbose = TRUE
)
Arguments
Y |
A 3-dimensional array whose dimensions are the time, the cell type and the clone respectively. |
rct.lst |
list of biochemical reactions. A differentiation move from cell type "A" to cell type "B" must be coded as "A->B" Duplication of cell "A" must be coded as "A->1" Death of cell "A" must be coded as "A->0" |
maxit |
maximum number of iterations for the optimization step. This argument is passed to optim() function. Details on "maxit" can be found in "optim()" documentation page. |
factr |
controls the convergence of the "L-BFGS-B" method. Convergence occurs when the reduction in the objective is within this factor of the machine tolerance. Default is 1e7, that is a tolerance of about 1e-8. This argument is passed to optim() function. |
pgtol |
helps control the convergence of the "L-BFGS-B" method. It is a tolerance on the projected gradient in the current search direction. This defaults to zero, when the check is suppressed. This argument is passed to optim() function. |
lmm |
is an integer giving the number of BFGS updates retained in the "L-BFGS-B" method, It defaults to 5. This argument is passed to optim() function. |
trace |
Non-negative integer. If positive, tracing information on the progress of the optimization is produced. This parameter is also passed to the optim() function. Higher values may produce more tracing information: for method "L-BFGS-B" there are six levels of tracing. (To understand exactly what these do see the source code: higher levels give more detail.) |
verbose |
(defaults to TRUE) Logical value. If TRUE, then information messages on the progress of the algorithm are printed to the console. |
Value
A 3-length list. First element is the output returned by "optim()" function (see "optim()" documentation for details). Second element is a vector of statistics associated to the fitted null model:
nPar | number of parameters of the base(null) model |
cll | value of the conditional log-likelihood, in this case just the log-likelihood |
mll | value of the marginal log-likelihood, in this case just the log-likelihood |
cAIC | conditional Akaike Information Criterion (cAIC), in this case simply the AIC. |
mAIC | marginal Akaike Information Criterion (mAIC), in this case simply the AIC. |
Chi2 | value of the \chi^2 statistic (y - M\theta)'S^{-1}(y - M\theta) . |
p-value | p-value of the \chi^2 test for the null hypothesis that Chi2 follows a \chi^2 distribution with n - nPar degrees of freedom. |
The third element, called "design", is a list including:
M | A n \times K dimensional (design) matrix. |
V | A p \times K dimensional net-effect matrix. |
Examples
rcts <- c("A->1", "B->1", "C->1", "D->1",
"A->0", "B->0", "C->0", "D->0",
"A->B", "A->C", "C->D") ## set of reactions
ctps <- head(LETTERS,4)
nC <- 3 ## number of clones
S <- 10 ## trajectory length
tau <- 1 ## for tau-leaping algorithm
u_1 <- c(.2, .15, .17, .09*5,
.001, .007, .004, .002,
.13, .15, .08)
u_2 <- c(.2, .15, .17, .09,
.001, .007, .004, .002,
.13, .15, .08)
u_3 <- c(.2, .15, .17*3, .09,
.001, .007, .004, .002,
.13, .15, .08)
theta_allcls <- cbind(u_1, u_2, u_3) ## clone-specific parameters
rownames(theta_allcls) <- rcts
s20 <- 1 ## additional noise
Y <- array(data = NA,
dim = c(S + 1, length(ctps), nC),
dimnames = list(seq(from = 0, to = S*tau, by = tau),
ctps,
1:nC)) ## empty array to store simulations
Y0 <- c(100,0,0,0) ## initial state
names(Y0) <- ctps
for (cl in 1:nC) { ## loop over clones
Y[,,cl] <- get.sim.tl(Yt = Y0,
theta = theta_allcls[,cl],
S = S,
s2 = s20,
tau = tau,
rct.lst = rcts,
verbose = TRUE)
}
null.res <- fit.null(Y = Y,
rct.lst = rcts,
maxit = 0, ## needs to be increased (>=100) for real applications
lmm = 0, ## needs to be increased (>=5) for real applications
) ## null model fitting
Fit the random-effects model
Description
This function builds the design matrix of the random-effects model and returns the fitted values and the corresponding statistics.
Usage
fit.re(
theta_0,
Y,
rct.lst,
maxit = 10000,
factr = 1e+07,
pgtol = 1e-08,
lmm = 100,
maxemit = 100,
eps = 1e-05,
trace = TRUE,
verbose = TRUE
)
Arguments
theta_0 |
A p-dimensional vector parameter as the initial guess for the inference. |
Y |
A 3-dimensional array whose dimensions are the time, the cell type and the clone respectively. |
rct.lst |
list of biochemical reactions. A differentiation move from cell type "A" to cell type "B" must be coded as "A->B" Duplication of cell "A" must be coded as "A->1" Death of cell "A" must be coded as "A->0" |
maxit |
maximum number of iterations for the optimization step. This argument is passed to optim() function. Details on "maxit" can be found in "optim()" documentation page. |
factr |
controls the convergence of the "L-BFGS-B" method. Convergence occurs when the reduction in the objective is within this factor of the machine tolerance. Default is 1e7, that is a tolerance of about 1e-8. This argument is passed to optim() function. |
pgtol |
helps control the convergence of the "L-BFGS-B" method. It is a tolerance on the projected gradient in the current search direction. This defaults to zero, when the check is suppressed. This argument is passed to optim() function. |
lmm |
is an integer giving the number of BFGS updates retained in the "L-BFGS-B" method, It defaults to 5. This argument is passed to optim() function. |
maxemit |
maximum number of iterations for the expectation-maximization algorithm. |
eps |
relative error for the value x and the objective function f(x) that has to be optimized in the expectation-maximization algorithm. |
trace |
Non-negative integer. If positive, tracing information on the progress of the optimization is produced. This parameter is also passed to the optim() function. Higher values may produce more tracing information: for method "L-BFGS-B" there are six levels of tracing. (To understand exactly what these do see the source code: higher levels give more detail.) |
verbose |
(defaults to TRUE) Logical value. If TRUE, then information messages on the progress of the algorithm are printed to the console. |
Value
A 3-length list. First element is the output returned by "optim()" function (see "optim()" documentation for details)
along with the conditional expectation E[u \vert y]
and variance V[u \vert y]
of the latent states u given the observed states y from the last step of the expectation-maximization algorithm.
Second element is a vector of statistics associated to the fitted random-effects model:
nPar | number of parameters of the base(null) model |
cll | value of the conditional log-likelihood, in this case just the log-likelihood |
mll | value of the marginal log-likelihood, in this case just the log-likelihood |
cAIC | conditional Akaike Information Criterion (cAIC), in this case simply the AIC. |
mAIC | marginal Akaike Information Criterion (mAIC), in this case simply the AIC. |
Chi2 | value of the \chi^2 statistic (y - M\theta)'S^{-1}(y - M\theta) . |
p-value | p-value of the \chi^2 test for the null hypothesis that Chi2 follows a \chi^2 distribution with n - nPar degrees of freedom. |
KLdiv | Kullback-Leibler divergence of the random-effects model from the null model. |
KLdiv/N | Rescaled Kullback-Leibler divergence of the random-effects model from the null model. |
BhattDist_nullCond | Bhattacharyya distance between the random-effects model and the null model. |
BhattDist_nullCond/N | Rescaled Bhattacharyya distance between the random-effects model and the null model. |
The third element, called "design", is a list including:
M | A n \times K dimensional (design) matrix. |
M_bdiag | An \times Jp dimensional block-diagonal design matrix. |
V | A p \times K dimensional net-effect matrix. |
Examples
rcts <- c("A->1", "B->1", "C->1", "D->1",
"A->0", "B->0", "C->0", "D->0",
"A->B", "A->C", "C->D") ## set of reactions
ctps <- head(LETTERS,4)
nC <- 3 ## number of clones
S <- 10 ## trajectory length
tau <- 1 ## for tau-leaping algorithm
u_1 <- c(.2, .15, .17, .09*5,
.001, .007, .004, .002,
.13, .15, .08)
u_2 <- c(.2, .15, .17, .09,
.001, .007, .004, .002,
.13, .15, .08)
u_3 <- c(.2, .15, .17*3, .09,
.001, .007, .004, .002,
.13, .15, .08)
theta_allcls <- cbind(u_1, u_2, u_3) ## clone-specific parameters
rownames(theta_allcls) <- rcts
s20 <- 1 ## additional noise
Y <- array(data = NA,
dim = c(S + 1, length(ctps), nC),
dimnames = list(seq(from = 0, to = S*tau, by = tau),
ctps,
1:nC)) ## empty array to store simulations
Y0 <- c(100,0,0,0) ## initial state
names(Y0) <- ctps
for (cl in 1:nC) { ## loop over clones
Y[,,cl] <- get.sim.tl(Yt = Y0,
theta = theta_allcls[,cl],
S = S,
s2 = s20,
tau = tau,
rct.lst = rcts,
verbose = TRUE)
}
null.res <- fit.null(Y = Y,
rct.lst = rcts,
maxit = 0, ## needs to be increased (>=100) for real applications
lmm = 0, ## needs to be increased (>=5) for real applications
) ## null model fitting
re.res <- fit.re(theta_0 = null.res$fit$par,
Y = Y,
rct.lst = rcts,
maxit = 0, ## needs to be increased (>=100) for real applications
lmm = 0, ## needs to be increased (>=5) for real applications
maxemit = 1 ## needs to be increased (>= 100) for real applications
) ## random-effects model fitting
Design matrix M
Description
This function generates time-adjacent increments from a cell counts matrix y.
Usage
get.M(y, V, dT, get.h)
Arguments
y |
clone-specific |
V |
A |
dT |
A (t-1)-dimensional vector of the time-increments. |
Value
A tp \times K
dimensional (design) matrix.
Net-effect matrix
Description
This function builds the net-effect matrix V.
Usage
get.V(rct.lst)
Arguments
rct.lst |
list of biochemical reactions. A differentiation move from cell type "A" to cell type "B" must be coded as "A->B" Duplication of cell "A" must be coded as "A->1" Death of cell "A" must be coded as "A->0" |
Value
The net-effect matrix V
Stochastic covariance matrix W
Description
This function returns the stochastic covariance matrix W associated to a design matrix M, a net-effect matrix V, and a vector parameter theta.
Usage
get.W(M, V, VCNs, theta, nObs)
Arguments
M |
A |
V |
A |
theta |
p-dimensional vector parameter. |
Value
A n \times n
dimensional stochastic covariance matrix.
Clonal boxplots
Description
Draw clonal boxplots of a random-effects reaction network.
Usage
get.boxplots(re.res)
Arguments
re.res |
output list returned by fit.re(). |
Details
This function generates the boxplots of the conditional expectations
w_k = E_{u\vert \Delta Y; \hat{\psi}}[u^k_{\alpha_{l}}] - E_{u\vert \Delta Y; \hat{\psi}}[u^k_{\delta_{l}}]
,
computed from the estimated parameters \hat{\psi}
for the clone-specific net-duplication in each cell lineage l (different colors).
The whiskers extend to the data extremes.
Value
No return value.
Examples
rcts <- c("A->1", "B->1", "C->1", "D->1",
"A->0", "B->0", "C->0", "D->0",
"A->B", "A->C", "C->D") ## set of reactions
ctps <- head(LETTERS,4)
nC <- 3 ## number of clones
S <- 10 ## trajectory length
tau <- 1 ## for tau-leaping algorithm
u_1 <- c(.2, .15, .17, .09*5,
.001, .007, .004, .002,
.13, .15, .08)
u_2 <- c(.2, .15, .17, .09,
.001, .007, .004, .002,
.13, .15, .08)
u_3 <- c(.2, .15, .17*3, .09,
.001, .007, .004, .002,
.13, .15, .08)
theta_allcls <- cbind(u_1, u_2, u_3) ## clone-specific parameters
rownames(theta_allcls) <- rcts
s20 <- 1 ## additional noise
Y <- array(data = NA,
dim = c(S + 1, length(ctps), nC),
dimnames = list(seq(from = 0, to = S*tau, by = tau),
ctps,
1:nC)) ## empty array to store simulations
Y0 <- c(100,0,0,0) ## initial state
names(Y0) <- ctps
for (cl in 1:nC) { ## loop over clones
Y[,,cl] <- get.sim.tl(Yt = Y0,
theta = theta_allcls[,cl],
S = S,
s2 = s20,
tau = tau,
rct.lst = rcts,
verbose = TRUE)
}
null.res <- fit.null(Y = Y,
rct.lst = rcts,
maxit = 0, ## needs to be increased (>=100) for real applications
lmm = 0, ## needs to be increased (>=5) for real applications
) ## null model fitting
re.res <- fit.re(theta_0 = null.res$fit$par,
Y = Y,
rct.lst = rcts,
maxit = 0, ## needs to be increased (>=100) for real applications
lmm = 0, ## needs to be increased (>=5) for real applications
maxemit = 1 ## needs to be increased (>= 100) for real applications
) ## random-effects model fitting
get.boxplots(re.res)
Time-adjacent increments of the cell counts
Description
This function generates time-adjacent increments from a cell counts matrix y.
Usage
get.dx(y)
Arguments
y |
clone-specific |
Value
A (t-1) \times p
dimensional vector containing the time-adjacent increments.
Gradient of the negative log-likelihood of the base (null) model
Description
Compute the gradient of the negative log-likelihood of the null model, namely the linear model without rondom effects at all.
Usage
get.gnl(theta, M, y, V, VCNs, nObs, dW)
Arguments
theta |
p-dimensional vector parameter. |
M |
A |
y |
n-dimensional vector of the time-adjacent cellular increments |
V |
A |
dW |
p-dimensional list of the partial derivatives of W w.r.t. theta. |
Value
p-dimensional vector of the gradient of the negative log-likelihood.
Negative log-likelihood of the base (null) model
Description
Compute the negative log-likelihood of the null model, namely the linear model without rondom effects at all.
Usage
get.nl(theta, M, y, V, VCNs, nObs, dW)
Arguments
theta |
p-dimensional vector parameter. |
M |
A |
y |
n-dimensional vector of the time-adjacent cellular increments |
V |
A |
dW |
p-dimensional list of the partial derivatives of W w.r.t. theta. |
Value
the value of the negative log-likelihood.
Rescaling a clonal tracking dataset
Description
Rescales a clonal tracking dataset based on the sequencing depth.
Usage
get.rescaled(Y)
Arguments
Y |
A 3-dimensional array whose dimensions are the time, the cell type and the clone respectively. |
Details
This function rescales a clonal tracking dataset Y according to the formula
Y_{ijk} \leftarrow Y_{ijk} \cdot \frac{min_{ij}\sum_cY_{ijc}}{\sum_cY_{ijc}}
Value
A rescaled clonal tracking dataset.
Examples
get.rescaled(Y_RM[["ZH33"]])
Clonal pie-chart
Description
Draw a clonal pie-chart of a random-effects reaction network.
Usage
get.scatterpie(re.res, txt = FALSE, legend = FALSE)
Arguments
re.res |
output list returned by fit.re(). |
txt |
logical (defaults to FALSE). If TRUE, barcode names will be printed on the pies. |
legend |
logical (defaults to FALSE). If TRUE, the legend of the pie-chart will be printed. |
Details
This function generates a clonal pie-chart given a previously fitted random-effects model.
In this representation each clone k
is identified with a pie whose slices are
lineage-specific and weighted with w_k
, defined as the difference between the
conditional expectations of the random-effects on duplication and death parameters, that is
w_k = E_{u\vert \Delta Y; \hat{\psi}}[u^k_{\alpha_{lin}}] - E_{u\vert \Delta Y; \hat{\psi}}[u^k_{\delta_{lin}}]
,
where \texttt{lin}
is a cell lineage.
The diameter of the k
-th pie is proportional to the euclidean 2-norm of w_k
.
Therefore, the larger the diameter, the more the corresponding clone is expanding
into the lineage associated to the largest slice.
Value
No return value.
Examples
rcts <- c("A->1", "B->1", "C->1", "D->1",
"A->0", "B->0", "C->0", "D->0",
"A->B", "A->C", "C->D") ## set of reactions
ctps <- head(LETTERS,4)
nC <- 3 ## number of clones
S <- 10 ## trajectory length
tau <- 1 ## for tau-leaping algorithm
u_1 <- c(.2, .15, .17, .09*5,
.001, .007, .004, .002,
.13, .15, .08)
u_2 <- c(.2, .15, .17, .09,
.001, .007, .004, .002,
.13, .15, .08)
u_3 <- c(.2, .15, .17*3, .09,
.001, .007, .004, .002,
.13, .15, .08)
theta_allcls <- cbind(u_1, u_2, u_3) ## clone-specific parameters
rownames(theta_allcls) <- rcts
s20 <- 1 ## additional noise
Y <- array(data = NA,
dim = c(S + 1, length(ctps), nC),
dimnames = list(seq(from = 0, to = S*tau, by = tau),
ctps,
1:nC)) ## empty array to store simulations
Y0 <- c(100,0,0,0) ## initial state
names(Y0) <- ctps
for (cl in 1:nC) { ## loop over clones
Y[,,cl] <- get.sim.tl(Yt = Y0,
theta = theta_allcls[,cl],
S = S,
s2 = s20,
tau = tau,
rct.lst = rcts,
verbose = TRUE)
}
null.res <- fit.null(Y = Y,
rct.lst = rcts,
maxit = 0, ## needs to be increased (>=100) for real applications
lmm = 0, ## needs to be increased (>=5) for real applications
) ## null model fitting
re.res <- fit.re(theta_0 = null.res$fit$par,
Y = Y,
rct.lst = rcts,
maxit = 0, ## needs to be increased (>=100) for real applications
lmm = 0, ## needs to be increased (>=5) for real applications
maxemit = 1 ## needs to be increased (>= 100) for real applications
) ## random-effects model fitting
get.scatterpie(re.res, txt = TRUE)
\tau
-leaping simulation algorithm
Description
Simulate a trajectory of length S for a stochastic reaction network.
Usage
get.sim.tl(Yt, theta, S, s2 = 0, tau = 1, rct.lst, verbose = TRUE)
Arguments
Yt |
starting point of the trajectory |
theta |
vector parameter for the reactions. |
S |
length of the simulated trajectory. |
s2 |
noise variance (defaults to 0). |
tau |
time interval length (defaults to 1). |
rct.lst |
list of biochemical reactions. |
verbose |
(defaults to TRUE) Logical value. If TRUE, then information messages on the simulation progress are printed to the console. |
Details
This function allows to simulate a trajectory of a single clone given an
initial conditions Y_0
for the cell counts, and obeying to a particular cell differentiation network
defined by a net-effect (stoichiometric) matrix V
and an hazard function h()
.
The function allows to consider only three cellular events, such as
cell duplication (Y_{it} \rightarrow 1
), cell death (Y_{it} \rightarrow \emptyset
)
and cell differentiation (Y_{it} \rightarrow Y_{jt}
) for a clone-specific time counting process
Y_t = (Y_{1t},\dots,Y_{Nt})
observed in $N$ distinct cell lineages.
In particular, the cellular events of duplication, death and differentiation are
respectively coded with the character labels \texttt{"A->1"}
, \texttt{"A->0"}
,
and \texttt{"A->B"}
, where \texttt{A}
and \texttt{B}
are two distinct cell types.
The output is a 3
-dimensional array Y
whose ijk
-entry Y_{ijk}
is the number of cells of clone k
for cell type j
collected at time i
.
More mathematical details can be found in the vignette of this package.
Value
A S \times p
dimensional matrix of the simulated trajectory.
Examples
rcts <- c("A->1", "B->1", "C->1", "D->1",
"A->0", "B->0", "C->0", "D->0",
"A->B", "A->C", "C->D") ## set of reactions
ctps <- head(LETTERS,4)
nC <- 3 ## number of clones
S <- 10 ## trajectory length
tau <- 1 ## for tau-leaping algorithm
u_1 <- c(.2, .15, .17, .09*5,
.001, .007, .004, .002,
.13, .15, .08)
u_2 <- c(.2, .15, .17, .09,
.001, .007, .004, .002,
.13, .15, .08)
u_3 <- c(.2, .15, .17*3, .09,
.001, .007, .004, .002,
.13, .15, .08)
theta_allcls <- cbind(u_1, u_2, u_3) ## clone-specific parameters
rownames(theta_allcls) <- rcts
s20 <- 1 ## additional noise
Y <- array(data = NA,
dim = c(S + 1, length(ctps), nC),
dimnames = list(seq(from = 0, to = S*tau, by = tau),
ctps,
1:nC)) ## empty array to store simulations
Y0 <- c(100,0,0,0) ## initial state
names(Y0) <- ctps
for (cl in 1:nC) { ## loop over clones
Y[,,cl] <- get.sim.tl(Yt = Y0,
theta = theta_allcls[,cl],
S = S,
s2 = s20,
tau = tau,
rct.lst = rcts,
verbose = TRUE)
}
Y
Matrix determinant
Description
Compute the determinant of a matrix.
Usage
ldet(A, logarithm = TRUE)
Arguments
A |
a matrix |
logarithm |
logical; if TRUE (default) return the logarithm of the modulus of the determinant. |
Value
the determinant of A.
E-step function Q
Description
Negative E-step function -Q of the expectation-maximization algorithm
Usage
nQ(theta, euy_curr, vuy_curr, M, M_bdiag, y, V, VCNs, nObs, dW)
Arguments
theta |
p-dimensional vector parameter. |
euy_curr |
current value of the conditional expectation |
vuy_curr |
current value of the conditional variance |
M |
A |
M_bdiag |
A |
y |
n-dimensional vector of the time-adjacent cellular increments |
V |
A |
VCNs |
A n-dimensional vector including values of the vector copy number corresponding to the cell counts of y. |
nObs |
A K-dimensional vector including the frequencies of each clone k ( |
dW |
p-dimensional list of the partial derivatives of W w.r.t. theta. |
Value
The current value of the negative E-step function -Q.
Gradient of the E-step function Q
Description
Gradient -\nabla Q
of the negative E-step function -Q of the expectation-maximization algorithm
Usage
ngQ(theta, euy_curr, vuy_curr, M, M_bdiag, y, V, VCNs, nObs, dW)
Arguments
theta |
p-dimensional vector parameter. |
euy_curr |
current value of the conditional expectation |
vuy_curr |
current value of the conditional variance |
M |
A |
M_bdiag |
A |
y |
n-dimensional vector of the time-adjacent cellular increments |
V |
A |
VCNs |
A n-dimensional vector including values of the vector copy number corresponding to the cell counts of y. |
nObs |
A K-dimensional vector including the frequencies of each clone k ( |
dW |
p-dimensional list of the partial derivatives of W w.r.t. theta. |
Value
p-dimensional vector of the gradient -\nabla Q
of the negative E-step function -Q.
Euclidean 2-norm
Description
Compute the euclidean 2-norm of a vector x.
Usage
norm2(x)
Arguments
x |
a vector. |
Value
the euclidean 2-norm of x.
Fit base model
Description
Fit the base (null) model to the given data using a maximum likelihood approach.
Usage
nullModelFitting(
theta_start,
M,
y,
V,
psiLB,
psiUB,
maxit,
factr,
pgtol,
lmm,
VCNs,
nObs,
trace = TRUE
)
Arguments
theta_start |
p-dimensional vector parameter used as initial guess in the inference procedure. |
M |
A |
y |
n-dimensional vector of the time-adjacent cellular increments |
V |
A |
psiLB |
p-dimensional vector of lower bound values for theta. |
psiUB |
p-dimensional vector of upper bound values for theta. |
maxit |
maximum number of iterations for the optimization step. This argument is passed to optim() function. Details on "maxit" can be found in "optim()" documentation page. |
factr |
controls the convergence of the "L-BFGS-B" method. Convergence occurs when the reduction in the objective is within this factor of the machine tolerance. Default is 1e7, that is a tolerance of about 1e-8. This argument is passed to optim() function. |
pgtol |
helps control the convergence of the "L-BFGS-B" method. It is a tolerance on the projected gradient in the current search direction. This defaults to zero, when the check is suppressed. This argument is passed to optim() function. |
lmm |
is an integer giving the number of BFGS updates retained in the "L-BFGS-B" method, It defaults to 5. This argument is passed to optim() function. |
VCNs |
A n-dimensional vector including values of the vector copy number corresponding to the cell counts of y. |
nObs |
A K-dimensional vector including the frequencies of each clone k ( |
trace |
Non-negative integer. If positive, tracing information on the progress of the optimization is produced. This parameter is also passed to the optim() function. Higher values may produce more tracing information: for method "L-BFGS-B" there are six levels of tracing. (To understand exactly what these do see the source code: higher levels give more detail.) |
Value
A 3-length list. First element is the output returned by optim() function (see optim() documentation for details). Second element is a vector of statistics associated to the fitted null model:
nPar | number of parameters of the base(null) model |
cll | value of the conditional log-likelihood, in this case just the log-likelihood |
mll | value of the marginal log-likelihood, in this case just the log-likelihood |
cAIC | conditional Akaike Information Criterion (cAIC), in this case simply the AIC. |
mAIC | marginal Akaike Information Criterion (mAIC), in this case simply the AIC. |
Chi2 | value of the \chi^2 statistic (y - M\theta)'S^{-1}(y - M\theta) . |
p-value | p-value of the \chi^2 test for the null hypothesis that Chi2 follows a \chi^2 distribution with n - nPar degrees of freedom. |
The third element, called design, is a list including:
M | A n \times K dimensional (design) matrix. |
V | A p \times K dimensional net-effect matrix. |
Conditional negative log-likelihood
Description
Conditional negative log-likelihood -l(y \vert u)
of y given the random effects u
Usage
ny_u(theta, Z, X, y, V, VCNs, nObs)
Arguments
theta |
p-dimensional vector parameter. |
Z |
A |
X |
A |
y |
n-dimensional vector of the time-adjacent cellular increments |
V |
A |
VCNs |
A n-dimensional vector including values of the vector copy number corresponding to the cell counts of y. |
nObs |
A K-dimensional vector including the frequencies of each clone k ( |
Value
Conditional negative log-likelihood -l(y \vert u)
of y given the random effects u.
Multivariate relative error
Description
Compute the multivariate relative error between two vectors.
Usage
relErr(x, y)
Arguments
x |
a vector. |
y |
a vector. |
Value
the multivariate relative error between x and y.
Random generator from a Multivariate Normal distribution
Description
This function generate a random vector from a multivariate normal distribution.
Usage
rmvNorm(d, p, Mu, Sigma)
Arguments
d |
sample d. |
p |
vector dimension. |
Mu |
mean vector. |
Sigma |
covariance matrix. |
Value
A p \times d
matrix. Each column is a p-variate random vector from
a multivariate normal with mean vector Mu and covariance matrix Sigma.
Fit random-effects model
Description
Fit the random-effects model to the given data using an expectation-maximization algorithm.
Usage
rndEffModelFitting(
theta_0,
V,
M,
M_bdiag,
y,
VCNs,
nObs,
maxit,
maxemit,
eps = 1e-05,
thetaLB,
thetaUB,
factr,
pgtol,
lmm,
trace = TRUE,
verbose = TRUE
)
Arguments
theta_0 |
p-dimensional vector parameter used as initial guess in the inference procedure. |
V |
A |
M |
A |
M_bdiag |
A |
y |
n-dimensional vector of the time-adjacent cellular increments |
VCNs |
A n-dimensional vector including values of the vector copy number corresponding to the cell counts of y. |
nObs |
A K-dimensional vector including the frequencies of each clone k ( |
maxit |
maximum number of iterations for the optimization step. This argument is passed to optim() function. Details on "maxit" can be found in "optim()" documentation page. |
maxemit |
maximum number of iterations for the expectation-maximization algorithm. |
eps |
relative error for the value x and the objective function f(x) that has to be optimized in the expectation-maximization algorithm. |
thetaLB |
p-dimensional vector of lower bound values for theta. |
thetaUB |
p-dimensional vector of upper bound values for theta. |
factr |
controls the convergence of the "L-BFGS-B" method. Convergence occurs when the reduction in the objective is within this factor of the machine tolerance. Default is 1e7, that is a tolerance of about 1e-8. This argument is passed to optim() function. |
pgtol |
helps control the convergence of the "L-BFGS-B" method. It is a tolerance on the projected gradient in the current search direction. This defaults to zero, when the check is suppressed. This argument is passed to optim() function. |
lmm |
is an integer giving the number of BFGS updates retained in the "L-BFGS-B" method, It defaults to 5. This argument is passed to optim() function. |
trace |
Non-negative integer. If positive, tracing information on the progress of the optimization is produced. This parameter is passed to the optim() function. Higher values may produce more tracing information: for method "L-BFGS-B" there are six levels of tracing. (To understand exactly what these do see the source code: higher levels give more detail.) |
verbose |
(defaults to TRUE) Logical value. If TRUE, then information messages on the progress of the algorithm are printed to the console. |
Value
The output returned by "optim()" function (see "optim()" documentation for details) along with
the conditional expectation E[u \vert y]
and variance V[u \vert y]
of the latent states u given the observed states y from the last step of the expectation-maximization algorithm.
Base and random-effects model statistics
Description
Main statistics from the fitted base and random-effects models
Usage
rndEffModelStats(
theta_null,
theta_rndEff,
V,
M,
M_bdiag,
y,
VCNs,
nObs,
verbose = TRUE
)
Arguments
theta_null |
the estimated p-dimensional vector parameter for the base (null) model. |
theta_rndEff |
the estimated p-dimensional vector parameter for the random-effects model. |
M |
A |
M_bdiag |
A |
y |
n-dimensional vector of the time-adjacent cellular increments |
VCNs |
A n-dimensional vector including values of the vector copy number corresponding to the cell counts of y. |
nObs |
A K-dimensional vector including the frequencies of each clone k ( |
verbose |
(defaults to TRUE) Logical value. If TRUE, then information messages on the progress of the algorithm are printed to the console. |
Value
A vector of statistics associated to the fitted base and random-effects models:
nPar | number of parameters of the base(null) model |
cll | value of the conditional log-likelihood, in this case just the log-likelihood |
mll | value of the marginal log-likelihood, in this case just the log-likelihood |
cAIC | conditional Akaike Information Criterion (cAIC), in this case simply the AIC. |
mAIC | marginal Akaike Information Criterion (mAIC), in this case simply the AIC. |
Chi2 | value of the \chi^2 statistic (y - M\theta)'S^{-1}(y - M\theta) . |
p-value | p-value of the \chi^2 test for the null hypothesis that Chi2 follows a \chi^2 distribution with n - nPar degrees of freedom. |
KLdiv | Kullback-Leibler divergence of the random-effects model from the null model. |
KLdiv/N | Rescaled Kullback-Leibler divergence of the random-effects model from the null model. |
BhattDist_nullCond | Bhattacharyya distance between the random-effects model and the null model. |
BhattDist_nullCond/N | Rescaled Bhattacharyya distance between the random-effects model and the null model. |
Matrix trace
Description
Trace of a matrix
Usage
tr(M)
Arguments
M |
a matrix. |
Value
the trace of M.