Type: | Package |
Title: | Simulation and Comparison of Dissolution Profiles |
Version: | 0.4.1 |
Date: | 2021-08-19 |
Description: | Compare dissolution profiles with confidence interval of similarity factor f2 using bootstrap methodology as described in the literature, such as Efron and Tibshirani (1993, ISBN:9780412042317), Davison and Hinkley (1997, ISBN:9780521573917), and Shah et al. (1998) <doi:10.1023/A:1011976615750>. The package can also be used to simulate dissolution profiles based on mathematical modelling and multivariate normal distribution. |
License: | GPL (≥ 3) |
URL: | https://github.com/zhengguoxu/bootf2 |
BugReports: | https://github.com/zhengguoxu/bootf2/issues |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.1.1 |
Depends: | R (≥ 4.0.0) |
Imports: | ggplot2, minpack.lm, MASS, readxl, stats |
Suggests: | testthat (≥ 3.0.0), knitr, rmarkdown |
VignetteBuilder: | knitr |
Config/testthat/edition: | 3 |
NeedsCompilation: | no |
Packaged: | 2021-08-19 15:04:06 UTC; zhengguo |
Author: | Zhengguo Xu [aut, cre] |
Maintainer: | Zhengguo Xu <zhengguoxu@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2021-08-25 12:50:22 UTC |
Estimate 90% Confidence Intervals of f_2
with Bootstrap Methodology
Description
Main function to estimate 90% confidence intervals of f_2
using
bootstrap methodology.
Usage
bootf2(test, ref, path.in, file.in, path.out, file.out,
n.boots = 10000L, seed = 306L, digits = 2L, alpha = 0.05,
regulation = c("EMA", "FDA", "WHO","Canada", "ANVISA"),
min.points = 1L, both.TR.85 = FALSE, print.report = TRUE,
report.style = c("concise", "intermediate", "detailed"),
f2.type = c("all", "est.f2", "exp.f2", "bc.f2",
"vc.exp.f2", "vc.bc.f2"),
ci.type = c("all", "normal", "basic", "percentile",
"bca.jackknife", "bca.boot"),
quantile.type = c("all", as.character(1:9), "boot"),
jackknife.type = c("all", "nt+nr", "nt*nr", "nt=nr"),
time.unit = c("min", "h"), output.to.screen = FALSE,
sim.data.out = FALSE)
Arguments
test , ref |
Data frames of dissolution profiles of test and reference
product if |
path.in , file.in , path.out , file.out |
Character strings of input and output directories and file names. See Input/Output in Details. |
n.boots |
An integer indicating the number of bootstrap samples. |
seed |
Integer seed value for reproducibility. If missing, a random seed will be generated for reproducibility purpose. |
digits |
An integer indicating the decimal points for the output. |
alpha |
A numeric value between 0 and 1 to estimate
|
regulation |
Character strings indicating regulatory guidelines.
@seealso |
min.points |
An integer indicating the minimum time points to be used
to calculate |
both.TR.85 |
Logical. If |
print.report |
Logical. If |
report.style |
|
f2.type |
Character strings indicating which type of |
ci.type |
Character strings indicating which type of confidence interval should be estimated. See Types of confidence intervals in Details. |
quantile.type |
Character strings indicating the type of percentile. |
jackknife.type |
Character strings indicating the type of jackknife method. See Details. |
time.unit |
Character strings indicating the unit of time. It should
be either |
output.to.screen |
Logical. If |
sim.data.out |
Logical. If |
Details
Minimum required arguments that must be provided by the user
Arguments test
and ref
must be provided by the user. They should be R
data frames
, with time as the first column, and all individual profiles
profiles as the rest columns. The actual names of the columns do not matter
since they will be renamed internally.
Input/Output
The dissolution data of test and reference product can either be provided as
data frames for test
and ref
, as explained above, or be read from an
Excel file with data of test and reference stored in separate worksheets.
In the latter case, the argument path.in
, the directory where the Excel
file is, and file.in
, the name of the Excel file including the file
extension .xls
or .xlsx
, must be provided. In such case, the argument
test
and ref
must be the names of the worksheets in quotation marks.
The first column of each Excel worksheet must be time, and the rest columns
are individual dissolution profiles. The first row should be column names,
such as time, unit01, unit02, ... The actual names of the columns do not
matter as they will be renamed internally.
Arguments path.out
and file.out
are the names of the output directory
and file. If they are not provided, but argument print.report
is TRUE
,
then a plain text report will be generated automatically in the current
working directory with file name test_vs_ref_TZ_YYYY-MM-DD_HHMMSS.txt
,
where test
and ref
are data set names of test and reference, TZ
is the
time zone such as CEST
, YYYY-MM-DD
is the numeric date format and
HHMMSS
is the numeric time format for hour, minute, and second.
For a quick check, set argument output.to.screen = TRUE
, a summary report
very similar to concise
style report will be printed on screen.
Types of Estimators
According to Shah et al, the population f_2
for the inference is
f_2 = 100-25\log\left(1 + \frac{1}{P}\sum_{i=1}^P%
\left(\mu_{\mathrm{T},i} - \mu_{\mathrm{R},i}\right)^2 \right)\,,
where P
is the number of time points; \mu_{\mathrm{T},i}
and \mu_{\mathrm{R},i}
are population mean of test and
reference product at time point i
, respectively; \sum_{i=1}^P
is the summation from i = 1
to P
.
Five estimators for f_2
are included in the function:
The estimated
f_2
, denoted by\hat{f}_2
, is the one written in various regulatory guidelines. It is expressed differently, but mathematically equivalently, as\hat{f}_2 = 100-25\log\left(1 + \frac{1}{P}\sum_{i=1}^P\left(% \bar{X}_{\mathrm{T},i} - \bar{X}_{\mathrm{R},i}\right)^2 \right)\:,
where
P
is the number of time points;\bar{X}_{\mathrm{T},i}
and\bar{X}_{\mathrm{R},i}
are mean dissolution data at thei
th time point of random samples chosen from the test and the reference population, respectively. Compared to the equation of populationf_2
above, the only difference is that in the equation of\hat{f}_2
the sample means of dissolution profiles replace the population means for the approximation. In other words, a point estimate is used for the statistical inference in practice.The Bias-corrected
f_2
, denoted by\hat{f}_{2,\mathrm{bc}}
, was described in the article of Shah et al, as\hat{f}_{2,\mathrm{bc}} = 100-25\log\left(1 + \frac{1}{P}% \left(\sum_{i=1}^P\left(\bar{X}_{\mathrm{T},i} - % \bar{X}_{\mathrm{R},i}\right)^2 - \frac{1}{n}\sum_{i=1}^P% \left(S_{\mathrm{T},i}^2 + S_{\mathrm{R},i}^2\right)\right)\right)\,,
where
S_{\mathrm{T},i}^2
andS_{\mathrm{R},i}^2
are unbiased estimates of variance at thei
th time points of random samples chosen from test and reference population, respectively; andn
is the sample size.The variance- and bias-corrected
f_2
, denoted by\hat{f}_{2,\mathrm{vcbc}}
, does not assume equal weight of variance as\hat{f}_{2,\mathrm{bc}}
does.\hat{f}_{2, \mathrm{vcbc}} = 100-25\log\left(1 +% \frac{1}{P}\left(\sum_{i=1}^P \left(\bar{X}_{\mathrm{T},i} -% \bar{X}_{\mathrm{R},i}\right)^2 - \frac{1}{n}\sum_{i=1}^P% \left(w_{\mathrm{T},i}\cdot S_{\mathrm{T},i}^2 +% w_{\mathrm{R},i}\cdot S_{\mathrm{R},i}^2\right)\right)\right)\,,
where
w_{\mathrm{T},i}
andw_{\mathrm{R},i}
are weighting factors for variance of test and reference products, respectively, which can be calculated as follows:w_{\mathrm{T},i} = 0.5 + \frac{S_{\mathrm{T},i}^2}% {S_{\mathrm{T},i}^2 + S_{\mathrm{R},i}^2}\,,
and
w_{\mathrm{R},i} = 0.5 + \frac{S_{\mathrm{R},i}^2}% {S_{\mathrm{T},i}^2 + S_{\mathrm{R},i}^2}\,.
The expected
f_2
, denoted by\hat{f}_{2, \mathrm{exp}}
, is calculated based on the mathematical expectation of estimatedf_2
,\hat{f}_{2, \mathrm{exp}} = 100-25\log\left(1 + \frac{1}{P}% \left(\sum_{i=1}^P\left(\bar{X}_{\mathrm{T},i} - % \bar{X}_{\mathrm{R},i}\right)^2 + \frac{1}{n}\sum_{i=1}^P% \left(S_{\mathrm{T},i}^2 + S_{\mathrm{R},i}^2\right)\right)\right)\,,
using mean dissolution profiles and variance from samples for the approximation of population values.
The variance-corrected expected
f_2
, denoted by\hat{f}_{2, \mathrm{vcexp}}
, is calculated as\hat{f}_{2, \mathrm{vcexp}} = 100-25\log\left(1 +% \frac{1}{P}\left(\sum_{i=1}^P \left(\bar{X}_{\mathrm{T},i} -% \bar{X}_{\mathrm{R},i}\right)^2 + \frac{1}{n}\sum_{i=1}^P% \left(w_{\mathrm{T},i}\cdot S_{\mathrm{T},i}^2 +% w_{\mathrm{R},i}\cdot S_{\mathrm{R},i}^2\right)\right)\right)\,.
Types of Confidence Interval
The following confidence intervals are included:
The Normal interval with bias correction, denoted by
normal
in the function, is estimated according to Davison and Hinkley,\hat{f}_{2, \mathrm{L,U}} = \hat{f}_{2, \mathrm{S}} - E_B \mp % \sqrt{V_B}\cdot Z_{1-\alpha}\,,
where
\hat{f}_{2, \mathrm{L,U}}
are the lower and upper limit of the confidence interval estimated from bootstrap samples;\hat{f}_{2, \mathrm{S}}
denotes the estimators described above;Z_{1-\alpha}
represents the inverse of standard normal cumulative distribution function with type I error\alpha
;E_B
andV_B
are the resampling estimates of bias and variance calculated asE_B = \frac{1}{B}\sum_{b=1}^{B}\hat{f}_{2,b}^\star - % \hat{f}_{2, \mathrm{S}} = \bar{f}_2^\star - \hat{f}_{2,\mathrm{S}}\,,
and
V_B = \frac{1}{B-1}\sum_{b=1}^{B} \left(\hat{f}_{2,b}^\star -\bar{f}_2^\star\right)^2\,,
where
B
is the number of bootstrap samples;\hat{f}_{2,b}^\star
is thef_2
estimate withb
th bootstrap sample, and\bar{f}_2^\star
is the mean value.The basic interval, denoted by
basic
in the function, is estimated according to Davison and Hinkley,\hat{f}_{2, \mathrm{L}} = 2\hat{f}_{2, \mathrm{S}} -% \hat{f}_{2,(B+1)(1-\alpha)}^\star\,,
and
\hat{f}_{2, \mathrm{U}} = 2\hat{f}_{2, \mathrm{S}} -% \hat{f}_{2,(B+1)\alpha}^\star\,,
where
\hat{f}_{2,(B+1)\alpha}^\star
and\hat{f}_{2,(B+1)(1-\alpha)}^\star
are the(B+1)\alpha
th and(B+1)(1-\alpha)
th ordered resampling estimates off_2
, respectively. When(B+1)\alpha
is not an integer, the following equation is used for interpolation,\hat{f}_{2,(B+1)\alpha}^\star = \hat{f}_{2,k}^\star + % \frac{\Phi^{-1}\left(\alpha\right)-\Phi^{-1}\left(\frac{k}{B+1}\right)}% {\Phi^{-1}\left(\frac{k+1}{B+1}\right)-\Phi^{-1}% \left(\frac{k}{B+1}\right)}\left(\hat{f}_{2,k+1}^\star-% \hat{f}_{2,k}^\star\right),
where
k
is the integer part of(B+1)\alpha
,\hat{f}_{2,k+1}^\star
and\hat{f}_{2,k}^\star
are the(k+1)
th and thek
th ordered resampling estimates off_2
, respectively.The percentile intervals, denoted by
percentile
in the function, are estimated using nine different types of quantiles, Type 1 to Type 9, as summarized in Hyndman and Fan's article and implemented inR
'squantile
function. UsingR
'sboot
package, programbootf2BCA
outputs a percentile interval using the equation above for interpolation. To be able to compare the results among different programs, the same interval, denoted byPercentile (boot)
in the function, is also included in the function.The bias-corrected and accelerated (BCa) intervals are estimated according to Efron and Tibshirani,
\hat{f}_{2, \mathrm{L}} = \hat{f}_{2, \alpha_1}^\star\,,
\hat{f}_{2, \mathrm{U}} = \hat{f}_{2, \alpha_2}^\star\,,
where
\hat{f}_{2, \alpha_1}^\star
and\hat{f}_{2, \alpha_2}^\star
are the100\alpha_1
th and the100\alpha_2
th percentile of the resampling estimates off_2
, respectively. Type I errors\alpha_1
and\alpha_2
are obtained as\alpha_1 = \Phi\left(\hat{z}_0 + \frac{\hat{z}_0 + \hat{z}_\alpha}% {1-\hat{a}\left(\hat{z}_0 + \hat{z}_\alpha\right)}\right),
and
\alpha_2 = \Phi\left(\hat{z}_0 + \frac{\hat{z}_0 + % \hat{z}_{1-\alpha}}{1-\hat{a}\left(\hat{z}_0 + % \hat{z}_{1-\alpha}\right)}\right),
where
\hat{z}_0
and\hat{a}
are called bias-correction and acceleration, respectively.There are different methods to estimate
\hat{z}_0
and\hat{a}
. Shah et al. used jackknife technique, denoted bybca.jackknife
in the function,\hat{z}_0 = \Phi^{-1}\left(\frac{N\left\{\hat{f}_{2,b}^\star <% \hat{f}_{2,\mathrm{S}} \right\}}{B}\right),
and
\hat{a} = \frac{\sum_{i=1}^{n}\left(\hat{f}_{2,\mathrm{m}} -% \hat{f}_{2, i}\right)^3}{6\left(\sum_{i=1}^{n}\left(% \hat{f}_{2,\mathrm{m}} - \hat{f}_{2, i}\right)^2\right)^{3/2}}\,,
where
N\left\{\cdot\right\}
denotes the number of element in the set,\hat{f}_{2, i}
is thei
th jackknife statistic,\hat{f}_{2,\mathrm{m}}
is the mean of the jackknife statistics, and\sum
is the summation from 1 to sample sizen
.Program
bootf2BCA
gives a slightly different BCa interval withR
'sboot
package. This approach, denoted bybca.boot
in the function, is also implemented in the function for estimating the interval.
Notes on the argument jackknife.type
For any sample with size n
, the jackknife estimator is obtained by
estimating the parameter for each subsample omitting the i
th
observation. However, when two samples (e.g., test and reference) are
involved, there are several possible ways to do it. Assuming sample size
of test and reference are n_\mathrm{T}
and n_\mathrm{R}
,
the following three possibility are considered:
Estimated by removing one observation from both test and reference samples. In this case, the prerequisite is
n_\mathrm{T} = n_\mathrm{R}
, denoted bynt=nr
in the function. So if there are 12 units in test and reference data sets, there will be 12 jackknife estimators.Estimate the jackknife for test sample while keeping the reference data unchanged; and then estimate jackknife for reference sample while keeping the test sample unchanged. This is denoted by
nt+nr
in the function. This is the default method. So if there are 12 units in test and reference data sets, there will be12 + 12 = 24
jackknife estimators.For each observation deleted from test sample, estimate jackknife for reference sample. This is denoted by
nt*nr
in the function. So if there are 12 units in test and reference data sets, there will be12 \times 12 = 144
jackknife estimators.
Value
A list of 3 or 5 components.
-
boot.ci
: A data frame of bootstrapf_2
confidence intervals. -
boot.f2
: A data frame of all individualf_2
values for all bootstrap data set. -
boot.info
: A data frame with detailed information of bootstrap for reproducibility purpose, such as all arguments used in the function, time points used for calculation off_2
, and the number ofNA
s. -
boot.summary
: A data frame with descriptive statistics of the bootstrapf_2
. -
boot.t
andboot.r
: Lists of individual bootstrap samples for test and reference product ifsim.data.out = TRUE
.
References
Shah, V. P.; Tsong, Y.; Sathe, P.; Liu, J.-P. In Vitro
Dissolution Profile Comparison—Statistics and Analysis of the
Similarity Factor, f_2
. Pharmaceutical Research 1998,
15 (6), 889–896. DOI: 10.1023/A:1011976615750.
Davison, A. C.; Hinkley, D. V. Bootstrap Methods and Their Application. Cambridge University Press, 1997.
Hyndman, R. J.; Fan, Y. Sample Quantiles in Statistical Packages. The American Statistician 1996, 50 (4), 361–365. DOI: /10.1080/00031305.1996.10473566.
Efron, B.; Tibshirani, R. An Introduction to the Bootstrap. Chapman & Hall, 1993.
Examples
# time points
tp <- c(5, 10, 15, 20, 30, 45, 60)
# model.par for reference with low variability
par.r <- list(fmax = 100, fmax.cv = 3, mdt = 15, mdt.cv = 14,
tlag = 0, tlag.cv = 0, beta = 1.5, beta.cv = 8)
# simulate reference data
dr <- sim.dp(tp, model.par = par.r, seed = 100, plot = FALSE)
# model.par for test
par.t <- list(fmax = 100, fmax.cv = 3, mdt = 12.29, mdt.cv = 12,
tlag = 0, tlag.cv = 0, beta = 1.727, beta.cv = 9)
# simulate test data with low variability
dt <- sim.dp(tp, model.par = par.t, seed = 100, plot = FALSE)
# bootstrap. to reduce test run time, n.boots of 100 was used in the example.
# In practice, it is recommended to use n.boots of 5000--10000.
# Set `output.to.screen = TRUE` to view the result on screen
d <- bootf2(dt$sim.disso, dr$sim.disso, n.boots = 100, print.report = FALSE)
Calculate Similarity Factor f_2
Description
Main function to calculate f_2
according to different regulatory
guidelines.
Usage
calcf2(test, ref, path.in, file.in, path.out, file.out,
regulation = c("EMA", "FDA", "WHO", "Canada", "ANVISA"),
cv.rule = TRUE, message = FALSE, min.points = 3L,
f2.type = c("est.f2", "exp.f2", "bc.f2", "vc.exp.f2",
"vc.bc.f2", "all"), both.TR.85 = FALSE,
digits = 2L, time.unit = c("min", "h"), plot = TRUE,
plot.start.time = 0, plot.max.unit = 24L)
Arguments
test , ref |
Data frames of dissolution profiles of test and reference
product if |
path.in , file.in , path.out , file.out |
Character strings of input and output directories and file names. See Input/Output in Details. |
regulation |
Character strings indicating regulatory guidelines. See Regulation in Details. |
cv.rule |
Logical. If |
message |
Logical. If |
min.points |
An integer indicating the minimum time points to be used
to calculate |
f2.type |
Character strings indicating which |
both.TR.85 |
Logical. If |
digits |
An integer indicating the decimal points for the output. |
time.unit |
Character strings indicating the unit of time. It should
be either |
plot |
Logical. If |
plot.start.time |
Numeric value indicating the starting time for the plot. |
plot.max.unit |
Integer. If the number of individual units is no more
than this value, the mean and all individual profiles will be plotted;
otherwise, individual profiles will be represented by boxplots at each
time point. Therefore, to avoid overplotting, this value should not be
too large. @seealso |
Details
Minimum required arguments that must be provided by the user
Arguments test
and ref
must be provided by the user. They should be R
data frames
, with time as the first column, and all individual profiles
profiles as the rest columns, or mean profile as the second column if only
mean profile is available. The actual names of the columns do not matter
since they will be renamed internally.
Input/Output
The dissolution data of test and reference product can either be provided as
data frames for test
and ref
, as explained above, or be read from an
Excel file with data of test and reference stored in separate worksheets.
In the latter case, the argument path.in
, the directory where the Excel
file is, and file.in
, the name of the Excel file including the file
extension .xls
or .xlsx
, must be provided. In such case, the argument
test
and ref
must be the names of the worksheets in quotation marks.
The first column of each Excel worksheet must be time, and the rest columns
are individual dissolution profiles, or the second column must be mean
profile if only mean data is available. The first row should be column names,
such as time, unit01, unit02, ... The actual names of the columns do not
matter as they will be renamed internally.
Arguments path.out
and file.out
are the names of the output directory
and file. It is an overkill to output such simple calculations; therefore,
unless these two arguments are specified by the user, results are printed
on screen by default.
Regulation
To apply f_2
method, different regulatory guidelines have slightly
different requirements. Some requirements are almost universal, such as same
time points for the test and reference product, minimum 3 time points
(excluding time zero), and twelve individual profiles for each formulation.
Other requirements are slightly different among different regulatory
guidelines, or at least interpreted differently. Two main issues are the
rules for the variability (CV Rule) and time points where dissolution is more
than 85% (85% Rule).
CV rule
-
EMA
,Canada
, andANVISA
: The CV of the first time point should not be greater than 20%, and the CV of the rest time points should not be greater than 10%. -
WHO
: The CV should not be greater than 20% for time points up to 10 min, and not greater than 10% for the rest time points. -
FDA
: US FDA is more flexible. The CV for the early time points should not be greater than 20%, and for the rest time points, not greater than 10%.
The phrase the first time point in EMA
rule was later interpreted as all
time points up to 10 min, according to an unofficial communication with an
European regulator. This makes the EMA
rule the same as WHO
rule. For
example, if there are 5 min and 10 min time points in the dissolution
profiles, the CV for both 5 min and 10 min should not be greater than 20%.
The first time point in ANVISA
rule corresponds to 40% of the total
collected points. For example, for a dissolution profile with five
collection times, the first two collection times are considered first points.
The phrase early time points in FDA
rule is typically interpreted as
those points up to 15 min, sometimes even up to 20 min according to
an unofficial communication with FDA staff. In the function calcf2()
, the
cutting point for FDA rule is 15 min.
85% Rule
This rule is implemented as follows:
-
EMA
,FDA
,Canada
, andANVISA
: Only one measurement is considered after 85% of dissolution for any product. -
WHO
: Dissolution profiles should be 'cut' at the time point where the reference release more than 85%. Therefore,WHO
rule only differs from rule ofEMA
,FDA
,Canada
, andANVISA
when test product dissolve faster than reference. If reference product dissolve faster, then rules of all five regulatory bodies are same in this regard.
Notes on conventional FDA rule
The exact phrase in the guidance of US FDA regarding this rule is that
"Only one measurement should be considered after 85% dissolution of both
the products." Due to the ambiguous word "both" used in the sentence, the
conventional interpretation was that all measurements up to the time point
at which both test and reference dissolved more than 85% should be included
in the calculation of f_2
. However, this is only true when both
test and reference dissolve more than 85% at the same time points.
Consider the following example:
time | test | reference |
5 | 7 | 10 |
10 | 15 | 20 |
15 | 50 | 55 |
20 | 69 | 86 |
30 | 82 | 90 |
45 | 84 | 95 |
60 | 86 | 97 |
According to conventional interpretation, all measurements up to 60 min
should be included to calculate f_2
because both test and reference
dissolved more than 85% only at 60 min, not at any earlier time point.
However, in such case, there would be 4 measurement of reference (20, 30, 45,
and 60 min) included in the calculation, which would be a direct
contradictory to the phrase "Only one measurement should be considered
after 85% ..." in the same statement in the guidance!
In an unofficial communication using this example, an FDA staff confirmed that only the first 4 time points (up to 20 min) would be used. In other words, FDA rule in this regard is the same as EMA rule.
The statement in ANVISA
guideline also uses the word "ambos" (means both),
which could also lead to the similar confusion. Follow the same logic as
demonstrated above, it should also be interpreted as the same rule in EMA
guideline.
Read vignette Introduction to bootf2 for more details.
Value
A data frame of f_2
type and f_2
value, the
number of time points used for the calculation (f2.tp
), indication if
both test and reference dissolve more than 85% at 15 min (d85at15
), and
other information used for the calculation.
Examples
tp <- c(5, 10, 15, 20, 30, 45, 60)
mod.par.t <- list(fmax = 100, fmax.cv = 2, tlag = 0, tlag.cv = 0,
mdt = 20, mdt.cv = 5, beta = 2.2, beta.cv = 5)
d.t <- sim.dp(tp, model.par = mod.par.t, seed = 100, n.units = 120L,
plot = FALSE)$sim.disso
mod.par.r <- list(fmax = 100, fmax.cv = 2, tlag = 0, tlag.cv = 0,
mdt = 25, mdt.cv = 4, beta = 2.1, beta.cv = 3)
d.r <- sim.dp(tp, model.par = mod.par.r, seed = 100, n.units = 120L,
plot = FALSE)$sim.disso
# set `message = TRUE` to view the compliance of the regulatory guidelines.
calcf2(d.t, d.r, plot = FALSE)
Helper Functions
Description
Helper Functions
Usage
bpwhisker.l(x)
bpwhisker.u(x)
ci.header(boot.info)
mod.ref(tp, ref.dp, digits = 4, model, max.disso, time.unit)
rpt.ci(f2type, btsum, boot.info)
rpt.f2(f2type, f2o, boot.info, a.jack, btsum)
rpt.concise(boot.f2.ci, boot.info, f2o, a.jack, btsum)
rpt.detailed(data.t, data.r, boot.t, boot.r, boot.f2, boot.info, f2o)
rpt.info(boot.info)
rpt.intermediate(boot.info, boot.f2)
rpt.screen(boot.f2.ci, boot.info, f2o, a.jack, btsum)
Arguments
x |
Numeric vector |
boot.info |
A data frame of bootstrap information from |
tp , ref.dp |
Numeric vector of time points |
digits |
An integer indicating the decimal points for the output. |
model |
Strings of model names. Currently only 'Weibull' and 'first-order' models are supported. |
max.disso |
Numeric value indicating the maximum dissolution. |
time.unit |
Character strings indicating the unit of time. It should
be either |
f2type |
Character strings indicating the f2 type. |
btsum |
A data frame of descriptive statistics of the bootstrap data set. |
f2o |
Vector of f2 values calculated with the original data set |
a.jack |
Data frame of acceleration from |
boot.f2.ci |
Matrix of f2 values from bootstrap data sets |
data.t , data.r |
Input data sets for test and reference. |
boot.t , boot.r |
List of bootstrap data sets for test and reference. |
boot.f2 |
Matrix of f2 calculated from bootstrap data sets. |
Dissolution data from the article of Shah et al 1998
Description
Dissolution data of reference and 5 batches of test products published in the article of Shah et al, Pharm. Res.1998, 15(6), 889–896.
Usage
shah1998
Format
A list of 6 data frames. Each data frame is a set of dissolution profiles with the format: the first column is time, the rest columns are individual profiles.
- ref
dissolution of reference batch
- test1
dissolution of test batch 1
- test2
dissolution of test batch 2
- test3
dissolution of test batch 3
- test4
dissolution of test batch 4
- test5
dissolution of test batch 5
Source
Shah VP, Tsong Y, Sathe P, Liu JP. In vitro dissolution profile comparison–statistics and analysis of the similarity factor, f2. Pharm Res. 1998 Jun;15(6):889-96. doi: 10.1023/a:1011976615750.
Simulate Dissolution Profiles
Description
Function to simulate dissolution profiles based on mathematical models or multivariate normal distribution.
Usage
sim.dp(tp, dp, dp.cv, model = c("Weibull", "first-order"),
model.par = NULL, seed = NULL, n.units = 12L, product,
max.disso = 100, ascending = FALSE, message = FALSE,
time.unit = c("min", "h"), plot = TRUE,
plot.max.unit = 36L)
Arguments
tp |
Numeric vector of time points for the dissolution profiles. See Details. |
dp , dp.cv |
Numeric vectors of the target mean dissolution profile
( |
model |
Character strings of model names. Currently only |
model.par |
A list with model parameters. If missing, a list of
random |
seed |
Integer seed value for reproducibility. If missing, a random seed will be generated for reproducibility purpose. |
n.units |
An integer indicating the number of individual profiles to be simulated. |
product |
Character strings indicating the product name of the simulated profiles. If missing, a random name with 3 letters and 4 digits will be generated. |
max.disso |
Numeric value for the maximum possible dissolution. In theory, the maximum dissolution is 100%, but in practice, it is not uncommon to see higher values, such as 105%, or much lower values, especially for products with low solubility. |
ascending |
Logical. If |
message |
Logical. If |
time.unit |
Character strings indicating the unit of time. It should
be either |
plot |
Logical. If |
plot.max.unit |
Integer. If the number of individual units is no more
than this value, the mean and all individual profiles will be plotted;
otherwise, individual profiles will be represented by boxplots at each
time point. Therefore, to avoid overplotting, this value should not be
too large. @seealso |
Details
Simulation approaches
The approach used to simulate individual dissolution profiles depends on if
the target mean dissolution profile dp
is provided by the user or not.
If
dp
is not provided, then it will be calculated usingtp
,model
, andmodel.par
. The parameters defined bymodel.par
are considered as the population parameter; consequently, the calculateddp
is considered as the targeted population profile. In addition,n.units
sets of individual model parameters will be simulated based on exponential error model, and individual dissolution profiles will be generated using those individual parameters. The mean of all simulated individual profiles,sim.mean
, included in one of the outputs data frames,sim.summary
, will be similar, but not identical, todp
. The difference betweensim.mean
anddp
is determined by the number of units and the variability of the model parameters. In general, the larger the number of units and the lower of the variability, the smaller the difference. Additional details on the mathematical models are given below.If
dp
is provided, thenn.units
of individual dissolution profiles will be simulated using multivariate normal distribution. The mean of all simulated individual profiles,sim.mean
, will be identical todp
. In such case, it is recommended thatdp.cv
, the CV at time pointstp
, should also be provided by the user. Ifdp.cv
is not provided, it will be generated automatically such that the CV is between 10% and 20% for time points up to 10 min, between 1% and 3% for time points where the dissolution is more than 95%, between 3% and 5% for time points where the dissolution is between 90% and 95%, and between 5% and 10% for the rest of time points. Whether thedp.cv
is provided or generated automatically, the CV calculated using all individual profiles will be equal todp.cv
. Additional details on this approach are given below.
Minimum required arguments that must be provided by the user
If dp
is provided by the user, logically, tp
must also be provided, and
the approach based on multivariate normal distribution is used, as explained
above. If dp
is not provided, tp
could be missing, i.e., a simple
function call such as sim.dp()
will simulate dissolution profiles. In such
case, a default tp
will be generated depending on the time.unit
: if the
time.unit
is "min"
, then tp
would be c(5, 10, 15, 20, 30, 45, 60)
;
otherwise the tp
would be c(1, 2, 3, 4, 6, 8, 10, 12)
. The rest
arguments are either optional, or required by the function but default
values will be used.
Notes on mathematical models
The first-order model is expressed as
f_t = f_\mathrm{max} \left(1 - %
e^{-k\left(t - t_\mathrm{lag}\right)}\right),
and the Weibull model was expressed either as
f_t = f_\mathrm{max} \left(1 - %
e^{-\left(\frac{t - t_\mathrm{lag}}{\mathrm{MDT}}%
\right)^\beta}\right)
or
f_t = f_\mathrm{max} \left(1 - %
e^{-\frac{(t - t_\mathrm{lag})^\beta}{\alpha}}\right)
where f_\mathrm{max}
is the maximum dissolution,
\mathrm{MDT}
is the mean dissolution time,
t_\mathrm{lag}
is the lag time, \alpha
and
\beta
are the scale and shape factor in Weibull function,
and k
is the rate constant in the first-order model. Obviously,
The two Weibull models are mathematically equivalent by letting
\alpha = \mathrm{MDT}^\beta
.
Individual model parameter were simulated according to the exponential error model
P_i = P_\mu e^{N\left(0, \sigma^2\right)}
where P_i
and P_\mu
denote the individual and
population model parameters; N\left(0, \sigma^2\right)
represents the normal distribution with mean zero and variance \sigma^2
(\sigma = \mathrm{CV}/100
).
How to supply model.par
The argument model.par
should be supplied as a named list of model
parameters as explained above, and their respective CV for simulation of
individual parameters. Therefore, for the first-order model, three pairs of
parameters should be specified: fmax/fmax.cv
, k/k.cv
, and tlag/tlag.cv
;
and for Weibull model, four pairs: fmax/fmax.cv
, tlag/tlag.cv
,
beta/beta.cv
, and either alpha/alpha.cv
or mdt/mdt.cv
, depending on
the mathematical formula used. CV should be given in percentage, e.g., if
CV for beta
is 30%, it should be specified as beta.cv = 30
, not
beta.cv = 0.3
. The order of the parameters does not matter, but the name
of the parameters must be given exactly same as described above.
For example:
-
model.par = list(fmax = 100, fmax.cv = 5, k = 0.6, k.cv = 25, tlag = 0, tlag.cv = 0)
for the first-order model. -
model.par = list(fmax = 100, fmax.cv = 5, tlag = 5, tlag.cv = 10, mdt = 15, mdt.cv = 20, beta = 1.5, beta.cv = 5)
, or -
model.par = list(fmax = 100, fmax.cv = 5, tlag = 5, tlag.cv = 10, alpha = 60, alpha.cv = 20, beta = 1.5, beta.cv = 5)
for Weibull models.
Notes on multivariate normal distribution approach
When this approach is used, depending on dp/dp.cv
, there is no guarantee
that all individual profiles increase with time; near the end of the time
points, some individual profiles may decrease, especially when the
dissolution is close to the plateau and the variability is high. This can
happen to real life data, especially for those products with drug substances
that are unstable in dissolution medium. To force increase for all profiles,
set ascending = TRUE
. Depending on the data, the program may take long
time to run, or may even fail.
Value
A list of 3 to 5 components:
-
sim.summary
: A data frame with summary statistics of all individual dissolution profiles. -
sim.disso
: A data frame with all individual dissolution profiles. -
sim.info
: A data frame with information of the simulation such as the seed number and number of individual profiles. If modelling approach is used, the data frame will contain model parameters as well. -
model.par.ind
: A data frame of all individual model parameters that were used for the simulation of individual dissolution profiles. Available only if the modelling approach is used, i.e., whendp
is missing. -
sim.dp
: A plot. Available only if the argumentplot
isTRUE
.
Examples
# time points
tp <- c(5, 10, 15, 20, 30, 45, 60)
# using all default values to simulate profiles
d1 <- sim.dp(tp, plot = FALSE)
# summary statistics
d1$sim.summary
# individual profiles
d1$sim.disso
# simulation information
d1$sim.info
#individual model parameters
d1$mod.par.ind
# using Weibull model to simulate 100 profiles without lag time
mod.par <- list(fmax = 100, fmax.cv = 2, tlag = 0, tlag.cv = 0,
mdt = 20, mdt.cv = 5, beta = 2.2, beta.cv = 5)
d2 <- sim.dp(tp, model.par = mod.par, seed = 100, n.units = 100L,
plot = FALSE)
# using multivariate normal distribution approach
# target mean profile with same length as tp
dp <- c(39, 56, 67, 74, 83, 90, 94)
# CV% at each time point
dp.cv <- c(19, 15, 10, 8, 8, 5, 3)
# simulation
d3 <- sim.dp(tp, dp = dp, dp.cv = dp.cv, seed = 1234, plot = FALSE)
Simulate Dissolution Profiles by f_2
Values
Description
Given any mean dissolution profile dp
, this function will simulate a mean
dissolution profile such that when the newly simulated profile is compared
to dp
, the calculated f_2
will be equal to the predefined target
f_2
value.
Usage
sim.dp.byf2(tp, dp, target.f2, seed = NULL, min.points = 3L,
regulation = c("EMA", "FDA", "WHO", "Canada", "ANVISA"),
model = c("Weibull", "first-order"), digits = 2L,
max.disso = 100, message = FALSE, both.TR.85 = FALSE,
time.unit = c("min", "h"), plot = TRUE, sim.dp.out,
sim.target = c("ref.pop", "ref.median", "ref.mean"),
model.par.cv = 50, fix.fmax.cv = 0, random.factor = 3)
Arguments
tp , dp |
Numeric vector of time points |
target.f2 |
Numeric value of target |
seed |
Integer seed value for reproducibility. If missing, a random seed will be generated for reproducibility purpose. |
min.points |
An integer indicating the minimum time points to be used
to calculate |
regulation |
Character strings indicating regulatory guidelines.
@seealso |
model |
Character strings of model names. Currently only |
digits |
An integer indicating the decimal points for the output. |
max.disso |
Numeric value for the maximum possible dissolution. In theory, the maximum dissolution is 100%, but in practice, it is not uncommon to see higher values, such as 105%, or much lower values, especially for products with low solubility. |
message |
Logical. If |
both.TR.85 |
Logical. If |
time.unit |
Character strings indicating the unit of time. It should
be either |
plot |
Logical. If |
sim.dp.out |
Output of function |
sim.target |
Character strings indicating to which target profile
should the newly simulated profile be compared for the calculation of
|
model.par.cv , fix.fmax.cv |
Numeric values expressed as percentages
used for random generation of model parameters and fmax when optimization
algorithm is not used, i.e., when |
random.factor |
Numeric value used for random generation of model
parameters when optimization algorithm is used, i.e., when |
Details
The main principle of the function is as follows:
For any given mean dissolution profile
dp
, fit a suitable mathematical model and obtain model parameters.No precise fitting is required since those parameters will be served as initial value for further model fitting.
If
sim.dp.out
, the output of the functionsim.dp()
, is available, no initial fitting is necessary as model parameters can be read directly from the output, unless multivariate normal distribution approach was used insim.dp()
. In such case, initial model fitting will be done.
Find a suitable model parameters and simulate a new dissolution profile, comparing the new profile to the provided reference profile
dp
by calculatingf_2
. If the the obtainedf_2
is equal to, or within the lower and upper limit of, thetarget.f2
, then the function will output the obtained model parameters and the new profile.
There are two approaches used to find the suitable model parameters:
If
target.f2
is a single value, optimization algorithm will be used and the newly simulated dissolution profile will havef_2
equal totarget.f2
when compared todp
(within numeric precision defined by the tolerance).If
target.f2
is a vector of two numbers representing the lower and upper limit of targetf_2
value, such astarget.f2 = c(lower, upper)
, then dissolution will be obtained by random searching and the calculatedf_2
will be within the range of lower and upper.
For example, you can set target.f2 = c(54.95, 55.04)
to have target
f_2
of 55. Since f_2
should be normally reported without
decimal, in practice, this precision is enough. You might be able to do with
c(54.99, 55.01)
if you really need more precision. However, the narrower
the range, the longer it takes the function to run. With narrow range such as
c(54.999, 55.001)
, it is likely the program will fail. In such case,
provide single value to use optimization algorithm.
Arguments model.par.cv
, fix.fmax.cv
, and random.factor
are certain
numeric values used to better control the random generation of model
parameters. The default values should work in most scenarios. Those values
should be changed only when the function failed to return any value. Read
vignette of the function (vignette("sim.dp.byf2", package = "bootf2")
)
for more details.
The data frame sim.summary
in sim.dp.out
, the output of function
sim.dp()
, contains dp
, the population profile, and sim.mean
and
sim.median
, the mean and median profiles calculated with n.units
of
simulated individual profiles. All these three profiles could be used as the
target profile that the newly simulated profile will be compare to. Argument
sim.target
defines which of the three will be used: ref.pop
, ref.mean
,
and ref.median
correspond to dp
, sim.mean
and sim.median
,
respectively.
Value
A list of 2 components: a data frame of model parameters and a
data frame of mean dissolution profile generated using the said model
parameters. The output can be passed to function sim.dp()
to further
simulate multiple individual profiles.
Examples
tp <- c(5, 10, 15, 20, 30, 45, 60)
mod.par.r <- list(fmax = 100, fmax.cv = 2, tlag = 0, tlag.cv = 0,
mdt = 25, mdt.cv = 4, beta = 2.1, beta.cv = 3)
d.r <- sim.dp(tp, model.par = mod.par.r, seed = 100, n.units = 120L,
plot = FALSE)
model.par1 <- sim.dp.byf2(sim.dp.out = d.r, target.f2 = 60, seed = 123)
model.par2 <- sim.dp.byf2(sim.dp.out = d.r, target.f2 = c(59.95, 60.04),
seed = 123)