Version: | 0.0-22 |
Date: | 2021-08-19 |
Title: | Discretised Beta Distribution |
Author: | Rolf Turner <r.turner@auckland.ac.nz> |
Maintainer: | Rolf Turner <r.turner@auckland.ac.nz> |
Description: | Tools for working with a new versatile discrete distribution, the db ("discretised Beta") distribution. This package provides density (probability), distribution, inverse distribution (quantile) and random data generation functions for the db family. It provides functions to effect conveniently maximum likelihood estimation of parameters, and a variety of useful plotting functions. It provides goodness of fit tests and functions to calculate the Fisher information, different estimates of the hessian of the log likelihood and Monte Carlo estimation of the covariance matrix of the maximum likelihood parameter estimates. In addition it provides analogous tools for working with the beta-binomial distribution which has been proposed as a competitor to the db distribution. |
Depends: | R (≥ 3.2.2) |
Suggests: | hmm.discnp, MASS, rmutil, spcadjust |
LazyData: | true |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
NeedsCompilation: | no |
Packaged: | 2021-08-19 05:10:08 UTC; rolf |
Repository: | CRAN |
Date/Publication: | 2021-08-19 13:40:05 UTC |
Analytic hessian.
Description
Compute the hessian of the negative log likelihood of a db or beta binomial distribution from an analytic expression for this quantity.
Usage
aHess(object,x)
Arguments
object |
An object of class |
x |
A numeric vector of observations appropriate for the model that
was fitted to produce |
Details
This function is essentially the same as the finfo()
functions and differs from it only in that it is designed to
act up "mleDb"
or "mleBb"
objects, from which
(estimates of) the relevant parameters are extracted.
Value
A two-by-two positive definite (with any luck!) numeric matrix. Its inverse is an estimate of the covariance matrix of the parameter estimates.
Author(s)
Rolf Turner r.turner@auckland.ac.nz
See Also
Examples
X <- hmm.discnp::SydColDisc
X$y <- as.numeric(X$y)
X <- split(X,f=with(X,interaction(locn,depth)))
x <- X[[19]]$y
fit <- mleDb(x, ntop=5)
H <- aHess(fit)
print(solve(H)) # Equal to ...
print(vcov(fit))
X <- hrsRcePred
top1e <- X[X$sbjType=="Expert","top1"]
fit <- mleBb(top1e,size=10)
H <- aHess(fit,x=top1e)
print(solve(H)) # Equal to ...
print(vcov(fit))
The db (“discretised Beta”) distribution.
Description
Density, distribution function, quantile function and random generation for the db
distribution with parameters alpha
, beta
and ntop
.
Usage
ddb(x, alpha, beta, ntop, zeta=FALSE, log=FALSE)
pdb(x, alpha, beta, ntop, zeta=FALSE)
qdb(p, alpha, beta, ntop, zeta=FALSE)
rdb(n, alpha, beta, ntop, zeta=FALSE)
Arguments
x |
Numeric vector of values at which the “density” (probability
mass function) |
alpha |
Positive scalar. The first “shape” parameter of the db distribution. |
beta |
Positive scalar. The second “shape” parameter of the db distribution. |
ntop |
Integer scalar, strictly greater than 1. The maximum possible value of the db distribution. |
zeta |
Logical scalar. Should zero origin indexing be used?
I.e. should the range of values of the distribution be taken to
be |
log |
Logical scalar. Should logs of the probabilities calculated by
|
p |
Vector of probablilities (i.e. values between 0 and 1). The
corresponding quantiles of the db distribution are calculated
by |
n |
Integer scalar. An independent sample of size |
Details
In the predecessor of this package (hse
versions 0.1-15
and earlier), the probability function of the distribution
was calculated as dbeta(x/(ntop+1),alpha,beta)/
sum(dbeta((nbot:ntop)/(ntop+k),alpha,beta))
where nbot
and k
were set to 1 if zeta
was FALSE
,
and nbot
was set to 0 and k
to 2 if zeta
was TRUE
.
However the probability function is calculated in a more
“direct” manner, using an exponential family representation
of this function. The Beta
distribution is no longer called
upon (although it still of course conceptually underlies the
distribution).
The function ddb()
is a probability mass function for
an ad hoc finite discrete distribution of ordered values,
with a “reasonably flexible” shape.
The p
th quantile of a random variable X
is defined to be
the infimum over the range of X
of those values of x
such that F(x) \geq p
where F(x)
is the cumulative
distribution function for X
. Note that if we did not impose the
“over the range of X
” restriction, then the 0th quantile of
e.g. an exponential distribution would be -\infty
(since F(x) \geq 0
for all x
) whereas we
actually want this quantile to be 0.
Consequently qdb(p,alpha,beta,ntop)
is equal to the
least value of i
such that pdb(i,alpha,beta,ntop)
\geq
p
. The set of values of i
to be
considered is {1,2,...,ntop}
if zeta
is
FALSE
and is {0,1,2,...,ntop}
if zeta
is TRUE
.
Value
For
ddb()
andpdb()
vectors of probabilities.For
qdb()
a vector of quantiles.For
rdb()
a vector of lengthn
, of integers betweennbot
andntop
, independently sampled from the db distribution, wherenbot
is 1 ifzeta
isFALSE
and is 0 ifzeta
isTRUE
.
Note
In the predecessor of this package (hse
, versions 0.1-14
and earlier) the density/probability function threw an error if
any values of argument i
were not in the set of integers
nbot:ntop
. In accordance with a suggestion from Duncan
Murdoch this behaviour was changed so that the density/probability
function returns 0 for such values. It also issues a warning
if any of the values are non-integer. The criterion used
for “non-integer” is that abs(i-round(i)) >
sqrt(.Machine$double.eps)
. The new behaviour is analogous to
that of other probability functions used in R, dbinom()
in particular.
Author(s)
Rolf Turner r.turner@auckland.ac.nz
See Also
Examples
parz <- list(c(0.5,0.5),c(5,1),c(1,3),c(2,2),c(2,5))
for(i in 1:5) {
p1 <- ddb(1:15,parz[[i]][1],parz[[i]][2],15)
names(p1) <- 1:15
eckslab <- paste0("alpha=",parz[[i]][1]," beta=",parz[[i]][2])
barplot(p1,xlab=eckslab,main="db probabilities",
space=1.5,col="black")
abline(h=0)
if(i < 5) readline("Go? ")
}
x <- c(-1.5,-1,-0.5,0,0.5,1,1.5)
ddb(x,2.5,1,5,TRUE) # Produces 0 for all but the 4th and 6th
# entries of x, and issues a warning.
Internal Verdis functions.
Description
Internal Verdis functions.
Usage
grad(x, distr=c("db","betabinom"),gpar)
gradDb(x,gpar)
gradBb(x,gpar)
hess(x, distr=c("db","betabinom"),hpar)
hessDb(hpar)
hessBb(x,hpar)
mcCovMatEngine(fitz,par0,seed)
meDb(x, ntop)
meBb(x, size, warn=FALSE)
Details
These functions are auxiliary and are not intended to be called by the user.
Set or query the value of the "maxitErrorOrWarn"
option.
Description
Chooses (set.eow()
) or queries (get.eow()
), the
reaction to maxit
being exceeded in mleDb()
or mleBb()
. The possible reactions are to throw
an error or to issue a warning. The choice is effected
by calling set.eow()
which sets the value of
options()[["maxitErrorOrWarning"]]
. The current choice
is revealed by get.eow()
. This choice is set equal to
"error"
at startup.
Usage
set.eow(eow = c("error", "warn"))
get.eow()
Arguments
eow |
Character string that specifies the reaction to |
Value
No value is returned by set.eow()
. the value of
"maxitErrorOrWarn"
in options()
.
The function get.eow()
returns the current value of
options[["maxitErrorOrWarn"]]
.
Note
It seems unlikely that you would want to change the option from the value that is set at startup. This function is provided “just in case”.
Author(s)
Rolf Turner r.turner@auckland.ac.nz
See Also
Examples
get.eow() # Is "error" at startup.
set.eow("w") # Changes the option from "error" to "warning".
set.eow("e") # Changes it back again.
Exact moment estimates for the db distribution.
Description
Attempts to calculate “exact” moment estimates of the
parameters of a db distribution. This is done by minimising
the sum of squared differences between the sample mean and variance
(xbar
and s2
) and the theoretical mean and variance.
Calls upon optim()
with the "BFGS"
method.
Usage
exactMeDb(x, ntop, zeta=FALSE, par0 = NULL, maxit = 1000)
Arguments
x |
A random sample from the db distribution whose parameters are being estimated. Missing values are allowed. |
ntop |
The |
zeta |
See |
par0 |
Optional starting values for the iterative estimation procedure.
A vector with entries |
maxit |
Integer scalar. The maximum number of iterations to be undertaken
by |
Details
This function is really an “intellectual curiosity”. The
results produced may be compared with those produced via maximum
likelihood (using mleDb()
) which in theory should
be “better”. Since numerical optimisation has to be applied
to calculate the “exact” moment estimates, there is no
real saving in terms of computation cost.
Value
An object of class "exactMeDb"
. Such an object consists
of a named vector with entries "alpha"
and "beta"
,
which are the “exact” moment estimates of the corresponding
parameters. It has a number of attributes:
-
"ntop"
The value of thentop
argument. -
"zeta"
The value of thezeta
argument. -
"minSqDiff"
The (minimised) value of the sum of the squared differences between the sample mean and variance (xbar
ands2
) and the theoretical mean and variance. Ideally this minimised value should be zero. -
ndata
The number of non-missing values in the data set for which the likelihood was maximised, i.e.sum(!is.na(x))
.
Author(s)
Rolf Turner r.turner@auckland.ac.nz
See Also
ddb
meDb()
mleDb()
expValDb()
varDb()
optim()
Examples
set.seed(42)
x <- rdb(500,3,5,2)
eMom <- exactMeDb(x,ntop=2,zeta=FALSE)
eMle <- mleDb(x,ntop=2)
# Get much better results using true parameter values
# as starting values; pity we can't do this in real life!
eMom <- exactMeDb(x,ntop=2,zeta=FALSE,par0=c(alpha=3,beta=5))
eMle <- mleDb(x,2,par0=c(alpha=3,beta=5))
# Larger ntop value
x <- rdb(500,3,5,20)
eMom <- exactMeDb(x,ntop=20,zeta=FALSE)
eMle <- mleDb(x,ntop=20)
# Binomial, n = 10, p = 0.3.
set.seed(42)
x <- rbinom(1000,10,0.3)
eMom <- exactMeDb(x,ntop=10,zeta=TRUE)
eMle <- mleDb(x,ntop=10,zeta=TRUE)
p1 <- dbinom(0:10,10,0.3)
p2 <- dbinom(0:10,10,mean(x)/10)
p3 <- table(factor(x,levels=0:10))/1000
p4 <- ddb(0:10,alpha=eMom["alpha"],beta=eMom["beta"],ntop=10,zeta=TRUE)
plot(eMle,obsd=x,legPos=NULL,ylim=c(0,max(p1,p2,p3,p4)))
lines(0.2+(0:10),p1,col="orange",type="h",ylim=c(0,max(p1,p2)))
lines(0.3+(0:10),p2,col="green",type="h")
legend("topright",lty=1,col=c("red","blue","orange","green","black"),
legend=c("dbMle","observed","true binomial","fitted binomial","dbMom"),bty="n")
Expected value of a beta binomial distribution.
Description
Calculate the expected value (theoretical mean) of a random variable having a beta binomial distribution.
Usage
expValBb(mo,...)
## S3 method for class 'mleBb'
expValBb(mo,...)
## Default S3 method:
expValBb(mo, size, ...)
Arguments
mo |
For the |
size |
Integer scalar specifying the upper limit of the “support”
of the beta binomial distribution under consideration. The support
is the set of integers |
... |
Not used. |
Details
For the "mleBb"
method, the single argument should really
be called (something like) “object
” and for the
default method the first argument should be called m
.
However the argument lists must satisfy the restrictions that
“A method must have all the arguments of the generic,
including ... if the generic does.” and “A method
must have arguments in exactly the same order as the generic.”
For the "mleBb"
method, the values of m
and size
are extracted from the attributes of mo
.
The expected value of a beta binomial distribution is trivial to calculate “by hand”. These functions are provided for convenience and to preserve parallelism with the db distribution.
Value
Numeric scalar equal to the expected value of a beta binomial distributed random variable with the given parameters.
Author(s)
Rolf Turner r.turner@auckland.ac.nz
See Also
Examples
expValBb(0.3,15)
X <- hmm.discnp::Downloads
fit <- mleBb(X,size=15)
expValBb(fit)
Expected value of a db distribution.
Description
Calculate the expected value (theoretical mean) of a random variable having a db distribution.
Usage
expValDb(ao,...)
## S3 method for class 'mleDb'
expValDb(ao,...)
## Default S3 method:
expValDb(ao, beta, ntop, zeta=FALSE,...)
Arguments
ao |
For the |
beta |
See |
ntop |
See |
zeta |
See |
... |
Not used. |
Details
For the "mleDb"
method, the single argument should really
be called (something like) “object
” and for the
default method the first argument should be called alpha
.
However the argument lists must satisfy the restrictions that
“A method must have all the arguments of the generic,
including ... if the generic does.” and “A method
must have arguments in exactly the same order as the generic.”
For the "mleDb"
method, the values of alpha
,
beta
, ntop
and zeta
(passed to ddb()
)
are extracted from the attributes of ao
.
The expected value of a db distribution is theoretically intractable but is readily calculable numerically as
\sum x \times \Pr(X=x)
.
Value
Numeric scalar equal to the expected value of a db distributed random variable with the given parameters.
Author(s)
Rolf Turner r.turner@auckland.ac.nz
See Also
Examples
expValDb(3,4,15)
X <- hmm.discnp::Downloads
fit <- mleDb(X,ntop=15,zeta=TRUE)
expValDb(fit)
Fisher information.
Description
Compute the Fisher information for a db distribution or a beta binomial distribution given the parameters of that distribution. In the case of the db distribution a specified number of observations must be supplied. In the case of the beta binomial distribution the actual observations must be supplied. The inverse of the Fisher information is an estimate of the covariance matrix of the parameter estimates.
Usage
finfo(distr=c("db","betabinom"),alpha, beta, ntop, ndata,
zeta = FALSE, x, m, s, size)
Arguments
distr |
Text string specifying which distribution to consisder.
May be abbreviated (e.g. to |
alpha |
See |
beta |
See |
ntop |
See |
ndata |
The number of observations for which the Fisher
information is being determined. Ignored if |
zeta |
See |
x |
A numeric vector of observations appropriate for the model
under consideration. Ignored if |
m |
A numeric scalar, between 0 and 1, which may be interpreted
as the “success” probability. (See the help for
|
s |
Numeric scalar, greater than 0. The overdispersion parameter of
the distribution. (See the help for |
size |
Integer scalar specifying the upper limit of the “support”
of the betabinom distribution under consideration. The support
is the set of integers |
Details
This function differs from aHess()
in that its
arguments are prescribed “individually” rather than being
extracted from an "mleDb"
or "mleBb"
object.
This allows finfo()
to be applied to “true”
parameters (where these are known) rather than estimated ones.
Note that if distr
is "db"
, the number
of observations must be supplied explicitly, whereas for
aHess()
this number is extracted from the object
argument. If distr
is "betabinom"
then a vector
of actual observations must be supplied.
If distr
is "db"
then finfo()
in effect
calculates the expected information, since the information
matrix does not depend on the parameters. This is not the case
if distr
is "betabinom"
. If the parameters
supplied are the maximum likelihood estimates based on the
supplied vector of observations x
, then the value returned
by finfoBb()
is the observed Fisher information.
Value
A two-by-two positive definite (with any luck!) numeric matrix. Its inverse is an estimate of the covariance matrix of the parameter estimates.
Author(s)
Rolf Turner r.turner@auckland.ac.nz
See Also
link{aHess}()
link{nHess}()
link{mleDb}()
link{mleBb}()
Examples
print(finfo(alpha=0.6,beta=0.3,ntop=5,ndat=54))
X <- hmm.discnp::SydColDisc
X$y <- as.numeric(X$y)
X <- split(X,f=with(X,interaction(locn,depth)))
x <- X[[19]]$y
fit <- mleDb(x, ntop=5)
alpha <- fit["alpha"]
beta <- fit["beta"]
ntop <- attr(fit,"ntop")
zeta <- attr(fit,"zeta")
ndat <- ndata(fit)
print(finfo(alpha=alpha,beta=beta,ntop=ntop,ndat=ntop,zeta=zeta))
print(aHess(fit)) # Same
X <- hrsRcePred
top1e <- X[X$sbjType=="Expert","top1"]
fit <- mleBb(top1e,size=10)
print(finfo(distr="b",x=top1e,m=fit["m"],s=fit["s"],
size=10)) # Observed Fisher info.
print(aHess(fit,x=top1e)) # Same
Goodness of fit test for db and beta binomial distributions.
Description
Either a chi-squared or a Monte Carlo test of goodness of fit of a db distribution.
Usage
gof(object, obsd, ...)
## S3 method for class 'mleDb'
gof(object,obsd,...,test=TRUE,MC=FALSE,seed=NULL,
nsim=99,maxit=1000,verb=FALSE)
## S3 method for class 'mleBb'
gof(object,obsd,...,test=TRUE,MC=FALSE,seed=NULL,
nsim=99,maxit=1000,verb=FALSE)
Arguments
object |
An object of class |
obsd |
The data to which |
... |
Not used. |
test |
Logical scalar. Should a hypothesis test be carried out? If |
MC |
Logical scalar. Should a Monte Carlo test be used rather than a chi squared test? |
seed |
Integer scalar. The seed for the random number generator used
when |
nsim |
The number of simulated replicates on which the Monte Carlo test is
to be based. Ignored if |
maxit |
Integer scalar. The maximum number of iterations to be undertaken
by |
verb |
Logical scalar. Should rudimentary “progress reports” be
issued during the course of the simulations invoked by the Monte
Carlo test procedure? Ignored if |
Details
The function gof()
is a generic function with two methods,
gof.mleDb()
and gof.mleBb()
.
The test statistic is calculated as
\sum((O-E)^2/E)
where O
means “observed” and E
means “expected”.
If the mean of E
is less than 5 or if any of the entries of E
is less than 1, then the chi squared test is invalid and a warning to this
effect is issued. In this case the expected values are returned as an
attribute of the value returned by gof()
. The foregoing applies
of course only if a chi squared test (as opposed to a Monte Carlo test)
is being used.
The degrees of freedom for the chi squared test are length(E) - 3
.
The value 3 is equal to 2 (for the number of parameters estimated) plus
1 (for the costraint that the probabilities of the values sum to 1).
If it were actually true that, under the null hypothesis, the
observed test statistic and those calculated from simulated
data are exchangeable, the Monte Carlo test would
be exact. However the real data are distributed as
f(x,\theta)
whereas the simulated data
are distributed as f(x,\hat{\theta})
where \hat{\theta}
is the estimate of
\theta
based on the observed data. Consequently the
observed test statistic and simulated test statistics are
“not quite” exchangeable. Nevertheless it appears that
in practice the Monte Carlo test is very close to being exact.
The meaning of “exact” here is that if the null hypothesis
is true then, over the set of instances of collecting the data
and simulating the required replicates, the p
-value
is uniformly distributed on the set \{1/N, 2/N, \ldots,
(N-1)/N, 1\}
where N
is equal to nsim
.
Value
A list with components
stat |
The test statistic. |
pval |
The p-value of the test. |
degFree |
The degrees of freedom of the chi squared test. |
The last component is present only if a chi squared test (rather than a Monte Carlo test) is used.
If a chi squared test is used and turns out to be invalid, then
the returned value has an attribute "expVals"
, consisting
of the (problematic) expected values.
If a Monte Carlo test is used the returned value has an attribute
"seed"
which is equal to the seed
argument or to the
random value selected to replace it if the seed
argument was
not supplied.
Notes
The Monte Carlo p
-value is calculated as
(m+1)/(nsim+1)
where m
is the number of simulated
statistics which greater than or equal to the observed statistic
(computed from the “real” data.
The smallest that the Monte Carlo
p
-value can be is 1/(nsim + 1)
, e.g. 0.01 when
nsim
is 99. For “finer distinctions” you must use
larger values of nsim
, such as 999 or 9999.
The p
-value is random; if you repeat the test (with
the same data) you may well get a different p
-value.
Resist the temptation to repeat the test until you get a
p
-value that you like!!! This invalidates your inference!
Remark on the Examples
In the Examples, db and beta binomial distributions are
fitted to the Parsonnet scores from the cardiacsurgery
data set which comes from the spcadjust
package. It is not
completely clear what the value of ntop
(db distribution)
or size
(beta binomial distribution) should be. The data
are not actually counts, and in particular they are not counts
of successes out of a given number (“size
”) of trials.
In the event I chose to use the value 71, the maximium value of the
Parsonnet scores, for the value of both ntop
and size
.
This was the value chosen for use as size
by Wittenberg
(2021) when he fitted a beta binomial distribution to these data.
Author(s)
Rolf Turner r.turner@auckland.ac.nz
References
Philipp Wittenberg (2021). Modeling the patient mix for risk-adjusted CUSUM charts. To appear in Statistical Methods in Medical Research.
Axel Gandy and Jan Terje Kvaloy (2013). Guaranteed
conditional performance of control charts via bootstrap
methods. Scandinavian Journal of Statistics 40,
pp. 647–668. (Reference for spcadjust
package.)
See Also
mleDb()
Examples
X <- hmm.discnp::Downloads
f <- mleDb(X,15,TRUE)
tst1 <- gof(f,X) # Gives warning that the chi squared test is invalid.
tst2 <- gof(f,X,MC=TRUE,seed=42)
# The p-value is 0.03 so we reject the adequacy of the fit at the 0.05
# significance level. Note that the p-value that we get, when the
# random number generator seed is set equal to 42, is very similar in
# value to the p-value (0.0347) from the "invalid" chi squared test.
#
## Not run: # Takes too long.
if(requireNamespace("spcadjust")) {
data("cardiacsurgery", package = "spcadjust")
xxx <- cardiacsurgery$Parsonnet
fit1 <- mleDb(xxx,ntop=71,zeta=TRUE)
g1 <- gof(fit1,obsd=xxx,MC=TRUE,verb=TRUE,seed=42)
fit2 <- mleBb(xxx,size=71)
g2 <- gof(fit2,obsd=xxx,MC=TRUE,verb=TRUE,seed=17)
}
## End(Not run)
Horse race prediction data.
Description
Counts of correct predictions of the outcomes of 10 harness races made by “experts” and “non-experts”.
Usage
hrsRcePred
Format
A data frame with 30 observations on the following 4 variables.
sbjType
A character vector with entries
"NonXpert"
and"Expert"
, which classifies the “subjects” (the people making the predictions of the race outcomes).subject
An integer vector indexing the subjects. (Not of any real consequence.)
top1
An integer vector giving the counts of correct predictions of the winners of 10 harness races.
top3
An integer vector giving the counts of correct predictions of the top three horses (“win/place/show” in 10 harness races.
Details
In Ceci and Liker (1986) it is stated that subjects were classified as “experts” and “nonexperts” based on their ability to predict post-time odds on the basis of factual information about horses.
It appears that the counts in top1
and top3
pertain
to the same 10 races, but this is not completely clear.
Source
These data are taken from the paper cited in the first of the two given in the References below. They were provided by a generous email correspondent who prefers to remain anonymous.
References
Ceci, S. J. and Liker, J. K. (1986). A day at the races: A study of IQ, expertise, and cognitive complexity. Journal of Experimental Psychology, General 115, pp. 255 – 266.
Ceci, S. J. and Liker, J. K. (1988). Stalking the IQ-expertise relation: When the critics go fishing. Journal of Experimental Psychology, General 117, pp. 96 – 100.
Examples
X <- hrsRcePred
top1e <- X[X$sbjType=="Expert","top1"]
top1n <- X[X$sbjType=="NonXpert","top1"]
top3e <- X[X$sbjType=="Expert","top3"]
top3n <- X[X$sbjType=="NonXpert","top3"]
dbfit1e <- mleDb(top1e,ntop=10,zeta=TRUE)
dbfit1n <- mleDb(top1n,ntop=10,zeta=TRUE)
dbfit3e <- mleDb(top3e,ntop=10,zeta=TRUE)
dbfit3n <- mleDb(top3n,ntop=10,zeta=TRUE)
# Set seeds to get repeatable Monte Carlo p-values.
## Not run: # Takes too long.
print(gof(dbfit1e,obsd=top1e,MC=TRUE,maxit=5000,verb=TRUE,seed=49)$pval) # 0.02
print(gof(dbfit1n,obsd=top1n,MC=TRUE,verb=TRUE,seed=128)$pval) # 0.79
print(gof(dbfit3e,obsd=top3e,MC=TRUE,verb=TRUE,seed=303)$pval) # 0.35
print(gof(dbfit3n,obsd=top3n,MC=TRUE,maxit=3000,verb=TRUE,seed=24)$pval) # 0.40
## End(Not run)
bbfit1e <- mleBb(top1e,size=10)
bbfit1n <- mleBb(top1n,size=10)
bbfit3e <- mleBb(top3e,size=10)
bbfit3n <- mleBb(top3n,size=10)
# Set seeds to get repeatable Monte Carlo p-values.
## Not run: # Takes too long.
print(gof(bbfit1e,obsd=top1e,MC=TRUE,verb=TRUE,seed=792)$pval) # 0.11
print(gof(bbfit1n,obsd=top1n,MC=TRUE,verb=TRUE,seed=48)$pval) # 0.64
print(gof(bbfit3e,obsd=top3e,MC=TRUE,verb=TRUE,seed=969)$pval) # 0.62
print(gof(bbfit3n,obsd=top3n,MC=TRUE,verb=TRUE,seed=834)$pval) # 0.75
## End(Not run)
# Reality check: goodness of fit tests for the fit of just plain *binomial*
# distributions to these data sets yielded Monte Carlo p-values equal to
# 0.22, 0.17, 0.32 and 0.73 respectively. I.e. binomial fits appear to
# work just fine!
Plot the log likelihood surface for the data.
Description
Plot, as a perspective plot or a contour plot, the log likelihood surface for the data set from which parameters are being estimated.
Usage
llPlot(x, distr=c("db","betabinom"),ntop, zeta, size, alim = NULL, blim = NULL,
ngrid = c(100, 100), plotType = c("persp", "contour", "none"),
theta = -30, phi = 40, ...)
Arguments
x |
A vector of numeric data purportedly arising from a db or beta binomial distribution. |
distr |
Character string specifying which of the two relevant distributions (db, or beta binomial) is to be considered. |
ntop |
|
zeta |
|
size |
Integer scalar specifying the upper limit of the “support”
of the beta binomial distribution under consideration. The support
is the set of integers |
alim |
Numeric vector of length 2; the range of |
blim |
Numeric vector of length 2; the range of |
ngrid |
The dimensions of the grid of paramter values at
which the log likelihood is to be evaluated in order to plot
the surface. Note that |
plotType |
Character string specifying the nature of the plot to be
produced. If it is |
theta |
An argument to be passed to |
phi |
An argument to be passed to |
... |
Other arguments that may be passed to |
Details
This function could conceivably be useful in diagnosing problems with parameter estimation should these arise.
Value
A list with entries
x |
The vector of values of the first parameter ( |
y |
The vector of values of the second parameter ( |
z |
An |
dxy |
A data frame with columns named |
fxy |
A numeric vector of length |
There is obviously considerable redundancy in the returned value.
The names x
and y
that are used for the first two
entries of this list conform to the names of the
arguments of persp()
and contour
.
If plotType
is "persp"
or "contour"
the
value is returned invisibly.
Author(s)
Rolf Turner r.turner@auckland.ac.nz
See Also
link{mleDb}()
link{mleBb}()
link{persp}()
link{contour}()
Examples
X <- hmm.discnp::SydColDisc
X$y <- as.numeric(X$y)
X <- split(X,f=with(X,interaction(locn,depth)))
x <- X[[19]]$y
srf <- llPlot(x,ntop=5,zeta=FALSE,alim=c(0.5,0.7),blim=c(0.2,0.4),plotType="c")
## Not run:
if(require(rgl)) {
with(srf,plot3d(ab$alpha,ab$beta,fab)
# Allows dynamic rotation of the surface.
}
## End(Not run)
# Negative (!) parameters for the db distribution.
set.seed(42)
xs <- rdb(100,-1,-1,5)
fit <- mleDb(xs,5)
llPlot(xs,ntop=5,zeta=FALSE,alim=c(-4,2),blim=c(-4,2),plotType="c",
main="log likelihood contours")
points(fit[1],fit[2],pch=20,col="red")
points(-1,-1,pch=20,col="blue")
legend("topright",pch=20,col=c("red","blue"),
legend=c("estimate","true value"),bty="n")
Retrieve the (maximised) log likelihood from an "mleDb"
or
an "mleBb"
object.
Description
Extract the log likelihood attribute an object of class
"mleDb"
or "mleBb"
. I.e. obtain the maximum log
likelihood in respect of the estimation of the parameters of a
db or beta-binomial distribution.
Usage
## S3 method for class 'mleDb'
logLik(object, ...)
## S3 method for class 'mleBb'
logLik(object, ...)
Arguments
object |
An object of class |
... |
Not used. |
Value
An object of class "logLik"
, which consists of
a numeric scalar equal to the maximum log likelihood for the parameters
of a db or beta-binomial distribution. It has an attribute "df"
equal to 2.
Author(s)
Rolf Turner r.turner@auckland.ac.nz
See Also
Examples
X <- hmm.discnp::SydColDisc
X$y <- as.numeric(X$y)
X <- split(X,f=with(X,interaction(locn,depth)))
fitz <- lapply(X,function(x){mleDb(x$y,ntop=5)})
sapply(fitz,logLik)
X <- hrsRcePred
top1e <- X[X$sbjType=="Expert","top1"]
fit <- mleBb(top1e,10)
logLik(fit)
Create an object of class "Bbdpars"
.
Description
Create an object of class "Bbdpars"
which may be used
as an argument of the simulate()
function.
Usage
makeBbdpars(m, s, size, ndata)
Arguments
m |
Numeric scalar between 0 and 1. May be interpreted as a “success probability”. |
s |
Numeric scalar, greater than 0. The overdispersion parameter
of the beta binomial distribution. Note that if overdispersion
is defined to equal the ratio of the variance of the data to
the corresponding “binomial variance” (i.e.
the actual variance over |
size |
Integer scalar specifying the upper limit of the “support”
of the beta binomial distribution under consideration. The support
is the set of integers |
ndata |
Integer vector specifying the lengths of the data sets to
be simulated. If it is of length less than the |
Value
An object of class "Bbdpars"
which is a list with components
m
, s
, size
and ndata
. The entries
of this list are simply the corresponding function arguments.
Author(s)
Rolf Turner r.turner@auckland.ac.nz
See Also
Examples
obj1 <- makeBbdpars(m=0.35,s=0.3,size=20,ndata=500)
obj2 <- makeBbdpars(m=0.85,s=1.7,size=20,ndata=30*(1:10))
## Not run:
sdat1 <- simulate(obj1,nsim=100)
sdat2 <- simulate(obj2,nsim=100)
## End(Not run)
sdat3 <- simulate(obj2,nsim=10)
## Not run:
sdat4 <- simulate(obj2,nsim=100,ndata=100*(2:6)) # The ndata component of
# obj2 is ignored.
## End(Not run)
Create an object of class "Dbdpars"
.
Description
Create an object of class "Dbdpars"
which may be used
as an argument of the simulate()
function.
Usage
makeDbdpars(alpha, beta, ntop, zeta, ndata)
Arguments
alpha |
The first “shape” parameter of the db distribution. |
beta |
The second “shape” parameter of the db distribution. |
ntop |
Integer scalar, strictly greater than 1. The maximum possible value of the db distribution. |
zeta |
Logical scalar. Should zero origin indexing be used?
I.e. should the range of values of the distribution be taken to
be |
ndata |
Integer vector specifying the lengths of the data sets to
be simulated. If it is of length less than the |
Value
An object of class "Dbdpars"
which is a list with
components alpha
, beta
, ntop
, zeta
and ndata
. The entries of this list are simply the
corresponding function arguments.
Author(s)
Rolf Turner r.turner@auckland.ac.nz
See Also
Examples
obj1 <- makeDbdpars(alpha=2,beta=3,ntop=20,zeta=TRUE,ndata=500)
obj2 <- makeDbdpars(alpha=0.2,beta=0.25,ntop=20,zeta=FALSE,ndata=30*(1:10))
sdat1 <- simulate(obj1,nsim=100)
sdat2 <- simulate(obj2,nsim=100)
sdat3 <- simulate(obj2,nsim=10)
sdat4 <- simulate(obj2,nsim=100,ndata=100*(2:6)) # The ndata component of
# obj2 is ignored.
Monte Carlo estimation of a covariance matrix.
Description
Calculate an estimate of the covariance matrix for the parameter estimates of a db or beta binomial distribution via simulation.
Usage
mcCovMat(object, nsim = 100, seed=NULL, maxit=1000)
## S3 method for class 'mleDb'
mcCovMat(object, nsim = 100, seed=NULL, maxit=1000)
## S3 method for class 'mleBb'
mcCovMat(object, nsim = 100, seed=NULL, maxit=1000)
## S3 method for class 'Dbdpars'
mcCovMat(object, nsim = 100, seed=NULL, maxit=1000)
## S3 method for class 'Bbdpars'
mcCovMat(object, nsim = 100, seed=NULL, maxit=1000)
## Default S3 method:
mcCovMat(object, nsim = 100, seed=NULL, maxit=1000)
Arguments
object |
An object of class either |
nsim |
Integer scalar. The number of simulations to be used to produce the Monte Carlo estimate of the covariance matrix. |
seed |
Integer scalar. The seed for the random number generator. If not
specified it is randomly sampled from the sequence |
maxit |
Integer scalar. The maximum number of iterations to be undertaken
by |
Details
The procedure is to simulate nsim
data sets, all of
the same size. This will be the size of the data set to which
object
was fitted), in the case of the "mleDb"
and
"mleBb"
methods, and will be the value of the ndata
argument supplied to the “make
” function in the
case of the "Dbdpars"
and "Bbdpars"
methods. The
simulations are from models determined by the parameter value
contained in object
.
From each such simulated data, parameter estimates are obtained. The covariance matrix of these latter parameter estimates (adjusted for the fact that the true parameters are known in a simulation) is taken to be the required covariance matrix estimated.
The default method simply throws an error.
Value
A two-by-two positive definite (with any luck!) numeric matrix. It is an estimate of the covariance matrix of the parameter estimates.
It has an attribute "seed"
which is the seed that was used
for the random number generator. This is either the value of the
argument seed
or (if this argument was left NULL
) the
value that was randomly sampled from 1:1e5
.
Author(s)
Rolf Turner r.turner@auckland.ac.nz
See Also
link{aHess}()
link{nHess}()
link{vcov.mleDb}()
link{vcov.mleBb}()
Examples
X <- hmm.discnp::SydColDisc
X$y <- as.numeric(X$y)
X <- split(X,f=with(X,interaction(locn,depth)))
x <- X[[19]]$y
fit <- mleDb(x, ntop=5)
set.seed(42)
CM.m <- mcCovMat(fit,nsim=500) # Lots of simulations!
CM.a <- vcov(fit)
CM.n <- solve(nHess(fit,x))
cat("Monte Carlo:\n\n")
print(CM.m)
cat("Analytic:\n\n")
print(CM.a)
cat("Numeric:\n\n")
print(CM.n)
X <- hrsRcePred
top1e <- X[X$sbjType=="Expert","top1"]
fit <- mleBb(top1e,size=10)
CM.m <- mcCovMat(fit,nsim=500) # Lots of simulations!
CM.a <- vcov(fit)
CM.n <- solve(nHess(fit,top1e))
cat("Monte Carlo:\n\n")
print(CM.m)
cat("Analytic:\n\n")
print(CM.a)
cat("Numeric:\n\n")
print(CM.n)
Maximum likelihood estimation of the parameters of a beta binomial distribution.
Description
Calculates maximum likelihood estimates of the m
and
s
parameters of a beta binomial distribution. Calls upon
optim()
with the "L-BFGS-B"
method.
Usage
mleBb(x, size, par0, maxit=1000, covmat=TRUE, useGinv=FALSE)
Arguments
x |
Integer vector of counts to which a beta binomial distribution is to be fitted. Missing values are allowed. (These are discarded before the data are analysed.) |
size |
Integer scalar specifying the upper limit of the “support”
of the beta binomial distribution under consideration. The support
is the set of integers |
par0 |
Optional starting values for the iterative estimation procedure.
A vector with entries |
maxit |
Integer scalar. The maximum number of iterations to be undertaken
by |
covmat |
Logical scalar. Should the covariance matrix of the parameter
estimates be calculated? In simulation studies, in which the
covariance matrix is not of interest, calculations might be
speeded up a bit by setting |
useGinv |
Logical scalar. Should the |
Details
This function is provided so as to give a convenient means of comparing the fit of a beta binomial distribution with that of the discretised Beta (db) distribution which is the focus of this package.
Value
An object of class "mleBb"
which is a vector of length two.
Its first entry m
is the estimate of the (so-called) success
probability m
; its second entry s
is the estimate of the
overdispersion parameter s
. It has a number of attributes:
-
"size"
The value of thesize
argument. -
"log.like"
The (maximised) value of the log likelihood of the data. -
"covMat"
An estimate of the (2 \times 2
) covariance matrix of the parameter estimates. This is formed as the inverse of the hessian (of the negative log likelihood) calculated byaHess()
. -
ndata
The number of non-missing values in the data set for which the likelihood was maximised, i.e.sum(!is.na(x))
.
Author(s)
Rolf Turner r.turner@auckland.ac.nz
References
Bruce Swihart and Jim Lindsey (2020). rmutil: Utilities for Nonlinear Regression and Repeated Measurements Models. R package version 1.1.4. https://CRAN.R-project.org/package=rmutil
Wikipedia, https://en.wikipedia.org/wiki/Beta-binomial_distribution
See Also
mleDb()
optim()
aHess()
vcov.mleBb()
hrsRcePred
visRecog
Examples
if(require(hmm.discnp)) {
X <- hmm.discnp::Downloads
f <- mleBb(X,15)
}
set.seed(42)
X <- c(rbinom(20,10,0.3),rbinom(20,10,0.7))
f <- mleBb(X,10)
g <- mleDb(X,10,TRUE)
print(attr(f,"log.like"))
print(attr(g,"log.like")) # Not much difference.
dbfit5 <- with(visRecog,mleDb(tot5,20,TRUE))
print(vcov(dbfit5))
# See the help for data sets "hrsRcePred" and "visRecog" for
# other examples.
Maximum likelihood estimates of db parameters.
Description
Calculates maximum likelihood estimates of the alpha
and
beta
parameters of a db distribution. Calls upon
optim()
with the "BFGS"
method.
Usage
mleDb(x, ntop, zeta=FALSE, par0=NULL, UB=10, maxit=1000,
covmat=TRUE, useGinv=FALSE)
Arguments
x |
A random sample from the db distribution whose parameters are being estimated. Missing values are allowed. |
ntop |
The |
zeta |
See |
par0 |
Optional starting values for the iterative estimation procedure.
A vector with entries |
UB |
Positive numeric scalar, providing an upper bound on the starting
values used by |
maxit |
Integer scalar. The maximum number of iterations to be undertaken
by |
covmat |
Logical scalar. Should the covariance matrix of the parameter
estimates be calculated? In simulation studies, in which the
covariance matrix is not of interest, calculations might be
speeded up a bit by setting |
useGinv |
Logical scalar. Should the |
Details
The ntop
and zeta
parameters must be supplied; they
are not formally estimated (although the choice of ntop
may be influenced by the data — see below). The parameter
zeta
has a default value, FALSE
.
If the generating mechanism from which the observed data x
arose has a (known) theoretical least upper bound, then ntop
should probably be set equal to this upper bound. If the data
are theoretically unbounded, then ntop
should probably
be set equal to 1 + max(x)
. In this case \Pr(X =
\textrm{ntop})
should probably be interpreted
as \Pr(X \geq \textrm{ntop})
. Otherwise
ntop
should should probably be set equal to max(x)
.
The choice depends on circumstances and is up to the user.
Missing values are removed from x
before it is passed to
optim()
. (Note that ddb()
doesn't mind missing
values but returns missing values when evaluated at them.
This in turn produces a missing value for the log likelihood.)
In previous versions of this package (0.1-17 and earlier)
optim()
was called with method "L-BFGS-B"
.
The change was made possible by the fact that, with the new
“direct” version of ddb()
, it is no longer
necessary to bound the parameters away from (above) zero.
Value
An object of class "mleDb"
. Such an object consists of a
named vector with entries "alpha"
and "beta"
, which
are the estimates of the corresponding parameters. It has a
number of attributes:
-
"ntop"
The value of thentop
argument. -
"zeta"
The value of thezeta
argument. -
"log.like"
The (maximised) value of the log likelihood of the data. -
"covMat"
An estimate of the (2 \times 2
) covariance matrix of the parameter estimates. This is formed as the inverse of the hessian (of the negative log likelihood) calculated byaHess()
. -
ndata
The number of non-missing values in the data set for which the likelihood was maximised, i.e.sum(!is.na(x))
.
Author(s)
Rolf Turner r.turner@auckland.ac.nz
See Also
ddb
meDb()
optim()
aHess()
vcov.mleDb()
Examples
set.seed(42)
x <- rdb(500,3,5,2)
ests <- mleDb(x,2) # Bad! Mind you, 2 is a "bad" value for ntop!
# Hessian is singular; covMat is NA.
# Get much better results using true parameter values
# as starting values; pity we can't do this in real life!
ests <- mleDb(x,2,par0=c(alpha=3,beta=5))
x <- rdb(500,3,5,20)
ests <- mleDb(x,20) # Pretty good.
print(vcov(ests))
# Binomial, n = 10, p = 0.3.
set.seed(42)
x <- rbinom(1000,10,0.3)
fit <- mleDb(x,10,zeta=TRUE)
print(vcov(fit))
p1 <- dbinom(0:10,10,0.3)
p2 <- dbinom(0:10,10,mean(x)/10)
p3 <- table(factor(x,levels=0:10))/1000
plot(fit,obsd=x,legPos=NULL,ylim=c(0,max(p1,p2,p3,
ddb(0:10,fit[1],fit[2],10,zeta=TRUE))))
lines(0.2+(0:10),p1,col="orange",type="h",ylim=c(0,max(p1,p2)))
lines(0.3+(0:10),p2,col="green",type="h")
legend("topright",lty=1,col=c("red","blue","orange","green"),
legend=c("db","observed","true binomial","fitted binomial"),bty="n")
print(attr(fit,"log.like")) # -1778.36
print(sum(dbinom(x,10,mean(x)/10,log=TRUE))) # -1777.36
# Slightly better fit with only one estimated parameter,
# but then binomial is the true distribution, so you'd
# kind of expect a better fit.
print(sum(dbinom(x,10,0.3,log=TRUE))) # -1778.37
# Poisson mean = 5
set.seed(42)
x <- rpois(1000,5)
fit <- mleDb(x,14,zeta=TRUE) # max(x) = 13, take ntop = 1+13
print(vcov(fit))
p1 <- c(dpois(0:13,5),1-ppois(13,5))
lhat <- mean(x)
p2 <- c(dpois(0:13,lhat),1-ppois(13,lhat))
plot(fit,obsd=x,legPos=NULL,ylim=c(0,max(p1,p2,p3,
ddb(0:14,fit[1],fit[2],14,zeta=TRUE))))
lines(0.2+0:14,p1,col="orange",type="h")
lines(0.3+(0:14),p2,col="green",type="h")
legend("topright",lty=1,col=c("red","blue","orange","green"),
legend=c("db","observed","true Poisson","fitted Poisson"),bty="n")
print(attr(fit,"log.like")) # -2198.594
print(sum(dpois(x,lhat,log=TRUE))) # -2197.345
# Slightly better fit with only one estimated parameter,
# but then Poisson is the true distribution, so you'd
# kind of expect a better fit.
print(sum(dpois(x,5,log=TRUE))) # -2198.089
Numerical hessian calculation.
Description
Calculate an approximation to the hessian of the negative log likelihood of a db or beta binomial distribution via a numerical (finite differencing based) procedure as effected by optimHess().
Usage
nHess(object, x, silent=TRUE)
Arguments
object |
An object of class |
x |
Numeric vector of non-negative integer data, presumably the
data set on the basis of which |
silent |
Logical scalar. If the call to |
Details
It is up to the user to make sure that (when object
is of class "mleBb"
) object
and x
are
“mutually compatible”, i.e. are appropriately paired up.
Note that this function calculates the hessian of the
negative log likelihood of the distribution in question,
as minimised by optim()
. Hence its inverse
is an estimate of the covariance matrix of the parameter estimates.
(Do not take the negative of this hessian before inverting
it to get the desired covariance matrix!)
This function is mainly present to investigate possible differences
between the numerical approximation to the hessian, which is what
optim()
uses in its maximisation procedure, and the analytic
form of the hessian.
Value
A two-by-two positive definite (with any luck!) numeric matrix. Its inverse is an estimate of the covariance matrix of the parameter estimates.
Author(s)
Rolf Turner r.turner@auckland.ac.nz
See Also
aHess()
mleDb()
mleBb()
optim()
optimHess()
Examples
X <- hmm.discnp::SydColDisc
X$y <- as.numeric(X$y)
X <- split(X,f=with(X,interaction(locn,depth)))
x <- X[[19]]$y
fit <- mleDb(x, ntop=5)
H <- nHess(fit,x)
print(solve(H)) # Compare with ...
print(vcov(fit))
Retrieve the "ndata"
attribute of an "mleDb"
object.
Description
Retrieve the number of (non-missing) values in the data set
to which an "mleDb"
object was fitted.
Usage
ndata(object)
Arguments
object |
An object of class |
Value
Integer scalar equal to the number of (non-missing) values in
the data set to which object
was fitted.
Author(s)
Rolf Turner r.turner@auckland.ac.nz
See Also
mleDb()
Examples
X <- hmm.discnp::SydColDisc
X$y <- as.numeric(X$y)
X <- split(X,f=with(X,interaction(locn,depth)))
fitz <- lapply(X,function(x){mleDb(x$y,ntop=5)})
sapply(fitz,ndata)
Plot a maxium likelihood fit to data from a beta binomial distribution.
Description
Creates a plot of type "h"
of the probabilities of each
possible x
value of a beta binomial distribution where
the probabilities are calculated on the basis of parameters
estimated by the function mleBb()
. If obsd
is supplied it also superimposes/juxtaposes vertical lines
representing the observed proportions.
Usage
## S3 method for class 'mleBb'
plot(x, ..., plot = TRUE, col.fit = "red", col.obsd = "blue",
tikx = NULL, xlim=NULL, ylim=NULL, xlab = NULL,
ylab = NULL, obsd = NULL, incr = NULL,
main = "", legPos = "topright")
Arguments
x |
An object of class |
... |
Not used. |
plot |
Logical scalar; should a plot be produced (or should the function simply return a data frame consisting of the relevant values)? |
col.fit |
The colour for the (vertical) lines corresponding to the “fitted” probabilities, i.e. the probabilities calculated from the fitted parameters. |
col.obsd |
The colour for the (vertical) lines corresponding to the
“observed” probabilities (proportions), i.e. the
probabilities calculated by tabulating the data (from which the
parameters were estimated. Ignored if |
tikx |
(Optional) vector of locations of the tick marks on the |
xlim |
A numeric vector of length 2 specifying the limits
of the |
ylim |
A numeric vector of length 2 specifying the limits
of the |
xlab |
A label for the |
ylab |
A label for the |
obsd |
The data set from which the parameters were estimated, i.e. from
which |
incr |
Numeric scalar; defaults to 0.1 if |
main |
A main title for the plot; defaults to the empty string. |
legPos |
A list with components |
Value
A data frame with numeric columns x
, p
and
possibly po
. The x
column consists of the integers
from 0 to size
. The p
column consists of the
appropriate probabilities of the x
values, calculated by
dbetabinom()
from the rmutil
package. The po
column is present only if obsd
is supplied and consists
of the observed proportions. The value is returned invisibly.
A plot is produced as a side-effect if plot
is TRUE
.
Note
This function calls plotBb()
to do the heavy lifting.
Warning
It is up to the user to make sure that the obsd
argument,
if specified, is indeed the data set from which the object x
was calculated.
Author(s)
Rolf Turner r.turner@auckland.ac.nz
See Also
Examples
if(require(hmm.discnp)) {
xxx <- hmm.discnp::Downloads
fit <- mleBb(xxx,size=14)
plot(fit)
plot(fit,obsd=xxx)
plot(fit,obsd=xxx,legPos=list(x=3,y=0.25))
plot(fit,obsd=xxx,legPos=NULL) # No legend is plotted.
}
set.seed(42)
yyy <- rbinom(300,10,0.7)
fit <- mleBb(yyy,size=10)
plot(fit,obsd=yyy,legPos="topleft")
Plot a maxium likelihood fit to data from a db distribution.
Description
Creates a plot of type "h"
of the probabilities of
each possible x
value of a db distribution where
the probabilities are calculated on the basis of parameters
estimated by the function mleDb()
. If obsd
is supplied it also superimposes/juxtaposes vertical lines
representing the observed proportions.
Usage
## S3 method for class 'mleDb'
plot(x, ..., plot = TRUE, col.fit = "red", col.obsd = "blue",
tikx=NULL, xlim=NULL, ylim=NULL, xlab = NULL, ylab = NULL,
obsd = NULL, incr = NULL, main = "", legPos = "topright")
Arguments
x |
An object of class |
... |
Not used. |
plot |
Logical scalar; should a plot be produced (or should the function simply return a data frame consisting of the relevant values)? |
col.fit |
The colour for the (vertical) lines corresponding to the “fitted” probabilities, i.e. the probabilities calculated from the fitted parameters. |
col.obsd |
The colour for the (vertical) lines corresponding to the
“observed” probabilities (proportions), i.e. the
probabilities calculated by tabulating the data (from which the
parameters were estimated. Ignored if |
tikx |
(Optional) vector of locations of the tick marks on the |
xlim |
A numeric vector of length 2 specifying the limits
of the |
ylim |
A numeric vector of length 2 specifying the limits
of the |
xlab |
A label for the |
ylab |
A label for the |
obsd |
The data set from which the parameters were estimated, i.e. from
which |
incr |
Numeric scalar; defaults to 0.1 if |
main |
A main title for the plot; defaults to the empty string. |
legPos |
A list with components |
Value
A data frame with numeric columns x
, p
and possibly
po
. The x
column consists of the integers from
0 to ntop
or from 1 to ntop
depending on whether
zeta
is TRUE
. The p
column consists of
the appropriate probabilities of the x
values, calculated
by link{ddb}()
. The po
column is present only if
obsd
is supplied and consists of the observed proportions.
The value is returned invisibly. A plot is produced as a
side-effect if plot
is TRUE
.
Note
This function calls plotDb()
to do the heavy lifting.
Warning
It is up to the user to make sure that the obsd
argument,
if specified, is indeed the data set from which the object x
was calculated.
Author(s)
Rolf Turner r.turner@auckland.ac.nz
See Also
Examples
if(require(hmm.discnp)) {
xxx <- hmm.discnp::Downloads
fit <- mleDb(xxx,ntop=14,z=TRUE)
plot(fit)
plot(fit,obsd=xxx)
plot(fit,obsd=xxx,legPos=list(x=3,y=0.25))
plot(fit,obsd=xxx,legPos=NULL) # No legend is plotted.
}
set.seed(42)
yyy <- rbinom(300,10,0.7)
fit <- mleDb(yyy,ntop=10,z=TRUE)
plot(fit,obsd=yyy,legPos="topleft")
Plot a beta binomial distribution.
Description
Plots the probabilities of a specified beta binomial distributon.
Usage
plotBb(m, s, size, ..., plot = TRUE, tikx = NULL, xlim = NULL,
ylim = NULL, xlab = NULL, ylab = NULL, main = "")
Arguments
m |
Numeric scalar between 0 and 1. May be interpreted as the “success probability”. |
s |
Numeric scalar, greater than 0. The overdispersion parameter of the distribution. |
size |
Integer scalar specifying the upper limit of the “support”
of the beta binomial distribution under consideration. The support
is the set of integers |
... |
Extra arguments that are passed to the |
plot |
Logical scalar; should a plot be produced (or should the function simply return a data frame consisting of the relevant values)? |
tikx |
(Optional) vector of locations of the tick marks on the |
xlim |
The |
ylim |
The |
xlab |
A label for the |
ylab |
A label for the |
main |
An overall title for the plot. (See |
Value
A data frame with numeric columns x
and p
. The
x
column consists of the integers from 0 to size
.
The p
column consists of the appropriate probabilities of
the x
values, calculated by dbetabinom()
from the
rmutil
package. The value is returned invisibly. A plot
is produced as a side-effect if plot
is TRUE
.
Author(s)
Rolf Turner r.turner@auckland.ac.nz
See Also
plot.mleBb()
plotDb()
plot.mleDb()
Examples
plotBb(0.7,3,14,main="An exempular plot")
plotBb(0.3,3,14,col="red",xlab="count",main="A communist plot")
plotBb(0.1,10,14,col="blue",main="A royal plot")
plotBb(0.5,20,14,col="green",main="An ecological plot")
plotBb(0.5,20,14,xlim=c(0,15))
plotBb(0.5,20,14,xlim=c(0,15),tikx=3*(0:5))
Plot a db distribution.
Description
Plots the probabilities of a specified db distributon.
Usage
plotDb(alpha, beta, ntop, zeta, ..., plot = TRUE, tikx = NULL, xlim = NULL,
ylim = NULL, xlab = NULL, ylab = NULL, main = "")
Arguments
alpha |
See |
beta |
See |
ntop |
See |
zeta |
See |
... |
Extra arguments that are passed to the |
plot |
Logical scalar; should a plot be produced (or should the function simply return a data frame consisting of the relevant values)? |
tikx |
(Optional) vector of locations of the tick marks on the |
xlim |
The |
ylim |
The |
xlab |
A label for the |
ylab |
A label for the |
main |
An overall title for the plot. (See |
Value
A data frame with numeric columns x
and p
. The
x
column consists of the integers from 0 to ntop
or from 1 to ntop
depending on whether zeta
is TRUE
. The p
column consists of the
appropriate probabilities of the x
values, calculated by
ddb()
. The value is returned invisibly. A plot
is produced as a side-effect if plot
is TRUE
.
Author(s)
Rolf Turner r.turner@auckland.ac.nz
See Also
Examples
plotDb(2,3,14,FALSE,main="An exempular plot")
plotDb(2,3,14,TRUE,col="red",xlab="count",main="A communist plot")
plotDb(0.1,3,14,TRUE,col="blue",main="A royal plot")
plotDb(0.1,0.3,14,TRUE,col="green",main="An ecological plot")
plotDb(2,3,14,FALSE,xlim=c(0,15))
plotDb(2,3,14,FALSE,xlim=c(0,15),tikx=3*(0:5))
par(mfrow=c(2,1))
plotDb(2,2,5,FALSE,main=bquote(paste(alpha == 2,", ",beta == 2)),col="red")
plotDb(-2,-2,5,FALSE,main=bquote(paste(alpha == -2,", ",beta == -2)),col="blue")
Simulate data from a db or beta binomial distribution.
Description
Simulate one or more data sets from a db or beta binomial
distribution. The parameters of the distribution may be
equal to those obtained from fitting the distibution to data,
using mleDb()
or mleBb()
. They may also be
specified by the user via the function makeDbdpars()
or makeBbdpars()
.
Usage
## S3 method for class 'mleDb'
simulate(object, nsim = 1, seed = NULL, ...,
ndata = NULL, drop = TRUE)
## S3 method for class 'mleBb'
simulate(object, nsim = 1, seed = NULL, ...,
ndata = NULL, drop = TRUE)
## S3 method for class 'Dbdpars'
simulate(object, nsim = 1, seed = NULL, ...,
ndata = NULL, drop = TRUE)
## S3 method for class 'Dbdpars'
simulate(object, nsim = 1, seed = NULL, ...,
ndata = NULL, drop = TRUE)
Arguments
object |
An object of class |
nsim |
The number of data sets to simulate. |
seed |
Integer vector of seeds for random number generation. It should
be of length either 1 or |
... |
Not used. |
ndata |
Integer vector specifying the lengths of the data sets to
be simulated. If it is of length less than |
drop |
Logical scalar; if |
Details
The actual simulation is done by rdb()
or by the
rbetabinom()
function from the rmutil
package.
Value
A list, of length nsim
, whose entries are integer
vectors, the length of of the i
th entry being equal to
ndata[i]
. Each entry has an attribute "seed"
which
is the random number generation seed that was used in the generation
of the data in that entry. If nsim==1
and if drop
is
TRUE
, then the value is simply an integer vector
(of length ndata[1]
).
See Also
Examples
X <- hmm.discnp::Downloads
fit <- mleDb(X,ntop=15,zeta=TRUE)
s1 <- simulate(fit)
s2 <- simulate(fit,nsim=5) # All data sets of length 267.
s3 <- simulate(fit,nsim=5,ndata=100*(2:6))
obj <- makeDbdpars(alpha=2,beta=3,ntop=20,zeta=TRUE,ndata=500)
s4 <- simulate(obj,nsim=5,seed=1:5)
Variance of a beta binomial distribution.
Description
Calculate the variance of a random variable having a beta binomial distribution.
Usage
varBb(mo,...)
## S3 method for class 'mleBb'
varBb(mo,...)
## Default S3 method:
varBb(mo, s, size, ...)
Arguments
mo |
For the |
s |
Numeric scalar, greater than 0. The overdispersion parameter
of the distribution. (See the help for |
size |
Integer scalar specifying the upper limit of the “support”
of the beta binomial distribution under consideration. The support
is the set of integers |
... |
Not used. |
Details
For the "mleBb"
method, the single argument should really
be called (something like) “object
” and for the
default method the first argument should be called m
.
However the argument lists must satisfy the restrictions that
“A method must have all the arguments of the generic,
including ... if the generic does.” and “A method
must have arguments in exactly the same order as the generic.”
For the "mleBb"
method, the values of m
and s
are obtained from mo
, and size
is extracted from
the attributes of mo
.
The variance of a beta binomial distribution is readily calculable “by hand”. These functions are provided for convenience and to preserve parallelism with the db distribution.
Value
Numeric scalar equal to the variance of a beta binomial distributed random variable with the given parameters.
Author(s)
Rolf Turner r.turner@auckland.ac.nz
See Also
Examples
varBb(0.7,0.1,15)
varBb(0.7,400,15)
X <- hmm.discnp::Downloads
fit <- mleBb(X,size=15)
varBb(fit)
Variance of a db distribution.
Description
Calculate the variance of a random variable having a db distribution.
Usage
varDb(ao,...)
## S3 method for class 'mleDb'
varDb(ao,...)
## Default S3 method:
varDb(ao, beta, ntop, zeta=FALSE,...)
Arguments
ao |
For the |
beta |
See |
ntop |
See |
zeta |
See |
... |
Not used. |
Details
For the "mleDb"
method, the single argument should really
be called (something like) “object
” and for the
default method the first argument should be called alpha
.
However the argument lists must satisfy the restrictions that
“A method must have all the arguments of the generic,
including ... if the generic does.” and “A method
must have arguments in exactly the same order as the generic.”
For the "mleDb"
method, the values of alpha
and
beta
are obtained from ao
, and ntop
, and
zeta
are extracted from the attributes of ao
.
The variance of a db distribution is theoretically intractable but is readily calculable numerically as
\sum (x - \mu)^2
\times \Pr(X=x)
, where \mu
is the expected value of the given distribution.
Value
Numeric scalar equal to the variance of a db distributed random variable with the given parameters.
Author(s)
Rolf Turner r.turner@auckland.ac.nz
See Also
Examples
varDb(3,4,15)
varDb(3,4,15,TRUE)
X <- hmm.discnp::Downloads
fit <- mleDb(X,ntop=15,zeta=TRUE)
varDb(fit)
Retrieve the covariance matrix from an "mleBb"
object.
Description
Extract the covariance matrix attribute an object of class
"mleBb"
. I.e. obtain the estimated covariance matrix
of the maximum likelihood estimates of the parameters of a
beta binomial distribution.
Usage
## S3 method for class 'mleBb'
vcov(object, ...)
Arguments
object |
An object of class |
... |
Not used. |
Details
The estimated covariance matrix is the inverse of the hessian of the negative log likelihood. (This may also be referred to as the observed Fisher information — the Fisher information evaluated at the maximum likelihood estimates of the parameters).
Value
A two-by-two positive definite (with any luck!) numeric matrix. It is an estimate of the covariance matrix of the parameter estimates.
Author(s)
Rolf Turner r.turner@auckland.ac.nz
See Also
vcov.mleDb)
mleBb()
Examples
X <- hrsRcePred
top1e <- X[X$sbjType=="Expert","top1"]
fit <- mleBb(top1e,size=10)
vcov(fit)
Retrieve the covariance matrix from an "mleDb"
object.
Description
Extract the covariance matrix attribute an object of class
"mleDb"
. I.e. obtain the estimated covariance matrix
of the maximum likelihood estimates of the parameters of a
db distribution.
Usage
## S3 method for class 'mleDb'
vcov(object, ...)
Arguments
object |
An object of class |
... |
Not used. |
Details
The estimated covariance matrix is the inverse of the hessian of the negative log likelihood. (This may also be referred to as the observed Fisher information — the Fisher information evaluated at the maximum likelihood estimates of the parameters).
Value
A two-by-two positive definite (with any luck!) numeric matrix. It is an estimate of the covariance matrix of the parameter estimates.
Author(s)
Rolf Turner r.turner@auckland.ac.nz
See Also
vcov.mleBb)
mleDb()
Examples
X <- hmm.discnp::SydColDisc
X$y <- as.numeric(X$y)
X <- split(X,f=with(X,interaction(locn,depth)))
fitz <- lapply(X,function(x){mleDb(x$y,ntop=5)})
lapply(fitz,vcov)
Visual recognition data.
Description
Counts of successes in visual recognition memory for large and small binary pictures.
Usage
data("visRecog")
Format
A data frame with 30 observations on the following 4 variables.
deck
An integer vector indicating which of two decks of cards, bearing graphic images, was used in the given experiment.
subject
An integer vector indexing the (human) subjects in the experiments.
tot5
An integer vector whose entries are counts of successes when the cards used consist of a
5 \times 5
grid of “facets”.tot10
An integer vector whose entries are counts of successes when the cards used consist of a
10 \times 10
grid of “facets”.
Details
Adult subjects were shown a series of cards, each bearing a
simple graphic image. Each image resembled one face of a Rubik's
cube, formed of either a 5x5 or a 10x10 grid of facets, each facet
being either black or white. Later, each subject was shown a
series of 20 similar cards, exactly 10 of which had been shown to
the subject previously. The subject's task was to identify each
image as a new one, or as a previously seen one. The response
variable tot5
is the number of correct identifications,
out of 20, for the 5 \times 5
cards. Similarly the
variable tot10
is the number of correct identifications for
the 10 \times 10
cards.
Subjects 21–30 were (deliberately) tested with a different set of cards than subjects 1–20, to ensure that results were not a function of the original deck of cards. (This seems to have no actual relevance.)
Source
The data are taken from the paper sited in References below. They were provided by a generous email correspondent who prefers to remain anonymous.
References
Green, D. M., and Purohit, A. K. (1976). Visual recognition memory for large and small binary pictures. Journal of Experimental Psycholology: Human Learning and Memory 2, pp. 32–37.
Examples
dbfit5 <- with(visRecog,mleDb(tot5,20,TRUE))
dbfit10 <- with(visRecog,mleDb(tot10,20,TRUE))
set.seed(42) # To get repeatable Monte Carlo p-values.
print(gof(dbfit5,obsd=visRecog[["tot5"]],MC=TRUE)$pval) # 0.86
print(gof(dbfit10,obsd=visRecog[["tot10"]],MC=TRUE)$pval) # 0.68
bbfit5 <- with(visRecog,mleBb(tot5,20))
bbfit10 <- with(visRecog,mleBb(tot10,20))
set.seed(42) # To get repeatable Monte Carlo p-values.
print(gof(bbfit5,obsd=visRecog[["tot5"]],MC=TRUE)$pval) # 0.94
print(gof(bbfit10,obsd=visRecog[["tot10"]],MC=TRUE)$pval) # 0.70