Title: Miscellaneous Functions for Bioinformatics and Bayesian Statistics
Version: 1.0.0
Description: A hodgepodge of hopefully helpful functions. Two of these perform shrinkage estimation: one using a simple weighted method where the user can specify the degree of shrinkage required, and one using James-Stein shrinkage estimation for the case of unequal variances.
Depends: R (≥ 3.2.0)
Suggests: ggplot2, RISmed, testthat
License: GPL-3
Encoding: UTF-8
LazyData: true
RoxygenNote: 5.0.1
NeedsCompilation: no
Packaged: 2016-05-24 13:50:20 UTC; amckenz
Author: Andrew McKenzie [aut, cre]
Maintainer: Andrew McKenzie <amckenz@gmail.com>
Repository: CRAN
Date/Publication: 2016-05-24 16:32:42

bayesbio: Miscellaneous functions useful in bioinformatics and Bayesian statistics

Description

A hodgepodge of hopefully helpful functions. Two of these perform shrinkage estimation: one using a simple weighted method where the user can specify the degree of shrinkage required, and one using James-Stein shrinkage estimation for the case of unequal variances.


Likelihood function of the James-Stein shrinkage factor.

Description

To be used in MLE computation of the James-Stein shrinkage factor.

Usage

a_hat_mle(stat, vars, a_hat)

Arguments

stat

Input statistics to be shrinkage estimated.

vars

Corresponding variances of equal length.

a_hat

Shrinkage intensity to be estimated.

Value

The likelihood of the function given the parameters.

References

http://projecteuclid.org/euclid.ss/1331729986


Identify all duplicates values in a vector.

Description

By default the base R function duplicated only identifies the duplicated values after the first in a vector as TRUE. This function identifies all of the duplicates as true.

Usage

allDups(x)

Arguments

x

The input vector.

Value

A logical vector.


cbind while converting missing entries to NA.

Description

cbind usually malfunctions on vector of unequal lengths; this function allows vectors of unequal length to be combined, while filling the missing entries with NAs.

Usage

cbindFill(...)

Arguments

...

A set of vectors separated by commas.

Value

A matrix that combines the inputted vectors.

References

http://r.789695.n4.nabble.com/How-to-join-matrices-of-different-row-length-from-a-list-td3177212.html; http://stackoverflow.com/a/7962286/560791


Creates random, unique character strings.

Description

Makes them unique by randomly choosing the character strings; and, in case it is necessary, adding numbers to the end using make.unique.

Usage

createStrings(number, length, upper = FALSE)

Arguments

number

Specifies the number of character strings that should be created.

length

Specifies the length of each character string in letters.

upper

Binary parameter specifying whether the character strings should be uppercase. Default = FALSE, so the character strings are all lowercase.

References

http://stackoverflow.com/a/1439541/560791


Create a color-labeled horizontal bar plot in ggplot2.

Description

This function takes a data frame and creates a horizontal (by default) bar plot from it while ordering the values.

Usage

ggHorizBar(data_df, dataCol, namesCol, labelsCol, decreasing = TRUE)

Arguments

data_df

Data frame with columns to specify the data values, the row names, and the fill colors of each of the bars.

dataCol

The column name that specifies the values to be plotted.

namesCol

The column name that specifies the corresponding names for each of the bar plots to be plotted.

labelsCol

The column name that specifies the groups of the labels.

decreasing

Logical specifying whether the values in dataCol should be in decreasing order.

Value

A ggplot2 object, which can be plotted via the plot() function or saved via the ggsave() function.


Jaccard index of two character vectors.

Description

This function compares the elements in two character vectors to find the Jaccard index, i.e. the number of intersections divided by the total number of elements in both sets.

Usage

jaccardSets(set1, set2)

Arguments

set1

Character vector.

set2

Character vector.

Value

A number (one-element numeric vector) specifying the Jaccard index from comparing the two sets.

References

https://en.wikipedia.org/wiki/Jaccard_index


Multiple pattern gsub.

Description

An extension to gsub that handles vectors of patterns and replacements, avoiding recursion problems associated with overlap at the extense of computation time.

Usage

mgsub(pattern, replacement, x, ...)

Arguments

pattern

Character vector of patterns to match.

replacement

Character vector of replacements for each pattern.

x

Character vector in which the gsub should be performed.

...

Additional arguments to grep.

References

http://stackoverflow.com/a/15254254/560791


Merge data frames based on the nearest datetime differences.

Description

Takes two data frames each with time/date columns in date-time or date format (i.e., able to be compared using the function difftime), finds the rows of df2 that minimize the absolute value of the datetime for each of the rows in df1, and merges the corresponding rows of df2 into df1 for downstream processing.

Usage

nearestTime(df1, df2, timeCol1, timeCol2)

Arguments

df1

Data frame containing the dates for which the differences between the other data frame's date column should be minimized for each row.

df2

Data frame containing the dates which should be compared to, as well as other values that should be merged to df1 per minimized date time.

timeCol1

Character vector specifying the date/time column in df1.

timeCol2

Character vector specifying the date/time column in df2.

Value

A merged data frame that minimizes datetime differences.


Merge data frames based on the nearest datetime differences and an ID column. Also removes duplicate column names from the result.

Description

Takes two data frames each with time/date columns in date-time or date format (i.e., able to be compared using the function difftime), finds the rows of df2 that minimize the absolute value of the datetime for each of the rows in df1, and merges the corresponding rows of df2 into df1 for downstream processing.

Usage

nearestTimeandID(df1, df2, timeCol1, timeCol2, IDcol)

Arguments

df1

Data frame containing the dates for which the differences between the other data frame's date column should be minimized for each row.

df2

Data frame containing the dates which should be compared to, as well as other values that should be merged to df1 per minimized date time.

timeCol1

Character vector specifying the date/time column in df1.

timeCol2

Character vector specifying the date/time column in df2.

IDcol

Must be unique by row in df1. Multiple versions are allowed (and expected at least in some rows, as that is the point of the function) in df2.

Value

A merged data frame that minimizes datetime differences.


Adjust p-values where n is less than p.

Description

This function recapitulates p.adjust but allows the number of hypothesis tests n to be less than the number of p-values p. Statistical properties of the p-value adjustments may not hold.

Usage

p.adjust.nlp(p, method = p.adjust.methods, n = length(p))

Arguments

p

Numeric vector of p-values.

method

Correction method.

n

Number of comparisons to be made.

References

http://stackoverflow.com/a/30110186/560791


Perform PubMed queries on 2x2 combinations of term vectors.

Description

Perform PubMed queries on the intersections of two character vectors. This function is a wrapper to RISmed::EUtilsSummary with type = 'esearch', db = 'pubmed'.

Usage

pubmedQuery(rowTerms, colTerms, sleepTime = 0.01)

Arguments

rowTerms

Character vector of terms that should make up the rows of the resulting mention count data frame.

colTerms

Character vector of terms for the columns.

sleepTime

How much time (in seconds) to sleep between successive PubMed queries. If you set this too low, PubMed may shut down your connection to prevent overloading their servers.

Value

A data frame of the number of mentions for each combination of terms.


Add values to the super- and sub-diagonals of a matrix.

Description

Takes a matrix and adds values to the values that are one above the diagonal (ie the superdiagonal) and the values that are one below the diagonal (ie the subdiagonal).

Usage

subsupDiag(matrix, x)

Arguments

matrix

Matrix whose super- and sub-diagonals values should be replaced.

x

Numeric vector used to replace values in the matrix. If the inputted vector is not of the same length as both the super- and sub-diagonals of the matrix, then short vector recycling will occur (e.g., x can be one value to replace all of the super- and sub-diagonals of the matrix with that one value).

Value

The original matrix with the values added.

References

http://stackoverflow.com/a/9885186/560791


Perform James-Stein shrinkage estimation using unequal variances

Description

Traditional JS shrinkage estimation assumes equal variances for each of the data points, while this algorithm extends JS shrinkage estimation to entries with different variances.

Usage

unequalVarShrink(stat, vars, verbose = TRUE)

Arguments

stat

Input statistics to be shrinkage estimated.

vars

Corresponding variances of equal length.

verbose

Whether information about the algorithm should be reported.

Value

A data frame containing the shrinkage estimated statistics.

References

http://projecteuclid.org/euclid.ss/1331729986


Weighted shrinkage estimation.

Description

Shrink values towards the mean (in the sample or the overall cohort) to an inverse degree to the confidence you assign to that observation.

Usage

weightedShrink(x, n, m = NULL, meanVal = NULL)

Arguments

x

Numeric vector of values to be shrunken towards the mean.

n

Numeric vector with corresponding entries to x, specifying the number of observations used to calculate x, or some other confidence weight to associate with x.

m

Number specifying weight of the shrinkage estimation, relative to the number of observations in the input vector n. Defaults to the minimum of n, but this is an arbitrary value and should be explored to find an optimal value for your use case.

meanVal

Number specifying the overall mean towards which the values should be shrunken. Defaults to NULL, in which case it is calculated as the (non-weighted) arithmetic mean of the values in the inputted vector x.

Value

A numeric vector with shrunken data values.

References

http://math.stackexchange.com/a/41513