Type: | Package |
Title: | Estimating Frequencies, Prevalence and Multiplicity of Infection |
Version: | 0.1.2 |
Maintainer: | Meraj Hashemi <meraj.hashemi.esh@gmail.com> |
Description: | The implemented methods reach out to scientists that seek to estimate multiplicity of infection (MOI) and lineage (allele) frequencies and prevalences at molecular markers using the maximum-likelihood method described in Schneider (2018) <doi:10.1371/journal.pone.0194148>, and Schneider and Escalante (2014) <doi:10.1371/journal.pone.0097899>. Users can import data from Excel files in various formats, and perform maximum-likelihood estimation on the imported data by the package's moimle() function. |
Depends: | R (≥ 4.3.0) |
Imports: | openxlsx (≥ 4.2.5.2), Rdpack (≥ 2.6), Rmpfr (≥ 0.9-3), |
License: | GPL-3 |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.3 |
Suggests: | knitr, rmarkdown |
VignetteBuilder: | knitr |
RdMacros: | Rdpack |
NeedsCompilation: | no |
Packaged: | 2023-12-21 19:29:48 UTC; meraj |
Author: | Meraj Hashemi [cre, aut, com], Kristan Schneider [aut, ths] |
Repository: | CRAN |
Date/Publication: | 2023-12-21 22:30:08 UTC |
MLMOI: An R Package to preprocess molecular data and derive prevalences, frequencies and multiplicity of infection (MOI)
Description
The MLMOI package provides three functions:
moimport();
moimle();
moimerge().
Details
The package reaches out to scientists that seek to
estimate MOI and lineage frequencies at molecular markers
using the maximum-likelihood method described in
(Schneider 2018),
(Schneider and Escalante 2018) and
(Schneider and Escalante 2014). Users can
import data from Excel files in various formats, and
perform maximum-likelihood estimation on the imported data
by the package's moimle()
function.
Types of molecular data
Molecular data can be of types:
microsatellite repeats (STRs);
single nucleotide polymorphisms (SNPs);
-
amino acids;
codons (base triplets).
Import function
The function moimport()
, is
designed to import molecular data. It imports molecular
data in various formats and transforms them into a
standard format.
Merging Datasets
Two datasets in standard format
can be merged with the function moimerge()
.
Estimation MOI and frequencies
The function
moimle()
is designed to derive MLE from molecular
data in standard format.
References
Schneider KA (2018). “Large and finite sample properties of a maximum-likelihood estimator for multiplicity of infection.” PLOS ONE, 13(4), 1-21. doi:10.1371/journal.pone.0194148.
Schneider KA, Escalante AA (2018). “Correction: A Likelihood Approach to Estimate the Number of Co-Infections.” PLOS ONE, 13(2), 1-3. doi:10.1371/journal.pone.0192877.
Schneider KA, Escalante AA (2014). “A Likelihood Approach to Estimate the Number of Co-Infections.” PLoS ONE, 9(7), e97899. http://dx.doi.org/10.1371%2Fjournal.pone.0097899.
Removes punctuation characters and typos from data entries
Description
This function is designed to find the lineages (STRs) present on a microsatellite marker in a single cell.
Usage
corrector_numeric(y, c_l, r_w, conm, cons, cha_num, rw_col, multsh)
Arguments
y |
string; a cell entry. |
c_l |
string; marker label. |
r_w |
numeric; sample ID's row number in the excel file. |
conm |
numeric; the multiple column per marker identifier. For the data of format multiple columns conm > 1. |
cons |
numeric; the multiple row per sample identifier. For the data of format multiple rows cons > 1. |
cha_num |
string vector; the vector of punctuation
characters plus alphabets. See
|
rw_col |
string vector; variable used to switch
between row and column in case of transposed data.
Namely, |
multsh |
string; reports warnings for multiple worksheet datasets. |
Value
a list of following elements: 1) a vector of lineages found on a microsatellite marker in a single cell. Each element corresponds to one and only one lineage and it is free from any punctuation character. 2) an identifier whose value is 1 if a warning takes place.
See Also
For further details see: moimport
,
moi_marker
.
Removes punctuation characters and typos from data entries
Description
This function is designed to find the lineages present on a SNP, amino-acid and codon marker in a single cell.
Usage
corrector_string(y, c_l, r_w, conm, cons, cha_string, rw_col, coding, multsh)
Arguments
y |
string; entry of a cell. |
c_l |
string; marker label. |
r_w |
numeric; sample ID's row number in the excel file. |
conm |
numeric; the multiple column per marker identifier. For the data of format multiple columns conm > 1. |
cons |
numeric; the multiple row per sample identifier. For the data of format multiple rows cons > 1. |
cha_string |
string vector; the vector of
punctuation characters plus numerics form 1 to 9. See
|
rw_col |
string vector; variable used to switch
between row and column in case of transposed data.
Namely, |
multsh |
string; reports warnings for multiple worksheet datasets. |
Value
a list of following elements: 1) a vector of lineages found on a marker (SNP, amino-acid or codon) in a single cell. Each element corresponds to one and only one lineage and it is free from typos, 2) an identifier whose value is 1 if a warning takes place.
See Also
For further details see: moimport
,
moi_marker
.
Translates the standard ambiguity codes for nucleotides (amino acid decoder)
Description
Translates the standard ambiguity codes for nucleotides in amino acid forms from a pre-specified coding class to 3-letter designation of amino acids.
Usage
decoder_aminoacid(
y,
c_l,
r_w,
aa_1,
aa_2,
let_3,
amino_acid,
aa_symbol,
coding,
rw_col,
multsh
)
Arguments
y |
numeric vector; entries in a cell corresponding to a specific sample and a specific marker. |
c_l |
string; marker label. |
r_w |
numeric; sample ID's row number in the excel file. |
aa_1 |
string vector; vector of different amino acids. |
aa_2 |
string vector; vector of different codons. |
let_3 |
string vector; vector of amino acids in 3 letter designation. |
amino_acid |
string vector; vector of amino acids in full name. |
aa_symbol |
string vector; vector of amino acids in one letter designation. |
coding |
string; coding class of the molecular marker. |
rw_col |
string vector; variable used to switch
between row and column in case of transposed data.
Namely, |
multsh |
string; reports warnings for multiple worksheet datasets. |
Value
a list of two elements: 1) a vector of 3-letter designation of amino acids on a marker corresponding to a sample in pre-specified coding class. 2) an identifier whose value is 1 if a warning takes place.
See Also
For further details see: moimport
,
moi_marker
and
corrector_string
. See also the
vignette 'StandardAmbiguityCodes'.
Translates the standard ambiguity codes for nucleotides (codon decoder)
Description
Translates the standard ambiguity codes for nucleotides in codon form from a pre-specified coding class to triplet designation of codons.
Usage
decoder_codon(
y,
c_l,
r_w,
aa_1,
aa_2,
compact,
codon_s,
coding,
rw_col,
multsh
)
Arguments
y |
numeric vector; entries in a cell corresponding to a specific sample and a specific marker. |
c_l |
string; marker label. |
r_w |
numeric; sample ID's row number in the excel file. |
aa_1 |
string vector; vector of different amino acids. |
aa_2 |
string vector; vector of different codons. |
compact |
string vector; vector of different codons in compact form. |
codon_s |
string vector; vector of different codons. |
coding |
string; coding class of the molecular marker. |
rw_col |
string vector; variable used to switch
between row and column in case of transposed data.
Namely, |
multsh |
string; reports warnings for multiple worksheet datasets. |
Value
a list of two elements: 1) a vector of codons in triplet designation on a marker corresponding to a sample in pre-specified coding class. 2) an identifier whose value is 1 if a warning takes place.
See Also
For further details see: moimport
,
moi_marker
and
corrector_string
.
Converts ambeguity codes to represented bases
Description
Translates the nucleotide ambiguity codes as defined in DNA Sequence Assembler from a pre-specified coding class to 4-letter codes.
Usage
decoder_snp(
y,
c_l,
r_w,
ambeguity_code,
represented_bases,
coding,
rw_col,
multsh
)
Arguments
y |
numeric vector; entries in a cell corresponding to a specific sample and a specific marker. |
c_l |
string; marker label. |
r_w |
numeric; sample ID's row number in the excel file. |
ambeguity_code |
string vector; ambeguity codes for snp data. |
represented_bases |
string vector; represented bases for those ambeguity codes. |
coding |
string; coding class of the molecular marker. |
rw_col |
string vector; variable used to switch
between row and column in case of transposed data.
Namely, |
multsh |
string; reports warnings for multiple worksheet datasets. |
Value
a list of two elements: 1) a vector of represented bases on a marker corresponding to a sample in pre-specified coding class. 2) an identifier whose value is 1 if a warning takes place.
See Also
For further details see: moimport
,
moi_marker
and
corrector_string
. See also the vignette
'StandardAmbiguityCodes'.
Transforms entries to the desired coding class
Description
Transforms the data entries in a cell to a pre-specified coding class.
Usage
decoder_str(y, c_l, r_w, coding, rw_col, multsh)
Arguments
y |
numeric vector; entries in a cell corresponding to a specific sample and a specific marker. |
c_l |
string; marker label. |
r_w |
numeric; sample ID's row number in the excel file. |
coding |
string; coding class of the molecular marker. |
rw_col |
string vector; variable used to switch
between row and column in case of transposed data.
Namely, |
multsh |
string; reports warnings for multiple worksheet datasets. |
Value
a list of two elements: 1) a vector of STR entries in pre-specified coding class. 2) an identifier whose value is 1 if a warning takes place.
See Also
For further details see: moimport
,
moi_marker
.
Derives MLE
Description
Derives MLE
Usage
mle(nnk, nn)
Arguments
nnk |
numeric vector; vector of lineage prevalence counts. |
nn |
numeric; sample size |
Value
a list with the following elements: 1) log likelihood at MLE, MLE of lambda and MLE of psi, 2) MLE of lineage frequencies.
See Also
For further details see: moimle
Derives profile-likelihood MLE of lineage frequencies
Description
Derives profile-likelihood MLE of lineage frequencies
Usage
mle_fixed(lambda, nnk)
Arguments
lambda |
numeric; MOI parameter |
nnk |
numeric vector; vector of lineage prevalence counts. |
Value
vector of lineage frequency estimates
See Also
For further details see: moimle
Administrator function
Description
Administrator function
Usage
moi_administrator(
set_d,
s_total,
m_total,
nummtd,
cha,
rw_col,
nwsh,
transposed,
multsh
)
Arguments
set_d |
data frame; imported dataset. |
s_total |
string vector; vector of sample IDs. |
m_total |
string vector;vector of marker labels. |
nummtd |
numeric; number of metadata columns plus 2. |
cha |
string vector; vector of punctuation
characters. See |
rw_col |
string vector; variable used to switch
between row and column in case of transposed data.
Namely, |
nwsh |
numeric; worksheet number in multiple worksheet dataset. |
multsh |
string; reports warnings for multiple worksheet datasets. |
Value
a list of following elements: 1) rows in which new samples start, 2) all sample IDs in the worksheet, 3) number of samples in the worksheet, 3) multiple row per sample identifier, 4) another multiple row per sample identifier, 5) columns in which new markers start, 6) all marker labels in the worksheet, 7) number of markers in the worksheet, 8) multiple column per marker identifier, 9) another multiple column per marker identifier, 10) an identifier whose value is 1 if a warning takes place.
See Also
For further details see: moimport
.
Finds forbidden sample ID repetitions
Description
Sample IDs need to be uniquely assigned to samples. This function checks if a sample ID is assigned to two or more different samples. Similarly, the marker labels need to be uniquely defined. This function is also used to check forbidden marker label repetitions.
Usage
moi_duplicatefinder(total, sam_mark, nummtd, rw_col)
Arguments
total |
string vector; vector of sample IDs. |
sam_mark |
string; a string which is either "Sample ID" or "Marker". |
nummtd |
numeric; number of metadata columns plus 2. |
rw_col |
string vector; variable used to switch
between row and column in case of transposed data.
Namely, |
Value
a warning which informs the user of forbidden repetition of sample ID's or marker labels.
See Also
For further details see: moimport
moi_labels
.
Reports and deletes empty rows/columns
Description
Reports and deletes empty rows/columns.
Usage
moi_empty(
set_d,
setnoempty,
nummtd,
rw_col,
multsheets,
alllabels,
molecular,
molid,
n
)
Arguments
set_d |
data frame; imported dataset. |
setnoempty |
data frame; imported dataset. |
nummtd |
numeric; number of metadata columns plus 2. |
rw_col |
string vector; variable used to switch
between row and column in case of transposed data.
Namely, |
multsheets |
logical; indicating whether data is
contained in a single or multiple worksheets. The
default value is |
alllabels |
all the marker labels. |
molecular |
molecular argument. |
molid |
identifier. It is greater than zero when molecular argument is just a single value. |
Value
A list of 4 elements: 1) dataset without empty rows/columns; 2) dataset with empty rows/columns; 3) coorected labels; 4)corrected number of metadata columns.
See Also
For further details see: moimport
.
Exports the dataset in standard format to a new excel file.
Description
This function exports the modified dataset in standard format to a new excel file.
Usage
moi_export(export, final)
Arguments
export |
string; the path where the imported data is stored in standardized format. |
final |
data frame; modified dataset in standard format. |
Value
An excel file which contains dataset in standard format.
See Also
For further details, please see the following
functions: moimport
.
Finds sample IDs contained in a dataset
Description
Each dataset consists of several number of
samples. Samples are specified by their sample ID which
are placed in the first column of the excel worksheet.
The function moi_labels
, finds sample IDs and
the row in which they start.
Usage
moi_labels(s_total)
Arguments
s_total |
sample IDs. |
Value
a list which the first element is a numeric vector specifying the number of rows in which a new sample starts. The second element is the name of sample IDs.
See Also
For further details see: moimport
and moi_duplicatefinder
.
Extracts lineages of samples at a specific marker
Description
For a specific marker, the function goes
from one sample to another and finds lineages with the
help of the following functions:
corrector_numeric
along with
decoder_str
,
corrector_string
along with
decoder_aminoacid
and
corrector_string
along with
decoder_snp
. Each of these functions are
suitable for a particular type of molecular data.
Usage
moi_marker(
col_j,
c_l,
sam,
samorder,
conm,
cons,
molecular,
coding,
cha_num,
cha_string,
ambeguity_code,
represented_bases,
aa_1,
aa_2,
let_3,
amino_acid,
aa_symbol,
compact,
codon_s,
rw_col,
multsh
)
Arguments
col_j |
vector; column vector of a specific marker. |
c_l |
string; marker label. |
sam |
numeric vector; vector which its elements specify where a new sample starts. |
samorder |
a vector which its elements specify where a new sample starts. |
conm |
numeric; the multiple column identifier. For the data of format multiple columns conm > 1. |
cons |
numeric; the multiple row identifier. For the data of format multiple rows conm > 1. |
molecular |
string; type of molecular data. |
coding |
string; coding class of the molecular marker. |
cha_num |
string vector; vector of symbols (used for microsatellite data). |
cha_string |
string vector; vector of symbols (used for snp and amino acid). |
ambeguity_code |
string vector; ambeguity codes for snp data. |
represented_bases |
string vector; represented bases for those ambeguity codes. |
aa_1 |
string vector; vector of different amino acids. |
aa_2 |
string vector; vector of different codons. |
let_3 |
string vector; vector of amino acids in 3-letter designation. |
amino_acid |
string vector; vector of amino acids in full name. |
aa_symbol |
string vector; vector of amino acids in one letter designation. |
compact |
string vector; vector of different codons in compact form. |
codon_s |
string vector; vector of different codons. |
rw_col |
string vector; variable used to switch
between row and column in case of transposed data.
Namely, |
multsh |
string; reports warnings for multiple worksheet datasets. |
Value
a list with the following elements: 1) a list with elements containing lineages for a specific sample on a specific marker. The order in which samples are entered in dataset is preserved in the list. The lineages are free from typos and are transformed to pre-specified coding class, 2) an identifier whose value is 1 if a warning takes place.
See Also
For further details, please see the following
functions: moimport
Merges metadata
Description
Merges metadata
Usage
moi_mergemetadata(tempmtd, mtdall, samall, multsh)
Arguments
tempmtd |
matrix; matrix of temporary metadata. |
mtdall |
string vector; vector of all metadata labels. |
samall |
string vector; vector of all sample IDs in the worksheet. |
multsh |
string; reports warnings for multiple worksheet datasets. |
Value
list of following elements: 1) unique metadata labels, 2) unique metadata for different samples.
See Also
For further details see: moimport
.
Checks the metadata entries
Description
Checks if a sample has unique metadata entries.
Usage
moi_metadata(
metadata,
mtdlabels,
nummtd,
samorder,
samall,
lsam,
nomtdneeded,
nomerge,
multsh
)
Arguments
metadata |
matrix; matrix of metadata columns. |
mtdlabels |
string vector; vector of metadata labels. |
nummtd |
numeric; number of metadata columns plus 2. |
samorder |
numeric vector; rows where samples start. |
samall |
string vector; vector of sample IDs in the worksheet. |
lsam |
numeric; number of samples in the worksheet. |
nomtdneeded |
numeric vector; samples which need no metadata. |
multsh |
string; reports warnings for multiple worksheet datasets. |
Value
a list of following elements: 1) unique metadata for different samples, 2) an identifier whose value is 1 if a warning takes place.
See Also
For further details see: moimport
.
Derives lineage prevalence counts
Description
Derives lineage prevalence counts
Usage
moi_nk(datmarker, samorder)
Arguments
datmarker |
vector; a column of data corresponding to a marker from the imported dataset. |
samorder |
numeric vector; row numbers in excel file where the new samples start. |
Value
a list with two elements: 1) sample size of the marker, 2) vector of lineage prevalence counts at the marker.
See Also
For further details see: moimle
Contains different character vectors needed for importing molecular dataset
Description
Contains different character vectors needed for importing molecular dataset
Usage
moi_prerequisite()
Finds the most frequent separator
Description
This function is activated when the data is of 'One row per sample, one column per marker' format. In such a dataset, the user enters multiple information corresponding to a sample on a marker in one cell and separates the items with a symbol (punctuation character). It is expected that user be consistent with usage of symbol. Otherwise this function addresses the inconsistencies with a warning.
Usage
moi_separator(set_d, nummtd, cha, rw_col, nwsh, multsh)
Arguments
set_d |
data frame; imported dataset. |
nummtd |
numeric; number of metadata columns plus 2. |
cha |
string vector; vector of punctuation
characters. See |
rw_col |
string vector; variable used to switch
between row and column in case of transposed data.
Namely, |
nwsh |
numeric; worksheet number in multiple worksheet dataset. |
multsh |
string; reports warnings for multiple worksheet datasets. |
Value
a warning is generated reporting inconsistencies in usage of separator.
See Also
For further details see: moimport
.
Writes warnings into a file
Description
This function exports the modified dataset in standard format to a new excel file.
Usage
moi_warning(
keepwarnings,
general_warnings,
metadata_warnings,
marker_warnings,
markerlabels
)
Arguments
keepwarnings |
string; the path where the warnings are stored. |
general_warnings |
list; general warnings. |
metadata_warnings |
list; metadata warnings. |
marker_warnings |
list; marker warnings. |
markerlabels |
string vector; marker labels. |
Value
An excel file which contains warnings. Each worksheet corresponds to a marker
See Also
For further details, please see the following
functions: moimport
.
Merges two molecular datasets.
Description
The function is designed to merge two datasets from separate Excel files. The data in each Excel file is placed in the first worksheet.
Usage
moimerge(
file1,
file2,
nummtd1,
nummtd2,
keepmtd = FALSE,
export = NULL,
keepwarnings = NULL
)
Arguments
file1 |
string; specifying the path of the first dataset. |
file2 |
string; specifying the path of the second dataset. |
nummtd1 |
numeric; number of metadata columns (see
|
nummtd2 |
numeric; number of metadata columns (see
|
keepmtd |
logical; determining whether metadata
(e.g., date) should be retained (default as
|
export |
string; the path where the data is stored. |
keepwarnings |
string; the path where the warnings are stored. |
Details
The two datasets should be already in standard
format (see moimport()
). The datasets
are placed in the first worksheet of the two different
Excel files. Notice that marker labels (=column
labels) need to be unique.
Value
The output is a dataset in standard format which constitutes of an assembly of the input datasets.
Warnings
Warnings are generated if potential
inconsistencies are detected. E.g., if the same sample
occurs in both datasets and have contradicting metadata
entries. The function only prints the first 50 warnings.
If the number of warnings are more than 50, the user is
recommended to set the argument keepwarnings
,
in order to save the warnings in an Excel file.
See Also
To import and transform data into standard
format, please see the function moimport()
.
Examples
#The datasets 'testDatamerge1.xlsx' and 'testDatamerge1.xlsx' are already in standard format:
infile1 <- system.file("extdata", "testDatamerge1.xlsx", package = "MLMOI")
infile2 <- system.file("extdata", "testDatamerge2.xlsx", package = "MLMOI")
outfile <- moimerge(infile1, infile2, nummtd1 = 1, nummtd2 = 2, keepmtd = TRUE)
Estimates prevalences, frequency spectra and MOI parameter.
Description
moimle()
derives the maximum-likelihood
estimate (MLE) of the MOI parameter (Poisson parameter)
and the lineage (allele) frequencies for each molecular
marker in a dataset. Additionally, the lineage
prevalence counts are derived.
Usage
moimle(file, nummtd = 0, bounds = c(NA, NA))
Arguments
file |
string or data.frame; if file is a path it
must specify the path to the file to be imported. The
dataset can also be a data.frame object in R. The dataset
must be in standard format (see |
nummtd |
numeric; number of metadata columns (e.g.
date, sample location, etc.) in the dataset (default
value is |
bounds |
numeric vector; a vector of size 2, specifying a lower bound (1st element) and an upper bound (2nd element) for the MOI parameter. The function derives lineage frequency ML estimates by profiling the likelihood function on one of the bounds. For a marker without sign of super-infections, the lower bound is employed. If one allele is contained in every sample, the upper bound is employed. |
Details
moimle()
requires a dataset in standard
format which is free of typos (e.g. incompatible and
unidentified entries). Therefore, users need to
standardize the dataset by employing the
moimport()
function.
If one or more molecular markers contain pathological
data, the ML estimate for the Poisson parameter is
either 0 or does not exist. Both estimates are
meaningless, however, in the former case frequency
estimates exist while they do not in the later. By
setting the option bounds
as a range for MOI
parameter \lambda
. i.e., bounds =
c(<\lambda_min
>, <\lambda_max
>), this
problem is bypassed and the ML estimates are calculated
by profiling at \lambda_min
or \lambda_max
.
If no super-infections are observed at a marker,
moimle()
uses \lambda_min
as the MOI
parameter estimate, \lambda_max
if one lineage is
present in all samples. For regular data, the
profile-likelihood estimate using \lambda_min
or
\lambda_max
is returned depending on whether the
ML estimate falls below \lambda_min
or above
\lambda_max
.
Value
moimle()
returns a nested list, where the
outer elements correspond to molecular markers in the
dataset. The inner elements for each molecular marker
contain the following information:
-
sample size,
allele prevalence counts,
-
observed prevalences
log likelihood at MLE,
-
maximum-likelihood estimate of MOI parameter,
-
maximum-likelihood estimates of lineage frequencies.
Warnings
Warnings are issued, if data is
pathological at one or multiple markers. If the option
bounds
is set, but MLE of MOI parameter at a
molecular marker takes a lower or higher value than
\lambda_min or \lambda_max
respectively, a warning
is generated.
See Also
To import and transform data to standard format,
please see the function moimport()
.
Examples
#basic data analysis
infile1 <- system.file("extdata", "testDatamerge1.xlsx", package = "MLMOI")
mle1 <- moimle(infile1, nummtd = 1)
Imports molecular data in various formats and transforms them into a standard format.
Description
moimport()
imports molecular data from
Excel workbooks. The function handles various types of
molecular data (e.g. STRs, SNPs), codings (e.g. 4-letter
vs. IUPAC format for SNPs), and detects inconsistencies
(e.g. typos, incorrect entries). moimport()
allows users to import data from single or multiple
worksheets.
Usage
moimport(
file,
multsheets = FALSE,
nummtd = 0,
molecular = "str",
coding = "integer",
transposed = FALSE,
keepmtd = FALSE,
export = NULL,
keepwarnings = NULL
)
Arguments
file |
string; specifying the path to the file to be imported. |
multsheets |
logical; indicating whether data is
contained in a single or multiple worksheets. The
default value is |
nummtd |
numeric number or vector; number of metadata
columns (e.g. date, sample location, etc.) in the
worksheet(s) to be imported (default value |
molecular |
string vector or list; specifies the type
of molecular data to be imported. STR, SNP, amino acid
and codon markers are specified with 'STR', 'SNP',
'amino' and 'codon' values, respectively (default value
|
coding |
string vector or list; specifies the coding
of each data variable (marker) depending on their type.
Admissible values for |
transposed |
logical or logical vector; if markers
are entered in rows and samples in columns, set
|
keepmtd |
logical; determines whether metadata (e.g.,
date) should be retained during import (default value
|
export |
string; the path where the imported data is
stored in standardized format. Data is not stored if no
path is specified (default value |
keepwarnings |
string; the path where the warnings
are stored. Warnings are not stored if no path is
specified (default value |
Details
Each worksheet of the data to be imported must have one of the following formats: i) one row per sample and one column per marker. Here cells can have multiple entries, separated by a special character (separator), e.g. a punctuation character. ii) one column per marker and multiple rows per sample (standard format). iii) one row per sample and multiple columns per marker. Importantly, within one worksheet formats ii) and iii) cannot be combined (see section Warnings and Errors). Combinations of other formats are permitted but might result in warnings. Additionally, Occurrence of different separators are reported (see section Warnings and Errors).
Users should check the following before data import:
the dataset is placed in the first worksheet of the workbook;
in case of multiple worksheets, all worksheets contain data (additional worksheets need to be removed);
sample IDs are placed in the first column (first row in case of transposed data; see section Exceptions);
marker labels are placed in the first row (first column in case of transposed data; see section Exceptions);
-
sample IDs and as well the marker labels are unique (the duplication of ID/labels are allowed when sample/marker contains data in consecutive rows/columns);
-
entries such as sentences (e.g. comments in the worksheet) or meaningless words (e.g. 'missing' for missing data) are removed from data;
metadata columns (rows in case of transposed data) are placed between sample IDs and molecular-marker columns.
If data is contained in multiple worksheets, above requirements need to be fulfilled for every worksheet in the Excel workbook. Not all sample IDs must occur in every worksheet. The sample ID must not be confused with the patient's ID, the former refers to a particular sample taken from a patient, the latter to a unique patient. Several sample IDs can have the same patient's ID. In case of multiple-worksheet datasets, all marker labels across all worksheets need to be unique.
The option molecular
needs to be specified as a
vector, for single-worksheet data (multsheets =
FALSE
) containing different types of molecular markers.
A list is specified, if data spread across multiple
worksheets with different types of molecular across the
worksheets. List elements are vectors or single values,
referring to the types of molecular data of the
corresponding worksheets. Users do not need to set a
vector if all markers are of the same molecular type
(single or multiple worksheet dataset).
Setting the option coding
as vector or list is
similar to setting molecular type by molecular
.
Every molecular data type has a pre-specified coding
class as default which users do not need to specify.
Namely, 'integer' for STRs, '4let' for SNPs, '3let' for
amino acids and 'triplet' for codons.
Value
returns a data frame. moimport()
imports
heterogeneous data formats and converts them into a
standard format which are free from typos (e.g.
incompatible and unidentified entries) appropriate for
further analyses. Metadata is retained (if keepmtd
= TRUE
) and, in case of data from multiple worksheets,
unified if metadata variables have the same labels
across two or more worksheets. If the argument
export
is set, then the result is saved in the
first worksheet of the workbook of the file specified by
export
. The imported/exported dataset will be
appropriate for other functions of the package.
Warnings and Errors
Usually warnings are generated if data is corrected pointing to suspicious entries in the original data. Users should read warnings carefully and check respective entries and apply manual corrections if necessary. In case of issues an error occurs and the function is stopped.
Usually, if arguments are not set properly, errors occur. Other cases of errors are: i) if sample IDs in a worksheet are not uniquely defined, i.e., two samples in non-consecutive rows have the same sample ID; ii) if formats 'one column per marker and multiple rows per sample' and 'one row per sample and multiple columns per marker' are mixed.
Warnings are issued in several cases. Above all, when typos (e.g., punctuation characters) are found. Entries which cannot be identified as a molecular type/coding class specified by the user are also reported (e.g., '9' is reported when marker is of type SNPs, or 'L' is reported when coding class of an amino-acid marker is '3let').
Empty rows and columns are deleted and eventually reported. Samples with ambiguous metadata (in a worksheet or across worksheets in case of multiple worksheet dataset), or missing are also reported.
The function only prints the first 50 warnings.
If the number of warnings are more than 50, the user is
recommended to set the argument keepwarnings
,
in order to save the warnings in an Excel file.
Exceptions
Transposed data: usually data is
entered with samples in rows and markers in columns.
However, on the contrary some users might enter data the
opposite way. That is the case of transposed data. If
so, the argument transposed = TRUE
is set, or a
vector in case of multiple worksheets with at least one
worksheet being transposed.
See Also
For further details, see the following vignettes:
vignette("dataimportcheck-list", package =
"MLMOI")
vignette("StandardAmbiguityCodes", package =
"MLMOI")
vignette("moimport-arguments", package = "MLMOI")
Examples
#datasets are provided by the package
#importing dataset with metadata variables:
infile <- system.file("extdata", "testDatametadata.xlsx", package = "MLMOI")
moimport(infile, nummtd = 3, keepmtd = TRUE)
##more examples are included in 'examples' vignette:
#vignette("examples", package = "MLMOI")
tempMLMOI: An R package for deriving multiplicity of infection (MOI) from molecular data.
Description
The tempMLMOI package provides three functions:
moimport;
moimle;
moimerge.
Details
The package reaches out to scientists that seek to
estimate MOI and frequencies at molecular markers using
the maximum-likelihood method described i n Schneider and
Escalante (2018) and Schneider (2018). Users can import
data from ‘.xlsx’ files in various formats, and
perform maximum-likelihood estimation on the imported data
by the package's mlmoi
function.
Types of molecular data
Molecular data can be of types:
microsatellite repeats (STRs);
single nucleotide polymorphisms (SNPs);
-
amino acids;
codons (base triplets).
Import function
The function moimport
, is
designed to import molecular data. It imports molecular
data in various formats and transforms them into a
standard format.
Merging Datasets
Two datasets in standard format
can be merged with the function moimerge
.
Estimation MOI and frequencies
The function
moimle
is designed to derive MLE from molecular
data in standard format.
Captures all the warnings
Description
Captures all the warnings
Usage
withWarnings(expr)
Arguments
expr |
expression to be evaluated. |
Value
list with two elements. First is the value of the function. Second is the generated functions by the expression.
See Also
For further details, please see the following
functions: moimport
.