Help for package MLMOI

Type:

Package

Title:

Estimating Frequencies, Prevalence and Multiplicity of Infection

Version:

0.1.2

Maintainer:

Meraj Hashemi <meraj.hashemi.esh@gmail.com>

Description:

The implemented methods reach out to scientists that seek to estimate multiplicity of infection (MOI) and lineage (allele) frequencies and prevalences at molecular markers using the maximum-likelihood method described in Schneider (2018) <doi:10.1371/journal.pone.0194148>, and Schneider and Escalante (2014) <doi:10.1371/journal.pone.0097899>. Users can import data from Excel files in various formats, and perform maximum-likelihood estimation on the imported data by the package's moimle() function.

Depends:

R (≥ 4.3.0)

Imports:

openxlsx (≥ 4.2.5.2), Rdpack (≥ 2.6), Rmpfr (≥ 0.9-3),

License:

GPL-3

Encoding:

UTF-8

RoxygenNote:

7.2.3

Suggests:

knitr, rmarkdown

VignetteBuilder:

knitr

RdMacros:

Rdpack

NeedsCompilation:

Packaged:

2023-12-21 19:29:48 UTC; meraj

Author:

Meraj Hashemi [cre, aut, com], Kristan Schneider [aut, ths]

Repository:

CRAN

Date/Publication:

2023-12-21 22:30:08 UTC

MLMOI: An R Package to preprocess molecular data and derive prevalences, frequencies and multiplicity of infection (MOI)

Description

The MLMOI package provides three functions:

moimport();
moimle();
moimerge().

Details

The package reaches out to scientists that seek to estimate MOI and lineage frequencies at molecular markers using the maximum-likelihood method described in (Schneider 2018), (Schneider and Escalante 2018) and (Schneider and Escalante 2014). Users can import data from Excel files in various formats, and perform maximum-likelihood estimation on the imported data by the package's moimle() function.

Types of molecular data

Molecular data can be of types:

microsatellite repeats (STRs);
single nucleotide polymorphisms (SNPs);
amino acids;
codons (base triplets).

Import function

The function moimport(), is designed to import molecular data. It imports molecular data in various formats and transforms them into a standard format.

Merging Datasets

Two datasets in standard format can be merged with the function moimerge().

Estimation MOI and frequencies

The function moimle() is designed to derive MLE from molecular data in standard format.

References

Schneider KA (2018). “Large and finite sample properties of a maximum-likelihood estimator for multiplicity of infection.” PLOS ONE, 13(4), 1-21. doi:10.1371/journal.pone.0194148.

Schneider KA, Escalante AA (2018). “Correction: A Likelihood Approach to Estimate the Number of Co-Infections.” PLOS ONE, 13(2), 1-3. doi:10.1371/journal.pone.0192877.

Schneider KA, Escalante AA (2014). “A Likelihood Approach to Estimate the Number of Co-Infections.” PLoS ONE, 9(7), e97899. http://dx.doi.org/10.1371%2Fjournal.pone.0097899.

Removes punctuation characters and typos from data entries

Description

This function is designed to find the lineages (STRs) present on a microsatellite marker in a single cell.

Usage

corrector_numeric(y, c_l, r_w, conm, cons, cha_num, rw_col, multsh)

Arguments

y

string; a cell entry.

c_l

string; marker label.

r_w

numeric; sample ID's row number in the excel file.

conm

numeric; the multiple column per marker identifier. For the data of format multiple columns conm > 1.

cons

numeric; the multiple row per sample identifier. For the data of format multiple rows cons > 1.

cha_num

string vector; the vector of punctuation characters plus alphabets. See moi_prerequisite.

rw_col

string vector; variable used to switch between row and column in case of transposed data. Namely, c("rows ", "row ", "column ", "columns ").

multsh

string; reports warnings for multiple worksheet datasets.

Value

a list of following elements: 1) a vector of lineages found on a microsatellite marker in a single cell. Each element corresponds to one and only one lineage and it is free from any punctuation character. 2) an identifier whose value is 1 if a warning takes place.

Removes punctuation characters and typos from data entries

Description

This function is designed to find the lineages present on a SNP, amino-acid and codon marker in a single cell.

Usage

corrector_string(y, c_l, r_w, conm, cons, cha_string, rw_col, coding, multsh)

Arguments

y

string; entry of a cell.

c_l

string; marker label.

r_w

numeric; sample ID's row number in the excel file.

conm

numeric; the multiple column per marker identifier. For the data of format multiple columns conm > 1.

cons

numeric; the multiple row per sample identifier. For the data of format multiple rows cons > 1.

cha_string

string vector; the vector of punctuation characters plus numerics form 1 to 9. See moi_prerequisite.

rw_col

string vector; variable used to switch between row and column in case of transposed data. Namely, c("rows ", "row ", "column ", "columns ").

multsh

string; reports warnings for multiple worksheet datasets.

Value

a list of following elements: 1) a vector of lineages found on a marker (SNP, amino-acid or codon) in a single cell. Each element corresponds to one and only one lineage and it is free from typos, 2) an identifier whose value is 1 if a warning takes place.

Translates the standard ambiguity codes for nucleotides (amino acid decoder)

Description

Translates the standard ambiguity codes for nucleotides in amino acid forms from a pre-specified coding class to 3-letter designation of amino acids.

Usage

decoder_aminoacid(
  y,
  c_l,
  r_w,
  aa_1,
  aa_2,
  let_3,
  amino_acid,
  aa_symbol,
  coding,
  rw_col,
  multsh
)

Arguments

y

numeric vector; entries in a cell corresponding to a specific sample and a specific marker.

c_l

string; marker label.

r_w

numeric; sample ID's row number in the excel file.

aa_1

string vector; vector of different amino acids.

aa_2

string vector; vector of different codons.

let_3

string vector; vector of amino acids in 3 letter designation.

amino_acid

string vector; vector of amino acids in full name.

aa_symbol

string vector; vector of amino acids in one letter designation.

coding

string; coding class of the molecular marker.

rw_col

string vector; variable used to switch between row and column in case of transposed data. Namely, c("rows ", "row ", "column ", "columns ").

multsh

string; reports warnings for multiple worksheet datasets.

Value

a list of two elements: 1) a vector of 3-letter designation of amino acids on a marker corresponding to a sample in pre-specified coding class. 2) an identifier whose value is 1 if a warning takes place.

Translates the standard ambiguity codes for nucleotides (codon decoder)

Description

Translates the standard ambiguity codes for nucleotides in codon form from a pre-specified coding class to triplet designation of codons.

Usage

decoder_codon(
  y,
  c_l,
  r_w,
  aa_1,
  aa_2,
  compact,
  codon_s,
  coding,
  rw_col,
  multsh
)

Arguments

y

numeric vector; entries in a cell corresponding to a specific sample and a specific marker.

c_l

string; marker label.

r_w

numeric; sample ID's row number in the excel file.

aa_1

string vector; vector of different amino acids.

aa_2

string vector; vector of different codons.

compact

string vector; vector of different codons in compact form.

codon_s

string vector; vector of different codons.

coding

string; coding class of the molecular marker.

rw_col

string vector; variable used to switch between row and column in case of transposed data. Namely, c("rows ", "row ", "column ", "columns ").

multsh

string; reports warnings for multiple worksheet datasets.

Value

a list of two elements: 1) a vector of codons in triplet designation on a marker corresponding to a sample in pre-specified coding class. 2) an identifier whose value is 1 if a warning takes place.

Converts ambeguity codes to represented bases

Description

Translates the nucleotide ambiguity codes as defined in DNA Sequence Assembler from a pre-specified coding class to 4-letter codes.

Usage

decoder_snp(
  y,
  c_l,
  r_w,
  ambeguity_code,
  represented_bases,
  coding,
  rw_col,
  multsh
)

Arguments

y

numeric vector; entries in a cell corresponding to a specific sample and a specific marker.

c_l

string; marker label.

r_w

numeric; sample ID's row number in the excel file.

ambeguity_code

string vector; ambeguity codes for snp data.

represented_bases

string vector; represented bases for those ambeguity codes.

coding

string; coding class of the molecular marker.

rw_col

string vector; variable used to switch between row and column in case of transposed data. Namely, c("rows ", "row ", "column ", "columns ").

multsh

string; reports warnings for multiple worksheet datasets.

Value

a list of two elements: 1) a vector of represented bases on a marker corresponding to a sample in pre-specified coding class. 2) an identifier whose value is 1 if a warning takes place.

Transforms entries to the desired coding class

Description

Transforms the data entries in a cell to a pre-specified coding class.

Usage

decoder_str(y, c_l, r_w, coding, rw_col, multsh)

Arguments

y

numeric vector; entries in a cell corresponding to a specific sample and a specific marker.

c_l

string; marker label.

r_w

numeric; sample ID's row number in the excel file.

coding

string; coding class of the molecular marker.

rw_col

string vector; variable used to switch between row and column in case of transposed data. Namely, c("rows ", "row ", "column ", "columns ").

multsh

string; reports warnings for multiple worksheet datasets.

Value

a list of two elements: 1) a vector of STR entries in pre-specified coding class. 2) an identifier whose value is 1 if a warning takes place.

Derives MLE

Description

Derives MLE

Usage

mle(nnk, nn)

Arguments

nnk

numeric vector; vector of lineage prevalence counts.

nn

numeric; sample size

Value

a list with the following elements: 1) log likelihood at MLE, MLE of lambda and MLE of psi, 2) MLE of lineage frequencies.

Derives profile-likelihood MLE of lineage frequencies

Description

Derives profile-likelihood MLE of lineage frequencies

Usage

mle_fixed(lambda, nnk)

Arguments

lambda

numeric; MOI parameter

nnk

numeric vector; vector of lineage prevalence counts.

Value

vector of lineage frequency estimates

Administrator function

Description

Administrator function

Usage

moi_administrator(
  set_d,
  s_total,
  m_total,
  nummtd,
  cha,
  rw_col,
  nwsh,
  transposed,
  multsh
)

Arguments

set_d

data frame; imported dataset.

s_total

string vector; vector of sample IDs.

m_total

string vector;vector of marker labels.

nummtd

numeric; number of metadata columns plus 2.

cha

string vector; vector of punctuation characters. See moi_prerequisite.

rw_col

string vector; variable used to switch between row and column in case of transposed data. Namely, c("rows ", "row ", "column ", "columns ").

nwsh

numeric; worksheet number in multiple worksheet dataset.

multsh

string; reports warnings for multiple worksheet datasets.

Value

a list of following elements: 1) rows in which new samples start, 2) all sample IDs in the worksheet, 3) number of samples in the worksheet, 3) multiple row per sample identifier, 4) another multiple row per sample identifier, 5) columns in which new markers start, 6) all marker labels in the worksheet, 7) number of markers in the worksheet, 8) multiple column per marker identifier, 9) another multiple column per marker identifier, 10) an identifier whose value is 1 if a warning takes place.

Finds forbidden sample ID repetitions

Description

Sample IDs need to be uniquely assigned to samples. This function checks if a sample ID is assigned to two or more different samples. Similarly, the marker labels need to be uniquely defined. This function is also used to check forbidden marker label repetitions.

Usage

moi_duplicatefinder(total, sam_mark, nummtd, rw_col)

Arguments

total

string vector; vector of sample IDs.

sam_mark

string; a string which is either "Sample ID" or "Marker".

nummtd

numeric; number of metadata columns plus 2.

rw_col

string vector; variable used to switch between row and column in case of transposed data. Namely, c("rows ", "row ", "column ", "columns ").

Value

a warning which informs the user of forbidden repetition of sample ID's or marker labels.

Reports and deletes empty rows/columns

Description

Reports and deletes empty rows/columns.

Usage

moi_empty(
  set_d,
  setnoempty,
  nummtd,
  rw_col,
  multsheets,
  alllabels,
  molecular,
  molid,
  n
)

Arguments

set_d

data frame; imported dataset.

setnoempty

data frame; imported dataset.

nummtd

numeric; number of metadata columns plus 2.

rw_col

string vector; variable used to switch between row and column in case of transposed data. Namely, c("rows ", "row ", "column ", "columns ").

multsheets

logical; indicating whether data is contained in a single or multiple worksheets. The default value is multsheets = FALSE, corresponding to data contained in a single worksheet.

alllabels

all the marker labels.

molecular

molecular argument.

molid

identifier. It is greater than zero when molecular argument is just a single value.

Value

A list of 4 elements: 1) dataset without empty rows/columns; 2) dataset with empty rows/columns; 3) coorected labels; 4)corrected number of metadata columns.

Exports the dataset in standard format to a new excel file.

Description

This function exports the modified dataset in standard format to a new excel file.

Usage

moi_export(export, final)

Arguments

export

string; the path where the imported data is stored in standardized format.

final

data frame; modified dataset in standard format.

Value

An excel file which contains dataset in standard format.

Finds sample IDs contained in a dataset

Description

Each dataset consists of several number of samples. Samples are specified by their sample ID which are placed in the first column of the excel worksheet. The function moi_labels, finds sample IDs and the row in which they start.

Usage

moi_labels(s_total)

Arguments

s_total

sample IDs.

Value

a list which the first element is a numeric vector specifying the number of rows in which a new sample starts. The second element is the name of sample IDs.

Extracts lineages of samples at a specific marker

Description

For a specific marker, the function goes from one sample to another and finds lineages with the help of the following functions: corrector_numeric along with decoder_str, corrector_string along with decoder_aminoacid and corrector_string along with decoder_snp. Each of these functions are suitable for a particular type of molecular data.

Usage

moi_marker(
  col_j,
  c_l,
  sam,
  samorder,
  conm,
  cons,
  molecular,
  coding,
  cha_num,
  cha_string,
  ambeguity_code,
  represented_bases,
  aa_1,
  aa_2,
  let_3,
  amino_acid,
  aa_symbol,
  compact,
  codon_s,
  rw_col,
  multsh
)

Arguments

col_j

vector; column vector of a specific marker.

c_l

string; marker label.

sam

numeric vector; vector which its elements specify where a new sample starts.

samorder

a vector which its elements specify where a new sample starts.

conm

numeric; the multiple column identifier. For the data of format multiple columns conm > 1.

cons

numeric; the multiple row identifier. For the data of format multiple rows conm > 1.

molecular

string; type of molecular data.

coding

string; coding class of the molecular marker.

cha_num

string vector; vector of symbols (used for microsatellite data).

cha_string

string vector; vector of symbols (used for snp and amino acid).

ambeguity_code

string vector; ambeguity codes for snp data.

represented_bases

string vector; represented bases for those ambeguity codes.

aa_1

string vector; vector of different amino acids.

aa_2

string vector; vector of different codons.

let_3

string vector; vector of amino acids in 3-letter designation.

amino_acid

string vector; vector of amino acids in full name.

aa_symbol

string vector; vector of amino acids in one letter designation.

compact

string vector; vector of different codons in compact form.

codon_s

string vector; vector of different codons.

rw_col

string vector; variable used to switch between row and column in case of transposed data. Namely, c("rows ", "row ", "column ", "columns ").

multsh

string; reports warnings for multiple worksheet datasets.

Value

a list with the following elements: 1) a list with elements containing lineages for a specific sample on a specific marker. The order in which samples are entered in dataset is preserved in the list. The lineages are free from typos and are transformed to pre-specified coding class, 2) an identifier whose value is 1 if a warning takes place.

Merges metadata

Description

Merges metadata

Usage

moi_mergemetadata(tempmtd, mtdall, samall, multsh)

Arguments

tempmtd

matrix; matrix of temporary metadata.

mtdall

string vector; vector of all metadata labels.

samall

string vector; vector of all sample IDs in the worksheet.

multsh

string; reports warnings for multiple worksheet datasets.

Value

list of following elements: 1) unique metadata labels, 2) unique metadata for different samples.

Checks the metadata entries

Description

Checks if a sample has unique metadata entries.

Usage

moi_metadata(
  metadata,
  mtdlabels,
  nummtd,
  samorder,
  samall,
  lsam,
  nomtdneeded,
  nomerge,
  multsh
)

Arguments

metadata

matrix; matrix of metadata columns.

mtdlabels

string vector; vector of metadata labels.

nummtd

numeric; number of metadata columns plus 2.

samorder

numeric vector; rows where samples start.

samall

string vector; vector of sample IDs in the worksheet.

lsam

numeric; number of samples in the worksheet.

nomtdneeded

numeric vector; samples which need no metadata.

multsh

string; reports warnings for multiple worksheet datasets.

Value

a list of following elements: 1) unique metadata for different samples, 2) an identifier whose value is 1 if a warning takes place.

Derives lineage prevalence counts

Description

Derives lineage prevalence counts

Usage

moi_nk(datmarker, samorder)

Arguments

datmarker

vector; a column of data corresponding to a marker from the imported dataset.

samorder

numeric vector; row numbers in excel file where the new samples start.

Value

a list with two elements: 1) sample size of the marker, 2) vector of lineage prevalence counts at the marker.

Contains different character vectors needed for importing molecular dataset

Description

Contains different character vectors needed for importing molecular dataset

Usage

moi_prerequisite()

Finds the most frequent separator

Description

This function is activated when the data is of 'One row per sample, one column per marker' format. In such a dataset, the user enters multiple information corresponding to a sample on a marker in one cell and separates the items with a symbol (punctuation character). It is expected that user be consistent with usage of symbol. Otherwise this function addresses the inconsistencies with a warning.

Usage

moi_separator(set_d, nummtd, cha, rw_col, nwsh, multsh)

Arguments

set_d

data frame; imported dataset.

nummtd

numeric; number of metadata columns plus 2.

cha

string vector; vector of punctuation characters. See moi_prerequisite.

rw_col

string vector; variable used to switch between row and column in case of transposed data. Namely, c("rows ", "row ", "column ", "columns ").

nwsh

numeric; worksheet number in multiple worksheet dataset.

multsh

string; reports warnings for multiple worksheet datasets.

Value

a warning is generated reporting inconsistencies in usage of separator.

Writes warnings into a file

Description

This function exports the modified dataset in standard format to a new excel file.

Usage

moi_warning(
  keepwarnings,
  general_warnings,
  metadata_warnings,
  marker_warnings,
  markerlabels
)

Arguments

keepwarnings

string; the path where the warnings are stored.

general_warnings

list; general warnings.

metadata_warnings

list; metadata warnings.

marker_warnings

list; marker warnings.

markerlabels

string vector; marker labels.

Value

An excel file which contains warnings. Each worksheet corresponds to a marker

Merges two molecular datasets.

Description

The function is designed to merge two datasets from separate Excel files. The data in each Excel file is placed in the first worksheet.

Usage

moimerge(
  file1,
  file2,
  nummtd1,
  nummtd2,
  keepmtd = FALSE,
  export = NULL,
  keepwarnings = NULL
)

Arguments

file1

string; specifying the path of the first dataset.

file2

string; specifying the path of the second dataset.

nummtd1

numeric; number of metadata columns (see moimport()) in the first file (default as 0).

nummtd2

numeric; number of metadata columns (see moimport()) in the second file (default as 0).

keepmtd

logical; determining whether metadata (e.g., date) should be retained (default as TRUE).

export

string; the path where the data is stored.

keepwarnings

string; the path where the warnings are stored.

Details

The two datasets should be already in standard format (see moimport()). The datasets are placed in the first worksheet of the two different Excel files. Notice that marker labels (=column labels) need to be unique.

Value

The output is a dataset in standard format which constitutes of an assembly of the input datasets.

Warnings

Warnings are generated if potential inconsistencies are detected. E.g., if the same sample occurs in both datasets and have contradicting metadata entries. The function only prints the first 50 warnings. If the number of warnings are more than 50, the user is recommended to set the argument keepwarnings, in order to save the warnings in an Excel file.

Examples

#The datasets 'testDatamerge1.xlsx' and 'testDatamerge1.xlsx' are already in standard format:

infile1 <- system.file("extdata", "testDatamerge1.xlsx", package = "MLMOI")
infile2 <- system.file("extdata", "testDatamerge2.xlsx", package = "MLMOI")
outfile <- moimerge(infile1, infile2, nummtd1 = 1, nummtd2 = 2, keepmtd = TRUE)

Estimates prevalences, frequency spectra and MOI parameter.

Description

moimle() derives the maximum-likelihood estimate (MLE) of the MOI parameter (Poisson parameter) and the lineage (allele) frequencies for each molecular marker in a dataset. Additionally, the lineage prevalence counts are derived.

Usage

moimle(file, nummtd = 0, bounds = c(NA, NA))

Arguments

file

string or data.frame; if file is a path it must specify the path to the file to be imported. The dataset can also be a data.frame object in R. The dataset must be in standard format (see moimport()). The first column must contain sample IDs. Adjacent columns can contain metadata, followed by columns corresponding to molecular markers.

nummtd

numeric; number of metadata columns (e.g. date, sample location, etc.) in the dataset (default value is nummtd = 0).

bounds

numeric vector; a vector of size 2, specifying a lower bound (1st element) and an upper bound (2nd element) for the MOI parameter. The function derives lineage frequency ML estimates by profiling the likelihood function on one of the bounds. For a marker without sign of super-infections, the lower bound is employed. If one allele is contained in every sample, the upper bound is employed.

Details

moimle() requires a dataset in standard format which is free of typos (e.g. incompatible and unidentified entries). Therefore, users need to standardize the dataset by employing the moimport() function.

If one or more molecular markers contain pathological data, the ML estimate for the Poisson parameter is either 0 or does not exist. Both estimates are meaningless, however, in the former case frequency estimates exist while they do not in the later. By setting the option bounds as a range for MOI parameter \lambda. i.e., bounds = c(<\lambda_min>, <\lambda_max>), this problem is bypassed and the ML estimates are calculated by profiling at \lambda_min or \lambda_max. If no super-infections are observed at a marker, moimle() uses \lambda_min as the MOI parameter estimate, \lambda_max if one lineage is present in all samples. For regular data, the profile-likelihood estimate using \lambda_min or \lambda_max is returned depending on whether the ML estimate falls below \lambda_min or above \lambda_max.

Value

moimle() returns a nested list, where the outer elements correspond to molecular markers in the dataset. The inner elements for each molecular marker contain the following information:

sample size,
allele prevalence counts,
observed prevalences
log likelihood at MLE,
maximum-likelihood estimate of MOI parameter,
maximum-likelihood estimates of lineage frequencies.

Warnings

Warnings are issued, if data is pathological at one or multiple markers. If the option bounds is set, but MLE of MOI parameter at a molecular marker takes a lower or higher value than \lambda_min or \lambda_max respectively, a warning is generated.

Examples


#basic data analysis
infile1 <- system.file("extdata", "testDatamerge1.xlsx", package = "MLMOI")
mle1 <- moimle(infile1, nummtd = 1)

Imports molecular data in various formats and transforms them into a standard format.

Description

moimport() imports molecular data from Excel workbooks. The function handles various types of molecular data (e.g. STRs, SNPs), codings (e.g. 4-letter vs. IUPAC format for SNPs), and detects inconsistencies (e.g. typos, incorrect entries). moimport() allows users to import data from single or multiple worksheets.

Usage

moimport(
  file,
  multsheets = FALSE,
  nummtd = 0,
  molecular = "str",
  coding = "integer",
  transposed = FALSE,
  keepmtd = FALSE,
  export = NULL,
  keepwarnings = NULL
)

Arguments

file

string; specifying the path to the file to be imported.

multsheets

logical; indicating whether data is contained in a single or multiple worksheets. The default value is multsheets = FALSE, corresponding to data contained in a single worksheet.

nummtd

numeric number or vector; number of metadata columns (e.g. date, sample location, etc.) in the worksheet(s) to be imported (default value nummtd = 0). In case of multiple worksheet dataset, if all worksheets have the same number of metadata columns an integer value is sufficient. If the numbers differ, they have to be specified by an integer vector.

molecular

string vector or list; specifies the type of molecular data to be imported. STR, SNP, amino acid and codon markers are specified with 'STR', 'SNP', 'amino' and 'codon' values, respectively (default value molecular = 'str'). For importing single worksheets, molecular is a single string or string vector. When importing multiple worksheets, molecular is a string in case the data contains only one type of molecular data. Else it is a list, with the k-th element being a string value or a vector describing the data types of the k-th worksheet.

coding

string vector or list; specifies the coding of each data variable (marker) depending on their type. Admissible values for coding depend on molecular data types are: 'integer', 'nearest', 'ceil' and 'floor' for STRs; SNPs with '4let' and 'iupac' for SNPs; '3let', '1let' and 'full' amino acids and 'triplet' and 'compact' for codons.

transposed

logical or logical vector; if markers are entered in rows and samples in columns, set transposed = TRUE (default value transposed = FALSE). When importing multiple worksheets, transposed can be logical vector specifying for each worksheet whether it is in transposed format.

keepmtd

logical; determines whether metadata (e.g., date) should be retained during import (default value keepmtd = TRUE).

export

string; the path where the imported data is stored in standardized format. Data is not stored if no path is specified (default value export = NULL).

keepwarnings

string; the path where the warnings are stored. Warnings are not stored if no path is specified (default value keepwarnings = NULL).

Details

Each worksheet of the data to be imported must have one of the following formats: i) one row per sample and one column per marker. Here cells can have multiple entries, separated by a special character (separator), e.g. a punctuation character. ii) one column per marker and multiple rows per sample (standard format). iii) one row per sample and multiple columns per marker. Importantly, within one worksheet formats ii) and iii) cannot be combined (see section Warnings and Errors). Combinations of other formats are permitted but might result in warnings. Additionally, Occurrence of different separators are reported (see section Warnings and Errors).

Users should check the following before data import:

the dataset is placed in the first worksheet of the workbook;
in case of multiple worksheets, all worksheets contain data (additional worksheets need to be removed);
sample IDs are placed in the first column (first row in case of transposed data; see section Exceptions);
marker labels are placed in the first row (first column in case of transposed data; see section Exceptions);
sample IDs and as well the marker labels are unique (the duplication of ID/labels are allowed when sample/marker contains data in consecutive rows/columns);
entries such as sentences (e.g. comments in the worksheet) or meaningless words (e.g. 'missing' for missing data) are removed from data;
metadata columns (rows in case of transposed data) are placed between sample IDs and molecular-marker columns.

If data is contained in multiple worksheets, above requirements need to be fulfilled for every worksheet in the Excel workbook. Not all sample IDs must occur in every worksheet. The sample ID must not be confused with the patient's ID, the former refers to a particular sample taken from a patient, the latter to a unique patient. Several sample IDs can have the same patient's ID. In case of multiple-worksheet datasets, all marker labels across all worksheets need to be unique.

The option molecular needs to be specified as a vector, for single-worksheet data (multsheets = FALSE) containing different types of molecular markers. A list is specified, if data spread across multiple worksheets with different types of molecular across the worksheets. List elements are vectors or single values, referring to the types of molecular data of the corresponding worksheets. Users do not need to set a vector if all markers are of the same molecular type (single or multiple worksheet dataset).

Setting the option coding as vector or list is similar to setting molecular type by molecular. Every molecular data type has a pre-specified coding class as default which users do not need to specify. Namely, 'integer' for STRs, '4let' for SNPs, '3let' for amino acids and 'triplet' for codons.

Value

returns a data frame. moimport() imports heterogeneous data formats and converts them into a standard format which are free from typos (e.g. incompatible and unidentified entries) appropriate for further analyses. Metadata is retained (if keepmtd = TRUE) and, in case of data from multiple worksheets, unified if metadata variables have the same labels across two or more worksheets. If the argument export is set, then the result is saved in the first worksheet of the workbook of the file specified by export. The imported/exported dataset will be appropriate for other functions of the package.

Warnings and Errors

Usually warnings are generated if data is corrected pointing to suspicious entries in the original data. Users should read warnings carefully and check respective entries and apply manual corrections if necessary. In case of issues an error occurs and the function is stopped.

Usually, if arguments are not set properly, errors occur. Other cases of errors are: i) if sample IDs in a worksheet are not uniquely defined, i.e., two samples in non-consecutive rows have the same sample ID; ii) if formats 'one column per marker and multiple rows per sample' and 'one row per sample and multiple columns per marker' are mixed.

Warnings are issued in several cases. Above all, when typos (e.g., punctuation characters) are found. Entries which cannot be identified as a molecular type/coding class specified by the user are also reported (e.g., '9' is reported when marker is of type SNPs, or 'L' is reported when coding class of an amino-acid marker is '3let').

Empty rows and columns are deleted and eventually reported. Samples with ambiguous metadata (in a worksheet or across worksheets in case of multiple worksheet dataset), or missing are also reported.

The function only prints the first 50 warnings. If the number of warnings are more than 50, the user is recommended to set the argument keepwarnings, in order to save the warnings in an Excel file.

Exceptions

Transposed data: usually data is entered with samples in rows and markers in columns. However, on the contrary some users might enter data the opposite way. That is the case of transposed data. If so, the argument transposed = TRUE is set, or a vector in case of multiple worksheets with at least one worksheet being transposed.

Examples

#datasets are provided by the package

#importing dataset with metadata variables:
infile <- system.file("extdata", "testDatametadata.xlsx", package = "MLMOI")
moimport(infile, nummtd = 3, keepmtd = TRUE)


##more examples are included in 'examples' vignette:

#vignette("examples", package = "MLMOI")

tempMLMOI: An R package for deriving multiplicity of infection (MOI) from molecular data.

Description

The tempMLMOI package provides three functions:

moimport;
moimle;
moimerge.

Details

The package reaches out to scientists that seek to estimate MOI and frequencies at molecular markers using the maximum-likelihood method described i n Schneider and Escalante (2018) and Schneider (2018). Users can import data from ‘.xlsx’ files in various formats, and perform maximum-likelihood estimation on the imported data by the package's mlmoi function.

Types of molecular data

Molecular data can be of types:

microsatellite repeats (STRs);
single nucleotide polymorphisms (SNPs);
amino acids;
codons (base triplets).

Import function

The function moimport, is designed to import molecular data. It imports molecular data in various formats and transforms them into a standard format.

Merging Datasets

Two datasets in standard format can be merged with the function moimerge.

Estimation MOI and frequencies

The function moimle is designed to derive MLE from molecular data in standard format.

Captures all the warnings

Description

Captures all the warnings

Usage

withWarnings(expr)

Arguments

expr

expression to be evaluated.

Value

list with two elements. First is the value of the function. Second is the generated functions by the expression.

MLMOI: An R Package to preprocess molecular data and derive prevalences, frequencies and multiplicity of infection (MOI)

Description

Details

Types of molecular data

Import function

Merging Datasets

Estimation MOI and frequencies

References

Removes punctuation characters and typos from data entries

Description

Usage

Arguments

Value

See Also

Removes punctuation characters and typos from data entries

Description

Usage

Arguments

Value

See Also

Translates the standard ambiguity codes for nucleotides (amino acid decoder)

Description

Usage

Arguments

Value

See Also

Translates the standard ambiguity codes for nucleotides (codon decoder)

Description

Usage

Arguments

Value

See Also

Converts ambeguity codes to represented bases

Description

Usage

Arguments

Value

See Also

Transforms entries to the desired coding class

Description

Usage

Arguments

Value

See Also

Derives MLE

Description

Usage

Arguments

Value

See Also

Derives profile-likelihood MLE of lineage frequencies

Description

Usage

Arguments

Value

See Also

Administrator function

Description

Usage

Arguments

Value

See Also

Finds forbidden sample ID repetitions

Description

Usage

Arguments

Value

See Also

Reports and deletes empty rows/columns

Description

Usage

Arguments

Value

See Also

Exports the dataset in standard format to a new excel file.

Description

Usage

Arguments

Value

See Also