Help for package SomaDataIO

Type:

Package

Title:

Input/Output 'SomaScan' Data

Version:

6.3.0

Description:

Load and export 'SomaScan' data via the 'Standard BioTools, Inc.' structured text file called an ADAT ('*.adat'). For file format see https://github.com/SomaLogic/SomaLogic-Data/blob/main/README.md. The package also exports auxiliary functions for manipulating, wrangling, and extracting relevant information from an ADAT object once in memory.

License:

MIT + file LICENSE

URL:

https://somalogic.github.io/SomaDataIO/, https://somalogic.com

BugReports:

https://github.com/SomaLogic/SomaDataIO/issues

Depends:

R (≥ 4.1.0)

Imports:

cli, dplyr (≥ 1.0.6), ggplot2, lifecycle (≥ 1.0.0), magrittr (≥ 2.0.1), methods, readxl (≥ 1.3.1), tibble (≥ 3.1.2), tidyr (≥ 1.1.3)

Suggests:

Biobase, knitr, purrr, recipes, rlang, rmarkdown, spelling, testthat (≥ 3.0.0), usethis (≥ 2.0.1), withr

VignetteBuilder:

knitr

Standard BioTools, Inc. 2025

Encoding:

UTF-8

Language:

en-US

LazyData:

true

LazyDataCompression:

LazyLoad:

true

Config/testthat/edition:

Config/Needs/website:

tidyverse/tidytemplate

RoxygenNote:

7.3.2

NeedsCompilation:

Packaged:

2025-05-06 16:42:05 UTC; cscheidel

Author:

Stu Field

[aut], Caleb Scheidel [cre], Standard BioTools, Inc. [cph, fnd]

Maintainer:

Caleb Scheidel <calebjscheidel@gmail.com>

Repository:

CRAN

Date/Publication:

2025-05-06 17:00:02 UTC

SomaDataIO: Input/Output 'SomaScan' Data

Description

Details

Load an ADAT file into the global workspace with a call to read_adat(). This function parses the main data table into a data.frame object and assigns the remaining data from the file as object attributes, i.e. call attributes(adat). Other functions in the package are designed to make extracting, manipulating, and wrangling data in the newly created soma_adat object more convenient.

Those familiar with micro-array data analysis and associated packages, e.g. Biobase, will notice that the feature data (proteins) are arranged as columns and the samples (arrays) are the rows. This is the transpose of typical micro-array data. This conflict can be easily solved using the transpose function, t(), which is part of the ⁠base R⁠. In addition, those familiar with the standard ExpressionSet object, available from Bioconductor, might find the functions adat2eSet() and pivotExpressionSet() particularly useful.

Author(s)

Maintainer: Caleb Scheidel calebjscheidel@gmail.com

Authors:

Stu Field stu.g.field@gmail.com (ORCID)

Other contributors:

Standard BioTools, Inc. [copyright holder, funder]

Examples

# a listing of all pkg functions
library(help = SomaDataIO)

# the `soma_adat` class
class(example_data)
is.soma_adat(example_data)

# Annotations Lookup Table
anno_tbl <- getAnalyteInfo(example_data)
anno_tbl

# Find all analytes starting with "MMP" in `anno_tbl`
dplyr::filter(anno_tbl, grepl("^MMP", Target))

Pipe operator

Description

See magrittr::%>% for details.

Usage

lhs %>% rhs

Arguments

lhs

A value or the magrittr placeholder.

rhs

A function call using the magrittr semantics.

Value

The result of calling rhs(lhs).

Analyte Annotations, Col.Meta, and Row Info

Description

In a standard SomaLogic ADAT, the section of information that sits directly above the measurement data (RFU data matrix) is the column meta data (Col.Meta), which contains detailed information and annotations about the analytes, SeqId()s, and their targets. See section below for further information about available fields and their descriptions. Use getAnalyteInfo() to obtain an object containing this information for programmatic analyses, and use getMeta() to obtain the column names representing the row-specific meta data about the samples (see section below).

Col Meta (Analyte Annotations)

Information describing the analytes is found to the above the data matrix in a standard SomaLogic ADAT. This information may consist of the any or all of the following:

Field	Description	Example
SeqId	SomaLogic sequence identifier	2182-54_1
SeqidVersion	Version of SOMAmer sequence	2
SomaId	Target identifier, of the form SLnnnnnn (8 characters in length)	SL000318
TargetFullName	Target name curated for consistency with UniProt name	Complement C4b
Target	SomaLogic Target Name	C4b
UniProt	UniProt identifier(s)	P0C0L4 P0C0L5
EntrezGeneID	Entrez Gene Identifier(s)	720 721
EntrezGeneSymbol	Entrez Gene Symbol names	C4A C4B
Organism	Protein Source Organism	Human
Units	Relative Fluorescence Units	RFU
Type	SOMAmer target type	Protein
Dilution	Dilution mix assignment	0.01%
PlateScale_Reference	PlateScale reference value	1378.85
CalReference	Calibration sample reference value	1378.85
medNormRef_ReferenceRFU	Median normalization reference value	490.342
Cal_V4_`⁠<YY>_<SSS>_<PPP>⁠`	Calibration scale factor (for given Year_Study_Plate)	0.64
ColCheck	QC acceptance criteria across all plates/sets	PASS
QcReference_`⁠<LLLLL>⁠`	QC sample reference value (for given QC lot)	PASS
CalQcRatio_V4_`⁠<YY>_<SSS>_<PPP>⁠`	Post calibration median QC ratio to reference (for given Year_Study_Plate)	1.04

Row Meta (Sample Annotations)

Information describing the samples is typically found to the left of the data matrix in a standard SomaLogic ADAT. This information may consist of clinical information provided by the client, or run-specific diagnostic information included for assay quality control. Below are some examples of what may be present in this section:

Field	Description	Examples
PlateId	Plate identifier	V4-18-004_001, V4-18-004_002
ScannerID	Scanner used to analyze slide	SG12064173, SG14374437
PlatePosition	Location on 96 well plate (A1-H12)	A1, H12
SlideId	Agilent slide barcode	2.58E+11
Subarray	Agilent subarray (1 – 8)	1,8
SampleId	1st form is Subject Identifier, 2nd form (calibrators, buffers)	2031
SampleType	1st form for clinical samples (Sample), 2nd form as above	Sample, QC, Calibrator, Buffer
PercentDilution	Highest concentration the SOMAmer dilution groups	20
SampleMatrix	Sample matrix	Plasma-PPT
Barcode	1D Barcode of aliquot	S622225
Barcode2d	2D Barcode of aliquot	1.91E+08
SampleNotes	Assay team sample observation	Cloudy, Low sample volume, Reddish
SampleDescription	Supplemental sample information	Plasma QC 1
AssayNotes	Assay team run observation	Beads aspirated, Leak/Hole, Smear
TimePoint	Sample time point	Baseline
ExtIdentifier	Primary key for Subarray	EXID40000000032037
SsfExtId	Primary key for sample	EID102733
SampleGroup	Sample group	A, B
SiteId	Collection site	SomaLogic, Covance
TubeUniqueID	Unique tube identifier	1.12E+11
CLI	Cohort definition identifier	CLI6006F001
HybControlNormScale	Hybridization control scale factor	0.948304
RowCheck	Normalization acceptance criteria for all row scale factors	PASS, FLAG
NormScale_0_5	Median signal normalization scale factor (0.5% mix)	1.02718
NormScale_0_005	Median signal normalization scale factor (0.005% mix)	1.119754
NormScale_20	Median signal normalization scale factor (20% mix)	0.996148

Examples

# Annotations/Col.Meta
tbl <- getAnalyteInfo(example_data)
tbl

# Row/sample Meta
r_m <- getMeta(example_data)
head(r_m)

# Normalization Scale Factors
grep("NormScale", r_m, value = TRUE)

# adat subset
example_data[1:3, head(r_m)]

Working with SomaLogic SeqIds

Description

The SeqId is the cornerstone used to uniquely identify SomaLogic analytes. SeqIds follow the format ⁠<Pool>-<Clone>_<Version>⁠, for example "1234-56_7" can be represented as:

Pool	Clone	Version
`1234`	`56`	`7`

See Details below for the definition of each sub-unit. The ⁠<Pool>-<Clone>⁠ combination is sufficient to uniquely identify a specific analyte and therefore versions are no longer provided (though they may be present in legacy ADATs). The tools below enable users to extract, test, identify, compare, and manipulate SeqIds across assay runs and/or versions.

Usage

getSeqId(x, trim.version = FALSE)

regexSeqId()

locateSeqId(x, trailing = TRUE)

seqid2apt(x)

apt2seqid(x)

is.apt(x)

is.SeqId(x)

matchSeqIds(x, y, order.by.x = TRUE)

getSeqIdMatches(x, y, show = FALSE)

Arguments

x

Character. A vector of strings, usually analyte/feature column names, AptNames, or SeqIds. For seqid2apt(), a vector of SeqIds. For apt2seqid(), a character vector containing SeqIds. For matchSeqIds(), a vector of pattern matches containing SeqIds. Can be AptNames with GeneIDs, the seq.XXXX format, or even "naked" SeqIds.

trim.version

Logical. Whether to remove the version number, i.e. "1234-56_7" -> "1234-56". Primarily for legacy ADATs.

trailing

Logical. Should the regular expression explicitly specify trailing SeqId pattern match, i.e. "regex$"? This is the most common case and the default.

y

Character. A second vector of AptNames containing SeqIds to match against those in contained in x. For matchSeqIds() these values are returned if there are matching elements.

order.by.x

Logical. Order the returned character string by the x (first) argument?

show

Logical. Return the data frame visibly?

Details

Pool:	ties back to the original well during SELEX
Clone:	ties to the specific sequence within a pool
Version:	refers to custom modifications (optional/defunct)

AptName: a SeqId combined with a string, usually a GeneId- or seq.-prefix, for convenient, human-readable manipulation from within R.

Value

getSeqId(): a character vector of SeqIds captured from a string.

regexSeqId(): a regular expression (regex) string pre-defined to match SomaLogic the SeqId pattern.

locateSeqId(): a data frame containing the start and stop integer positions for SeqId matches at each value of x.

seqid2apt(): a character vector with the ⁠seq.*⁠ prefix, i.e. the inverse of getSeqId().

apt2seqid(): a character vector of SeqIds. is.SeqId() will return TRUE for all elements.

is.apt(), is.SeqId(): Logical. TRUE or FALSE.

matchSeqIds(): a character string corresponding to values in y of the intersect of x and y. If no matches are found, character(0).

getSeqIdMatches(): a n x 2 data frame, where n is the length of the intersect of the matching SeqIds. The data frame is named by the passed arguments, x and y.

Functions

getSeqId(): extracts/captures the the SeqId match from an analyte column identifier, i.e. column name of an ADAT loaded with read_adat(). Assumes the SeqId pattern occurs at the end of the string, which for the vast majority of cases will be true. For edge cases, see the trailing argument to locateSeqId().
regexSeqId(): generates a pre-formatted regular expression for matching of SeqIds. Note the trailing match, which is most commonly required, but locateSeqId() offers an alternative to mach anywhere in a string. Used internally in many utility functions
locateSeqId(): generates a data frame of the positional SeqId matches. Specifically designed to facilitate SeqId extraction via substr(). Similar to stringr::str_locate().
seqid2apt(): converts a SeqId into anonymous-AptName format, i.e. 1234-56 -> seq.1234.56. Version numbers (⁠1234-56_ver⁠) are always trimmed when present.
apt2seqid(): converts an anonymous-AptName into SeqId format, i.e. seq.1234.56 -> 1234-56. Version numbers (seq.1234.56.ver) are always trimmed when present.
is.apt(): regular expression match to determine if a string contains a SeqId, and thus is probably an AptName format string. Both legacy EntrezGeneSymbol-SeqId combinations or newer so-called "anonymous-AptNames" formats (seq.1234.45) are matched.
is.SeqId(): tests for SeqId format, i.e. values returned from getSeqId() will always return TRUE.
matchSeqIds(): matches two character vectors on the basis of their intersecting SeqIds. Note that elements in y not containing a SeqId regular expression are silently dropped.
getSeqIdMatches(): matches two character vectors on the basis of their intersecting SeqIds only (irrespective of the GeneID-prefix). This produces a two-column data frame which then can be used as to map between the two sets.

The final order of the matches/rows is by the input corresponding to the first argument (x).

By default the data frame is invisibly returned to avoid dumping excess output to the console (see the ⁠show =⁠ argument.)

Author(s)

Stu Field

Examples

x <- c("ABDC.3948.48.2", "3948.88",
       "3948.48.2", "3948-48_2", "3948.48.2",
       "3948-48_2", "3948-88",
       "My.Favorite.Apt.3948.88.9")

tibble::tibble(orig       = x,
               SeqId      = getSeqId(x),
               SeqId_trim = getSeqId(x, TRUE),
               AptName    = seqid2apt(SeqId))

# Logical Matching
is.apt("AGR2.4959.2") # TRUE
is.apt("seq.4959.2")  # TRUE
is.apt("4959-2")      # TRUE
is.apt("AGR2")        # FALSE


# SeqId Matching
x <- c("seq.4554.56", "seq.3714.49", "PlateId")
y <- c("Group", "3714-49", "Assay", "4554-56")
matchSeqIds(x, y)
matchSeqIds(x, y, order.by.x = FALSE)

# vector of features
feats <- getAnalytes(example_data)

match_df <- getSeqIdMatches(feats[1:100], feats[90:500])  # 11 overlapping
match_df

a <- utils::head(feats, 15)
b <- withr::with_seed(99, sample(getSeqId(a)))   # => SeqId & shuffle
(getSeqIdMatches(a, b))                          # sorted by first vector "a"

Deprecated function(s) of the SomaDataIO package

Description

These functions have either been or in the current version of SomaDataIO package. They may eventually be completely removed, so please re-code your scripts accordingly based on the suggestions below:

Function		Now Use
`getSomamers()`		`getAnalytes()`
`getSomamerData()`		`getAnalyteInfo()`

Details

Some badges you may see in SomaDataIO:

Example Data and Objects

Description

The example_data object is intended to provide existing and prospective SomaLogic customers with example data to enable analysis preparation prior to receipt of SomaScan data, and also for those generally curious about the SomaScan data deliverable. It is not intended to be used as a control group for studies or provide any metrics for SomaScan data in general.

Format

example_data

a soma_adat parsed via read_adat() containing 192 samples (see below for breakdown of sample type). There are 5318 columns containing 5284 analyte features and 34 clinical meta data fields. These data have been pre-processed via the following steps:

hybridization normalized (all samples)
calibrators and buffers median normalized
plate scaled
calibrated
Adaptive Normalization by Maximum Likelihood (ANML) of QC and clinical samples

Note1: The Age and Sex (M/F) fields contain simulated values designed to contain biological signal.

**Note2:** The `SampleType` column contains sample source/type information
and usually the `SampleType == Sample` represents the "client" samples.

**Note3:** The original source file can be found at
\url{https://github.com/SomaLogic/SomaLogic-Data}.

ex_analytes

character string of the analyte features contained in the soma_adat object, derived from a call to getAnalytes().

ex_anno_tbl

a lookup table corresponding to a transposed data frame of the "Col.Meta" attribute of an ADAT, with an index key field AptName included in column 1, derived from a call to getAnalyteInfo().

ex_target_names

A lookup table mapping SeqId feature names -> target names contained in example_data. This object (or one like it) is convenient at the console via auto-complete for labeling and/or creating plot titles on the fly.

ex_clin_data

A table containing SampleId, smoking_status, and alcohol_use fields for each clinical sample in example_data used to demonstrate how to merge sample annotation information to an existing soma_adat object.

Data Description

The example_data object contains a SomaScan V4 study from healthy normal individuals. The RFU measurements themselves and other identifiers have been altered to protect personally identifiable information (PII), but also retain underlying biological signal as much as possible. There are 192 total EDTA-plasma samples across two 96-well plate runs which are broken down by the following types:

170 clinical samples (client study samples)
10 calibrators (replicate controls for combining data across runs)
6 QC samples (replicate controls used to assess run quality)
6 Buffer samples (no protein controls)

Data Processing

The standard V4 data normalization procedure for EDTA-plasma samples was applied to this dataset. For more details on the data standardization process see the Data Standardization and File Specification Technical Note. General details are outlined above.

Source

https://github.com/SomaLogic/SomaLogic-Data

Standard BioTools, Inc.

Examples

# S3 print method
example_data

# print header info
print(example_data, show_header = TRUE)

class(example_data)

# Features/Analytes
head(ex_analytes, 20L)

# Feature info table (annotations)
ex_anno_tbl

# Search via `filter()`
dplyr::filter(ex_anno_tbl, grepl("^MMP", Target))

# Lookup table -> targets
# MMP-9
ex_target_names$seq.2579.17

# gender hormone FSH
tapply(example_data$seq.3032.11, example_data$Sex, median)

# gender hormone LH
tapply(example_data$seq.2953.31, example_data$Sex, median)

# Target lookup
ex_target_names$seq.2953.31     # tab-completion at console

# Sample Type/Source
table(example_data$SampleType)

# Sex/Gender Variable
table(example_data$Sex)

# Age Variable
summary(example_data$Age)

Helpers to Extract Information from an ADAT

Description

Retrieve elements of the HEADER attribute of a soma_adat object:

getAdatVersion() determines the the ADAT version number from a parsed ADAT header.

getSomaScanVersion() determines the original SomaScan assay version that generated RFU measurements within a soma_adat object.

checkSomaScanVersion() determines if the version of is a recognized version of SomaScan.

Table of SomaScan assay versions:

Version	Commercial Name	Size
`V4`	5k	5284
`v4.1`	7k	7596
`v5.0`	11k	11083

getSignalSpace() determines the current signal space of the RFU values, which may differ from the original SomaScan signal space if the data have been lifted. See lift_adat() and vignette("lifting-and-bridging", package = "SomaDataIO").

getSomaScanLiftCCC() accesses the lifting Concordance Correlation Coefficients between various SomaScan versions. For more about CCC metrics see lift_adat().

Usage

getAdatVersion(x)

getSomaScanVersion(adat)

getSignalSpace(adat)

checkSomaScanVersion(ver)

getSomaScanLiftCCC(matrix = c("plasma", "serum"))

Arguments

x

Either a soma_adat object with intact attributes or the attributes themselves of a soma_adat object.

adat

A soma_adat object (with intact attributes), typically created using read_adat().

ver

character(1). The SomaScan version as a string. Note: the "v"-prefix is case insensitive.

matrix

Character. A string of (usually) either "serum" or "plasma".

Value

\link[=getAdatVersion]{getAdatVersion()}

The key-value of the Version as a string.

\link[=getSomaScanVersion]{getSomaScanVersion()}

The key-value of the AssayVersion as a string.

\link[=getSignalSpace]{getSignalSpace()}

The key-value of the SignalSpace as a string.

\link[=checkSomaScanVersion]{checkSomaScanVersion()}

Returns NULL (invisibly) if checks pass.

\link[=getSomaScanLiftCCC]{getSomaScanLiftCCC()}

Returns a tibble of either the serum or plasma CCC between various versions of the SomaScan assay.

Author(s)

Stu Field

References

Lin, Lawrence I-Kuei. 1989. A Concordance Correlation Coefficient to Evaluate Reproducibility. Biometrics. 45:255-268.

Examples

getAdatVersion(example_data)

attr(example_data, "Header.Meta")$HEADER$Version <- "99.9"
getAdatVersion(example_data)

ver <- getSomaScanVersion(example_data)
ver

rfu_space <- getSignalSpace(example_data)
rfu_space

is.null(checkSomaScanVersion(ver))

# plasma (default)
getSomaScanLiftCCC()

# serum
getSomaScanLiftCCC("serum")

Convert ADAT to ExpressionSet Object

Description

Utility to convert a SomaLogic soma_adat object to an ExpressionSet object via the Biobase package from Bioconductor: https://www.bioconductor.org/packages/release/bioc/html/Biobase.html.

Usage

adat2eSet(adat)

Arguments

adat

A soma_adat class object as read into the R environment using read_adat().

Details

The Biobase package is required and must be installed from Bioconductor via the following at the R console:

if (!requireNamespace("BiocManager", quietly = TRUE)) {
  install.packages("BiocManager")
}
BiocManager::install("Biobase", version = remotes::bioc_version())

Value

A Bioconductor object of class ExpressionSet.

Author(s)

Stu Field

References

https://bioconductor.org/install/

Examples


eSet <- adat2eSet(example_data)
class(eSet)
eSet

ft <- Biobase::exprs(eSet)
head(ft[, 1:10L], 10L)

Add Attributes to `soma_adat` Objects

Description

Adds a set of attributes, typically "Header.Meta" and "Col.Meta", to a data.frame, tibble, soma_adat or similar tabular object. Existing attributes data are not over-written. Typically untouched are:

names
class
row.names

Usage

addAttributes(data, new.atts)

Arguments

data

The receiving data.frame object for new attributes.

new.atts

A named list object containing new attributes to add to the existing ones.

Value

A data frame object corresponding to data but with the attributes of new.atts grafted on to it. Existing attribute names are not over-written.

Author(s)

Stu Field

Add a Class to an Object

Description

Utility to add (prepend) a class(es) to existing objects.

Usage

addClass(x, class)

Arguments

x

The object to receive new class(es).

class

Character. The name of additional class(es).

Value

An object with new classes.

Author(s)

Stu Field

Examples

class(iris)

addClass(iris, "new") |> class()

addClass(iris, c("A", "B")) |> class()    # 2 classes

addClass(iris, c("A", "data.frame")) |> class()    # no duplicates

addClass(iris, c("data.frame", "A")) |> class()    # re-orders if exists

Calculate MAD Outlier Map

Description

Calculate the median absolute deviation (statistical) outliers measurements and fold-change criteria from an ADAT. Two values are required for the calculation: median absolute deviation (MAD) and fold-change (FC). Outliers are determined based on the result of both 6*MAD and x*FC , where x is the number of fold changes defined.

Usage

calcOutlierMap(
  data,
  anno_tbl = NULL,
  apt.order = c(NA, "dilution", "signal"),
  sample.order = NULL,
  fc.crit = 5
)

## S3 method for class 'outlier_map'
print(x, ...)

Arguments

data

A soma_adat object containing RFU feature data.

anno_tbl

An annotations table produced via getAnalyteInfo(). Used to calculate analyte dilutions for the matrix column ordering. If NULL, a table is generated internally from data (if possible), and the analytes are plotted in dilution order.

apt.order

Character. How should the columns/features be ordered? Options include: by dilution mix ("dilution"), by median overall signal ("signal"), or as-is in data (default).

sample.order

Either a character string indicating the column name with entries to be used to order the data frame rows, or a numeric vector representing the order of the data frame rows. The default (NULL) leaves the row ordering as it is in data.

fc.crit

Integer. The fold change criterion to evaluate. Defaults to 5x.

x

An object of class "outlier_map".

...

Arguments for S3 print methods.

Details

For the S3 plotting method, see plot.Map().

Value

A list of class c("outlier_map", "Map") containing:

matrix

A boolean matrix of TRUE/FALSE whether each sample is an outlier according the the stated criteria.

x.lab

A character string containing the plot x-axis label.

title

A character string containing the plot title.

rows.by.freq

A logical indicating if the samples are ordered by outlier frequency.

class.tab

A table containing the frequencies of each class if input sample.order is defined as a categorical variable.

sample.order

A numeric vector representing the order of the data frame rows.

legend.sub

A character string containing the plot legend subtitle.

Functions

print(outlier_map): There is a S3 print method for class "outlier_map".

Author(s)

Stu Field

Examples

dat <- example_data |> dplyr::filter(SampleType == "Sample")
om <- calcOutlierMap(dat)
class(om)

# S3 print method
om

# `sample.order = "frequency"` orders samples by outlier frequency
om <- calcOutlierMap(dat, sample.order = "frequency")
om$rows.by.freq
om$sample.order

# order samples field in Adat
om <- calcOutlierMap(dat, sample.order = "Sex")
om$sample.order

Calculate Estimated Limit of Detection (eLOD)

Description

Calculate the estimated limit of detection (eLOD) for SOMAmer reagent analytes in the provided input data. The input data should be filtered to include only buffer samples desired for eLOD calculation.

Usage

calc_eLOD(data)

Arguments

data

A soma_adat, data.frame, or tibble object including SeqId columns (seq.xxxxx.xx) containing RFU values.

Details

eLOD is calculated using the following steps:

For each SOMAmer, the median and adjusted median absolute deviation (MAD_{Adjusted}) are calculated, where

MAD_{Adjusted} = 1.4826 * MAD

The 1.4826 is a set constant used to adjust the MAD to be reflective of the standard deviation of the normal distribution.
For each SOMAmer, calculate

eLOD = median + 3.3 * MAD_{Adjusted}

Note: The eLOD is useful for non-core matrices, including cell lysate and CSF, but should be used carefully for evaluating background signal in plasma and serum.

Value

A tibble object with 2 columns: SeqId and eLOD.

Author(s)

Caleb Scheidel, Christopher Dimapasok

Examples

# filter data frame using vector of SampleId controls
df <- withr::with_seed(101, {
  data.frame(
    SampleType = rep(c("Sample", "Buffer"), each = 10),
    SampleId = paste0("Sample_", 1:20),
    seq.20.1.100 = runif(20, 1, 100),
    seq.21.1.100 = runif(20, 1, 100),
    seq.22.2.100 = runif(20, 1, 100)
  )
})
sample_ids <- paste0("Sample_", 11:20)
selected_samples <- df |> filter(SampleId %in% sample_ids)

selected_elod <- calc_eLOD(selected_samples)
head(selected_elod)
## Not run: 
# filter `soma_adat` object to buffer samples
buffer_samples <- example_data |> filter(SampleType == "Buffer")

# calculate eLOD
buffer_elod <- calc_eLOD(buffer_samples)
head(buffer_elod)

# use eLOD to calculate signal to noise ratio of samples
samples_median <- example_data |> dplyr::filter(SampleType == "Sample") |>
  dplyr::summarise(across(starts_with("seq"), median, .names = "median_{col}")) |>
  tidyr::pivot_longer(starts_with("median_"), names_to = "SeqId",
                      values_to = "median_signal") |>
  dplyr::mutate(SeqId = gsub("median_seq", "seq", SeqId))

# analytes with signal to noise > 2
ratios <- samples_median |>
  dplyr::mutate(signal_to_noise = median_signal / buffer_elod$eLOD) |>
  dplyr::filter(signal_to_noise > 2) |>
  dplyr::arrange(desc(signal_to_noise))

head(ratios)

## End(Not run)

Clean Up Character String

Description

Often the names, particularly within soma_adat objects, are messy due to varying inputs, this function attempts to remedy this by removing the following:

trailing/leading/internal whitespace
non-alphanumeric strings (except underscores)
duplicated internal dots (..), (...), etc.
SomaScan normalization scale factor format

Usage

cleanNames(x)

Arguments

x

Character. String to clean up.

Value

A cleaned up character string.

Author(s)

Stu Field

Examples

cleanNames("    sdkfj...sdlkfj.sdfii4994### ")

cleanNames("Hyb..Scale")

Diff Two ADAT Objects

Description

Diff tool for the differences between two soma_adat objects. When diffs of the table values are interrogated, only the intersect of the column meta data or feature data is considered

Usage

diffAdats(adat1, adat2, tolerance = 1e-06)

Arguments

adat1, adat2

Two soma_adat objects to compare.

tolerance

Numeric ⁠> 0⁠. Differences smaller than tolerance are not triggered. See all.equal().

Value

NULL, invisibly. Called for side effects.

Note

Only diffs of the column name intersect are reported.

Author(s)

Stu Field

Examples

# subset `example_data` for speed
# all SeqIds from 2000 -> 2999
seqs <- grep("^seq\\.2[0-9]{3}", names(example_data), value = TRUE)
ex_data_small <- head(example_data[, c(getMeta(example_data), seqs)], 10L)
dim(ex_data_small)

# no diff to itself
diffAdats(ex_data_small, ex_data_small)

# remove random column
rm <- withr::with_seed(123, sample(1:ncol(ex_data_small), 1))
diffAdats(ex_data_small, ex_data_small[, -rm])

# randomly shuffle Subarray
diffAdats(ex_data_small, dplyr::mutate(ex_data_small, Subarray = sample(Subarray)))

# modify 2 RFUs randomly
new <- ex_data_small
new[5L, c(rm, rm + 1L)] <- 999
diffAdats(ex_data_small, new)

Get Analyte Annotation Information

Description

Uses the Col.Meta attribute (analyte annotation data that appears above the protein measurements in the ⁠*.adat⁠ text file) of a soma_adat object, adds the AptName column key, conducts a few sanity checks, and generates a "lookup table" of analyte data that can be used for simple manipulation and indexing of analyte annotation information. Most importantly, the analyte column names of the soma_adat (e.g. seq.XXXX.XX) become the AptName column of the lookup table and represents the key index between the table and soma_adat from which it comes.

Usage

getAnalyteInfo(adat)

getTargetNames(tbl)

getFeatureData(adat)

Arguments

adat

A soma_adat object (with intact attributes), typically created using read_adat().

tbl

A tibble object containing analyte target annotation information. This is usually the result of a call to getAnalyteInfo().

Value

A tibble object with columns corresponding to the column meta data entries in the soma_adat. One row per analyte.

Functions

getTargetNames(): creates a lookup table (or dictionary) as a named list object of AptNames and Target names in key-value pairs. This is a convenient tool to quickly access a TargetName given the AptName in which the key-value pairs map the seq.XXXX.XX to its corresponding TargetName in tbl. This structure which provides a convenient auto-completion mechanism at the command line or for generating plot titles.
getFeatureData(): . Please now use getAnalyteInfo().

Author(s)

Stu Field

Examples

# Get Aptamer table
anno_tbl <- getAnalyteInfo(example_data)
anno_tbl

# Use `dplyr::group_by()`
dplyr::tally(dplyr::group_by(anno_tbl, Dilution))  # print summary by dilution

# Columns containing "Target"
anno_tbl |>
  dplyr::select(dplyr::contains("Target"))

# Rows of "Target" starting with MMP
anno_tbl |>
  dplyr::filter(grepl("^MMP", Target))

# Target names
tg <- getTargetNames(anno_tbl)

# how to use for plotting
feats <- sample(anno_tbl$AptName, 6)
op <- par(mfrow = c(2, 3))
sapply(feats, function(.x) plot(1:10, main = tg[[.x]]))
par(op)

Get Analytes

Description

Return the feature names (i.e. the column names for SOMAmer reagent analytes) from a soma_adat. S3 methods also exist for these classes:

#> [1] getAnalytes.character  getAnalytes.data.frame getAnalytes.default   
#> [4] getAnalytes.list       getAnalytes.matrix     getAnalytes.recipe    
#> [7] getAnalytes.soma_adat 
#> see '?methods' for accessing help and source code

getMeta() returns the inverse, a character vector of string names of non-analyte feature columns/variables, which typically correspond to the clinical ("meta") data variables. S3 methods exist for these classes:

#> [1] getMeta.character  getMeta.data.frame getMeta.default    getMeta.list      
#> [5] getMeta.matrix     getMeta.soma_adat 
#> see '?methods' for accessing help and source code

Usage

getAnalytes(x, n = FALSE, rm.controls = FALSE)

getMeta(x, n = FALSE)

getFeatures(x, n = FALSE, rm.controls = FALSE)

Arguments

x

Typically a soma_adat class object created using read_adat().

n

Logical. Return an integer corresponding to the length of the features?

rm.controls

Logical. Should all control and non-human analytes (e.g. HybControls, Non-Human, Non-Biotin, Spuriomer) be removed from the returned value?

Value

getAnalytes(): a character vector of ADAT feature ("analyte") names.

getMeta(): a character vector of ADAT clinical ("meta") data names.

For both, if n = TRUE, an integer corresponding to the length of the character vector.

Functions

getFeatures(): . Please now use getAnalytes().

Author(s)

Stu Field

Examples

# RFU feature variables
apts <- getAnalytes(example_data)
head(apts)
getAnalytes(example_data, n = TRUE)

# vector string
bb <- getAnalytes(names(example_data))
all.equal(apts, bb)

# create some control sequences
# ~~~~~~~~~ Spuriomer ~~~ HybControl ~~~
apts2 <- c("seq.2053.2", "seq.2171.12", head(apts))
apts2
no_crtl <- getAnalytes(apts2, rm.controls = TRUE)
no_crtl
setdiff(apts2, no_crtl)

# clinical variables
mvec <- getMeta(example_data)
head(mvec, 10)
getMeta(example_data, n = TRUE)

# test 'data.frame' and 'character' S3 methods are identical
identical(getMeta(example_data), getMeta(names(example_data))) # TRUE

Get Flagged Ids From MAD Outlier Map

Description

Return the IDs of flagged samples for objects of the outlier_map class. Samples are flagged based on the percent analytes (RFU columns) for a given sample that were identified as outliers using the median absolute deviation (MAD).

Usage

getOutlierIds(x, flags = 0.05, data = NULL, include = NULL)

Arguments

x

An object of class:

outlier_map - from calcOutlierMap()

flags

Numeric in ⁠[0, 1]⁠. For an "outlier_map", the proportion of the analytes (columns) for a given sample that must be outliers for a flag to be placed at the right-axis, right-axis, thus flagging that sample. If NULL (default), 0.05 (5%) is selected.

data

Optional. The data originally used to create the map x. If omitted, a single column data frame is returned.

include

Optional. Character vector of column name(s) in data to include in the resulting data frame. Ignored if data = NULL.

Value

A data.frame of the indices (idx) of flagged samples, along with any additional variables as specified by include.

Author(s)

Caleb Scheidel

Examples

# flagged outliers
# create a single sample outlier (12)
out_adat <- example_data
apts     <- getAnalytes(out_adat)
out_adat[12, apts] <- out_adat[12, apts] * 10

om <- calcOutlierMap(out_adat)
getOutlierIds(om, out_adat, flags = 0.05, include = c("Sex", "Subarray"))

Group Generics for `soma_adat` Class Objects

Description

S3 group generic methods to apply group specific prototype functions to the RFU data only of soma_adat objects. The clinical meta data are not transformed and remain unmodified in the returned object (Math() and Ops()) or are ignored for the Summary() group. See groupGeneric().

Usage

## S3 method for class 'soma_adat'
Math(x, ...)

antilog(x, base = 10)

## S3 method for class 'soma_adat'
Ops(e1, e2 = NULL)

## S3 method for class 'soma_adat'
Summary(..., na.rm = FALSE)

## S3 method for class 'soma_adat'
e1 == e2

Arguments

x

The soma_adat class object to perform the transformation.

...

Additional arguments passed to the various group generics as appropriate.

base

A positive or complex number: the base with respect to which logarithms are computed.

e1, e2

Objects.

na.rm

Logical. Should missing values be removed?

Value

A soma_adat object with the same dimensions of the input object with the feature columns transformed by the specified generic.

Functions

antilog(): performs the inverse or anti-log transform for a numeric vector of soma_adat object. note: default is base = 10, which differs from the log() default base e.
Ops(soma_adat): performs binary mathematical operations on class soma_adat. See Ops().
Summary(soma_adat): performs summary calculations on class soma_adat. See Summary().
== : compares left- and right-hand sides of the operator unless the RHS is also a soma_adat, in which case diffAdats() is invoked.

Math

Group members:

#>  [1] "abs"      "acos"     "acosh"    "asin"     "asinh"    "atan"    
#>  [7] "atanh"    "ceiling"  "cos"      "cosh"     "cospi"    "cummax"  
#> [13] "cummin"   "cumprod"  "cumsum"   "digamma"  "exp"      "expm1"   
#> [19] "floor"    "gamma"    "lgamma"   "log"      "log10"    "log1p"   
#> [25] "log2"     "sign"     "sin"      "sinh"     "sinpi"    "sqrt"    
#> [31] "tan"      "tanh"     "tanpi"    "trigamma" "trunc"

Commonly used generics of this group include:

log(), log10(), log2(), antilog(), abs(), sign(), floor(), sqrt(), exp()

Ops

Group members:

#>  [1] "+"   "-"   "*"   "^"   "%%"  "%/%" "/"   "=="  ">"   "<"   "!="  "<=" 
#> [13] ">="

Note that for the ⁠`==`⁠ method if the RHS is also a soma_adat, diffAdats() is invoked which compares LHS vs. RHS. Commonly used generics of this group include:

+, -, *, /, ^, ==, >, <

Summary

Group members:

#> [1] "all"   "any"   "max"   "min"   "prod"  "range" "sum"

Commonly used generics of this group include:

max(), min(), range(), sum(), any()

Author(s)

Stu Field

Examples

# subset `example_data` for speed
# all SeqIds from 2000 -> 2999
seqs <- grep("^seq\\.2[0-9]{3}", names(example_data), value = TRUE)
ex_data_small <- head(example_data[, c(getMeta(example_data), seqs)], 10L)
dim(ex_data_small)

ex_data_small$seq.2991.9

# Math Generics:
# -------------
# log-transformation
a <- log(ex_data_small)
a$seq.2991.9

b <- log10(ex_data_small)
b$seq.2991.9
isTRUE(all.equal(b, log(ex_data_small, base = 10)))

# floor
c <- floor(ex_data_small)
c$seq.2991.9

# square-root
d <- sqrt(ex_data_small)
d$seq.2991.9

# rounding
e <- round(ex_data_small)
e$seq.2991.9

# inverse log
antilog(1:4)

alog <- antilog(b)
all.equal(ex_data_small, alog)    # return `b` -> linear space

# Ops Generics:
# -------------
plus1 <- ex_data_small + 1
times2 <- ex_data_small * 2

sq <- ex_data_small^2
all.equal(sqrt(sq), ex_data_small)

gt100k <- ex_data_small > 100000
gt100k

ex_data_small == ex_data_small   # invokes diffAdats()

# Summary Generics:
# -------------
sum(ex_data_small)

any(ex_data_small < 100)  # low RFU analytes

sum(ex_data_small < 100)  # how many

min(ex_data_small)

min(ex_data_small, 0)

max(ex_data_small)

max(ex_data_small, 1e+7)

range(ex_data_small)

Are Attributes Intact?

Description

This function runs a series of checks to determine if a soma_adat object has a complete set of attributes. If not, this indicates that the object has been modified since the initial read_adat() call. Checks for the presence of both "Header.Meta" and "Col.Meta" in the attribute names. These entries are added during the read_adat() call. Specifically, within these sections it also checks for the presence of the following entries:

"Header.Meta" section:: "HEADER", "COL_DATA", and "ROW_DATA"
"Col.Meta" section:: "SeqId", "Target", "Units", and "Dilution"

If any of the above they are altered or missing, FALSE is returned.

is.intact.attributes() is . It remains for backward compatibility and may be removed in the future. You are encouraged to shift your code to is_intact_attr().

Usage

is_intact_attr(adat, verbose = interactive())

is.intact.attributes(adat, verbose = interactive())

Arguments

adat

A soma_adat object to query.

verbose

Logical. Should diagnostic information about failures be printed to the console? If the default, see interactive(), is invoked, only messages via direct calls are triggered. This prohibits messages generated deep in the call stack from bubbling up to the user.

Value

Logical. TRUE if all checks pass, otherwise FALSE.

Examples

# checking attributes
my_adat <- example_data
is_intact_attr(my_adat)           # TRUE
is_intact_attr(my_adat[, -303L])   # doesn't break atts; TRUE
attributes(my_adat)$Col.Meta$Target <- NULL    # break attributes
is_intact_attr(my_adat)  # FALSE (Target missing)

Test `AptName` Format

Description

Test whether an object is in the new seq.XXXX.XX format.

Usage

is_seqFormat(x)

Arguments

x

The object to be tested.

Value

A logical indicating whether x contains AptNames consistent with the new format, beginning with a seq. prefix.

Author(s)

Stu Field, Eduardo Tabacman

Examples

# character S3 method
is_seqFormat(names(example_data))   # no; meta data not ^seq.
is_seqFormat(tail(names(example_data), -20L))   # yes

# soma_adat S3 method
is_seqFormat(example_data)

Lift an ADAT Between Assay Versions

Description

The SomaScan platform continually improves its technical processes between assay versions. The primary change of interest is content expansion, and other protocol changes may be implemented including: changing reagents, liquid handling equipment, and well volumes.

Table of SomaScan assay versions:

Version	Commercial Name	Size
`V4`	5k	5284
`v4.1`	7k	7596
`v5.0`	11k	11083

However, for a given analyte, these technical upgrades can result in minute measurement signal differences, requiring a calibration (aka "lifting" or "bridging") to bring RFUs into a comparable signal space. This is accomplished by applying an analyte-specific scalar, a linear transformation, to each analyte RFU measurement (column). If you have an annotations file (⁠*.xlsx⁠) and wish to examine the bridging scalars themselves, please see read_annotations().

Lifting between SomaScan versions no longer requires an annotations file containing lifting scalars. We now enable users to pass a bridge parameter, indicating the direction of the bridge. For example, to "lift" between ⁠11k⁠ -> ⁠7k⁠, you must be acting on SomaScan data in ⁠11k⁠ RFU space and would pass bridge = "11k_to_7k". Likewise, ⁠7k⁠ -> ⁠5k⁠ requires bridge = "7k_to_5k". Lastly, you may also lift directly from ⁠11k⁠ -> ⁠5k⁠ (aka "double-bridge") with bridge = "11k_to_5k". See below for all options for the bridge argument.

Usage

lift_adat(
  adat,
  bridge = c("11k_to_7k", "11k_to_5k", "7k_to_11k", "7k_to_5k", "5k_to_11k", "5k_to_7k"),
  anno.tbl = deprecated()
)

is_lifted(adat)

Arguments

adat

A soma_adat object (with intact attributes), typically created using read_adat().

bridge

The direction of the lift (i.e. bridge).

anno.tbl

. Please now use the bridge argument.

Details

Matched samples across assay versions are used to calculate bridging scalars. For each analyte, this scalar is computed as the ratio of population medians across assay versions. Please see the lifting vignette vignette("lifting-and-bridging", package = "SomaDataIO") for more details.

Value

lift_adat(): A "lifted" soma_adat object corresponding to the scaling requested in the bridge parameter. RFU values are rounded to 1 decimal place to match standard SomaScan delivery format.

is_lifted(): Logical. Whether the RFU values in a soma_adat have been lifted from its original signal space to a new signal space.

Lin's CCC

The Lin's Concordance Correlation Coefficient (CCC) is calculated by computing the correlation between post-lift RFU values and the RFU values generated on the original SomaScan version. This CCC estimate is a measure of how well an analyte can be bridged across SomaScan versions. See vignette("lifting-and-bridging", package = "SomaDataIO"). As with the lifting scalars, if you have an annotations file you may view the analyte-specific CCC values via read_annotations(). Alternatively, getSomaScanLiftCCC() retrieves these values from an internal object for both "serum" and "plasma".

Analyte Setdiff

Newer versions of SomaScan typically have additional content, i.e. new reagents added to the multi-plex assay that bind to additional proteins. When lifting to a previous SomaScan version, new reagents that do not exist in the "earlier" assay version assay are scaled by 1.0, and thus maintained, unmodified in the returned object. Users may need to drop these columns in order to combine these data with a previous study from an earlier SomaScan version, e.g. with collapseAdats().
In the inverse scenario, lifting "forward" from a previous, lower-plex version, there will be extra reference values that are unnecessary to perform the lift, and a warning is triggered. The resulting data consists of RFU data in the "new" signal space, but with fewer analytes than would otherwise be expected (e.g. ⁠11k⁠ space with only 5284 analytes; see example below).

References

Lin, Lawrence I-Kuei. 1989. A Concordance Correlation Coefficient to Evaluate Reproducibility. Biometrics. 45:255-268.

Examples

# `example_data` is SomaScan (V4, 5k)
adat <- head(example_data, 3L)
dim(adat)

getSomaScanVersion(adat)

getSignalSpace(adat)

# perform 'lift'
lift_11k <- lift_adat(adat, "5k_to_11k")  # warning

is_lifted(lift_11k)

dim(lift_11k)

# attributes updated to reflect the 'lift'
attr(lift_11k, "Header")$HEADER$SignalSpace

attr(lift_11k, "Header")$HEADER$ProcessSteps

Load ADAT files as a list

Description

Load a series of ADATs and return a list of soma_adat objects, one for each ADAT file. collapseAdats() concatenates a list of ADATs from loadAdatsAsList(), while maintaining the relevant attribute entries (mainly the HEADER element). This makes writing out the final object possible without the loss of HEADER information.

Usage

loadAdatsAsList(files, collapse = FALSE, verbose = interactive(), ...)

collapseAdats(x)

Arguments

files

A character string of files to load.

collapse

Logical. Should the resulting list of ADATs be collapsed into a single ADAT object?

verbose

Logical. Should the function call be run in verbose mode.

...

Additional arguments passed to read_adat().

x

A list of soma_adat class objects returned from loadAdatsAsList().

Details

Note 1:

The default behavior is to "vertically bind" (rbind()) on the intersect of the column variables, with unique columns silently dropped.

Note 2:

If "vertically binding" on the column union is desired, use dplyr::bind_rows(), however this results in NAs in non-intersecting columns. For many files with little variable intersection, a sparse RFU-matrix will result (and will likely break ADAT attributes):

adats <- loadAdatsAsList(files)
union_adat <- dplyr::bind_rows(adats, .id = "SourceFile")

Value

A list of ADATs named by files, each a soma_adat object corresponding to an individual file in files. For collapseAdats(), a single, collapsed soma_adat object.

Author(s)

Stu Field

Examples

# only 1 file in directory
dir(system.file("extdata", package = "SomaDataIO"))

files <- system.file("extdata", package = "SomaDataIO") |>
  dir(pattern = "[.]adat$", full.names = TRUE) |> rev()

adats <- loadAdatsAsList(files)
class(adats)

# collapse into 1 ADAT
collapsed <- collapseAdats(adats)
class(collapsed)

# Alternatively use `collapse = TRUE`

  loadAdatsAsList(files, collapse = TRUE)

Merge Clinical Data into SomaScan

Description

Occasionally, additional clinical data is obtained after samples have been submitted to SomaLogic, or even after 'SomaScan' results have been delivered. This requires the new clinical variables, i.e. non-proteomic, data to be merged with 'SomaScan' data into a "new" ADAT prior to analysis. merge_clin() easily merges such clinical variables into an existing soma_adat object and is a simple wrapper around dplyr::left_join().

Usage

merge_clin(x, clin_data, by = NULL, by_class = NULL, ...)

Arguments

x

A soma_adat object (with intact attributes), typically created using read_adat().

clin_data

One of 2 options:

a data frame containing clinical variables to merge into x, or
a path to a file, typically a ⁠*.csv⁠, containing clinical variables to merge into x.

by

A character vector of variables to join by. See dplyr::left_join() for more details.

by_class

If clin_data is a file path, a named character vector of the variable and its class. This ensures the "by-key" is compatible for the join. For example, c(SampleId = "character"). See read.table() for details about its colClasses argument, and also the examples below.

...

Additional parameters passed to dplyr::left_join().

Details

This functionality also exists as a command-line tool (R script) contained in merge_clin.R that lives in the cli/merge system file directory. Please see:

dir(system.file("cli/merge", package = "SomaDataIO"), full.names = TRUE)
vignette("cli-merge-tool", package = "SomaDataIO")

Value

A soma_adat with new clinical variables merged.

Author(s)

Stu Field

Examples

# retrieve clinical data
clin_file <- system.file("cli/merge", "meta.csv",
                         package = "SomaDataIO",
                         mustWork = TRUE)
clin_file

# view clinical data to be merged:
# 1) `group`
# 2) `newvar`
clin_df <- read.csv(clin_file, colClasses = c(SampleId = "character"))
clin_df

# create mini-adat
apts <- withr::with_seed(123, sample(getAnalytes(example_data), 2L))
adat <- head(example_data, 9L) |>   # 9 x 2
  dplyr::select(SampleId, all_of(apts))

# merge clinical variables
merged <- merge_clin(adat, clin_df, by = "SampleId")
merged

# Alternative syntax:
#   1) pass file path
#   2) merge on different variable names
#   3) convert join type on-the-fly
clin_file2 <- system.file("cli/merge", "meta2.csv",
                          package = "SomaDataIO",
                          mustWork = TRUE)

id_type <- typeof(adat$SampleId)
merged2 <- merge_clin(adat, clin_file2,                # file path
                      by = c(SampleId = "ClinKey"),    # join on 2 variables
                      by_class = c(ClinKey = id_type)) # match types
merged2

Common Parameters in SomaDataIO

Description

The parameters below are commonly used throughout the SomaDataIO package.

Arguments

adat

A soma_adat object (with intact attributes), typically created using read_adat().

x

A soma_adat object (with intact attributes), typically created using read_adat().

matrix

Character. A string of (usually) either "serum" or "plasma".

Value

A soma_adat class object.

SomaLogic ADAT parser

Description

Parses the header section of an ADAT file.

Usage

parseHeader(file)

Arguments

file

Character. The elaborated path and file name of the ⁠*.adat⁠ file to be loaded into an R workspace environment.

Value

A list of relevant file information required by read_adat() in order to complete loading the ADAT file, including:

Header.Meta

list of notes and other information about the adat

Col.Meta

list of vectors that contain the column meta data about individual analytes, includes information about the target name and calibration and QC ratios

file_specs

list of values of the file parsing specifications

row_meta

character vector of the clinical variables; assay information that is included in the adat output along with the RFU data

Author(s)

Stu Field

Examples

f <- system.file("extdata", "example_data10.adat",
                 package = "SomaDataIO", mustWork = TRUE)
header <- parseHeader(f)
names(header)

header$Header.Meta

header$file_specs

header$row_meta

head(as.data.frame(header$Col.Meta))

Convert to Long Format

Description

Utility to convert an ExpressionSet class object from the "wide" data format to the "long" format via tidyr::pivot_longer(). The Biobase package is required for this function.

Usage

pivotExpressionSet(eSet)

meltExpressionSet(eSet)

Arguments

eSet

An ExpressionSet class object, created using adat2eSet().

Value

A tibble consisting of the long format conversion of an ExpressionSet object.

Functions

meltExpressionSet(): . Please now use pivotExpressionSet().

Author(s)

Stu Field

Examples


# subset into a reduced mini-ADAT object
# 10 samples (rows)
# 5 clinical variables and 3 features (cols)
sub_adat <- example_data[1:10, c(1:5, 35:37)]
ex_set   <- adat2eSet(sub_adat)

# convert ExpressionSet object to long format
adat_long <- pivotExpressionSet(ex_set)

Plot Image Maps

Description

Plotting function for objects of the outlier_map class. Produces a heatmap-style image using ggplot2 syntax, for objects produced by calcOutlierMap().

Usage

## S3 method for class 'Map'
plot(
  x,
  color.scheme = NULL,
  legend.ticks = 7,
  gridlines = NULL,
  gridlinecol = "red",
  gridlinelwd = 0.5,
  gridlinelty = 2,
  main = NULL,
  y.lab = NULL,
  x.lab = NULL,
  flags = NULL,
  legend.width = 1,
  legend.height = 2,
  filename = NULL,
  plot.width = 14,
  plot.height = 8,
  plot.scale = 1,
  ...
)

Arguments

x

An object of class: outlier_map

color.scheme

Which color scheme to use. Typical choices include:

legend.ticks

How many ticks to place on the color legend.

gridlines

Numeric vector or logical. Indicates where to draw the horizontal grid lines that can be used to separate samples (rows). This should be a vector of the cumulative sum of the horizontal lines to be drawn, typically something like cumsum(table(data$Sex)). Alternatively, TRUE can be passed whereby the lines are determined by the "class.tab" element of x$class.tab (if possible).

gridlinecol

Color of the gridlines.

gridlinelwd

Width of the gridlines.

gridlinelty

Line type of the gridlines.

main

Character. Main title for the plot. See ggplot2::ggtitle() for ggplot2 style graphics.

y.lab

Character. Optional string for the y-axis. Otherwise one is automatically generated (default).

x.lab

Character. Optional string for the x-axis. Otherwise one is automatically generated (default).

flags

legend.width

Width for the color legend.

legend.height

Height for the color legend.

filename

Optional. If provided, the plot will be written to a file. The file name must also include the desired file type extension; this will be used to determine the file type, e.g. a file named foo.png will be saved as a PNG. See ggplot2::ggsave() for a full list of file type (device) options.

plot.width

If "filename != NULL", the width of the plot image file.

plot.height

If "filename != NULL", the height of the plot image file.

plot.scale

If "filename != NULL", the scale of the plot image file.

...

Arguments required by the plot() generic. Currently unused.

Value

Plot an image of the passed matrix.

Author(s)

Stu Field, Amanda Hiser

Examples

example_data |>
  dplyr::filter(SampleType == "Sample") |>
  head(10) |>
  calcOutlierMap() |>
  plot(flags = 0.05)

Pre-Process an ADAT Object for Analysis

Description

Pre-process an ADAT file containing raw analyte RFU values in preparation for analysis. For more details please refer to the pre-processing how-to article

Usage

preProcessAdat(
  adat,
  filter.features = TRUE,
  filter.controls = TRUE,
  filter.qc = TRUE,
  filter.outliers = FALSE,
  data.qc = NULL,
  log.10 = FALSE,
  center.scale = FALSE
)

Arguments

adat

A soma_adat object created using read_adat(), including SeqId columns (seq.xxxxx.xx) containing raw RFU values.

filter.features

Logical. Should non-human protein features (SeqIds) be dropped? Default is TRUE.

filter.controls

Logical. Should SomaScan technical control samples be dropped? If TRUE, this retains all samples where SampleType = "Sample" (study samples) and discards all others including buffer, calibrator, and QC control samples. Default is TRUE.

filter.qc

Logical. If TRUE only samples that pass default normalization acceptance criteria will be retained. Default is TRUE.

filter.outliers

Logical. Should the adat object drop outlier samples? An outlier sample is defined by >= 5% of filtered SeqIds exceeding +/- 6 MAD and 5x fold-change from the median signal. This filter is typically appropriate for studies on plasma, serum, and other biological matrices generally exhibiting homeostatic characteristics. For studies on matrices such as tissue homogenate, cell culture, or study designs containing client-provided background lysis buffer controls (or similar), this filter will likely not be appropriate. Default is FALSE. If set to TRUE it is highly recommended that filter.controls is also set to TRUE

data.qc

Character. Character vector of variable names for which data QC plots are desired. Default is NULL, which does not generate any QC plots. Note: These plots are for visual inspection only, no samples or features are dropped from the output soma_adat object.

log.10

Logical. Should the RFU values be log10 transformed? Default is FALSE.

center.scale

Logical. Should the RFU values be Z-transformed (centered and scaled)? Default is FALSE. If set to set to TRUE it is highly recommended that log.10 is also set to TRUE

Details

The soma_adat object is pre-processed with the following steps:

Filter features -> down to human protein analytes
Filter samples -> by the following order and criteria: a) Retain study samples only (dropping buffer, calibrator, and QC samples) b) Only those that pass default normalization acceptance criteria c) Those not identified as outliers.
Data QC -> plots of normalization scale factors by clinical covariates
Transformations -> log10, center, and scale analyte RFU values

Value

A soma_adat object.

Author(s)

Caleb Scheidel

Examples

preProcessAdat(example_data, data.qc = c("Age", "Sex"))

Read (Load) SomaLogic ADATs

Description

The parse and load a ⁠*.adat⁠ file as a data.frame-like object into an R workspace environment. The class of the returned object is a soma_adat object.

read.adat() is . For backward compatibility it will likely never go away completely, but you are strongly encouraged to shift your code to use read_adat().

is.soma_adat() checks whether an object is of class soma_adat. See inherits().

Usage

read_adat(file, debug = FALSE, verbose = getOption("verbose"), ...)

read.adat(file, debug = FALSE, verbose = getOption("verbose"), ...)

is.soma_adat(x)

Arguments

file

Character. The elaborated path and file name of the ⁠*.adat⁠ file to be loaded into an R workspace.

debug

Logical. Used for debugging and development of an ADAT that fails to load, particularly out-of-spec, poorly modified, or legacy ADATs.

verbose

Logical. Should the function call be run in verbose mode, printing relevant diagnostic call information to the console.

...

Additional arguments passed ultimately to read.delim(), or additional arguments passed to either other S3 print or summary methods as required by those generics.

x

An R object to test.

Value

A data.frame-like object of class soma_adat consisting of SomaLogic RFU (feature) data and clinical meta data as columns, and samples as rows. Row names are labeled with the unique ID "SlideId_Subarray" concatenation. The sections of the ADAT header (e.g., "Header.Meta", "Col.Meta", ...) are stored as attributes (e.g. attributes(x)$Header.Meta).

Logical. Whether x inherits from class soma_adat.

Author(s)

Stu Field

Examples

# path to *.adat file
# replace with your file path
adat_path <- system.file("extdata", "example_data10.adat",
                         package = "SomaDataIO", mustWork = TRUE)
adat_path

my_adat <- read_adat(adat_path)

is.soma_adat(my_adat)

Import a SomaLogic Annotations File

Description

Import a SomaLogic Annotations File

Usage

read_annotations(file)

Arguments

file

A path to an annotations file location. This is a sanctioned, versioned file provided by Standard BioTools, Inc. and should be an unmodified ⁠*.xlsx⁠ file.

Value

A tibble containing analyte-specific annotations and related (e.g. lift/bridging) information, keyed on SomaLogic SeqId, the unique SomaScan analyte identifier.

Examples

## Not run: 
  # for example
  file <- "~/Downloads/SomaScan_11K_Annotated_Content.xlsx"
  anno_tbl <- read_annotations(file)

## End(Not run)

Objects exported from other packages

Description

These objects are imported from other packages. Follow the links below to see their documentation.

dplyr: anti_join, arrange, count, filter, full_join, group_by, inner_join, left_join, mutate, rename, right_join, sample_frac, sample_n, semi_join, slice, slice_sample, ungroup
tidyr: separate, unite

Helpers for Working With Row Names

Description

Easily move row names to a column and vice-versa without the unwanted side-effects to object class and attributes. Drop-in replacement for tibble::rownames_to_column() and tibble::column_to_rownames() which can have undesired side-effects to complex object attributes. Does not import any external packages, modify the environment, or change the object (other than the desired column). When using col2rn(), if explicit row names exist, they are overwritten with a warning. add_rowid() does not affect row names, which differs from tibble::rowid_to_column().

Usage

rn2col(data, name = ".rn")

col2rn(data, name = ".rn")

has_rn(data)

rm_rn(data)

set_rn(data, value)

add_rowid(data, name = ".rowid")

Arguments

data

An object that inherits from class data.frame. Typically a soma_adat class object.

name

Character. The name of the column to move.

value

Character. The new set of names for the data frame. If duplicates exist they are modified on-the-fly via make.unique().

Value

All functions attempt to return an object of the same class as the input with fully intact and unmodified attributes (aside from those required by the desired action). has_rn() returns a scalar logical.

Functions

rn2col(): moves the row names of data to an explicit column whether they are explicit or implicit.
col2rn(): is the inverse of rn2col(). If row names exist, they will be overwritten (with warning).
has_rn(): returns a boolean indicating whether the data frame has explicit row names assigned.
rm_rn(): removes existing row names, leaving only "implicit" row names.
set_rn(): sets (and overwrites) existing row names for data frames only.
add_rowid(): adds a sequential integer row identifier; starting at 1:nrow(data). It does not remove existing row names currently, but may in the future (please code accordingly).

Examples

df <- data.frame(a = 1:5, b = rnorm(5), row.names = LETTERS[1:5])
df
rn2col(df)              # default name is `.rn`
rn2col(df, "AptName")   # pass `name =`

# moving columns
df$mtcars <- sample(names(mtcars), 5)
col2rn(df, "mtcars")   # with a warning

# Move back and forth easily
# Leaves original object un-modified
identical(df, col2rn(rn2col(df)))

# add "id" column
add_rowid(mtcars)

# remove row names
has_rn(mtcars)
mtcars2 <- rm_rn(mtcars)
has_rn(mtcars2)

The `soma_adat` Class and S3 Methods

Description

The soma_adat data structure is the primary internal R representation of SomaScan data. A soma_adat is automatically created via read_adat() when loading a ⁠*.adat⁠ text file. It consists of a data.frame-like object with leading columns as clinical variables and SomaScan RFU data as the remaining variables. Two main attributes corresponding to analyte and SomaScan run information contained in the ⁠*.adat⁠ file are added:

Header.Meta: information about the SomaScan run, see parseHeader() or attr(x, "Header.Meta")
Col.Meta: annotations information about the SomaScan reagents/analytes, see getAnalyteInfo() or attr(x, "Col.Meta")
file_specs: parsing specifications for the ingested ⁠*.adat⁠ file
row_meta: the names of the non-RFU fields. See getMeta().

See groupGenerics() for a details on Math(), Ops(), and Summary() methods that dispatch on class soma_adat.

See reexports() for a details on re-exported S3 generics from other packages (mostly dplyr and tidyr) to enable S3 methods to be dispatched on class soma_adat.

Below is a list of all currently available S3 methods that dispatch on the soma_adat class:

#>  [1] [              [[             [[<-           [<-           
#>  [5] ==             $              $<-            anti_join     
#>  [9] arrange        count          filter         full_join     
#> [13] getAdatVersion getAnalytes    getMeta        group_by      
#> [17] inner_join     is_seqFormat   left_join      Math          
#> [21] median         merge          mutate         Ops           
#> [25] print          rename         right_join     row.names<-   
#> [29] sample_frac    sample_n       select         semi_join     
#> [33] separate       slice_sample   slice          summary       
#> [37] Summary        transform      ungroup        unite         
#> see '?methods' for accessing help and source code

The S3 print() method returns summary information parsed from the object attributes, if present, followed by a dispatch to the tibble::tibble() print method. Rownames are printed as the first column in the print method only.

The S3 summary() method returns the following for each column of the ADAT object containing SOMAmer data (clinical meta data is excluded):

Target (if available)
Minimum value
1st Quantile
Median
Mean
3rd Quantile
Maximum value
Standard deviation
Median absolute deviation (mad())
Interquartile range (IQR())

The S3 Extract() method is used for sub-setting a soma_adat object and relies heavily on the [ method that maintains the soma_adat attributes intact and subsets the Col.Meta so that it is consistent with the newly created object.

S3 extraction via $ is fully supported, however, as opposed to the data.frame method, partial matching is not allowed for class soma_adat.

S3 extraction via [[ is supported, however, we restrict the usage of [[ for soma_adat. Use only a numeric index (e.g. 1L) or a character identifying the column (e.g. "SampleID"). Do not use ⁠[[i,j]]⁠ syntax with [[, use [ instead. As with $, partial matching is not allowed.

S3 assignment via [ is supported for class soma_adat.

S3 assignment via $ is fully supported for class soma_adat.

S3 assignment via [[ is supported for class soma_adat.

S3 median() is not currently supported for the soma_adat class, however a dispatch is in place to direct users to alternatives.

Usage

## S3 method for class 'soma_adat'
print(x, show_header = FALSE, ...)

## S3 method for class 'soma_adat'
summary(object, tbl = NULL, digits = max(3L, getOption("digits") - 3L), ...)

## S3 method for class 'soma_adat'
x[i, j, drop = TRUE, ...]

## S3 method for class 'soma_adat'
x$name

## S3 method for class 'soma_adat'
x[[i, j, ..., exact = TRUE]]

## S3 replacement method for class 'soma_adat'
x[i, j, ...] <- value

## S3 replacement method for class 'soma_adat'
x$i, j, ... <- value

## S3 replacement method for class 'soma_adat'
x[[i, j, ...]] <- value

## S3 method for class 'soma_adat'
median(x, na.rm = FALSE, ...)

Arguments

x, object

A soma_adat class object.

show_header

Logical. Should all the ⁠Header Data⁠ information be displayed instead of the data frame (tibble) object?

...

Ignored.

tbl

An annotations table. If NULL (default), annotation information is extracted from the object itself (if possible). Alternatively, the result of a call to getAnalyteInfo(), from which Target names can be extracted.

digits

Integer. Used for number formatting with signif().

i, j

Row and column indices respectively. If j is omitted, i is used as the column index.

drop

Coerce to a vector if fetching one column via tbl[, j]. Default FALSE, ignored when accessing a column via tbl[j].

name

A name or a string.

exact

Ignored with a warning().

value

A value to store in a row, column, range or cell.

na.rm

a logical value indicating whether NA values should be stripped before the computation proceeds.

Value

The set of S3 methods above return the soma_adat object with the corresponding S3 method applied.

Examples

# S3 print method
example_data

# show the header info (no RFU data)
print(example_data, show_header = TRUE)

# S3 summary method
# MMP analytes (4)
mmps <- c("seq.2579.17", "seq.2788.55", "seq.2789.26", "seq.4925.54")
mmp_adat <- example_data[, c("Sex", mmps)]
summary(mmp_adat)

# Summarize by group
mmp_adat |>
  split(mmp_adat$Sex) |>
  lapply(summary)

# Alternatively pass annotations with Target info
anno <- getAnalyteInfo(mmp_adat)
summary(mmp_adat, tbl = anno)

Scale Transform `soma_adat` Columns/Rows

Description

Scale the i-th row or column of a soma_adat object by the i-th element of a vector. Designed to facilitate linear transformations of only the analyte/RFU entries by scaling the data matrix. If scaling the analytes/RFU (columns), v must have getAnalytes(adat, n = TRUE) elements. If scaling the samples (rows), v must have ⁠nrow(_data)⁠ elements.

Usage

## S3 method for class 'soma_adat'
transform(`_data`, v, dim = 2L, ...)

Arguments

_data

A soma_adat object.

v

A numeric vector of the appropriate length corresponding to dim.

dim

Integer. The dimension to apply elements of v to. 1 = rows; 2 = columns (default).

...

Currently not used but required by the S3 generic.

Details

Performs the following operations (quickly):

Columns:

M_{nxp} = A_{nxp} * diag(v)_{pxp}

Rows:

M_{nxp} = diag(v)_{nxn} * A_{nxp}

Value

A modified value of ⁠_data⁠ with either the rows or columns linearly transformed by v.

Note

This method in intentionally naive, and assumes the user has ordered v to match the columns/rows of ⁠_data⁠ appropriately. This must be done upstream.

Examples

# simplified example of underlying operations
M <- matrix(1:12, ncol = 4)
M

v <- 1:4
M %*% diag(v)    # transform columns

v <- 1:3
diag(v) %*% M    # transform rows

# dummy ADAT example:
v    <- c(2, 0.5)     # double seq1; half seq2
adat <- data.frame(sample      = paste0("sample_", 1:3),
                   seq.1234.56 = c(1, 2, 3),
                   seq.9999.88 = c(4, 5, 6) * 10)
adat

# `soma_adat` to invoke S3 method dispatch
class(adat) <- c("soma_adat", "data.frame")
trans <- transform(adat, v)
data.frame(trans)

Write an ADAT to File

Description

One can write an existing modified internal ADAT (soma_adat R object) to an external file. However the ADAT object itself must have intact attributes, see is_intact_attr().

Usage

write_adat(x, file)

Arguments

x

A soma_adat object (with intact attributes), typically created using read_adat().

file

Character. File path where the object should be written. For example, extensions should be ⁠*.adat⁠.

Details

The ADAT specification no longer requires Windows end of line (EOL) characters (⁠"\r\n"⁠). The current EOL spec is ⁠"\n"⁠ which is commonly used in POSIX systems, like MacOS and Linux. Since the EOL affects the resulting checksum, ADATs written on other systems generate slightly differing files. Standardizing to ⁠"\n"⁠ attempts to solve this issue. For reference, see the EOL encoding for operating systems below:

Symbol	Platform	Character
LF	Linux	`⁠"\n"⁠`
CR	MacOS	`⁠"\r"⁠`
CRLF	DOS/Windows	`⁠"\r\n"⁠`

Value

Invisibly returns the input x.

Author(s)

Stu Field

Examples

# trim to 1 sample for speed
adat_out <- head(example_data, 1L)

# attributes must(!) be intact to write
is_intact_attr(adat_out)

write_adat(adat_out, file = tempfile(fileext = ".adat"))

SomaDataIO: Input/Output 'SomaScan' Data

Description

Details

Author(s)

See Also

Examples

Pipe operator

Description

Usage

Arguments

Value

Analyte Annotations, Col.Meta, and Row Info

Description

Col Meta (Analyte Annotations)

Row Meta (Sample Annotations)

Examples

Working with SomaLogic SeqIds

Description

Usage

Arguments

Details

Value

Functions

Author(s)

See Also

Examples

Deprecated function(s) of the SomaDataIO package

Description

Details

Example Data and Objects

Description

Format

Data Description

Data Processing

Source

Examples

Helpers to Extract Information from an ADAT

Description

Usage

Arguments

Value

Author(s)

References

Examples

Convert ADAT to ExpressionSet Object

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Add Attributes to soma_adat Objects

Description

Usage

Arguments

Value

Author(s)

See Also

Add a Class to an Object

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Calculate MAD Outlier Map

Description

Usage

Arguments

Details

Value

Functions

Author(s)

See Also

Examples

Calculate Estimated Limit of Detection (eLOD)

Add Attributes to `soma_adat` Objects

Group Generics for `soma_adat` Class Objects