Help for package PTXQC

Type:

Package

Title:

Quality Report Generation for MaxQuant and mzTab Results

Version:

1.1.3

Date:

2025-07-15

Description:

Generates Proteomics (PTX) quality control (QC) reports for shotgun LC-MS data analyzed with the MaxQuant software suite (from .txt files) or mzTab files (ideally from OpenMS 'QualityControl' tool). Reports are customizable (target thresholds, subsetting) and available in HTML or PDF format. Published in J. Proteome Res., Proteomics Quality Control: Quality Control Software for MaxQuant Results (2015) <doi:10.1021/acs.jproteome.5b00780>.

SystemRequirements:

pandoc (http://pandoc.org) for building Vignettes and output reports as HTML

Depends:

R (≥ 3.3.0)

Imports:

data.table, ggplot2 (≥ 3.4), ggdendro, grid, gridExtra, grDevices, gtable, htmlTable, knitr (≥ 1.10), magrittr, methods, plyr, R6, R6P, RColorBrewer, reshape2, rlang, rmarkdown, rmzqc (≥ 0.5.0), seqinr, stats, utils, UpSetR, xml2, yaml

Suggests:

testthat

VignetteBuilder:

knitr

License:

BSD_3_clause + file LICENSE

Encoding:

UTF-8

RoxygenNote:

7.3.2

URL:

https://github.com/cbielow/PTXQC

BugReports:

https://github.com/cbielow/PTXQC/issues

NeedsCompilation:

Packaged:

2025-07-15 09:30:55 UTC; bielow

Author:

Chris Bielow [aut, cre], Juliane Schmachtenberg [ctb], Swenja Wagner [ctb], Patricia Scheil [ctb], Tom Waschischek [ctb], Guido Mastrobuoni [dtc, rev]

Maintainer:

Chris Bielow <chris.bielow@bsc.fu-berlin.de>

Repository:

CRAN

Date/Publication:

2025-07-15 12:30:02 UTC

PTXQC: A package for computing Quality Control (QC) metrics for Proteomics (PTX)

Description

The following sections describe the main components:

Input

Valid input data are either the files from MaxQuant's .txt folder (all versions from MaxQuant >= 1.0 upwards are supported) or a single mzTab file. All mzTab files will work, but most metrics can be obtained from OpenMS' mzTab as produced by the QualityControl TOPP tool (from OpenMS 2.5 onwards).

Important functions

The central function of this package is called createReport and it accepts either MaxQuant or mzTab data, along with a configuration (optional). There is a parser for mzTab MzTabReader and MaxQuant txt files MQDataReader, as well as a plethora of QC metrics derived from a common qcMetric class and scoring functions qual..., e.g. qualGaussDev.

Configuration

The user can modify the behaviour of PTXQC, e.g. to enable/disable certain metrics or change scoring thresholds, via a YAML object/file. By default a Yaml file is written automatically side-by-side to the input files upon running PTXQC for the first time on a particular input. A custom Yaml object can be passed to the main createReport function for customization. Use yaml::yaml.load_file(input = 'myYAML.yaml') to load an existing file and pass the Yaml object along.

Output

Either a PDF and/or Html report which contains QC plots and a description of the metrics.

Author(s)

Maintainer: Chris Bielow chris.bielow@bsc.fu-berlin.de

Other contributors:

Juliane Schmachtenberg [contributor]
Swenja Wagner [contributor]
Patricia Scheil [contributor]
Tom Waschischek [contributor]
Guido Mastrobuoni [data contributor, reviewer]

A string concatenation function, more readable than 'paste()'.

Description

A string concatenation function, more readable than 'paste()'.

Usage

a %+% b

Arguments

a

Char vector

b

Char vector

Value

Concatenated string (no separator)

Coefficient of variation (CV)

Description

Computes sd(x) / mean(x)

Usage

CV(x)

Arguments

x

Vector of numeric values

Value

Make sure to call $readMappingFile(some_file) if you want to support a user-defined file mapping. Otherwise, calls to $getShortNames() will create/augment the mapping for filenames.

Description

Make sure to call $readMappingFile(some_file) if you want to support a user-defined file mapping. Otherwise, calls to $getShortNames() will create/augment the mapping for filenames.

Fields

raw_file_mapping: Data.frame with columns 'from', 'to' and maybe 'best.effort' (if shorting was unsuccessful)
mapping.creation: how the current mapping was obtained (user or auto)
external.mapping.file: Filename of user-defined mapping file; only defined if readMappingFile() was called

Methods

getShortNamesStatic(raw.files, max_len, fallbackStartNr = 1)

Static method: Shorten a set of Raw file names and return a data frame with the mappings. Mapping will have: $from, $to and optionally $best.effort (if shorting was unsuccessful and numbers had to be used)

⁠raw.files⁠ Vector of Raw files.
⁠max_len⁠ Maximal length of shortening results, before resorting to canonical names (file 1,...).
⁠fallbackStartNr⁠ Starting index for canonical names.

Return Value

data.frame with mapping.

Examples

a = FilenameMapper$new()
a$readMappingFile('filenamemapping.txt')

Compute longest common substring of two strings.

Description

Implementation is very inefficient (dynamic programming in R) –> use only on small instances

Usage

LCS(s1, s2)

Arguments

s1

String one

s2

String two

Value

String containing the longest common substring

Find longest common substring from 'n' strings.

Description

Warning: greedy heuristic! This is not guaranteed to find the best solution (or any solution at all), since its done pairwise with the shortest input string as reference.

Usage

LCSn(strings, min_LCS_length = 0)

Arguments

strings

A vector of strings in which to search for LCS

min_LCS_length

Minimum length expected. Empty string is returned if the result is shorter

Value

longest common substring (or "" if shorter than min_LCS_length)

Examples

LCSn(c("1_abcde...",
       "2_abcd...",
       "x_abc..."))  ## --> "_abc"
LCSn(c("16_IMU008_CISPLA_E5_R11", 
       "48_IMU008_CISPLA_P4_E7_R31",
       "60_IMU008_CISPLA_E7_R11"), 3) ## -->"_IMU008_CISPLA_"
LCSn(c("AAAAACBBBBB", 
       "AAAAADBBBBB",
       "AAAABBBBBEF",
       "AAABBBBBDGH")) ## -->  "BBBBB"
LCSn(c("AAAXXBBB",
       "BBBXXDDD",
       "XXAAADDD")) ## --> fails due to greedy approach; should be "XX"

S5-RefClass to read MaxQuant .txt files

Description

This class is used to read MQ data tables using MQDataReader::readMQ() while holding the internal raw file –> short raw file name mapping (stored in a member called 'fn_map') and updating/using it every time MQDataReader::readMQ() is called.

Arguments

file

(Relative) path to a MQ txt file.

filter

Searched for "C" and "R". If present, [c]ontaminants and [r]everse hits are removed if the respective columns are present. E.g. to filter both, filter = "C+R"

type

Allowed values are: "pg" (proteinGroups) [default], adds abundance index columns (*AbInd*, replacing 'intensity') "sm" (summary), splits into three row subsets (raw.file, condition, total) "ev" (evidence), will fix empty modified.sequence cells for older MQ versions (when MBR is active) "msms_scans", will fix invalid (negative) scan event numbers Any other value will not add/modify any columns

col_subset

A vector of column names as read by read.delim(), e.g., spaces are replaced by dot already. If given, only columns with these names (ignoring lower/uppercase) will be returned (regex allowed) E.g. col_subset=c("^lfq.intensity.", "protein.name")

add_fs_col

If TRUE and a column 'raw.file' is present, an additional column 'fc.raw.file' will be added with common prefix AND common substrings removed (simplifyNames) E.g. two rawfiles named 'OrbiXL_2014_Hek293_Control', 'OrbiXL_2014_Hek293_Treated' will give 'Control', 'Treated' If add_fs_col is a number AND the longest short-name is still longer, the names are discarded and replaced by a running ID of the form 'file <x>', where <x> is a number from 1 to N. If the function is called again and a mapping already exists, this mapping is used. Should some raw.files be unknown (ie the mapping from the previous file is incomplete), they will be augmented

check_invalid_lines

After reading the data, check for unusual number of NA's to detect if file was corrupted by Excel or alike

LFQ_action

[For type=='pg' only] An additional custom LFQ column ('cLFQ...') is created where zero values in LFQ columns are replaced by the following method IFF(!) the corresponding raw intensity is >0 (indicating that LFQ is erroneusly 0) "toNA": replace by NA "impute": replace by lowest LFQ value >0 (simulating 'noise')

...

Additional parameters passed on to read.delim()

colname

Name of the column (e.g. 'contaminants') in the mq.data table

valid_entries

Vector of values to be replaced (must contain all values expected in the column – fails otherwise)

replacements

Vector of values inserted with the same length as valid_entries.

Details

Since MaxQuant changes capitalization and sometimes even column names, it seemed convenient to have a function which just reads a txt file and returns unified column names, irrespective of the MQ version. So, it unifies access to columns (e.g. by using lower case for ALL columns) and ensures columns are identically named across MQ versions:

 alternative term          new term
 -----------------------------------------
 protease                  enzyme
 protein.descriptions      fasta.headers
 potential.contaminant     contaminant
 mass.deviations           mass.deviations..da.
 basepeak.intensity        base.peak.intensity

We also correct 'reporter.intensity.*' naming issues to MQ 1.6 convention, when 'reporter.intensity.not.corrected' is present. MQ 1.5 uses: reporter.intensity.X and reporter.intensity.not.corrected.X MQ 1.6 uses: reporter.intensity.X and reporter.intensity.corrected.X

Note: you must find a regex which matches both versions, or explicitly add both terms if you are requesting only a subset of columns!

Fixes for msmsScans.txt: negative Scan Event Numbers in msmsScans.txt are reconstructed by using other columns

Automatically detects UTF8-BOM encoding and deals with it (since MQ2.4).

Example of usage:

  mq = MQDataReader$new()
  d_evd = mq$readMQ("evidence.txt", type="ev", filter="R", col_subset=c("proteins", "Retention.Length", "retention.time.calibration"))

If the file is empty, this function shows a warning and returns NULL. If the file is present but cannot be read, the program will stop.

Wrapper to read a MQ txt file (e.g. proteinGroups.txt).

Value

A data.frame of the respective file

Replaces values in the mq.data member with (binary) values. Most MQ tables contain columns like 'contaminants' or 'reverse', whose values are either empty strings or "+", which is inconvenient and can be much better represented as TRUE/FALSE. The params valid_entries and replacements contain the matched pairs, which determine what is replaced with what.

Returns TRUE if successful.

Methods

getInvalidLines()

Detect broken lines (e.g. due to Excel import+export)

When editing a MQ txt file in Microsoft Excel, saving the file can cause it to be corrupted, since Excel has a single cell content limit of 32k characters (see http://office.microsoft.com/en-001/excel-help/excel-specifications-and-limits-HP010342495.aspx) while MQ can easily reach 60k (e.g. in oxidation sites column). Thus, affected cells will trigger a line break, effectively splitting one line into two (or more).

If the table has an 'id' column, we can simply check the numbers are consecutive. If no 'id' column is available, we detect line-breaks by counting the number of NA's per row and finding outliers. The line break then must be in this line (plus the preceeding or following one). Depending on where the break happened we can also detect both lines right away (if both have more NA's than expected).

Currently, we have no good strategy to fix the problem since columns are not aligned any longer, which leads to columns not having the class (e.g. numeric) they should have. (thus one would need to un-do the linebreak and read the whole file again)

[Solution to the problem: try LibreOffice 4.0.x or above – seems not to have this limitation]

@return Returns a vector of indices of broken (i.e. invalid) lines

Class to read an mzTab file and store the tables internally.

Description

The 'sections' field is initialized after $readMzTab was called. The 'fn_map' fields should be initialized via ...$fn_map$readMappingFile(...) manually if user-defined filename mappings are desired and is automatically updated/queried when $readMzTab is called.

Fields

sections: MzTab sections as list. Valid list entries are: "MTD", "PRT", "PEP", "PSM", "SML", "filename" and "comments"
fn_map: FilenameMapper which can translate raw filenames into something shorter

Methods

RTUnitCorrection(dt): Convert all RT columns from seconds (OpenMS default) to minutes (MaxQuant default)
getEvidence(): Basically the PSM table and additionally columns named 'raw.file' and 'fc.raw.file'.
getMSMSScans(identified_only = FALSE): Basically the PSM table (partially renamed columns) and additionally two columns 'raw.file' and 'fc.raw.file'. If identified_only is TRUE, only MS2 scans which were identified (i.e. a PSM) are returned – this is equivalent to msms.txt in MaxQuant.
getParameters(): Converts internal mzTab metadata section to a two column key-value data.frame similar to MaxQuants parameters.txt.
getProteins(): Basically the PRT table ...
getSummary(): Converts internal mzTab metadata section to a two data.frame with columns 'fc.raw.file', 'ms.ms.identified....' similar to MaxQuants summary.txt.
renameColumns(dt, namelist): Renames all columns and throws a warning if a column does not exist in the data

Quality control for proteomics data

Description

Quality control for proteomics data

Define a Singleton class which holds the full raw filenames (+path) and their PSI-MS CV terms for usage in the mzQC metadata

Description

The internal data is filled using, e.g. 'getMetaFilenames()'

Super class

R6P::Singleton -> QCMetaFilenames

Public fields

data: Stores the data of the singleton. Set the data once before using the singleton all over the place

Methods

Public methods

QCMetaFilenames$clone()

Inherited methods

R6P::Singleton$initialize()

Method `clone()`

The objects of this class are cloneable with this method.

Usage

QCMetaFilenames$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Relative standard deviation (RSD)

Description

Simply CV*100

Usage

RSD(x)

Arguments

x

Vector of numeric values

Value

RSD

Return a tree plot with a possible alignment tree.

Description

This allows the user to judge which Raw files have similar corrected RT's (i.e. where aligned successfully). If there are clear sub-clusters, it might be worth introducing artifical fractions into MaxQuant, to avoid ID-transfer between these clusters (use the MBR-Align and MBR-ID-Transfer metrics to support the decision).

Usage

RTalignmentTree(df_evd, col_fraction = c())

Arguments

df_evd

Evidence table containing calibrated retention times and sequence information.

col_fraction

Empty vector or 1-values vector giving the name of the fraction column (if existing)

Details

If the input contains fractions, leaf nodes will be colored accordingly. Distinct sub-clusters should have their own color. If not, MaxQuant's fraction settings should be optimized. Note that introducing fractions in MaxQuant will naturally lead to a clustering here (it's somewhat circular).

Value

ggplot object containing the correlation tree

Compute the fraction of features per Raw file which have an acceptable RT difference after alignment

Description

Using the result from 'alignmentCheck()', score the features of every Raw file and see if they have been properly aligned. Returned value is between 0 (bad) and 1 (all aligned).

Usage

ScoreInAlignWindow(data, allowed.deltaRT = 1)

Arguments

data

A data.frame with columns 'rtdiff' and 'raw.file'

allowed.deltaRT

The allowed matching difference (1 minute by default)

Value

A data.frame with one row for each raw.file and columns 'raw.file' and 'withinRT' (0-1)

Query a YAML object for a certain parameter.

Description

If the object has the param, then return it. If the param is unknown, create it with the given default value and return the default.

Fields

yamlObj: A Yaml object as created by yaml.load

Methods

getYAML(param_name, default, min = NA, max = NA): Query this YAML object for a certain parameter and return its value. If it does not exist it is created with a default value. An optional min/max range can be specified and will be enforced if the value is known (default will be used upon violation).
setYAML(param_name, value): Set a YAML parameter to a certain value. Overwrites the old value or creates a new entry if hithero unknown.
writeYAML(filename): Write YAML config (including some documentation) to a YAML file. Returns TRUE on success (always), unless writing the file generates an error.

Examples

    yc = YAMLClass$new(list())
    val = yc$getYAML("cat$subCat", "someDefault")
    val  ## someDefault
    val = yc$setYAML("cat$subCat", "someValue")
    val  ## someValue
    yc$getYAML("cat$subCat", "someDefault") ## still 'someValue' (since its set already)

Verify an alignment by checking the retention time differences of identical peptides across Raw files

Description

The input is a data frame containing feature evidence with corrected retention times, e.g. a 'calibrated.retention.time' column.

Usage

alignmentCheck(data, referenceFile)

Arguments

data

A data.frame with columns 'calibrated.retention.time', 'id', 'modified.sequence', 'charge', 'raw.file' and 'fraction' (if present)

referenceFile

A raw file name as occuring in data$raw.file, serving as alignment reference (when no fractions are used).

Details

Note that this function must be given real MS/MS identifications only (type "MULTI-MSMS") in order to work correctly!

For each peptide sequence (and charge) in the reference Raw file, this function looks up the already calibrated retention time difference of the same feature in all other files. For every comparison made, we report the RT difference. If alignment worked perfectly, the differences are very small (<1 min).

An 'id' column must be present, to enable mapping the result of this function to the original data frame.

A reference Raw file can be identified using 'findAlignReference()'. If Maxquants experimental design included pre-fractionation, a column named 'fraction' should be given and 'referenceFile' should be empty. This function will pick the one Raw file for each fraction (the first in order) to use as reference. Only the immediately neighbouring fractions will be matched to this reference.

Value

A data.frame containing the RT diff for each feature found in a Raw file and the reference.

Add the value of a variable to an environment (fast append)

Description

The environment must exist, and its name must be given as string literal in 'env_name'! The value of the variable 'v' will be stored under the name given in 'v_name'. If 'v_name' is not given, a variable name will be created by increasing an internal counter and using the its value padded with zeros as name (i.e., "0001", "0002" etc).

Usage

appendEnv(env_name, v, v_name = NULL)

Arguments

env_name

String of the environment variable

v

Value to be inserted

v_name

String used as variable name. Automatically generated if omitted.

Value

Always TRUE

Collects all 'mzQC' members from each entry in lst_qcMetrics and stores them in an overall mzQC object, which can be written to disk (see writeMZQC()) or augmented otherwise

Description

Collects all 'mzQC' members from each entry in lst_qcMetrics and stores them in an overall mzQC object, which can be written to disk (see writeMZQC()) or augmented otherwise

Usage

assembleMZQC(lst_qcMetrics, raw_file_mapping)

Arguments

lst_qcMetrics

A list of qcMetric objects which have their mzQC member populated with "MzQCrunQuality" and/or "MzQCsetQuality" objects

raw_file_mapping

A data.frame with cols 'from', to' and maybe 'best.effort' (if shorting was unsuccessful), as e.g. obtained by a FilenameMapper$raw_file_mapping

Value

An MzQCmzQC object (root object of an mzQC document)

Assign set numbers to a vector of values.

Description

Each set has size set_size (internally optimized using correctSetSize), holding values from 'values'. This gives n such sets and the return value is just the set index for each value.

Usage

assignBlocks(values, set_size = 5, sort_values = TRUE)

Arguments

values

Vector of values

set_size

Number of distinct values allowed in a set

sort_values

Before assigning values to sets, sort the values?

Value

Vector (same length as input) with set numbers

Examples

 #library(PTXQC)
 assignBlocks(c(1:11, 1), set_size = 3, sort_values = FALSE)
 ## --> 1 1 1 2 2 2 3 3 3 4 4 1

Boxplots - one for each condition (=column) in a data frame.

Description

Given a data.frame with two/three columns in long format (name, value, [contaminant]; in that order), each group (given from 1st column) is plotted as a bar. Contaminants (if given) are separated and plotted as yellow bars.

Usage

boxplotCompare(
  data,
  log2 = TRUE,
  ylab = "intensity",
  mainlab = ylab,
  sublab = "",
  boxes_per_page = 30,
  abline = NA,
  coord_flip = TRUE,
  names = NA
)

Arguments

data

Data frame in long format with numerical expression data

log2

Apply log2 to the data (yes/no)

ylab

Label on Y-axis

mainlab

Main title

sublab

Sub title

boxes_per_page

Maximum number of boxplots per plot. Yields multiple plots if more groups are given.

abline

Draw a horizontal green line at the specified y-position (e.g. to indicate target median values)

coord_flip

Exchange Y and X-axis for better readability

names

An optional data.frame(long=.., short=..), giving a renaming scheme (long->short) for the 'name' column

Details

Boxes are shaded: many NA or Inf lead to more transparency. Allows to easily spot sparse groups

Value

List of ggplot objects

Return color brew palettes, but fail hard if number of requested colors is larger than the palette is holding.

Description

Internally calls 'brewer.pal(n, palette)', checking 'n' beforehand.

Usage

brewer.pal.Safe(n = 3, palette = "Set1")

Arguments

n

Number of colours

palette

Name of palette (e.g. "set1")

Value

character vector of colors

Calls FUN on a subset of data in blocks of size 'subset_size' of unique indices.

Description

One subset consists of 'subset_size' unique groups and thus of all rows which in 'data' which have any of these groups. The last subset might have less groups, if the number of unique groups is not dividable by subset_size.

Usage

byX(data, indices, subset_size = 5, FUN, sort_indices = TRUE, ...)

Arguments

data

Data.frame whose subsets to use on FUN

indices

Vector of group assignments, same length as nrow(data)

subset_size

Number of groups to use in one subset

FUN

Function applied to subsets of data

sort_indices

Sort groups (by their sorted character(!) names) before building subsets

...

More arguments to FUN

Details

FUN is applied on each subset.

Value

list of function result (one entry for each subset)

Examples

 byX(data.frame(d=1:10), 1:10, 2, sum)

Same as `byX`, but with more flexible group size, to avoid that the last group has only a few entries (<50% of desired size).

Description

The 'subset_size' param is internally optimized using correctSetSize and then byX is called.

Usage

byXflex(data, indices, subset_size = 5, FUN, sort_indices = TRUE, ...)

Arguments

data

Data.frame whose subset to use on FUN

indices

Vector of group assignments, same length as nrow(data)

subset_size

Ideal number of groups to use in one subset – this can be changed internally, from 75%-150%

FUN

function Applied to subsets of data

sort_indices

Groups are formed by their sorted character(!) names

...

More arguments to FUN

Value

list of function result (one entry for each subset)

Examples

 stopifnot(
   byXflex(data.frame(d=1:10), 1:10, 2, sum, sort_indices = FALSE) ==
   c(3, 7, 11, 15, 19)
 )

When MaxQuant is run with a wrong locale (i.e. the decimal separator is not a '.', but a ','), then MaxQuant results are plainly wrong and broken. The can be detected by, e.g. checking for negative charge annotation

Description

When MaxQuant is run with a wrong locale (i.e. the decimal separator is not a '.', but a ','), then MaxQuant results are plainly wrong and broken. The can be detected by, e.g. checking for negative charge annotation

Usage

checkEnglishLocale(df_evd)

Arguments

df_evd

Evidence table from which we only need the 'charge' column

Combine several data structs into a final picture for segmentation incurred by 'Match-between-runs'.

Description

qMBRSeg_Dist_inGroup might be empty if there are only singlets (transferred and genuine), but then the scores will be pretty boring as well (100

Usage

computeMatchRTFractions(qMBR, qMBRSeg_Dist_inGroup)

Arguments

qMBR

A data.frame as computed by peakSegmentation()

qMBRSeg_Dist_inGroup

A data.frame as computed by inMatchWindow()

Value

A data.frame which details the distribution of singlets and pairs (inRT and outRT) for each Raw file and genuine vs. all

Re-estimate a new set size to split a number of items into equally sized sets.

Description

This is useful for plotting large datasets where multiple pages are needed. E.g. you know that you need 101 barplots, but you only want to fit about 25 per page. Naively one would now do five plots, with the last one only containing a single barplot. Using this function with correctSetSize(101, 25) would tell you to use 26 barplots per page, so you end up with four plots, all roughly equally filled. It also works the other extreme case, where your initial size is chosen slightly too high, e.g. Sets of size 5 for just 8 items is too much, because we can reduce the set size to 4 and still need two sets but now they are much more equally filled (correctSetSize(8, 5) == 4).

Usage

correctSetSize(item_count, initial_set_size)

Arguments

item_count

Known number of items which need to assigned to sets

initial_set_size

Desired number of items a single set should hold

Details

We allow for up to set sizes of 150% from default, to avoid the last set being sparse (we remove it and distribute to the other bins) 150 Once the number of sets is fixed, we distribute all items equally.

E.g. 6 items & initial_set_size=5, would result in 2 bins (5 items, 1 item), but we'd rather have one bin of 6 items or 8 items & initial_set_size=5, would result in 2 bins (5+3 items), since the last set is more than half full, but we'd rather have 4+4

Value

re-estimated set size which a set should hold in order to avoid underfilled sets

Examples

 stopifnot(
   correctSetSize(8, 5) == 4
 )
 stopifnot(
   correctSetSize(101, 25) == 26
 )

Create a quality control report (in PDF format).

Description

This is the main function of the package and the only thing you need to call directly if you are just interested in getting a QC report.

Usage

createReport(
  txt_folder = NULL,
  mztab_file = NULL,
  yaml_obj = list(),
  report_filenames = NULL,
  enable_log = FALSE
)

Arguments

txt_folder

Path to txt output folder of MaxQuant (e.g. "c:/data/Hek293/txt")

mztab_file

Alternative to **txt_folder**, you can provide a single mzTab file which contains PSM, PEP and PRT tables

yaml_obj

A nested list object with configuration parameters for the report. Useful to switch off certain plots or skip entire sections.

report_filenames

Optional list with names (as generated by getReportFilenames). If not provided, will be created internally by calling getReportFilenames.

enable_log

If TRUE all console output (including warnings and errors) is logged to the file given in **report_filenames$log_file**. Note: warnings/errors can only be shown in either the log **or** the console, not both!

Details

You need to provide either a) the folder name of the 'txt' output, as generated by MaxQuant or an mzTab file or b) an mzTab file as generated by the OpenMS QualityControl TOPP tool (other mzTab files will probably not work)

Optionally, provide a YAML configuration object, which allows to (de)activate certain plots and holds other parameters. The yaml_obj is complex and best obtained by running this function once using the default (empty list). A full YAML configuration object will be written in the 'txt' folder you provide and can be loaded using yaml.load.

The PDF and the config file will be stored in the given txt folder.

Value

List with named filename strings, e.g. $yaml_file, $report_file etc..

Note

You need write access to the txt/mzTab folder!

For updates, bug fixes and feedback please visit https://github.com/cbielow/PTXQC.

Creates a yaml file storing the parameters that are used for creating the PTXQC report and returns these parameters as well as a list of available qc-Metrics objects.

Description

Valid parameters are: param_useMQPAR, add_fs_col, id_rate_bad, id_rate_great , pg_ratioLabIncThresh , param_PG_intThresh, param_EV_protThresh , param_EV_intThresh, param_EV_pepThresh , yaml_contaminants, param_EV_MatchingTolerance, param_evd_mbr , param_EV_PrecursorTolPPM, param_EV_PrecursorOutOfCalSD , param_EV_PrecursorTolPPMmainSearch, param_MSMSScans_ionInjThresh, param_OutputFormats and param_PageNumbers

Please provide them as a list() of this format: list$parameter_name

Usage

createYaml(
  yc,
  param = list(),
  DEBUG_PTXQC = FALSE,
  txt_files = NULL,
  metrics = NULL
)

Arguments

yc

A yaml class object created by YAMLClass$new()

param

list of parameters sorted by names; if empty, will be populated with defaults

DEBUG_PTXQC

print some debugging information; default FALSE

txt_files

list of paths to MaxQuant files; if NULL, it is assumed that the parameters are for mzTab-mode

metrics

list of metric names that should be plotted; if NULL, will be populated with defaults

Value

list of parameters used for creating the report and list of qc-Metrics objects

Make a color (given as name or in RGB) darker by factor x = [0 = black, 1=unchanged]

Description

Make a color (given as name or in RGB) darker by factor x = [0 = black, 1=unchanged]

Usage

darken(color, factor = 0.8)

Arguments

color

A color as understood by col2rgb

factor

Between 0 (make black) and 1 (leave color as is)

Value

darkened color

Replace 0 with NA in a vector

Description

Replace 0 with NA in a vector

Usage

del0(x)

Arguments

x

A numeric vector

Value

Vector of same size as 'x', with 0's replaced by NA

Removes the longest common prefix (LCP) from a vector of strings.

Description

You should provide only unique strings (to increase speed). If only a single string is given, the empty string will be returned unless minOutputLength is set.

Usage

delLCP(x, min_out_length = 0, add_dots = FALSE)

Arguments

x

Vector of strings with common prefix

min_out_length

Minimal length of the shortest element of x after LCP removal [default: 0, i.e. empty string is allowed] . If the output would be shorter, the last part of the LCP is kept.

add_dots

Prepend output with '..' if shortening was done.

Value

Shortened vector of strings

Examples

  delLCP(c("TK12345_H1"), min_out_length=0)
  ## ""

  delLCP(c("TK12345_H1"), min_out_length=4)
  ## "5_H1"

  delLCP(c("TK12345_H1"), min_out_length=4, add_dots = TRUE)
  ## "..5_H1"

  delLCP(c("TK12345_H1", "TK12345_H2"), min_out_length=4)
  ## "5_H1" "5_H2"

  delLCP(c("TK12345_H1", "TK12345_H2"), min_out_length=4, add_dots = TRUE)
  ## "..5_H1" "..5_H2"

  delLCP(c("TK12345_H1", "TK12345_H2"), min_out_length=8)
  ## "12345_H1", "12345_H2"

  delLCP(c("TK12345_H1", "TK12345_H2"), min_out_length=8, add_dots = TRUE)
  ## "TK12345_H1", "TK12345_H2" (unchanged, since '..' would add another two)

  delLCP(c("TK12345_H1", "TK12345_H2"), min_out_length=60)
  ## "TK12345_H1", "TK12345_H2" (unchanged)

  delLCP(c("TK12345_H1", "TK12345_H2"), min_out_length=60, add_dots = TRUE)
  ## "TK12345_H1", "TK12345_H2" (unchanged)

Removes the longest common suffix (LCS) from a vector of strings.

Description

Removes the longest common suffix (LCS) from a vector of strings.

Usage

delLCS(x)

Arguments

x

Vector of strings with common suffix

Value

Shortened vector of strings

Examples

delLCS(c("TK12345_H1"))                     ## ""
delLCS(c("TK12345_H1", "TK12345_H2"))       ## "TK12345_H1" "TK12345_H2" 
delLCS(c("TK12345_H1", "TK12!45_H1"))       ## "TK123"    "TK12!"

Return list of raw file names which were reported by MaxQuant as reference point for alignment.

Description

There is only one reference point which has '0' in 'retention.time.calibration' column in evidence.txt as corrected RT. This is true for most MaxQuant versions and also true for fractions. However, some evidence.txt files show 0.03 as an averaged minimum per Raw file. We use the raw.file with the smallest average as reference.

Usage

findAlignReference(data)

Arguments

data

The data.frame with columns 'retention.time.calibration' and 'raw.file'

Details

Note that MaxQuant uses a guide tree to align the Raw files, so the order of files does not influence the alignment. But the first file will always be used as reference point when reporting delta-RTs. And this file is also used by PTXQC as reference file vs all other files to find the real calibration function (see alignmentCheck()).

This function might return multiple raw file names (if MQ decides to change its mind at some point in the future). In this case the result should be treated with caution or (better) regarded as failure.

Value

List of reference raw files (usually just one)

Detect (and fix) MaxQuant mass recalibration columns, since they sometimes report wrong values.

Description

Returns a list of items for both diagnostics and possibly a fixed evidence data.frame. Also two strings with messages are returned, which can serve as user message for pre and post calibration status.

Usage

fixCalibration(
  df_evd,
  df_idrate = NULL,
  tolerance_sd_PCoutOfCal = 2,
  low_id_rate = 1
)

Arguments

df_evd

Evidence data.frame with columns ()

df_idrate

Data.frame from summary.txt, giving ID rates for each raw file (cols: "ms.ms.identified....", "fc.raw.file"). Can also be NULL.

tolerance_sd_PCoutOfCal

Maximal standard deviation allowed before considered 'failed'

low_id_rate

Minimum ID rate in Percent before a Raw file is considered 'failed'

Value

list of data (stats, affected_raw_files, df_evd, recal_message, recal_message_post)

Flatten lists of lists with irregular depths to just a list of items, i.e. a list of the leaves (if you consider the input as a tree).

Description

Flatten lists of lists with irregular depths to just a list of items, i.e. a list of the leaves (if you consider the input as a tree).

Usage

flattenList(x)

Arguments

x

List of 'stuff' (could be lists or items or a mix)

Value

A flat list

Assign a relative abundance class to a set of (log10) abundance values

Description

Abundances (should be logged already) are grouped into different levels, starting from the smallest values ("low") to the highest values ("high"). Intermediate abundances are either assigned as "mid", or "low-mid". If the range is too large, only "low" and "high" are assigned, the intermediate values are just numbers.

Usage

getAbundanceClass(x)

Arguments

x

Vector of numeric values (in log10)

Details

Example: getAbundanceClass(c(12.4, 17.1, 14.9, 12.3)) ## –> factor(c("low", "high", "mid", "low"))

Value

Vector of factors corresponding to input with abundance class names (e.g. low, high)

Estimate the empirical density and return it

Description

Estimate the empirical density and return it

Usage

getECDF(samples, y_eval = (1:100)/100)

Arguments

samples

Vector of input values (samples from the distribution)

y_eval

Vector of points where CDF is evaluated (each percentile by default)

Value

Data.frame with columns 'x', 'y'

Examples

  plot(getECDF(rnorm(1e4)))

Determine if a file is 'UTF-8' or 'UTF-8-BOM' (as of MQ2.4) or 'UTF-16BE' or 'UTF-16LE'

Description

Determine if a file is 'UTF-8' or 'UTF-8-BOM' (as of MQ2.4) or 'UTF-16BE' or 'UTF-16LE'

Usage

getFileEncoding(filename)

Arguments

filename

Relative or absolute path to a file

Value

” if the file does not exist or is not readable

Extract fragment mass deviation errors from a data.frame from msms.txt

Description

Given a data.frame as obtainable from a msms.txt with - a 'mass.analyzer' column which contains only a single value for the whole column - a 'mass.deviations..da.' and (if available) 'mass.deviations..ppm.' - a 'masses' column (only required if 'mass.deviations..ppm.' is unavailable and the mass.analyzer indicates hig-res data)

Usage

getFragmentErrors(x, recurse = 0)

Arguments

x

Data frame in long format with numerical expression data

recurse

Internal usage only. Leave at 0 when calling.

Details

Mass deviations are extracted from the columns, e.g. each cell containing values separated by semicolons is split into single values. The appropriate unit is chosen (Da or ppm, depending on ITMS or FTMS data). Also the fragmentation type can be used: CID indicates ITMS, HCD to FTMS. This is not 100

Sometimes, peptides are identified purely based on MS1, i.e. have no fragments. These will be ignored.

If ppm mass deviations are not available, errors in Da will be converted to ppm using the corresponding mass values.

Value

Data frame with mass errors ('msErr') and their 'unit' (Da or ppm) or NULL (if no fragments were given)

Create an HTML table with an extra header row

Description

Create an HTML table with an extra header row

Usage

getHTMLTable(data, caption = NA)

Arguments

data

A data.frame which serves as table

caption

A set of headlines, e.g. c("top line", "bottom line")

Value

table as html character string for cat()'ing into an html document

Examples

  data = data.frame(raw.file = letters[1:4],
                    id.rate = 3:6)
  getHTMLTable(data, 
               caption = "some header line")

Retrieve a parameter value from a mqpar.xml file

Description

If the file has the param, then return it as string. If the file is missing, warning is shown and NULL is returned. If the param (i.e. XML tag) is unknown or cannot be extracted, the program will quit (since this is a hard error). When multiple occurrences of the param are found (usually due to parameter groups), we test if the values are all identical. If so, the value is returned. If the values are different, a warning is emitted and NULL is returned unless 'allow_multiple = TRUE'

Usage

getMQPARValue(mqpar_filename, xpath, allow_multiple = FALSE)

Arguments

mqpar_filename

Filename (incl. absolute or relative path) to the mqpar.xml file

xpath

An XPath to extract the content of XML tag(s), e.g. '//firstSearchTol'

allow_multiple

If the XPath expression returns more than one value, all values must be identical (not allowing multiple different values) or 'stop()' is called

Details

E.g. calling getMQPARValue("mqpar.xml", "//firstSearchTol") will look up the line <firstSearchTol>20</firstSearchTol> and return "20" (string!).

Value

The stored value as string(!)

Find the local maxima in a vector of numbers.

Description

A vector of booleans is returned with the same length as input which contains TRUE when there is a maximum. Simply sum up the vector to get the number of maxima.

Usage

getMaxima(x, thresh_rel = 0.2)

Arguments

x

Vector of numbers

thresh_rel

Minimum relative intensity to maximum intensity of 'x' required to be a maximum (i.e., a noise threshold). Default is 20%.

Value

Vector of bool's, where TRUE indicates a local maximum.

Examples

    r = getMaxima(c(1,0,3,4,5,0))                                
    all(r == c(TRUE,FALSE,FALSE,FALSE,TRUE,FALSE))
    
    getMaxima(c(1, NA, 3, 2, 3, NA, 4, 2, 5))

Extract meta information (orderNr, metric name, category) from a list of Qc metric objects

Description

Extract meta information (orderNr, metric name, category) from a list of Qc metric objects

Usage

getMetaData(lst_qcMetrics)

Arguments

lst_qcMetrics

List of qcMetrics

Value

data.frame with columns 'name', 'order' and 'cat' (category)

Parses the given mqpar.xml file (or, if not found, tries the 'txt_folder' + '/../../' folder (i.e. where the raw data should be)) to extract the full filepaths for all Raw files

Description

Parses the given mqpar.xml file (or, if not found, tries the 'txt_folder' + '/../../' folder (i.e. where the raw data should be)) to extract the full filepaths for all Raw files

Usage

getMetaFilenames(mqpar_file, txt_folder)

Arguments

mqpar_file

Location of the mqpar.xml (can be empty, if unknown)

txt_folder

Fallback option: path to the txt folder (which contains evidence.txt, etc)

Value

May return 'NULL' if no mqpar.xml could be found. Otherwise: data.frame with columns:

'file' (no path), 'path' (full path incl. names)
'file_no_suffix' (as 'file' but without suffix)
'CV' (CV term for filetype, e.g. for Thermo Raw)

Get all currently available metrics

Description

Get all currently available metrics

Usage

getMetricsObjects(DEBUG_PTXQC = FALSE)

Arguments

DEBUG_PTXQC

Use qc objects from the package (FALSE) or from environment (TRUE/DEBUG)

Value

List of matric objects

Create a principal component analysis (PCA) plot for the first two dimensions.

Description

Create a principal component analysis (PCA) plot for the first two dimensions.

Usage

getPCA(data, do_plot = TRUE, connect_line_order = NA, gg_layer)

Arguments

data

Matrix(!) where each row is one high-dimensional point, with ncol dimensions, e.g. a mouse as an array of proteinexpressions rownames(data) give classes for colouring (can be duplicates in matrices, as opposed to data.frames)

do_plot

Show PCA plot? if ==2, then shows correlations plot as well

connect_line_order

Connect points by lines, the order is given by this vector. Default: NA (no lines)

gg_layer

More parameters added to a ggplot object (ggplot(x) + gg_layer)

Value

[invisible] Named list with "PCA": The PCA object as returned by prcomp, access $x for PC values and "plots": list of plot objects (one or two)

Examples

n = 5
m = 10
data = matrix(runif(n * m), nrow = n, ncol = m)
rownames(data) = 1:n
getPCA(data, connect_line_order = 1:n, gg_layer = ggplot2::ggtitle("test"))

Extract the number of peptides observed per Raw file from an evidence table.

Description

Required columns are "fc.raw.file", "modified.sequence" and "is.transferred".

Usage

getPeptideCounts(df_evd)

Arguments

df_evd

Data.frame of evidence.txt as read by MQDataReader

Details

If match-between-runs was enabled during the MaxQuant run, the data.frame returned will contain separate values for 'transferred' evidence plus an 'MBRgain' column, which will give the extra MBR evidence in percent.

Value

Data.frame with columns 'fc.raw.file', 'counts', 'category', 'MBRgain'

Extract the number of protein groups observed per Raw file from an evidence table.

Description

Required columns are "protein.group.ids", "fc.raw.file" and "is.transferred".

Usage

getProteinCounts(df_evd)

Arguments

df_evd

Data.frame of evidence.txt as read by MQDataReader

Details

Value

Data.frame with columns 'fc.raw.file', 'counts', 'category', 'MBRgain'

Generate a Heatmap from a list of QC measurements.

Description

Each list entry is a data.frame with two columns. The first one contains the Raw file name (or the short version). and should be named 'raw.file' (or 'fc.raw.file'). The second column's name must be an expression (see ?plotmath) and contains quality values in the range [0,1]. If values are outside this range, a warning is issued and values are cut to the nearest allowed value (e.g. '1.2' becomes '1'). List entries are merged and columns are ordered by name.

All substrings enclosed by 'X[0-9]*X.' will be removed (can be used for sorting columns). The resulting string is evaluated as an expression. E.g. parse(text = <colname>)

Usage

getQCHeatMap(lst_qcMetrics, raw_file_mapping)

Arguments

lst_qcMetrics

List of QCMetric objects

raw_file_mapping

Data.frame with 'from' and 'to' columns for name mapping to unify names from list entries

Details

To judge the overall quality of each raw file a summary column is added, values being the mean of all other columns per row.

Value

A ggplot object for printing

Assembles a list of output file names, which will be created during reporting.

Description

You can combine **report_name_has_folder** (and **mzTab_filename** for mzTab files) to obtain report filenames which are even more robust to moving around (since they contain infixes of the mzTab filename and the folder), e.g. '@em 'report_HEK293-study_myProjects.html“, where the input was ‘mzTab_filename=’HEK293-study.mzTab‘ and 'folder=’c:/somePath/myProjects/'.

Usage

getReportFilenames(
  folder,
  report_name_has_folder = TRUE,
  mzTab_filename = NULL
)

Arguments

folder

Directory where the MaxQuant output (txt folder) or the mzTab file resides

report_name_has_folder

Boolean: Should the report files (html, pdf) contain the name of the deepest(=last) subdirectory in **txt_folder** which is not 'txt'? Useful for discerning different reports in a PDF viewer. E.g. when flag is FALSE: 'report_v0.91.0.html'; and 'report_v0.91.0_bloodStudy.html' when flag is TRUE (and the txt folder is '.../bloodStudy/txt/' or '...bloodStudy/')

mzTab_filename

If input is an mzTab, specify its name, so that the filenames can use its basename as infix E.g. when ‘mzTab_filename = ’HEK293-study.mzTab'' then the output will be 'report_HEK293-study.html'. This allows to get reports on multiple mzTabs in the same folder without overwriting report results.

Value

List of output file names (just names, no file is created) with list entries: **yaml_file**, **heatmap_values_file**, **R_plots_file**, **filename_sorting**, **mzQC_file**, **log_file**, **report_file_prefix**, **report_file_PDF**, **report_file_HTML**

Get an mzQC runQuality without actual metrics, but with full metadata

Description

Get an mzQC runQuality without actual metrics, but with full metadata

Usage

getRunQualityTemplate(fc.raw.file, raw_file_mapping)

Arguments

fc.raw.file

For which run

raw_file_mapping

A data.frame with cols 'from', 'to' and maybe 'best.effort' (if shorting was unsuccessful), as e.g. obtained by a FilenameMapper$raw_file_mapping

Value

An MzQCrunQuality object

Function to thin out the number of labels shown on an axis in GGplot

Description

By default, 20 labels (or up to 40 see below) are shown. If the number of items is less than twice the number of desired labels, all labels will be shown (to avoid irregular holes for some labels). I.e. if n=20, and x has 22 entries, there would be only two labels removed, giving a very irregular picture. It only becomes somewhat regular if after any label there is at least one blank, i.e. at most half the entries are labeled. #' Example: ## p is any ggplot object p + scale_y_discrete(breaks = ggAxisLabels) ## customize 'n' my.ggAxisLabels = function(x) ggAxisLabels(x, n = 4) p + scale_y_discrete(breaks = my.ggAxisLabels)

Usage

ggAxisLabels(x, n = 20)

Arguments

x

Vector of labels (passed by GGplot)

n

Number of labels to show

Value

Shortened version of 'x'

Plot a text as graphic using ggplot2.

Description

Plot a text as graphic using ggplot2.

Usage

ggText(title, text, col = "black")

Arguments

title

The title of the plot

text

Centered text, can contain linebreaks

col

Colour of text (excluding the title)

Value

ggplot object

Grep with values returned instead of indices.

Description

The parameter 'value' should not be passed to this function since it is passed internally already.

Usage

grepv(reg, data, ...)

Arguments

reg

regex param

data

container

...

other params forwarded to grep()

Value

values of data which matched the regex

Examples

  grepv("x", c("abc", "xyz"))
  ## --> "xyz"

Check how close transferred ID's after alignment are to their genuine IDs within one Raw file.

Description

The input is a data.frame containing feature evidence with corrected retention times, e.g. a 'calibrated.retention.time' column.

Usage

idTransferCheck(df_evd_all)

Arguments

df_evd_all

A data.frame with columns 'type', 'calibrated.retention.time', 'modified.sequence', 'charge', 'raw.file'

Details

Note that this function must be given MS/MS identifications of type "MULTI-MSMS" and "MSMS-MATCH". It will stop() otherwise.

We compare for each peptide sequence (and charge) the RT difference within groups of either genuine as well as mixed pairs. For every comparison made, we report the RT span If alignment worked perfectly, the span are very small (<1 min), for the mixed group, i.e. the pairs are accidentally split 3D peaks. Alignment performance has no influence on the genuine-only groups.

Note: We found early MaxQuant versions (e.g. 1.2.2.5) to have an empty 'modified.sequence' column for 'MULTI-MATCH' entries. The sequence which SHOULD be present is equal to the immediate upper row. This is what we use to guess the sequence. However, this relies on the data.frame not being subsetted before (we can sort using the 'id' column)!

Value

A data.frame containing the RT diff for each ID-group found in a Raw file (bg = genuine).

For grouped peaks: separate them into in-width vs. out-width class.

Description

Looking at groups only: Compute the fraction of 3D-peak pair groups per Raw file which have an acceptable RT difference after alignment using the result from 'idTransferCheck()', i.e. compute the fraction of groups which are within a certain RT tolerance.

Usage

inMatchWindow(data, df.allowed.deltaRT)

Arguments

data

A data.frame with columns 'fc.raw.file', 'rtdiff_mixed', 'rtdiff_genuine'

df.allowed.deltaRT

The allowed matching difference for each Raw file (as data.frame(fc.rawfile, m))

Details

Returned value is between 0 (bad) and 1 (all within tolerance).

Value

A data.frame with one row for each raw.file and columns 'raw.file' and score 'withinRT' (0-1)

Count the number of chars of the longest common prefix

Description

Count the number of chars of the longest common prefix

Usage

lcpCount(x)

Arguments

x

Vector of strings with common prefix

Value

Length of LCP

Count the number of chars of the longest common suffix

Description

Count the number of chars of the longest common suffix

Usage

lcsCount(x)

Arguments

x

Vector of strings with common suffix

Value

Length of LCS

Get the longest common prefix from a set of strings.

Description

Input is converted to character (e.g. from factor) first.

Usage

longestCommonPrefix(strings)

Arguments

strings

Vector of strings

Value

Single string - might be empty ("")

Examples

  longestCommonPrefix(c("CBA.321", "CBA.77654", ""))    ## ""
  longestCommonPrefix(c("CBA.321", "CBA.77654", "CB"))  ## "CB"
  longestCommonPrefix(c("ABC.123", "ABC.456"))          ## "ABC."
  longestCommonPrefix(c("nothing", "in", "common"))     ## ""

Like longestCommonPrefix(), but on the suffix.

Description

Like longestCommonPrefix(), but on the suffix.

Usage

longestCommonSuffix(strings)

Arguments

strings

Vector of strings

Value

Single string - might be empty ("")

Examples


 longestCommonSuffix(c("123.ABC", "45677.ABC", "BC"))  ## "BC"
 longestCommonSuffix(c("123.ABC", "", "BC"))           ## ""
 longestCommonSuffix(c("123.ABC", "45677.ABC"))        ## ".ABC"
 longestCommonSuffix(c("nothing", "in", "common"))     ## ""

Convert list of (mixed)modifications to a frequency table

Description

Convert list of (mixed)modifications to a frequency table

Usage

modsToTable(mod_list)

Arguments

mod_list

A vector with modifications, each for a specific peptide. Multiple mods per entry are allowed, each separated by comma.

Value

A data.frame with 'modification_names' and 'Freq' (0-100)

Examples

modsToTable(c("Ox (M)", 
              "Unmodified",
              "Ox (M),Acetyl (Prot N-term)",
              "2 Ox (M)",
              "Unmodified",
              "Unmodified"))

Convert list of (mixed)modifications to a frequency table

Description

Convert list of (mixed)modifications to a frequency table

Usage

modsToTableByRaw(
  df_evd,
  name_unmod = "Unmodified",
  name_unmod_inverse = "Modified (total)"
)

Arguments

df_evd

data.frame with 'fc.raw.file' and a 'modifications' column, which contains the modifications for each peptide.

name_unmod

String in 'modifications' which represents an unmodified peptide

name_unmod_inverse

If non-empty, then inverse the frequencies of the 'name_unmod' modifications (i.e. 100-x) IFF they are >=50% on average (across Raw files) and rename them to this string

Value

A data.table with 'fc.raw.file', 'modification_names' (factor), and 'Freq' (0-100)

Examples

data = data.frame(fc.raw.file = rep(c("file A", "file B"),
                                each = 3),
       modifications = c("Oxidation (M)",
                        "Unmodified",
                        "Oxidation (M),Acetyl (Protein N-term)",
                        "2 Oxidation (M)", 
                        "Unmodified", "Unmodified"))
modsToTableByRaw(data)

Prepare a Mosaic plot of two columns in long format.

Description

Found at http://stackoverflow.com/questions/19233365/how-to-create-a-marimekko-mosaic-plot-in-ggplot2 Modified (e.g. to pass R check)

Usage

mosaicize(data)

Arguments

data

A data.frame with exactly two columns

Details

Returns a data frame, which can be used for plotting and has the following columns: 'Var1' - marginalized values from 1st input column 'Var2' - marginalized values from 2nd input column 'Freq' - relative frequency of the combination given in [Var1, Var2] 'margin_var1' - frequency of the value given in Var1 'var2_height' - frequency of the value given in Var2, relative to Var1 'var1_center' - X-position when plotting (large sets get a larger share)

Value

Data.frame

Examples

  data = data.frame(raw.file = c(rep('file A', 100), rep('file B', 40)),
                    charge = c(rep(2, 60), rep(3, 30), rep(4, 10),
                               rep(2, 30), rep(3, 7), rep(4, 3)))
  mosaicize(data)

paste with newline as separator

Description

paste with newline as separator

Usage

pasten(...)

Arguments

...

Arguments forwarded to paste()

Value

return value of paste()

Examples

  pasten("newline","separated")
  ## --> "newline\nseparated"

paste with tab as separator

Description

paste with tab as separator

Usage

pastet(...)

Arguments

...

Arguments forwarded to paste()

Value

return value of paste()

Examples

  pastet("tab","separated")
  ## --> "tab\tseparated"

Determine fraction of evidence which causes segmentation, i.e. sibling peaks at different RTs confirmed either by genuine or transferred MS/MS.

Description

Sometimes, MQ splits a feature into 2 or more if the chromatograpic conditions are not optimal and there is a drop in RT intensity. If both features contain successful MS/MS scans, we will find the same peptide twice (with slightly different RT) in the same charge state. This constitutes a natively split peak and is rare (95

Usage

peakSegmentation(df_evd_all)

Arguments

df_evd_all

A data.frame of evidences containing the above columns

Details

If Match-between-runs is used and the RT alignment is not perfect, then a peptide might be inferred at a wrong RT position, even though this Raw file already contains MS/MS evidence of this peptide. Usually the number of peak duplicates rises drastically (e.g. only 75 In most cases, the RT is too far off to be a split peak. It's rather a lucky hit with accidentally the same mass-to-charge, and thus the intensity is random. To find by how much these peak pairs differ in RT, use idTransferCheck() and inMatchWindow().

Required columns are 'is.transferred', 'fc.raw.file', 'modified.sequence', 'charge', 'type'.

Note that this function must be given MS/MS identifications of type "MULTI-MSMS" and "MSMS-MATCH". It will stop() otherwise.

Value

A data.frame with one row per Raw file and three columns: 1) 2) 3)

Discretize RT peak widths by averaging values per time bin.

Description

Should be applied for each Raw file individually.

Usage

peakWidthOverTime(data, RT_bin_width = 2)

Arguments

data

Data.frame with columns 'retention.time' and 'retention.length'

RT_bin_width

Bin size in minutes

Details

Returns a data.frame, where 'bin' gives the index of each bin, 'RT' is the middle of each bin and 'peakWidth' is the averaged peak width per bin.

Value

Data.frame with columns 'bin', 'RT', 'peakWidth'

Examples

  data = data.frame(retention.time = seq(30,200, by=0.001)) ## one MS/MS per 0.1 sec
  data$retention.length = seq(0.3, 0.6, length.out = nrow(data)) + rnorm(nrow(data), 0, 0.1)
  d = peakWidthOverTime(data)
  plot(d$RT, d$peakWidth)

Plot a table with row names and title

Description

Restriction: currently, the footer will be cropped at the table width.

Usage

plotTable(
  data,
  title = "",
  footer = "",
  col_names = colnames(data),
  fill = c("grey90", "grey70"),
  col = "black",
  just = "centre"
)

Arguments

data

A data.frame with columns as described above

title

Table title

footer

Footer text

col_names

Column names for Table

fill

Fill pattern (by row)

col

Text color (by column)

just

(ignored)

Value

gTree object with class 'PTXQC_table'

Examples

  data = data.frame(raw.file = letters[1:4],
                    id.rate = 3:6)
  plotTable(data, 
            title = "Bad files",
            footer = "bottom", 
            col_names = c("first col", "second col"),
            col=c("red", "green"))

Colored table plot.

Description

Code taken from http://stackoverflow.com/questions/23819209/change-text-color-for-cells-using-tablegrob-in-r

Usage

plotTableRaw(data, colours = "black", fill = NA, just = "centre")

Arguments

data

Table as Data.frame

colours

Single or set of colours (col-wise)

fill

Cell fill (row-wise)

just

(ignored)

Value

gTable

Plot bargraph of uncalibrated mass errors for each Raw file.

Description

Boxes are optionally colored to indicate that a MQ bug was detected or if PTXQC detected a too narrow search window.

Usage

plot_CalibratedMSErr(
  data,
  MQBug_raw_files,
  stats,
  y_lim,
  extra_limit = NA,
  title_sub = ""
)

Arguments

data

A data.frame with columns 'fc.raw.file', 'mass.error..ppm.'

MQBug_raw_files

List of Raw files with invalid calibration values

stats

A data.frame with columns 'fc.raw.file', 'outOfCal'

y_lim

Range of y-axis

extra_limit

Position where a v-line is plotted (for visual guidance)

title_sub

Subtitle

Value

GGplot object

Examples

  n = c(150, 1000, 1000, 1000)
  data = data.frame(fc.raw.file = repEach(letters[4:1], n),
                    mass.error..ppm. = c(rnorm(n[1], 1, 2.4),
                                         rnorm(n[2], 0.5, 0.5),
                                         rnorm(n[3], 0.1, 0.7),
                                         rnorm(n[4], 0.3, 0.8)))
  stats = data.frame(fc.raw.file = letters[4:1],
                     sd = c(2.4, 0.5, 0.7, 0.8),
                     outOfCal = c(TRUE, FALSE, FALSE, FALSE))           
  plot_CalibratedMSErr(data, MQBug_raw_files = letters[1], stats, y_lim = c(-20,20), 15, "subtitle")

The plots shows the charge distribution per Raw file. The output of 'mosaicize()' can be used directly.

Description

The input is a data.frame with columns 'Var1' - name of the Raw file 'Var2' - charge (used as fill color) 'Var1_center' - contains X-position of the Raw file 'Var2_height' - relative frequency of the charge 'Margin_var1' - where each row represents one peptide sequence.

Usage

plot_Charge(d_charge)

Arguments

d_charge

A data.frame with columns as described above

Value

GGplot object

Examples

 data = data.frame(raw.file = c(rep('file A', 100), rep('file B', 40)),
                       data = c(rep(2, 60), rep(3, 30), rep(4, 10),
                                rep(2, 30), rep(3, 7), rep(4, 3)))
 plot_Charge(mosaicize(data))

Plot contaminants from evidence.txt, broken down into top5-proteins.

Description

Plot contaminants from evidence.txt, broken down into top5-proteins.

Usage

plot_ContEVD(data, top5)

Arguments

data

A data.frame with columns 'fc.raw.file', 'contaminant', 'pname', 'intensity'

top5

Name of the Top-5 Proteins (by relative intensity or whatever seems relevant)

Value

GGplot object

Examples


 data = data.frame(intensity = 1:12, 
                   pname = rep(letters[1:3], 4), 
                   fc.raw.file = rep(paste("f", 1:4), each=3),
                   contaminant = TRUE)
 ## providing more proteins than present... d,e will be ignored
 plot_ContEVD(data, top5 = letters[1:5])
 ## classify 'c' as 'other'
 plot_ContEVD(data, top5 = letters[1:2])

Plot user-defined contaminants from evidence.txt

Description

Kolmogorov-Smirnoff p-values are plotted on top of each group. High p-values indicate that Andromeda scores for contaminant peptides are equal or higher compared to sample peptide scores, i.e. the probability that sample peptides scores are NOT greater than contaminant peptide scores.

Usage

plot_ContUser(data, name_contaminant, extra_limit, subtitle = NULL)

Arguments

data

A data.frame with columns 'fc.raw.file', 'variable', 'value'

name_contaminant

Name of the contaminant shown in title

extra_limit

Position where a h-line is plotted (for visual guidance)

subtitle

Optional subtitle for plot

Value

GGplot object

Examples


 data = data.frame(fc.raw.file = letters[1:3], 
                   variable = c(rep("spectralCount", 3),
                                 rep("intensity", 3),
                                 rep("above.thresh", 3),
                                 rep("score_KS", 3)),
                   value = c(10, 20, 15, 9, 21, 14, 0, 1, 1, 0.3, 0.01, 0.04))
 plot_ContUser(data, "myco", 5, "subtitle")

Plot Andromeda score distribution of contaminant peptide vs. matrix peptides.

Description

The data is expected to be an ECDF already, x being the Andromeda score, y being the culmulative probability. The Score is the probability of a Kolm.-Smirnoff test that the contaminant scores are larger (i.e. large p-values indicate true contamination). You will only see this plot if the but high-scoring contaminant peptides, which would erroneously give you a large p-value and make you believe your sample is contaminated although that's not the case.

Usage

plot_ContUserScore(data, raw.file, score)

Arguments

data

A data.frame with columns 'x', 'y', 'condition'

raw.file

Name of Raw file for which the data is displayed (will become part of the plot title)

score

Score of how distinct the distributions are (will become part of the title)

Value

GGplot object

Examples


 data = data.frame(x = 10:60,
                   y = c(seq(0,1,length=51), seq(0.1, 1, length=51)), 
                   condition = rep(c("sample","contaminant"), each=51))
 plot_ContUserScore(data, 'test file', 0.96)

Plot contaminants from proteinGroups.txt

Description

Plot contaminants from proteinGroups.txt

Usage

plot_ContsPG(data)

Arguments

data

A data.frame with columns 'group', 'cont_pc', 'logAbdClass'

Value

GGplot object

Examples


 data = data.frame( 'group' = letters[1:10], 'cont_pc' = 2:11, 'logAbdClass' = c("low","high"))
 plot_ContsPG(data)

Plot Protein groups per Raw file

Description

The input is a data.frame with protein/peptide counts, where 'category' designates the origin of information (genuine ID, transferred ID, or both).

Usage

plot_CountData(data, y_max, thresh_line, title)

Arguments

data

A data.frame with columns 'fc.raw.file', 'counts', 'category'

y_max

Plot limit of y-axis

thresh_line

Position of a threshold line, indicating the usual target value

title

Main title, and optional subtitle (if vector of length 2 is provided)

Value

GGplot object

Examples


 data = data.frame(fc.raw.file = rep(c("file A", "file B"), each=3),
                   counts = c(3674, 593, 1120, 2300, 400, 600), 
                   category = c("genuine","genuine+transferred","transferred"))
 plot_CountData(data, 6000, 4000, c("EVD: Protein Groups count", "gain: 23%"))

Plot some count data over time for each Raw file.

Description

The input is a data.frame with columns 'RT' - RT in seconds, representing one bin 'counts' - number of counts at this bin 'fc.raw.file' - name of the Raw file where each row represents one bin in RT.

Usage

plot_DataOverRT(
  data,
  title,
  y_lab,
  x_lim = range(data$RT),
  y_max = max(data$counts)
)

Arguments

data

A data.frame with columns as described above

title

The plot title

y_lab

Label of y-axis

x_lim

Limits of the x-axis (2-tuple)

y_max

Maximum of the y-axis (single value)

Details

At most nine(!) Raw files can be plotted. If more are given, an error is thrown.

Value

GGplot object

Examples

 data = data.frame(fc.raw.file = rep(paste('file', letters[1:3]), each=30),
                            RT = seq(20, 120, length.out = 30),
                        counts = c(rnorm(30, 400, 20), rnorm(30, 250, 15), rnorm(30, 50, 15)))
 plot_DataOverRT(data, "some title", "count data")

Plot percent of identified MS/MS for each Raw file.

Description

Useful for a first overall impression of the data.

Usage

plot_IDRate(data, id_rate_bad, id_rate_great, label_ID)

Arguments

data

A data.frame with columns as described above

id_rate_bad

Number below which the ID rate is considered bad

id_rate_great

Number above which the ID rate is considered great

label_ID

Named vector with colors for the categories given in data$cat

Details

The input is a data.frame with columns 'fc.raw.file' - name of the Raw file 'ms.ms.identified....' - fraction of identified MS/MS spectra in percent 'cat' - identification category as arbitrary string where each row represents one Raw file.

Value

GGplot object

Examples

 id_rate_bad = 20; id_rate_great = 35;
 label_ID = c("bad (<20%)" = "red", "ok (...)" = "blue", "great (>35%)" = "green")
 data = data.frame(fc.raw.file = paste('file', letters[1:3]),
                   ms.ms.identified.... = rnorm(3, 25, 15))
 data$cat = factor(cut(data$ms.ms.identified....,
                       breaks=c(-1, id_rate_bad, id_rate_great, 100),
                       labels=names(label_ID)))                  
 plot_IDRate(data, id_rate_bad, id_rate_great, label_ID)

Plot IDs over time for each Raw file.

Description

Uses plot_DataOverRT() internally.

Usage

plot_IDsOverRT(data, x_lim = range(data$RT), y_max = max(data$counts))

Arguments

data

A data.frame with columns as described above

x_lim

Limits of the x-axis (2-tuple)

y_max

Maximum of the y-axis (single value)

Value

GGplot object

Examples

 data = data.frame(fc.raw.file = rep(paste('file', letters[1:3]), each=30),
                            RT = seq(20, 120, length.out = 30),
                        counts = c(rnorm(30, 400, 20), rnorm(30, 250, 15), rnorm(30, 50, 15)))
 plot_IDsOverRT(data)

Plot line graph of TopN over Retention time.

Description

Number of Raw files must be 6 at most. Function will stop otherwise.

Usage

plot_IonInjectionTimeOverRT(data, stats, extra_limit)

Arguments

data

A data.frame with columns 'fc.raw.file', 'rRT', 'medIIT'

stats

A data.frame with columns 'fc.raw.file', 'mean'

extra_limit

Visual guidance line (maximum acceptable IIT)

Value

GGplot object

Examples

  data = data.frame(fc.raw.file = rep(c("d","a","x"), each=100),
                    rRT = seq(20, 120, length.out = 100),
                    medIIT = c(round(runif(100, min=3, max=5)),
                               round(runif(100, min=5, max=8)),
                               round(runif(100, min=1, max=3)))
                    )
  stats = data.frame(fc.raw.file = c("d","a","x"),
                     mean = c(4, 6.5, 2))
  plot_IonInjectionTimeOverRT(data, stats, 10)

Plot MaxQuant Match-between-runs alignment performance.

Description

The plots shows the correction function applied by MaxQuant, and the residual RT (ideally 0) of each peptide to its reference. Uncalibrated peptides are shown in red, calibrated ones in green. The MaxQuant RT correction which was applied prior is shown in blue. The range of this function can give hints if the allowed RT search window (20min by default) is sufficient or if MaxQuant should be re-run with more tolerant settings.

Usage

plot_MBRAlign(data, y_lim, title_sub, match_tol)

Arguments

data

A data.frame with columns as described above

y_lim

Plot range of y-axis

title_sub

Subtitle

match_tol

Maximal residual RT delta to reference (usually ~1 min)

Details

The input is a data.frame with columns 'calibrated.retention.time' - resulting (hopefully) calibratated RT after MQ-recal (the X-axis of the plot) 'retention.time.calibration' - delta applied by MaxQuant 'rtdiff' - remaining RT diff to reference peptide of the same sequence 'RTdiff_in' - is the feature aligned (within 'match_tol')? 'fc.raw.file_ext' - raw file where each row represents one peptide whose RT was corrected by MaxQuant.

Value

GGplot object

Examples


 data = data.frame(fc.raw.file_ext = "file A", ## more than one would be possible
                   calibrated.retention.time = c(20:100), 
                   retention.time.calibration = 6 + sin((20:100)/10))
 data$rtdiff = rnorm(nrow(data))
 data$RTdiff_in = c("green", "red")[1 + (abs(data$rtdiff) > 0.7)]
 
 plot_MBRAlign(data, c(-10, 10), "fancy subtitle", 0.7)

Plot MaxQuant Match-between-runs id transfer performance.

Description

The plots shows the different categories of peak classes

Usage

plot_MBRIDtransfer(data)

Arguments

data

A data.frame with columns as described above

Details

The input is a data.frame with columns 'fc.raw.file' - raw file name 'single' - fraction of peptides with are represent only once 'multi.inRT' - fraction of peptides with are represent multiple times, but within a certain RT peak width 'multi.outRT' - fraction of peptides with are represent multiple times, with large RT distance 'sample' - raw file where each row represents one peptide sequence.

Value

GGplot object

Examples

 data = data.frame(fc.raw.file = rep(c("file A", "file B"), each = 3),
                   single = c(0.9853628, 0.8323160, 0.9438375,
                              0.9825538, 0.8003763, 0.9329961), 
                   multi.inRT = c(0.002927445, 0.055101018, 0.017593087,
                                  0.005636457, 0.099640044, 0.031870056),
                   multi.outRT = c(0.01170978, 0.11258294, 0.03856946,
                                   0.01180972, 0.09998363, 0.03513386),
                   sample = rep(c("genuine", "transferred", "all"), 2))
 plot_MBRIDtransfer(data)

Plot MaxQuant Match-between-runs id transfer performance as a scatterplot.

Description

Per Raw file, the absolute number of transferred IDs as well as the relative gain in percent.

Usage

plot_MBRgain(data, title_sub = "")

Arguments

data

A data.frame with columns as described above

title_sub

Subtitle text

Details

The input is a data.frame with columns 'fc.raw.file' - raw file name 'abs' - absolute number of transferred ID's 'pc' - gain on top of genuine IDs [ where each row represents one rawfile.

Value

GGplot object

Examples

 data = data.frame(fc.raw.file = paste("file", letters[1:4]),
                   abs = c(5461, 5312, 3618, 502), 
                   pc = c(34, 32, 22, 2))
 plot_MBRgain(data, "MBR gain: 18%")

Plot bargraph of oversampled 3D-peaks.

Description

Per Raw file, at most three n's must be given, i.e. the fraction of 3D-peaks for n=1, n=2 and n=3(or more). The fractions must sum to 1 (=100

Usage

plot_MS2Decal(data)

Arguments

data

A data.frame with columns 'file', 'msErr', 'type'

Value

GGplot object

Examples

  n = c(100, 130, 50)
  data = data.frame(file = repEach(paste(letters[1:3],"\nLTQ [Da]"), n),
                    msErr = c(rnorm(n[1], 0.5), rnorm(n[2], 0.0), rnorm(n[3], -0.5)),
                    type = c("forward", "decoy")[1+(runif(sum(n))>0.95)])
  plot_MS2Decal(data)

Plot bargraph of oversampled 3D-peaks.

Description

Per Raw file, at most three n's must be given, i.e. the fraction of 3D-peaks for n=1, n=2 and n=3(or more). The fractions must sum to 1 (=100

Usage

plot_MS2Oversampling(data)

Arguments

data

A data.frame with columns 'fc.raw.file', 'n', 'fraction'

Value

GGplot object

Examples

  data = data.frame(fc.raw.file = rep(letters[1:3], each=3),
                    n = 1:3,
                    fraction = c(0.8, 0.1, 0.1, 0.6, 0.3, 0.1, 0.7, 0.25, 0.05))
  plot_MS2Oversampling(data)

Plot bargraph of missed cleavages.

Description

Per Raw file, an arbitrary number of missed cleavage classes (one per column) can be given. The total fraction of 3D-peaks must sum to 1 (=100 Columns are ordered by name.

Usage

plot_MissedCleavages(data, title_sub = "")

Arguments

data

A data.frame with columns 'fc.raw.file', '...' (missed cleavage classes)

title_sub

Plot's subtitle

Details

A visual threshold line is drawn at 75

Value

GGplot object

Examples

  data = data.frame(fc.raw.file = letters[1:5],
                    MC0 = c(0.8, 0.5, 0.85, 0.2, 0.9),
                    MC1 = c(0.1, 0.4, 0.05, 0.7, 0.0),
                    "MS2+" =  c(0.1, 0.1, 0.1, 0.1, 0.1),
                    check.names = FALSE)
  plot_MissedCleavages(data, "contaminant inclusion unknown")

Plot RT peak width over time

Description

The input is a data.frame with already averaged counts over binned RT-slices.

Usage

plot_RTPeakWidth(data, x_lim, y_lim)

Arguments

data

A data.frame with columns 'fc.raw.file', 'RT', 'peakWidth'

x_lim

Plot range of x-axis

y_lim

Plot range of y-axis

Value

GGplot object

Examples


 data = data.frame(fc.raw.file = rep(c("file A", "file B", "file C"), each=81),
                   RT = c(20:100), 
                   peakWidth = c(rnorm(81, mean=20), rnorm(81, mean=10), rnorm(81, mean=30)))
 plot_RTPeakWidth(data, c(10, 100), c(0, 40))

Plot ratios of labeled data (e.g. SILAC) from proteinGroups.txt

Description

The 'x' values are expected to be log2() transformed already.

Usage

plot_RatiosPG(df_ratios, d_range, main_title, main_col, legend_title)

Arguments

df_ratios

A data.frame with columns 'x', 'y', 'col', 'ltype'

d_range

X-axis range of plot

main_title

Plot title

main_col

Color of title

legend_title

Legend text

Value

GGplot object

Examples


 x1 = seq(-3, 3, by = 0.1)
 y1 = dnorm(x1)
 x2 = seq(-5, 1, by = 0.1)
 y2 = dnorm(x2, mean = -1)
 data = data.frame( x = c(x1,x2),
                    y = c(y1,y2), 
                    col = c(rep("ok", length(x1)), rep("shifted", length(x2))), 
                    ltype = c(rep("solid", length(x1)), rep("dotted", length(x2))))
 plot_RatiosPG(data, range(data$x), "Ratio plot", "red", "group")

Plot line graph of TopN over Retention time.

Description

Number of Raw files must be 6 at most. Function will stop otherwise.

Usage

plot_ScanIDRate(data)

Arguments

data

A data.frame with columns 'fc.raw.file', 'scan.event.number', 'ratio', 'count'

Value

GGplot object

Examples

  data = data.frame(fc.raw.file = factor(rep(c("d","a","x"), each=10), levels = c("d","a","x")),
                    scan.event.number = 1:10,
                    ratio = seq(40, 20, length.out=10),
                    count = seq(400, 200, length.out=10))
  plot_ScanIDRate(data)

Plot Total Ion Count over time

Description

The input is a data.frame with already averaged counts over binned RT-slices.

Usage

plot_TIC(data, x_lim, y_lim)

Arguments

data

A data.frame with columns 'fc.raw.file', 'RT', 'intensity'

x_lim

Plot range of x-axis

y_lim

Plot range of y-axis

Value

GGplot object

Examples


 data = data.frame(fc.raw.file = rep(c("file A", "file B", "file C"), each=81),
                   RT = c(20:100), 
                   intensity = c(rnorm(81, mean=20), rnorm(81, mean=10), rnorm(81, mean=30)))
 plot_TIC(data, c(10, 100), c(0, 40))

Plot line graph of TopN over Retention time.

Description

Number of Raw files must be 6 at most. Function will stop otherwise.

Usage

plot_TopN(data)

Arguments

data

A data.frame with columns 'fc.raw.file', 'scan.event.number', 'n'

Value

GGplot object

Examples

  data = data.frame(fc.raw.file = rep(c("d","a","x"), each=10),
                    scan.event.number = 1:10,
                    n = 11:20)
  plot_TopN(data)

Plot line graph of TopN over Retention time.

Description

Number of Raw files must be 6 at most. Function will stop otherwise.

Usage

plot_TopNoverRT(data)

Arguments

data

A data.frame with columns 'fc.raw.file', 'rRT', 'topN'

Value

GGplot object

Examples

  data = data.frame(fc.raw.file = rep(letters[1:3], each=100),
                    rRT = seq(20, 120, length.out = 100),
                    topN = c(round(runif(100, min=3, max=5)),
                             round(runif(100, min=5, max=8)),
                             round(runif(100, min=1, max=3)))
                    )
  plot_TopNoverRT(data)

A boxplot of uncalibrated mass errors for each Raw file.

Description

Boxes are optionally colored to indicate that a MQ bug was detected or if PTXQC detected a too narrow search window.

Usage

plot_UncalibratedMSErr(
  data,
  MQBug_raw_files,
  stats,
  y_lim,
  extra_limit,
  title_sub
)

Arguments

data

A data.frame with columns 'fc.raw.file', 'uncalibrated.mass.error..ppm.'

MQBug_raw_files

List of Raw files with invalid calibration values

stats

A data.frame with columns 'fc.raw.file', 'sd', 'outOfCal'

y_lim

Range of y-axis

extra_limit

Position where a v-line is plotted (for visual guidance)

title_sub

Subtitle

Value

GGplot object

Examples

  n = c(150, 1000, 1000, 1000)
  data = data.frame(fc.raw.file = repEach(letters[4:1], n),
                    uncalibrated.mass.error..ppm. = c(rnorm(n[1], 13, 2.4),
                                                      rnorm(n[2], 1, 0.5),
                                                      rnorm(n[3], 3, 0.7),
                                                      rnorm(n[4], 4.5, 0.8)))
  stats = data.frame(fc.raw.file = letters[4:1],
                     sd_uncal = c(2.4, 0.5, 0.7, 0.8),
                     outOfCal = c(TRUE, FALSE, FALSE, FALSE))           
  plot_UncalibratedMSErr(data, MQBug_raw_files = letters[1],
                         stats, y_lim = c(-20,20), 15, "subtitle")

Plot peptide modification frequencies

Description

The input is a data.frame, as obtained from modsToTableByRaw().

Usage

plot_peptideMods(tbl, y_max = NA, show_missing_modification_levels = TRUE)

Arguments

tbl

A data.frame with 'fc.raw.file', 'modification_names' (can be a factor), and 'Freq' (0-100)

y_max

The upper limit of the y-axis's (==Freq); useful for multiple plots with identical limits; if 'NA' the limit is computed from the given 'tbl'

show_missing_modification_levels

If 'tbl$modification_names' is a factor and has more (but missing) levels than actually used, should missing values be dropped or assumed as '0' frequency?

Value

GGplot object

Examples


 data = data.frame(fc.raw.file = rep(c("file A", "file B"), each=3),
                   modifications = c("Oxidation (M)",
                   "Unmodified",
                   "Oxidation (M), Acetyl (Protein N-term)",
                   "2 Oxidation (M)", 
                   "Unmodified", 
                   "Unmodified"))
 tbl = modsToTableByRaw(data)
 plot_peptideMods(tbl,show_missing_modification_levels = TRUE)

Distribute a set of points with fixed y-values on a stretch of the x-axis.

Description

#' Usage: ggplot(...) + geom_X(...) + pointsPutX(...)

Usage

pointsPutX(x_range, x_section, y, col = NA)

Arguments

x_range

[min,max] valid range of x-values

x_section

[min,max] fraction in which to distribute the values (in [0,1] for min,max, e.g. c(0.03,0.08) for 3-8%)

y

Y-values

col

Colour of the points (used as argument to aes(colour=))

Value

ggplot object with new geom_point

helper S3 class, enabling print(some-plot_Table-object)

Description

helper S3 class, enabling print(some-plot_Table-object)

Usage

## S3 method for class 'PTXQC_table'
print(x, ...)

Arguments

x

Some Grid object to plot

...

Further arguments (not used, but required for consistency with other print methods)

Augment a ggplot with footer text

Description

Augment a ggplot with footer text

Usage

printWithFooter(gg_obj, bottom_left = NULL, bottom_right = NULL)

Arguments

gg_obj

ggplot2 object to be printed

bottom_left

Footer text for bottom left side

bottom_right

Footer text for bottom right side

Value

Class which can compute plots and generate mzQC output (usually for a single metric).

Description

Internally calls the workerFcn() , which computes the actual plots metric scores and supporting data (e.g. mzQC metrics) of the derived class; the resulting data is checked and stored in the members of this class

Arguments

df

The expected data, usually a data frame. If empty, this function will return immediately without failure.

...

Additional arguments passed to the workerFcn()

Details

Reference class which is instanciated with a metric description and a worker function (at initialization time, i.e. in the package) and can produce plots and mzQC values (at runtime, when data is provided) using setData().

All derived classes need to implement a 'workerFcn()' function, which returns a list with elements: c("plots", "mzQC", "htmlTable", "qcScores", "title"), where 'plots' is required; all others are optional.

Fields

helpText: Description (lengthy) of the metric and plot elements
workerFcn: Function which generates a result (usually plots). Data is provided using setData().
plots: List of plots (after setData() was called)
htmlTable: A table for display in the HTML report (preferred over a plot in Html mode)
qcScores: Data.frame of scores from a qcMetric (computed within workerFcn())
mzQC: An named list of mzQC MzQCqualityMetric's (named by their fc.raw.file for runQuality or concatenated fc.raw.files for setQualities (e.g. "file 1;file4")) (valid after setData() was called)
qcCat: QC category (LC, MS, or prep)
qcName: Name of the qcScore in the heatmap
orderNr: Column index during heatmap generation and for the general order of plots

Examples


require(ggplot2)
dd = data.frame(x=1:10, y=11:20)
a = qcMetric$new(helpText="small help text", 
                 ## arbitrary arguments, matched during setData()
                 workerFcn=function(.self, data, gtitle)
                 {
                   ## usually some code here to produce ggplots
                   pl = lapply(1:2, function(xx) {
                       ggplot(data) +
                         geom_point(aes(x=x*xx,y=y)) +
                         ggtitle(gtitle)
                     })
                     ## add mzQC metric for count of identified clusters
                     mzqc = lapply(1:3, function(id){
                       out = rmzqc::getQualityMetricTemplate("MS:1002406")
                       out$value = id;
                       return(out) })
                     names(mzqc) = paste0("file", 1:3);
                   return(list(plots = pl, mzQC = mzqc))
                 }, 
                 qcCat="LC", 
                 qcName="MS/MS Peak shape", 
                 orderNr = 30)
## test some output
a$setData(dd, "my title")
a$plots  ## the raw plots
a$getPlots(TRUE) ## same as above
a$getPlots(FALSE) ## plots without title
a$getTitles()  ## get the titles of the all plots
a$helpText
a$qcName
a$mzQC

Metric for msmsscans.txt, showing TopN over RT.

Description

Metric for msmsscans.txt, showing TopN over RT.

From a list of vectors, compute all vs. all Kolmogorov-Smirnoff distance statistics (D)

Description

... and report the row of the matrix which has maximum sum (i.e the best "reference" distribution). The returned data.frame has as many rows as distributions given and two columns. The first column 'name' gives the name of the list element, the second column 'ks_best' gives '1-statistic' of the Kolmogorov-Smirnoff test to the "reference" distribution (which was picked by maximising the sum of 'ks_best'). Thus, the row with a 'ks_best' of 1 is the reference distribution.

Usage

qualBestKS(x)

Arguments

x

List of vectors, where each vector holds a distribution

Value

A data.frame with ks-test values of the "reference" to all other distributions (see Details)

Quality metric for 'centeredness' of a distribution around zero.

Description

Ranges between 0 (worst score) and 1 (best score). A median of zero gives the best score of 1. The closer the median is to the most extreme value of the distribution, the smaller the score (until reaching 0). Can be used for calibrated mass errors, as a measure of how well they are centered around 0. E.g. if the median is 0.1, while the range is [-0.5,0.5], the score will be 0.8 (punishing the 20 If the range of data is asymmetric, e.g. [-1.5,-0.5] and does not include zero, the score cannot reach 1, since the median can never be zero.

Usage

qualCentered(x)

Arguments

x

Numeric values (e.g. ppm errors)

Value

Value between [0, 1]

Quality metric for 'centeredness' of a distribution around zero with a user-supplied range threshold.

Description

Ranges between 0 (worst score) and 1 (best score). The best score is achieved when the median of 'x' is close to the center of the interval [-tol, tol]. If median of 'x' is close to the border (on either side), the score decreases linearly to zero. Can be used for uncalibrated mass errors, as a measure of how well they are centered around 0.

Usage

qualCenteredRef(x, tol)

Arguments

x

Vector of values (hopefully in interval [-tol, tol])

tol

Border of interval (must be positive)

Details

NA's are removed for all computations.

Value

Value between [0, 1]

Compute probability of Gaussian (mu=m, sd=s) at a position 0, with reference to the max obtainable probability of that Gaussian at its center.

Description

Measure for centeredness around 0. Highest score is 1, worst score is 0.

Usage

qualGaussDev(mu, sd)

Arguments

mu

Center of Gaussian

sd

SD of Gaussian

Value

quality, ranging from 0 (bad agreement) to 1 (perfect, i.e. centered at 0)

Score an empirical density distribution of values, where the best possible distribution is right-skewed.

Description

The score is computed according to

Usage

qualHighest(x, N)

Arguments

x

Vector of numeric values (e.g. height of histogram bins)

N

Length of x (just a precaution currently)

Details

q = ((N-1) - sum_i( ((N-i-1)*x_i) ) / (N-1)

Scores range from 0 (worst), to 1 (best). E.g. c(0,0,0,16) would yield a score of 1. c(16,0,0,0,0) gives a score of 0.

Value

Quality score in the range of [0,1]

Examples

 qualHighest(c(0,0,0,16), 4)   ## 1
 qualHighest(c(16,0,0,0), 4)   ## 0 
 qualHighest(c(1,1,1,1), 4)    ## 0.5
 qualHighest(c(0,16,0,0), 4)   ## 1/3

Quality metric with linear response to input, reaching the maximum score at the given threshold.

Description

Ranges between 0 (worst score) and 1 (best score). Useful for performance measures where reaching a certain reference threshold 't' will be enough to reach 100%. The input range from [0, t] is scored from 0-100%.

Usage

qualLinThresh(x, t = 1)

Arguments

x

Numeric value(s) between [0, inf]

t

Threshold value, which indicates 100%

Value

Value between [0, 1]

Quality metric which measures the absolute distance from median.

Description

Ranges between 0 (worst score) and 1 (best score). Input must be between [0,1]. Deviations from the median of the sample represent the score for each sample point.

Usage

qualMedianDist(x)

Arguments

x

A vector numeric values between [0,1]

Value

A vector of the same size as x, with quality values between [0, 1]

Compute deviation from uniform distribution

Description

The score ranges between 0 (worst score) and 1 (best score). Input 'x' is a vector of counts (or probabilities) for equally spaced bins in a histogram. A uniform distribution (e.g. c(3,3,3) will get a score of 1. The worst possible case (e.g. c(4,0,0)), will get a score of 0, and a linear increasing function (e.g. c(1,2,3)) will get something in between (0.585 here)

Usage

qualUniform(x, weight = vector())

Arguments

x

Vector of numeric intensity/count values (e.g. ID's per RT bin); bins are assumed to have equal widths

weight

Vector of weights for values in 'x' (same length as 'x').

Details

In addition, bin values can be weighted (e.g. by their confidence). The total sum of weights is normalized to 1 internally.

The distance function used is the square root of the absolute difference between a uniform distribution and the input 'x' (summed for each element of 'x'). This distance is normalized to the worst possible input (e.g. one bin with 100

Value

Value between [0, 1]

Examples

 stopifnot(qualUniform(c(3,3,3))==1)
 stopifnot(qualUniform(c(4,0,0))==0)         

 ## how 'uniform' is a vector where only a single index has weight?-- answer: very
 stopifnot(qualUniform(c(4,0,0), c(1,0,0))==1)   
 stopifnot(qualUniform(c(4,0,0), c(0,1,0))==1)     
 stopifnot(qualUniform(c(0,4,0))==0)              
 stopifnot(abs(qualUniform(c(3,2,1))-0.58578) < 0.0001)
 stopifnot(abs(qualUniform(c(1,2,3))-0.58578) < 0.0001)
 stopifnot(qualUniform(c(1,2,3), c(0,1,0))==1)   
 stopifnot(abs(qualUniform(c(1,2,3))-0.58578) < 0.0001)
 stopifnot(abs(qualUniform(c(1,2,3), c(0,1,1))- 0.590316) < 0.0001)
 stopifnot(abs(qualUniform(c(2,3), c(1,1))-0.552786) < 0.0001)
 stopifnot(abs(qualUniform(1:120)-0.38661) < 0.0001)

Convenience wrapper for MQDataReader when only a single MQ file should be read and file mapping need not be stored.

Description

For params, see MQDataReader::readMQ().

Usage

read.MQ(
  file,
  filter = "",
  type = "pg",
  col_subset = NA,
  add_fs_col = 10,
  LFQ_action = FALSE,
  ...
)

Arguments

file

see MQDataReader::readMQ()

filter

see MQDataReader::readMQ()

type

see MQDataReader::readMQ()

col_subset

see MQDataReader::readMQ()

add_fs_col

see MQDataReader::readMQ()

LFQ_action

see MQDataReader::readMQ()

...

see MQDataReader::readMQ()

Value

see MQDataReader::readMQ()

Given a vector of (short/long) filenames, translate to the (long/short) version

Description

Given a vector of (short/long) filenames, translate to the (long/short) version

Usage

renameFile(f_names, mapping)

Arguments

f_names

Vector of filenames

mapping

A data.frame with from,to columns

Value

A vector of translated file names as factor (ordered by mapping!)

Repeat each element x_i in X, n_i times.

Description

Repeat each element x_i in X, n_i times.

Usage

repEach(x, n)

Arguments

x

Values to be repeated

n

Number of repeat for each x_i (same length as x)

Value

Vector with values from x, n times

Examples


  repEach(1:3, 1:3)  ## 1, 2, 2, 3, 3, 3

Scales a vector of values linearly to [0, 1] If all input values are equal, returned values are all 0

Description

Scales a vector of values linearly to [0, 1] If all input values are equal, returned values are all 0

Usage

scale01linear(X)

Arguments

X

Vector of values

Value

Scaled vector

Inverse the order of items on the x-axis (for discrete scales)

Description

Inverse the order of items on the x-axis (for discrete scales)

Usage

scale_x_discrete_reverse(values, ...)

Arguments

values

The vector of values as given to the x aestetic

...

Other arguments forwarded to 'scale_y_discrete()'

Value

ggplot object, concatenatable with '+'

Inverse the order of items on the y-axis (for discrete scales)

Description

Inverse the order of items on the y-axis (for discrete scales)

Usage

scale_y_discrete_reverse(values, ...)

Arguments

values

The vector of values as given to the y aestetic

...

Other arguments forwarded to 'scale_y_discrete()'

Value

ggplot object, concatenatable with '+'

Shorten a string to a maximum length and indicate shorting by appending '..'

Description

Some axis labels are sometimes just too long and printing them will either squeeze the actual plot (ggplot) or make the labels disappear beyond the margins (graphics::plot) One ad-hoc way of avoiding this is to shorten the names, hoping they are still meaningful to the viewer.

Usage

shortenStrings(x, max_len = 20, verbose = TRUE, allow_duplicates = FALSE)

Arguments

x

Vector of input strings

max_len

Maximum length allowed

verbose

Print which strings were shortened

allow_duplicates

If shortened strings are not discernible any longer, consider the short version valid (not the default), otherwise (default) return the full string (–> no-op)

Details

This function should be applied AFTER you tried more gentle methods, such as delLCP or simplifyNames.

Value

A vector of shortened strings

Examples

r = shortenStrings(c("gamg_101", "gamg_101230100451", "jurkat_06_100731121305", "jurkat_06_1"))
all(r == c("gamg_101", "gamg_101230100..", "jurkat_06_1007..", "jurkat_06_1"))

Removes common substrings (infixes) in a set of strings.

Description

Usually handy for plots, where condition names should be as concise as possible. E.g. you do not want names like 'TK20130501_H2M1_010_IMU008_CISPLA_E3_R1.raw' and 'TK20130501_H2M1_026_IMU008_CISPLA_E7_R2.raw' but rather 'TK.._010_I.._E3_R1.raw' and 'TK.._026_I.._E7_R2.raw'

If multiple such substrings exist, the algorithm will remove the longest first and iterate a number of times (two by default) to find the second/third etc longest common substring. Each substring must fulfill a minimum length requirement - if its shorter, its not considered worth removing and the iteration is aborted.

Usage

simplifyNames(
  strings,
  infix_iterations = 2,
  min_LCS_length = 7,
  min_out_length = 7
)

Arguments

strings

A vector of strings which are to be shortened

infix_iterations

Number of successive rounds of substring removal

min_LCS_length

Minimum length of the longest common substring (default:7, minimum: 6)

min_out_length

Minimum length of shortest element of output (no shortening will be done which causes output to be shorter than this threshold)

Value

A list of shortened strings, with the same length as the input

Examples

#library(PTXQC)
simplifyNames(c('TK20130501_H2M1_010_IMU008_CISPLA_E3_R1.raw',
                'TK20130501_H2M1_026_IMU008_CISPLA_E7_R2.raw'), infix_iterations = 2)
# --> "TK.._010_I.._E3_R1.raw","TK.._026_I.._E7_R2.raw"

try(simplifyNames(c("bla", "foo"), min_LCS_length=5))
# --> error, since min_LCS_length must be >=6

Compute shortest prefix length which makes all strings in a vector uniquely identifyable.

Description

If there is no unique prefix (e.g. if a string is contained twice), then the length of the longest string is returned, i.e. if the return value is used in a call to substr, nothing happens e.g. substr(x, 1, supCount(x)) == x

Usage

supCount(x, prefix_l = 1)

Arguments

x

Vector of strings

prefix_l

Starting prefix length, which is incremented in steps of 1 until all prefixes are unique (or maximum string length is reached)

Value

Integer with minimal prefix length required

Examples

  supCount(c("abcde...", "abcd...", "abc..."))  ## 5

  x = c("doubled", "doubled", "aLongDummyString")
  all( substr(x, 1, supCount(x)) == x )   
  ## TRUE (no unique prefix due to duplicated entries)

A blank theme (similar to the deprecated theme_blank())

Description

A blank theme (similar to the deprecated theme_blank())

Usage

theme_blank()

Value

A ggplot2 object, representing an empty theme

Thin out a data.frame by removing rows with similar numerical values in a certain column.

Description

All values in the numerical column 'filterColname' are assigned to bins of width 'binsize'. Only one value per bin is retained. All other rows are removed and the reduced data frame will all its columns is returned.

Usage

thinOut(data, filterColname, binsize)

Arguments

data

The data.frame to be filtered

filterColname

Name of the filter column as string

binsize

Width of a bin

Value

Data.frame with reduced rows, but identical input columns

Apply 'thinOut' on all subsets of a data.frame, split by a batch column

Description

The binsize is computed from the global data range of the filter column by dividing the range into binCount bins.

Usage

thinOutBatch(data, filterColname, batchColname, binCount = 1000)

Arguments

data

The data.frame to be split and filtered(thinned)

filterColname

Name of the filter column as string

batchColname

Name of the split column as string

binCount

Number of bins in the 'filterColname' dimension.

Value

Data.frame with reduced rows, but identical input columns

Check if a file is writable and blocks an interactive session, waiting for user input.

Description

This functions gives the user a chance to make the output file writeable before a write attempt is actually made by R to avoid having run the whole program again upon write failure.

Usage

wait_for_writable(
  filename,
  prompt_text = paste0("The file '", filename,
    "' is not writable. Please close all applications using this file. Press '",
    abort_answer, "' to abort!"),
  abort_answer = "n"
)

Arguments

filename

The file to test for writable

prompt_text

If not writable, show this prompt text to the user

abort_answer

If the user enters this string into the prompt, this function will stop()

Details

Note: The file will not be overwritten or changed by this function.

Value

TRUE if writable, FALSE if aborted by user or (not-writeable and non-interactive)

PTXQC: A package for computing Quality Control (QC) metrics for Proteomics (PTX)

Description

Input

Important functions

Configuration

Output

Author(s)

See Also

A string concatenation function, more readable than 'paste()'.

Description

Usage

Arguments

Value

Coefficient of variation (CV)

Description

Usage

Arguments

Value

Make sure to call $readMappingFile(some_file) if you want to support a user-defined file mapping. Otherwise, calls to $getShortNames() will create/augment the mapping for filenames.

Description

Fields

Methods

Return Value

Examples

Compute longest common substring of two strings.

Description

Usage

Arguments

Value

Find longest common substring from 'n' strings.

Description

Usage

Arguments

Value

Examples

S5-RefClass to read MaxQuant .txt files

Description

Arguments

Details

Value

Methods

Class to read an mzTab file and store the tables internally.

Description

Fields

Methods

Quality control for proteomics data

Description

Define a Singleton class which holds the full raw filenames (+path) and their PSI-MS CV terms for usage in the mzQC metadata

Description

Super class

Public fields

Methods

Public methods

Method clone()

Usage

Arguments

Relative standard deviation (RSD)

Description

Usage

Arguments

Value

Return a tree plot with a possible alignment tree.

Description

Usage

Arguments

Details

Value

Compute the fraction of features per Raw file which have an acceptable RT difference after alignment

Description

Usage

Arguments

Value

Query a YAML object for a certain parameter.

Description

Fields

Methods

Examples

Verify an alignment by checking the retention time differences of identical peptides across Raw files

Description

Usage

Method `clone()`

Same as `byX`, but with more flexible group size, to avoid that the last group has only a few entries (<50% of desired size).