Baldur is a hierarchical Bayesian model for the analysis of proteomics data. By leveraging empirical Bayes methods, Baldur estimates hyperparameters for variance and measurement-specific uncertainty. It then computes the posterior difference in means between conditions for each peptide, protein, or PTM, and integrates the posterior to estimate error probabilities.
Install the stable release from CRAN:
install.packages('baldur')Or, install the development version from GitHub (after installing
rstan):
Follow the instructions for installing rstan: https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started
Then:
devtools::install_github('PhilipBerg/baldur', build_vignettes = TRUE)Note:
- On Ubuntu, pandoc may be needed to build vignettes.
- On Windows, sometimes the development version ofrstanis required.
For detailed examples, see the package vignettes:
vignette('baldur_yeast_tutorial')
vignette('baldur_ups_tutorial')Baldur implements a hierarchical Bayesian framework for label-free proteomics quantification, designed to robustly estimate differential abundance while accounting for the mean-variance relationship in mass spectrometry data. For exact details please see the original paper.
For each feature (peptide, protein, or PTM) \(i\) in sample \(j\), the observed intensity \(y_{ij}\) is modeled as:
\[y_{ij}\sim\text{Normal}(\mu_{j},\sigma u_{ij})\] \[\mu_{j}\sim\text{Normal}(\mu_{0j}+\eta_j\sigma,\sigma)\]
The measurement standard deviation \(s_{j}\) is not constant, but depends on the mean intensity. This relationship is modeled with gamma regression: \[s_{j} \sim \Gamma(\alpha, \frac{\alpha}{\beta(\bar{y}_j)})\] where: - \(\alpha\): shape parameter (estimated empirically) - \(\beta(\bar{y}_j)\): rate parameter as a function of peptide/protein mean intensity
For each observation, the expected mean-variance relationship is modeled as:
\[\beta_i=\kappa\cdot\exp(\theta_i\cdot(I_L-S_Lx_i))+\exp(I-S\bar{y}_i)\]
where: - \(S, S_L\): slope parameters (common and latent) - \(I, I_L\): intercepts (common and latent) - \(\bar{y}_i\): mean - \(\theta_i\): feature-specific mixture parameter
Given the expected mean-variance, the observed standard deviation \(\sigma_i\) is modeled as: \[\sigma_i\sim\Gamma(\alpha,\frac{\alpha}{\beta_i})\] where: - \(\alpha\): gamma shape parameter - \(\beta_i\): expected mean-variance for observation \(i\)
The normalized root-mean-square error (NRMSE) is calculated for model diagnostics.
For differential analysis, Baldur estimates the posterior distribution of the difference in means between conditions: \[\boldsymbol{D}\sim\mathcal{N}(\boldsymbol{\mu}^\text{T}\boldsymbol{K},\sigma\boldsymbol{\xi}),\quad \xi_{m}=\sqrt{\sum_{i=1}^{C}\frac{|k_{im}|}{n_i}}\]
where: - \(\boldsymbol{K}\): contrast matrix - \(k_{im}\): contrast coefficient for condition \(i\) in contrast \(m\) - \(n_i\): number of samples in condition \(i\) - \(\boldsymbol{\xi}\): scaling factor for each contrast
The probability of error for contrast \(c\) is then:
\[P(\mathrm{error}) = 2\Phi(-|\mu_{D_c} - \mu_{h_0}| \odot \tau_{D_c})\]
where: - \(\Phi\): cumulative distribution function (CDF) of the standard normal - \(\mu_{h_0}\): null hypothesis mean (often zero) - \(\boldsymbol{\tau}_{\boldsymbol{D}}\): precision (inverse standard deviation) for each contrast - \(\odot\): element-wise multiplication
Summary:
Baldur combines hierarchical modeling, mean-variance trend estimation
via gamma regression, and empirical Bayes to robustly quantify
differential abundance and propagate uncertainty from individual
measurements to protein/PTM level, outputting interpretable error
probabilities for each feature.
For full details, see the reference publication.
Berg, Philip, and George Popescu.
“Baldur: Bayesian Hierarchical Modeling for Label-Free Proteomics with
Gamma Regressing Mean-Variance Trends.”
Molecular & Cellular Proteomics (2023): 2023-12.
https://doi.org/10.1016/j.mcpro.2023.100658