Maintainer: | Toby Dylan Hocking <toby.hocking@r-project.org> |
Author: | Toby Dylan Hocking, Guillem Rigaill |
Version: | 2024.1.24 |
License: | GPL-3 |
URL: | https://github.com/tdhock/PeakSegDP |
BugReports: | https://github.com/tdhock/PeakSegDP/issues |
Title: | Dynamic Programming Algorithm for Peak Detection in ChIP-Seq Data |
Description: | A quadratic time dynamic programming algorithm can be used to compute an approximate solution to the problem of finding the most likely changepoints with respect to the Poisson likelihood, subject to a constraint on the number of segments, and the changes which must alternate: up, down, up, down, etc. For more info read http://proceedings.mlr.press/v37/hocking15.html "PeakSeg: constrained optimal segmentation and supervised penalty learning for peak detection in count data" by TD Hocking et al, proceedings of ICML2015. |
Suggests: | ggplot2 (≥ 2.0), testthat, penaltyLearning |
Depends: | R (≥ 2.10) |
NeedsCompilation: | yes |
Packaged: | 2024-01-24 18:18:03 UTC; th798 |
Repository: | CRAN |
Date/Publication: | 2024-01-24 19:00:05 UTC |
GeomTallRect
Description
ggproto object for geom_tallrect
Usage
"GeomTallRect"
Several ChIP-seq profiles, some of which have few data points
Description
These data are used to test the PeakSegDP algorithm, to make sure it gives sensible results, even when there are few data.
Usage
data("H3K36me3.AM.immune.19")
Format
Named list of 21 data.frames, each with columns chromStart, chromEnd, count.
Source
http://cbio.ensmp.fr/~thocking/chip-seq-chunk-db/ data set H3K36me3_AM_immune, chunk id 19
8 profiles of H3K36me3 data
Description
these data caused a bug in multiSampleSegHeuristic.
Usage
data("H3K36me3.TDH.other.chunk3.cluster4")
Format
A data frame with 36914 observations on the following 4 variables.
sample.id
a factor with 8 levels
chromStart
integer vector
chromEnd
integer vector
count
integer vector
Source
http://cbio.ensmp.fr/~thocking/chip-seq-chunk-db/ data set H3K36me3_TDH_other chunk 3.
Histone ChIP-seq data, 26 samples, chr1 subset
Description
26 samples, each with the same overlapping peak(s).
Usage
data("H3K4me3.TDH.immune.chunk12.cluster4")
Format
A data frame.
Source
http://cbio.ensmp.fr/~thocking/chip-seq-chunk-db H3K4me3_TDH_immune data set, chunk.id=12.
PeakSegDP
Description
Compute the PeakSeg model on a data.frame of compressed
sequence
reads.
Usage
PeakSegDP(compressed,
maxPeaks)
Arguments
compressed |
data.frame with columns chromStart, chromEnd, count. |
maxPeaks |
maximum number of peaks to consider. |
Author(s)
Toby Dylan Hocking, Guillem Rigaill
Examples
library(PeakSegDP)
data(chr11ChIPseq, envir=environment())
one <- subset(chr11ChIPseq$coverage, sample.id=="McGill0002")[10000:12000,]
fit <- PeakSegDP(one, 3L)
if(interactive() && require(ggplot2)){
ggplot()+
geom_step(aes(chromStart/1e3, count), data=one)+
geom_segment(aes(chromStart/1e3, mean,
xend=chromEnd/1e3, yend=mean),
data=fit$segments, color="green")+
geom_segment(aes(chromStart/1e3, 0,
xend=chromEnd/1e3, yend=0),
data=subset(fit$segments, status=="peak"),
size=3, color="deepskyblue")+
theme_bw()+
theme(panel.margin=grid::unit(0, "cm"))+
facet_grid(peaks ~ ., scales="free", labeller=function(df){
s <- ifelse(df$peaks==1, "", "s")
df$peaks <- paste0(df$peaks, " peak", s)
df
})
}
PoissonLoss
Description
Compute the weighted Poisson loss function, which is seg.mean
-
count
* log(seg.mean
). The edge case is when the mean is zero, in
which case the probability mass function takes a value of 1 when
the data is 0 (and 0 otherwise). Thus the log-likelihood of a
maximum likelihood segment with mean zero must be zero.
Usage
PoissonLoss(count, seg.mean,
weight = 1)
Arguments
count |
count |
seg.mean |
seg.mean |
weight |
weight |
Author(s)
Toby Dylan Hocking, Guillem Rigaill
Examples
PoissonLoss(1, 1)
PoissonLoss(0, 0)
PoissonLoss(1, 0)
PoissonLoss(0, 1)
cDPA
Description
A constrained dynamic programming algorithm (cDPA) can be used to compute the best segmentation with respect to the Poisson likelihood, subject to a constraint on the number of segments, and the changes which must alternate: up, down, up, down, ...
Usage
cDPA(count, weight = rep(1,
length(count)), maxSegments)
Arguments
count |
Integer vector of |
weight |
Data weights (normally this is the number of base pairs). |
maxSegments |
Maximum number of segments to consider. |
Author(s)
Toby Dylan Hocking, Guillem Rigaill
Examples
fit <- cDPA(c(0, 10, 11, 1), maxSegments=3)
stopifnot(fit$ends[3,4] == 3)
stopifnot(fit$ends[2,3] == 1)
calc grad list
Description
List of calc.grad functions: x, features, limits -> gradient.
Usage
"calc.grad.list"
calc loss from lp list
Description
if we have already calculated the linear predictor using fit$predict, this function can be useful.
Usage
"calc.loss.from.lp.list"
calc loss list
Description
List of interval regression loss functions: x, feat, lim => numeric.
Usage
"calc.loss.list"
ChIP-seq aligned read coverage for 4 samples on a subset of chr11
Description
A ChIP-seq experiment was performed to locate the genomic positions of a histone (H3K4me3) in 2 B cell samples (McGill0091, McGill0322) and 2 T cell samples (McGill0002, McGill0004). The short sequence reads (about 100 base pairs each) were aligned to the hg19 reference genome, and the "coverage" in this data set contains the total count of aligned reads at each base pair. It also contains annotated regions determined by an expert who examined scatterplots of the coverage profiles.
Usage
data("chr11ChIPseq")
Format
A named list of 2 data.frames: regions contains annotations about which regions contain or do not contain peaks, and coverage contains the noisy signal.
Source
H3K4me3_TDH_immune chunk 5 in http://cbio.ensmp.fr/~thocking/chip-seq-chunk-db/ which in turn comes from http://epigenomesportal.ca/
Examples
data(chr11ChIPseq)
library(ggplot2)
ann.colors <-
c(noPeaks="#f6f4bf",
peakStart="#ffafaf",
peakEnd="#ff4c4c",
peaks="#a445ee")
if(interactive() && require(ggplot2)){
ggplot()+
scale_fill_manual("annotation", values=ann.colors,
breaks=names(ann.colors))+
penaltyLearning::geom_tallrect(aes(xmin=chromStart/1e3, xmax=chromEnd/1e3,
fill=annotation),
data=chr11ChIPseq$regions, alpha=1/2)+
theme_bw()+
theme(panel.margin=grid::unit(0, "cm"))+
facet_grid(sample.id ~ ., scales="free")+
geom_step(aes(chromStart/1e3, count), data=chr11ChIPseq$coverage)+
xlab("position on chr11 (kilo base pairs)")
}
Counts of first base of aligned reads
Description
For 4 samples on chr11 (hg19), this data set counts the first base pair of aligned reads at each genomic position. In contrast, chr11ChIPseq counts every base pair in each read (and each read is about 100bp, so that means there is some auto-correlation in chr11ChIPseq, but not in chr11first).
Usage
data("chr11first")
Format
A data frame with 23252 observations on the following 4 variables.
sample.id
a factor with levels for each of 4 samples
chromStart
integer vector: base before, on chr11
chromEnd
integer vector: last base on chr11
count
integer: aligned first base read counts
Source
H3K4me3_TDH_immune chunk 5 in http://cbio.ensmp.fr/~thocking/chip-seq-chunk-db/ which in turn comes from http://epigenomesportal.ca/
Examples
data(chr11ChIPseq)
data(chr11first)
library(ggplot2)
ann.colors <-
c(noPeaks="#f6f4bf",
peakStart="#ffafaf",
peakEnd="#ff4c4c",
peaks="#a445ee")
both <- list(coverage=chr11ChIPseq$coverage, first=chr11first)
representations <- NULL
one.sample <- "McGill0322"
for(data.type in names(both)){
one <- subset(both[[data.type]], sample.id==one.sample)
representations <- rbind(representations, data.frame(data.type, one))
}
one.sample.regions <- subset(
chr11ChIPseq$regions, sample.id==one.sample)
if(interactive() && require(ggplot2)){
ggplot()+
scale_fill_manual("annotation", values=ann.colors,
breaks=names(ann.colors))+
penaltyLearning::geom_tallrect(aes(xmin=chromStart/1e3, xmax=chromEnd/1e3,
fill=annotation),
data=one.sample.regions, alpha=1/2)+
theme_bw()+
theme(panel.margin=grid::unit(0, "cm"))+
facet_grid(data.type ~ ., scales="free")+
geom_step(aes(chromStart/1e3, count), data=representations)+
xlab("position on chr11 (kilo base pairs)")
}
derivs
Description
List of functions, each a derivative of a phi loss.
Usage
"derivs"
getPath
Description
Extract endpoint matrix from cDPA
result.
Usage
getPath(A)
Arguments
A |
A |
Author(s)
Toby Dylan Hocking, Guillem Rigaill
phi list
Description
List of functions, each a phi loss.
Usage
"phi.list"
regression funs
Description
List of regression functions: features, limits -> list.
Usage
"regression.funs"