Version: 1.0.13
Date: 2023-9-14
Title: Distance Based Cell Lineage Reconstruction
Author: Il-Youp Kwak [aut, cre], Wuming Gong [aut]
Maintainer: Il-Youp Kwak <ikwak2@cau.ac.kr>
License: GPL-3
VignetteBuilder: knitr
LinkingTo: Rcpp, RcppArmadillo
RoxygenNote: 7.2.3
Depends: R (≥ 4.1.0), tensorflow(≥ 2.2.0)
Encoding: UTF-8
Imports: BiocParallel, dplyr, Matrix, matrixStats, ape, phangorn, Rcpp, igraph, methods, purrr, stringr, tidyr, rBayesianOptimization, rlang, BiocGenerics
Suggests: knitr, rmarkdown, markdown
Description: R codes for distance based cell lineage reconstruction. Our methods won both sub-challenges 2 and 3 of the Allen Institute Cell Lineage Reconstruction DREAM Challenge in 2020. References: Gong et al. (2021) <doi:10.1016/j.cels.2021.05.008>, Gong et al. (2022) <doi:10.1186/s12859-022-04633-x>.
URL: https://github.com/ikwak2/DCLEAR
NeedsCompilation: yes
Packaged: 2023-09-14 07:01:28 UTC; ikwak2
Repository: CRAN
Date/Publication: 2023-09-14 07:32:35 UTC

DCLEAR: A package for DCLEAR: Distance based Cell LinEAge Reconstruction

Description

Distance based methods for inferring lineage treess from single cell data


WH

Description

implementation of weighted hamming algorithm

Usage

WH(x, InfoW, dropout = FALSE)

Arguments

x

Sequence object of 'phyDat' type.

InfoW

Weight vector for the calculation of weighted hamming distance

dropout

Different weighting strategy is taken to consider interval dropout with dropout = 'TRUE'. Default is, dropout = 'FALSE'.

Value

Calculated distance matrix of input sequences. The result is a 'dist' class object.

Author(s)

Il-Youp Kwak

Examples


set.seed(1)
library(phangorn)
mu_d1 = c( 30, 20, 10, 5, 5, 1, 0.01, 0.001)
mu_d1 = mu_d1/sum(mu_d1)
simn = 10 # number of cell samples
m = 10  ## number of targets
sD = sim_seqdata(sim_n = simn, m = m, mu_d = 0.03,
        d = 12, n_s = length(mu_d1), outcome_prob = mu_d1, p_d = 0.005 )

## RF score with hamming distance
D_h = dist.hamming(sD$seqs)
tree_h= NJ(D_h)
RF.dist(tree_h, sD$tree, normalize = TRUE)

## RF score with weighted hamming
InfoW = -log(mu_d1)
InfoW[1:2] = 1
InfoW[3:7] = 4.5

D_wh = WH(sD$seqs, InfoW)
tree_wh= NJ(D_wh)
RF.dist(tree_wh, sD$tree, normalize = TRUE)

## RF score with weighted hamming, cosidering dropout situation
nfoW = -log(mu_d1)
InfoW[1] = 1
InfoW[2] = 12
InfoW[3:7] = 3

D_wh2 = WH(sD$seqs, InfoW, dropout=TRUE)
tree_wh2= NJ(D_wh2)
RF.dist(tree_wh2, sD$tree, normalize = TRUE)


Train weights for WH

Description

Train weights for WH and output weight vector

Usage

WH_train(X, loc0 = 2, locDropout = 1, locMissing = FALSE)

Arguments

X

a list of k number of input data, X[[1]] ... X[[k]]. The ith data have sequence information as phyDat format in X[[i]][[1]], and tree information in X[[i]][[2]] as phylo format.

loc0

weight location of initial state

locDropout

weight location of dropout state

locMissing

weight location of missing state, FALSE if there is no missing values

Value

a weight vector

Author(s)

Il-Youp Kwak (ikwak2@cau.ac.kr)


Train weights for WH, and output distance object

Description

Train weights for WH using the given data, and fit the distance matrix for a input sequence.

Usage

WH_train_fit(x, X)

Arguments

x

input data in phyDat format

X

a list of k number of input data, X[[1]] ... X[[k]]. The ith data have sequence information as phyDat format in X[[i]][[1]], and tree information in X[[i]][[2]] as phylo format.

Value

a dist object

Author(s)

Il-Youp Kwak (ikwak2@cau.ac.kr)


add_deletion

Description

Add deletion

Usage

add_deletion(x, tree, mutation_site, config)

Arguments

x

a character matrix

tree

a matrix representing the lineage tree

mutation_site

a binary matrix for mutation site

config

a lineage_tree_config object

Value

a character matrix with deletions


add_dropout

Description

Add dropout events

Usage

add_dropout(x, config)

Arguments

x

a character matrix

config

a lineage_tree_config object

Value

a character matrix with dropout events


Generic function for as_igraph

Description

Generic function for as_igraph

Usage

as_igraph(x, ...)

Arguments

x

a phylo object

...

additional parameters


as_igraph

Description

Convert an phylo object to an igraph object, while keeping the weight (in contrast to igraph::as.igraph)

Usage

## S4 method for signature 'data.frame'
as_igraph(x, config)

Arguments

x

a phylo object

config

a 'lineage_tree_config' object

Value

an igraph object


as_igraph

Description

Convert an phylo object to an igraph object, while keeping the weight (in contrast to igraph::as.igraph)

Usage

## S4 method for signature 'phylo'
as_igraph(x)

Arguments

x

a phylo object

Value

an igraph object


Generic function for as_lineage_tree

Description

Generic function for as_lineage_tree

Usage

as_lineage_tree(x, y, config, ...)

Arguments

x

a phyDat object

y

a phylo object

config

a lineage_tree_config object

...

additional parameters


as_lineage_tree

Description

Convert a phylo object and a phyDat object to a lineage_tree object

Usage

## S4 method for signature 'phyDat,phylo,lineage_tree_config'
as_lineage_tree(x, y, config, ...)

Arguments

x

a phyDat object

y

a phylo object

config

a lineage_tree_config object

...

additional parameters

Value

a lineage_tree object


Generic function for as_phylo

Description

Generic function for as_phylo

Usage

as_phylo(x, ...)

Arguments

x

a graph object

...

additional parameters


as_phylo

Description

Convert an igraph object to a phylo object

Usage

## S4 method for signature 'igraph'
as_phylo(x)

Arguments

x

an igraph object

Value

a phylo object or a igraph object


Core function of computing kmer replacement distance

Description

Compute the sequence distance matrix using inferred kmer replacement matrix

Usage

dist_kmer_replacement_inference(x, kmer_summary, k = 2)

Arguments

x

input data in phyDat format

kmer_summary

a kmer_summary object

k

k-mers (default k=2)

Value

a dist object

Author(s)

Wuming Gong (gongx030@umn.edu)


Generic function for dist_replacement

Description

Generic function for dist_replacement

Usage

dist_replacement(x, kmer_summary, k, ...)

Arguments

x

a sequence object

kmer_summary

a kmer_summary object

k

k-mer length

...

additional parameters


Compute the kmer replacement distance

Description

Compute the kmer replacement distance between sequences

Usage

## S4 method for signature 'phyDat,kmer_summary,integer'
dist_replacement(x, kmer_summary, k = 2, ...)

Arguments

x

input data in phyDat format

kmer_summary

a kmer_summary object

k

k-mer length

...

other arguments passed to substr_kmer

Value

a dist object

Author(s)

Wuming Gong (gongx030@umn.edu)


Compute the kmer replacement distance

Description

Compute the kmer replacement distance between sequences

Usage

## S4 method for signature 'phyDat,missing,integer'
dist_replacement(x, kmer_summary, k = 2L, ...)

Arguments

x

input data in phyDat format

kmer_summary

a kmer_summary object

k

k-mer length

...

other arguments passed to substr_kmer

Value

a dist object

Author(s)

Wuming Gong (gongx030@umn.edu)


Generic function for dist_weighted_hamming

Description

Generic function for dist_weighted_hamming

Usage

dist_weighted_hamming(x, wVec, ...)

Arguments

x

a sequence object

wVec

weight vector

...

additional parameters


dist_weighted_hamming

Description

implementation of weighted hamming algorithm

Usage

## S4 method for signature 'phyDat,numeric'
dist_weighted_hamming(x, wVec, dropout = FALSE)

Arguments

x

Sequence object of 'phyDat' type.

wVec

Weight vector for the calculation of weighted hamming distance

dropout

Different weighting strategy is taken to consider interval dropout with dropout = 'TRUE'. Default is, dropout = 'FALSE'.

Value

Calculated distance matrix of input sequences. The result is a 'dist' class object.

Author(s)

Il-Youp Kwak

Examples


library(DCLEAR)
library(phangorn)
library(ape)

set.seed(1)
mu_d1 = c( 30, 20, 10, 5, 5, 1, 0.01, 0.001)
mu_d1 = mu_d1/sum(mu_d1)
simn = 10 # number of cell samples
m = 10  ## number of targets
sD = sim_seqdata(sim_n = simn, m = m, mu_d = 0.03,
      d = 12, n_s = length(mu_d1), outcome_prob = mu_d1, p_d = 0.005 )
## RF score with hamming distance
D_hm = dist.hamming(sD$seqs)
tree_hm = NJ(D_hm)
RF.dist(tree_hm, sD$tree, normalize = TRUE)

## RF score with weighted hamming
InfoW = -log(mu_d1)
InfoW[1:2] = 1
InfoW[3:7] = 4.5
D_wh = dist_weighted_hamming(sD$seqs, InfoW, dropout = FALSE)
tree_wh = NJ(D_wh)
RF.dist(tree_wh, sD$tree, normalize = TRUE)

## RF score with weighted hamming, cosidering dropout situation
nfoW = -log(mu_d1)
InfoW[1] = 1
InfoW[2] = 12
InfoW[3:7] = 3
D_wh2 = dist_weighted_hamming(sD$seqs, InfoW, dropout = TRUE)
tree_wh2= NJ(D_wh2)
RF.dist(tree_wh2, sD$tree, normalize = TRUE)



Generic function for downsample

Description

Generic function for downsample

Usage

downsample(x, ...)

Arguments

x

a data object

...

additional parameters


downsample

Description

Sample a lineage tree

Usage

## S4 method for signature 'igraph'
downsample(x, n = 10L, ...)

Arguments

x

a igraph object

n

number of leaves (tips) in the down-sampled tree

...

additional parameters

Value

a phylo object


downsample

Description

Sample a lineage tree

Usage

## S4 method for signature 'lineage_tree'
downsample(x, n = 10L, ...)

Arguments

x

a lineage_tree object

n

number of leaves (tips) in the down-sampled tree

...

additional parameters

Value

a lineage_tree object


get_distance_prior

Description

prior distribution of distance

Usage

get_distance_prior(x)

Arguments

x

a kmer_summary object

Value

a probabilistic vector of the distribution of nodal distances

Author(s)

Wuming Gong (gongx030@umn.edu)


Generic function for get_leaves

Description

Generic function for get_leaves

Usage

get_leaves(x, ...)

Arguments

x

a lineage_tree object

...

additional parameters


get_leaves

Description

Get the leaf sequences

Usage

## S4 method for signature 'lineage_tree'
get_leaves(x, ...)

Arguments

x

a lineage_tree object

...

additional parameters

Value

a phyDat object


get_node_names

Description

Convenient function for get node names

Usage

get_node_names(x)

Arguments

x

node id

Value

node names

Author(s)

Wuming Gong (gongx030@umn.edu)


get_replacement_probability

Description

Compute p(A,B|d), the conditional probability of seeing a replacement of from kmer A to B or vice versa

Usage

get_replacement_probability(x)

Arguments

x

a kmer_summary object

Value

an 3D probabilistic array (kmers by kmers by distances)

Author(s)

Wuming Gong (gongx030@umn.edu)


get_sequence

Description

Get sequencees

Usage

get_sequence(x, tree, outcome, config)

Arguments

x

a character matrix

tree

a matrix representing the lineage tree

outcome

a character matrix

config

a lineage_tree_config object

Value

a character matrix


get_transition_probability

Description

Compute p(A,X|B,Y,d), the conditional probability of seeing a replacement from A to B given the previous replacement B from Y at nodal distance d

Usage

get_transition_probability(x)

Arguments

x

a kmer_summary object

Value

an 3D probabilistic array (kmers by kmers by distances)

Author(s)

Wuming Gong (gongx030@umn.edu)


Lineage data

Description

Lineage data

Usage

data(lineages)

Format

An object of class list of length 100.

Examples

data(lineages)

positional_mutation_prob

Description

Convenient function for get node names

Usage

positional_mutation_prob(x, config)

Arguments

x

a phyDat object

config

a lineage_tree_config object

Value

a positional mutation probability matrix


Generic function for process_sequence

Description

Generic function for process_sequence

Usage

process_sequence(x, ...)

Arguments

x

a sequence object

...

additional parameters


Process sequences

Description

Process sequences

Usage

## S4 method for signature 'phyDat'
process_sequence(
  x,
  division = 16L,
  dropout_character = "*",
  default_character = "0",
  deletion_character = "-"
)

Arguments

x

input data in phyDat format

division

cell division

dropout_character

Dropout character (default: '*')

default_character

Default character (default: '0')

deletion_character

Deletion character (default: '-')

Value

a 'lineage_tree_config' object

Author(s)

Wuming Gong (gongx030@umn.edu)


Generic function for prune

Description

Generic function for prune

Usage

prune(x, ...)

Arguments

x

a lineage_tree object

...

additional parameters


prune

Description

Trim a full lineage tree into phylogenetic tree

Usage

## S4 method for signature 'igraph'
prune(x, weighted = TRUE, ...)

Arguments

x

an igraph object

weighted

whether or not keep the edge weight (default: TRUE)

...

additional parameters

Value

an igraph object


prune

Description

Trim a full lineage tree into phylogenetic tree

Usage

## S4 method for signature 'lineage_tree'
prune(x, ...)

Arguments

x

a lineage_tree object

...

additional parameters passed to as_phylo()

Value

a lineage_tree object


random_tree

Description

Simulate a random lineage tree

Usage

random_tree(n_samples, division = 16L)

Arguments

n_samples

number of samples to simulate

division

number of cell division

Value

a data frame

Author(s)

Wuming Gong (gongx030@umn.edu)


rbind

Description

Concatenate multiple phyDat objects

Usage

## S4 method for signature 'phyDat'
rbind(..., deparse.level = 1)

Arguments

...

a list of phyDat objects

deparse.level

see definition in generic rbind

Value

a phyDat object


sample_mutation_outcome

Description

Sample mutation outcome

Usage

sample_mutation_outcome(x, mp = NULL, config)

Arguments

x

an igraph object

mp

a mutation site matrix

config

a lineage_tree_config object

Value

a outcome matrix


sample_mutation_site

Description

Sample mutation site

Usage

sample_mutation_site(tree, config)

Arguments

tree

a data frame

config

a lineage_tree_config object

Value

a mutation site matrix


sample_outcome_prob

Description

Sampling outcome probability based on a gamma distribution

Usage

sample_outcome_prob(config, num_states = 20L, shape = 0.1, scale = 2)

Arguments

config

a lineage_tree_config object

num_states

number of states used in simulation.

shape

shape parameter in gamma distribution

scale

scale parameter in gamma distribution

Value

a probability vector for each alphabet

Author(s)

Wuming Gong (gongx030@umn.edu)


score_simulation

Description

Compare two sets of sequences

Usage

score_simulation(x, y, config)

Arguments

x

a character matrix

y

a character matrix

config

a lineage_tree_config object

Value

numeric scores


sim_seqdata

Description

Generate singe cell barcode data set with tree shaped lineage information

Usage

sim_seqdata(
  sim_n = 200,
  m = 200,
  mu_d = 0.03,
  d = 15,
  n_s = 23,
  outcome_prob = NULL,
  p_d = 0.003
)

Arguments

sim_n

Number of cell samples to simulate.

m

Number of targets.

mu_d

Mutation rate. (a scalar or a vector)

d

Number of cell divisions.

n_s

Number of possible outcome states

outcome_prob

Outcome probability vector (default is NULL)

p_d

Dropout probability

Value

The result is a list containing two objects, 'seqs' and 'tree'. The 'seqs' is 'phyDat' object of 'sim_n' number of simulated barcodes corresponding to each cell, and The 'tree' is a 'phylo' object, a ground truth tree structure for the simulated data.

Author(s)

Il-Youp Kwak

Examples


library(DCLEAR)
library(phangorn)
library(ape)

set.seed(1)
mu_d1 = c( 30, 20, 10, 5, 5, 1, 0.01, 0.001)
mu_d1 = mu_d1/sum(mu_d1)
simn = 10 # number of cell samples
m = 10  ## number of targets
sD = sim_seqdata(sim_n = simn, m = m, mu_d = 0.03,
        d = 12, n_s = length(mu_d1), outcome_prob = mu_d1, p_d = 0.005 )
## RF score with hamming distance
D_hm = dist.hamming(sD$seqs)
tree_hm = NJ(D_hm)
RF.dist(tree_hm, sD$tree, normalize = TRUE)

## RF score with weighted hamming
InfoW = -log(mu_d1)
InfoW[1:2] = 1
InfoW[3:7] = 4.5
D_wh = dist_weighted_hamming(sD$seqs, InfoW, dropout=FALSE)
tree_wh = NJ(D_wh)
RF.dist(tree_wh, sD$tree, normalize = TRUE)

## RF score with weighted hamming, cosidering dropout situation
nfoW = -log(mu_d1)
InfoW[1] = 1
InfoW[2] = 12
InfoW[3:7] = 3
D_wh2 = dist_weighted_hamming(sD$seqs, InfoW, dropout = TRUE)
tree_wh2= NJ(D_wh2)
RF.dist(tree_wh2, sD$tree, normalize = TRUE)


Generic function for simulate

Description

Generic function for simulate

Usage

simulate(config, x, ...)

Arguments

config

a lineage_tree_config object

x

a sequence object

...

additional parameters


simulate

Description

Simulate a cell lineage tree Adoped from https://github.com/elifesciences-publications/CRISPR_recorders_sims/blob/master/MATLAB_sims/GESTALT_30hr_1x_simulation.m

Usage

## S4 method for signature 'lineage_tree_config,missing'
simulate(config, x, n_samples = 200, ...)

Arguments

config

simulation configuration; a lineage_tree_config object

x

missing

n_samples

number of samples to simulate

...

additional parameters

Value

a lineage_tree object

Author(s)

Wuming Gong (gongx030@umn.edu)


simulate

Description

Simulate a cell lineage tree based on a set of sequences

Usage

## S4 method for signature 'lineage_tree_config,phyDat'
simulate(config, x, n_samples = 200L, k = 50, greedy = TRUE, ...)

Arguments

config

simulation configuration; a lineage_tree_config object

x

a sequence object

n_samples

number of samples to simulate

k

Number of trials

greedy

Whether ot not use a greedy search

...

additional parameters

Value

a lineage_tree object

Author(s)

Wuming Gong (gongx030@umn.edu)


simulate_core

Description

Simulate a cell lineage tree Adoped from https://github.com/elifesciences-publications/CRISPR_recorders_sims/blob/master/MATLAB_sims/GESTALT_30hr_1x_simulation.m

Usage

simulate_core(config, tree, mutation_site, outcome)

Arguments

config

simulation configuration; a lineage_tree_config object

tree

a matrix representing the lineage tree

mutation_site

a binary matrix indicating the mutation sites

outcome

a character matrix

Value

a 'lineage_tree' object


Generic function for substr_kmer

Description

Generic function for substr_kmer

Usage

substr_kmer(x, ...)

Arguments

x

a kmer object

...

additional parameters


Subseting a kmer_summary object

Description

Summarize the short k-mer summary from the long k-mer summary

Usage

## S4 method for signature 'kmer_summary'
substr_kmer(x, k = 2)

Arguments

x

a kmer_summary object

k

k-mer length(default: 2)

Value

a new kmer_summary object

Author(s)

Wuming Gong (gongx030@umn.edu)


Generic function for subtract

Description

Generic function for subtract

Usage

subtract(x, y, ...)

Arguments

x

a lineage_tree object

y

a lineage_tree object

...

additional parameters


subtract

Description

Subtract a subtree from a large tree

Usage

## S4 method for signature 'lineage_tree,lineage_tree'
subtract(x, y, ...)

Arguments

x

a lineage_tree object

y

a lineage_tree object

...

additional parameters

Value

a lineage_tree object


Generic function for subtree

Description

Generic function for subtree

Usage

subtree(x, ...)

Arguments

x

a lineage_tree object

...

additional parameters


subtree

Description

Extract a subtree with specific leaves

Usage

## S4 method for signature 'lineage_tree'
subtree(x, leaves = NULL, ...)

Arguments

x

a lineage_tree object

leaves

leaves of the extracted tree

...

additional parameters

Value

a lineage_tree object


subtree

Description

Extract a subtree with specific leaves

Usage

## S4 method for signature 'phylo'
subtree(x, leaves = NULL, ...)

Arguments

x

a phylo object

leaves

leaves of the extracted tree

...

additional parameters

Value

a pylo object


Generic function for summarize_kmer

Description

Generic function for summarize_kmer

Usage

summarize_kmer(x, ...)

Arguments

x

a sequence object

...

additional parameters


summarize_kmer

Description

Summarize kmer distributions with input sequences

Usage

## S4 method for signature 'phyDat'
summarize_kmer(
  x,
  division = 16L,
  k = 2,
  reps = 20L,
  n_samples = 200L,
  n_nodes = 100L,
  n_targets
)

Arguments

x

input data as a phyDat object

division

number of cell division

k

k-mer (default = 2)

reps

number of simulated trees

n_samples

number of samples to simulate

n_nodes

number of nodes to sample (including both leaves and internval nodes)

n_targets

sequence length. If this argument is missing, the length of the input sequences will be used.

Value

a kmer_summary object

Author(s)

Wuming Gong (gongx030@umn.edu)


summarize_kmer_core

Description

Summarize kmer distributions (core function)

Usage

summarize_kmer_core(
  k = 2,
  reps = 20L,
  n_samples = 200L,
  n_nodes = 100L,
  config = NULL
)

Arguments

k

k-mer (default = 2)

reps

number of simulated trees

n_samples

number of samples to simulate

n_nodes

number of nodes to sample (including both leaves and internval nodes)

config

lineage tree configuration (a lineage_tree_config object)

Value

a kmer_summary object

Author(s)

Wuming Gong (gongx030@umn.edu)