Type: Package
Title: Multisource Graph Synthesis with EHR Data
Version: 0.1.0
Description: We develop Multi-source Graph Synthesis (MUGS), an algorithm designed to create embeddings for pediatric Electronic Health Record (EHR) codes by leveraging graphical information from three distinct sources: (1) pediatric EHR data, (2) EHR data from the general patient population, and (3) existing hierarchical medical ontology knowledge shared across different patient populations. See Li et al. (2024) <doi:10.1038/s41746-024-01320-4> for details.
License: GPL-3
Encoding: UTF-8
LazyData: true
LazyDataCompression: xz
RoxygenNote: 7.3.2
URL: https://github.com/celehs/MUGS, https://celehs.github.io/MUGS/, https://doi.org/10.1038/s41746-024-01320-4
Suggests: knitr, rmarkdown, testthat (≥ 3.0.0)
VignetteBuilder: knitr
Imports: MASS, Matrix, fastDummies, doSNOW, dplyr, grplasso, foreach, glmnet, grpreg, inline, mvtnorm, pROC, parallel, RcppArmadillo, rsvd, methods
Depends: R (≥ 3.5.0)
Config/testthat/edition: 3
NeedsCompilation: no
Packaged: 2025-05-15 03:56:15 UTC; User1
Author: Mengyan Li [cre, aut], Thomas Charlon [ctb] (ORCID: 0000-0001-7497-0470), Xiaoou Li [aut], Tianxi Cai [aut], PARSE LTD [aut], CELEHS Team [aut]
Maintainer: Mengyan Li <mengyanli@bentley.edu>
Repository: CRAN
Date/Publication: 2025-05-19 13:40:09 UTC

Function Used To Estimate Code Effects

Description

This function estimates code effects using left and right embeddings from source and target sites.

Usage

CodeEff_Matrix(
  S.1,
  S.2,
  n1,
  n2,
  U.1,
  U.2,
  V.1,
  V.2,
  common_codes,
  zeta.int,
  lambda,
  p
)

Arguments

S.1

SPPMI from the source site.

S.2

SPPMI from the target site.

n1

The number of codes from the source site.

n2

The number of codes from the target site.

U.1

The left embeddings left singular vectors times the square root of the singular values from the source site.

U.2

The left embeddings left singular vectors times the square root of the singular values from the target site.

V.1

The right embeddings right singular vectors times the square root of the singular values from the source site.

V.2

The right embeddings right singular vectors times the square root of the singular values from the target site.

common_codes

The list of overlapping codes.

zeta.int

The initial estimator for the code effects.

lambda

The tuning parameter controls the intensity of penalization on the code effect.

p

The length of an embedding.

Value

A list with the following elements:

zeta

The estimated code effects.

dif_F

The Frobenius norm difference between the updated and initial estimators.

V.1.new

Updated right embeddings for the source site.

V.2.new

Updated right embeddings for the target site.


Function Used To Estimate Code-Site Effects Parallelly

Description

Function Used To Estimate Code-Site Effects Parallelly

Usage

CodeSiteEff_l2_par(
  S.1,
  S.2,
  n1,
  n2,
  U.1,
  U.2,
  V.1,
  V.2,
  delta.int,
  lambda.delta,
  p,
  common_codes,
  n.common,
  n.core
)

Arguments

S.1

SPPMI from the source site

S.2

SPPMI from the target site

n1

the number of codes from the source site

n2

the number of codes from the target site

U.1

the left embeddings (left singular vectors times the square root of the singular values) from the source site

U.2

the left embeddings (left singular vectors times the square root of the singular values) from the target site

V.1

the right embeddings (right singular vectors times the square root of the singular values) from the source site

V.2

the right embeddings (right singular vectors times the square root of the singular values) from the target site

delta.int

the initial estimator for the code-site effect

lambda.delta

the tuning parameter controls the intensity of penalization on the code-site effects

p

the length of an embedding

common_codes

the list of overlapping codes

n.common

the number of overlapping codes

n.core

the number of cored used for parallel computation

Value

The output for the estimation of code-site effects


Function used to generate input data (used only for Simulations) Generate SPPMIs, dummy matrices based on prior group structures, and code-code pairs for tuning and evaluation

Description

Function used to generate input data (used only for Simulations) Generate SPPMIs, dummy matrices based on prior group structures, and code-code pairs for tuning and evaluation

Usage

DataGen_rare_group(
  seed = NULL,
  p,
  n1,
  n2,
  n.common,
  n.group,
  sigma.eps.1,
  sigma.eps.2,
  ratio.delta,
  network.k,
  rho.beta,
  rho.U0,
  rho.delta,
  sigma.rare,
  n.rare,
  group.size
)

Arguments

seed

for reproducibility

p

the length of an embedding

n1

the number of codes in site 1

n2

the number of codes in site 2

n.common

common: the number of overlapping codes

n.group

the number of groups

sigma.eps.1

the sd of error in site 1

sigma.eps.2

the sd of error in site 2

ratio.delta

the proportion of codes in each site that have site-specific effects applied to them

network.k

the number of distinct blocks within each site for which unique inter-code correlations are modeled

rho.beta

AR parameter for the group effects covariance matrix

rho.U0

AR parameter for the code effects covariance matrix

rho.delta

AR parameter for the code-site effects covariance matrix

sigma.rare

the sd of error for rare codes (usually larger than sigma.eps.1 and sigma.eps.2)

n.rare

The number of rare codes

group.size

the size of each group

Value

Returns input data, SPPMIs, dummy matrices based on prior group structures and code-code pairs for tuning and evaluation


Function Used To Estimate Group Effects Parallelly

Description

Function Used To Estimate Group Effects Parallelly

Usage

GroupEff_par(
  S.MGB,
  S.BCH,
  n.MGB,
  n.BCH,
  U.MGB,
  U.BCH,
  V.MGB,
  V.BCH,
  X.MGB.group,
  X.BCH.group,
  n.group,
  name.list,
  beta.int,
  lambda = 0,
  p,
  n.core
)

Arguments

S.MGB

SPPMI from the source site

S.BCH

SPPMI from the target site

n.MGB

the number of codes from the source site

n.BCH

the number of codes from the target site

U.MGB

the left embeddings (left singular vectors times the square root of the singular values) from the source site

U.BCH

the left embeddings (left singular vectors times the square root of the singular values) from the target site

V.MGB

the right embeddings (right singular vectors times the square root of the singular values) from the source site

V.BCH

the right embeddings (right singular vectors times the square root of the singular values) from the target site

X.MGB.group

the dummy matrix based on prior group structures at the source site

X.BCH.group

the dummy matrix based on prior group structures at the target site

n.group

the number of groups

name.list

the full list of code names from the source site and the target site with repeated names of overlapping codes

beta.int

the initial estimator for the group effects

lambda

the tuning parameter controls the intensity of penalization on the group effect; by default we set it to 0

p

the length of an embedding

n.core

the number of cored used for parallel computation

Value

The output of estimating group effects parallelly


Main function for MUGS algorithm

Description

Main function for MUGS algorithm

Usage

MUGS(
  TUNE = FALSE,
  Eva = TRUE,
  Lambda = c(10),
  Lambda.delta = c(1000),
  n.core = 4,
  tol = 1,
  seed = NULL,
  S.1 = NULL,
  S.2 = NULL,
  X.group.source = NULL,
  X.group.target = NULL,
  pairs.rel.CV = NULL,
  pairs.rel.EV = NULL,
  p = 100,
  n.group = 400,
  outdir = NULL
)

Arguments

TUNE

Logical value indicating whether the function should tune parameters TRUE or use predefined parameters FALSE.

Eva

Logical value indicating whether to perform evaluation (TRUE) or skip it (FALSE).

Lambda

The candidate values for the tuning parameter controlling the intensity of penalization on the code effects.

Lambda.delta

The candidate values for the tuning parameter controlling the intensity of penalization on the code-site effects.

n.core

Integer specifying the number of cores to use for parallel processing.

tol

Numeric value representing the tolerance level for convergence in the algorithm.

seed

Integer used to set the seed for random number generation, ensuring reproducibility. Set to NULL to disable.

S.1

The SPPMI matrix from site 1.

S.2

The SPPMI matrix from site 2.

X.group.source

The dummy matrix representing the group structure of codes at site 1.

X.group.target

The dummy matrix representing the group structure of codes at site 2.

pairs.rel.CV

Code-code pairs used for tuning via cross-validation.

pairs.rel.EV

Code-code pairs used for evaluation.

p

Integer indicating the length of embeddings.

n.group

The number of groups.

outdir

Optional directory to write output files. Defaults to a temporary directory.

Value

A list or saved files containing the embedding matrices, similarity matrices, and site-heterogeneous code analysis.


S.1 Dataset

Description

A matrix containing SPPMI data from the source site. This dataset is used as input for analysis in the package.

Usage

S.1

Format

A matrix with 2000 rows and 10 columns:

Row Names

Unique identifiers for each row.

Columns

Numeric values representing SPPMI data.


S.2 Dataset

Description

A matrix containing SPPMI data from the target site. This dataset is used as input for analysis in the package.

Usage

S.2

Format

A matrix with 2000 rows and 10 columns:

Row Names

Unique identifiers for each row.

Columns

Numeric values representing SPPMI data.


U.1 Dataset

Description

A matrix containing left embeddings from the source site. These embeddings are used for embedding-based computations.

Usage

U.1

Format

A matrix with 2000 rows and 10 columns:

Row Names

Unique identifiers for each row.

Columns

Numeric values representing embeddings.


U.2 Dataset

Description

A matrix containing left embeddings from the target site. These embeddings are used for embedding-based computations.

Usage

U.2

Format

A matrix with 2000 rows and 10 columns:

Row Names

Unique identifiers for each row.

Columns

Numeric values representing embeddings.


X.group.source Dataset

Description

A matrix containing group structures at the source site. It represents binary group membership of entities at the source.

Usage

X.group.source

Format

A matrix with 2000 rows and 50 columns:

Rows

Entities at the source site.

Columns

Binary values (0 or 1) indicating group membership.


X.group.target Dataset

Description

A matrix containing group structures at the target site. It represents binary group membership of entities at the target.

Usage

X.group.target

Format

A matrix with 2000 rows and 50 columns:

Rows

Entities at the target site.

Columns

Binary values (0 or 1) indicating group membership.


Download and Load Example Data from Zenodo

Description

Download and Load Example Data from Zenodo

Usage

download_example_data(file, destdir = tempdir())

Arguments

file

Name of the .Rdata file to download (e.g., "S.1.Rdata").

destdir

Directory to store the downloaded data. Defaults to a temporary directory.

Value

A list containing the loaded dataset.


Function Used For Tuning And Evaluation

Description

Function Used For Tuning And Evaluation

Usage

evaluation.sim(pairs.rel, U, seed = NULL)

Arguments

pairs.rel

the known code-code pairs

U

the code embedding matrix

seed

Optional integer for reproducibility of sampling.

Value

The output of tuning and evaluation


Function For Getting Embedding From SVD

Description

Function For Getting Embedding From SVD

Usage

get_embed(mysvd, d = 2000, normalize = TRUE)

Arguments

mysvd

the (managed) svd result (adding an element with 'names')

d

dim of the final embedding

normalize

if the output embeddings have l2 norm equal to 1

Value

The embedding from SVD


pairs.rel.CV Dataset

Description

A data frame containing cross-validation pairs for relative comparisons.

Usage

pairs.rel.CV

Format

A data frame with multiple columns:

col

Integer representing the column index of a pair.

row

Integer representing the row index of a pair.

type

Character string indicating the type of data (e.g., "train", "test").


pairs.rel.EV Dataset

Description

A data frame containing evaluation pairs for relative comparisons.

Usage

pairs.rel.EV

Format

A data frame with multiple columns:

col

Integer representing the column index of a pair.

row

Integer representing the row index of a pair.

type

Character string indicating the type of data (e.g., "validation").