Help for package TCRconvertR

Type:

Package

Title:

Convert TCR Gene Names

Description:

Convert T Cell Receptor (TCR) gene names between the 10X Genomics, Adaptive Biotechnologies, and ImMunoGeneTics (IMGT) nomenclatures.

Version:

1.0

License:

MIT + file LICENSE

URL:

https://github.com/seshadrilab/tcrconvertr, https://seshadrilab.github.io/tcrconvertr/

BugReports:

https://github.com/seshadrilab/tcrconvertr/issues

Encoding:

UTF-8

Imports:

stats, utils, rappdirs

RoxygenNote:

7.3.2

Suggests:

knitr, rmarkdown, roxyglobals, testthat (≥ 3.0.0), mockery

Config/testthat/edition:

Config/roxyglobals/filename:

globals.R

Config/roxyglobals/unique:

FALSE

VignetteBuilder:

knitr

NeedsCompilation:

Packaged:

2025-04-09 17:05:56 UTC; emmabishop

Author:

Emma Bishop

[aut, cre, cph]

Maintainer:

Emma Bishop <emmab5@uw.edu>

Repository:

CRAN

Date/Publication:

2025-04-17 07:20:08 UTC

Add `-01` to gene names lacking gene-level info

Description

Some genes just have the IMGT subgroup (e.g. TRBV2) and allele (e.g. *01) designation. The Adaptive format always includes an IMGT gene (e.g. -01) designation, with "-01" as the apparent default. add_dash_one() adds a default gene-level designation if it's missing.

Usage

add_dash_one(gene_str)

Arguments

gene_str

A string, the gene name.

Value

A string, the updated gene name.

Examples

add_dash_one("TRBV2*01")

Create lookup tables

Description

build_lookup_from_fastas() processes IMGT reference FASTA files in a given folder to generate lookup tables used for making gene name conversions. It extracts all gene names and transforms them into 10X and Adaptive formats following predefined conversion rules. The resulting files are created:

lookup.csv: IMGT gene names and their 10X and Adaptive equivalents.
lookup_from_tenx.csv: Gene names aggregated by their 10X identifiers, with one representative allele (⁠*01⁠) for each.
lookup_from_adaptive.csv: Adaptive gene names, with or without alleles and gene designations, and their IMGT and 10X equivalents.

The files are stored in a given subfolder (species) within the appropriate application folder via rappdirs. For example:

MacOS: ⁠~/Library/Application Support/<AppName>⁠
Windows: ⁠C:\Documents and Settings\<User>\Application Data\Local Settings\<AppAuthor>\<AppName>⁠
Linux: ⁠~/.local/share/<AppName>⁠

If a folder named species already exists in that location, it will be replaced.

Usage

build_lookup_from_fastas(data_dir, species)

Arguments

data_dir

A string, the directory containing FASTA files.

species

A string, the name of species that will be used when running TCRconvert with these lookup tables.

Details

Key transformations from IMGT:

10X:
- Remove allele information (e.g., ⁠*01⁠) and modify ⁠/DV⁠ occurrences.
Adaptive:
- Apply renaming rules, such as adding gene-level designations and zero-padding single-digit numbers.
- Convert constant genes to "NoData" (Adaptive only captures VDJ) which become NA after the merge in convert_gene().

Value

A string, path to new lookup directory

Examples

# For the example, create and use a temporary folder
fastadir <- file.path(tempdir(), "TCRconvertR_tmp")
dir.create(fastadir, showWarnings = FALSE, recursive = TRUE)
trav <- get_example_path("fasta_dir/test_trav.fa")
trbv <- get_example_path("fasta_dir/test_trbv.fa")
file.copy(c(trav, trbv), fastadir)

# Build lookup tables
build_lookup_from_fastas(fastadir, "rabbit")

# Clean up temporary folder
unlink(fastadir, recursive = TRUE)

Choose lookup table

Description

choose_lookup() determines which CSV lookup table to use based on the the input format (frm) and returns the path to that file.

Usage

choose_lookup(frm, to, species = "human", verbose = TRUE)

Arguments

frm

A string, the input format of TCR data. Must be one of "tenx", "adaptive", "adaptivev2", or "imgt".

to

A string, the output format of TCR data. Must be one of "tenx", "adaptive", "adaptivev2", or "imgt".

species

A string, the species. Optional; defaults to "human".

verbose

A boolean, whether to show messages. Optional; defaults to TRUE

Value

A string, the path to correct lookup table.

Examples

choose_lookup("imgt", "adaptive")

Convert gene names

Description

convert_gene() converts T-cell receptor (TCR) gene names between the IMGT, 10X, and Adaptive formats. It determines the columns to convert based on the input format (frm) unless specified by the user (frm_cols). It returns a modified version of the input data frame with converted gene names while preserving row order.

Usage

convert_gene(df, frm, to, species = "human", frm_cols = NULL, verbose = TRUE)

Arguments

df

A dataframe containing TCR gene names.

frm

A string, the input format of TCR data. Must be one of "imgt", "tenx", "adaptive", or "adaptivev2".

to

A string, the output format of TCR data. Must be one of "imgt", "tenx", "adaptive", or "adaptivev2".

species

A string,the species. Optional; defaults to "human".

frm_cols

A character vector of custom gene column names. Optional; defaults to NULL.

verbose

A boolean, whether to display messages. Optional; defaults to TRUE.

Details

Gene names are converted by performing a merge between the relevant input columns and a species-specific lookup table containing IMGT reference genes in all three formats.

Behavioral Notes

If a gene name cannot be mapped, it is replaced with NA and a warning is raised.
If frm is 'imgt' and frm_cols is not provided, 10X column names are assumed.
Constant (C) genes are set to NA when converting to Adaptive formats, as Adaptive does not capture constant regions.
The input does not need to include all gene types; partial inputs (e.g., only V genes) are supported.
If no values in a custom column can be mapped (e.g. a CDR3 column) it is skipped and a warning is raised.

Standard Column Names

If frm_cols is not provided, these column names will be used if present:

IMGT: "v_gene", "d_gene", "j_gene", "c_gene"
10X: "v_gene", "d_gene", "j_gene", "c_gene"
Adaptive: "v_resolved", "d_resolved", "j_resolved"
Adaptive v2: "vMaxResolved", "dMaxResolved", "jMaxResolved"

Value

A dataframe with converted TCR gene names.

Examples

tcr_file <- get_example_path("tenx.csv")
df <- read.csv(tcr_file)[c("barcode", "v_gene", "j_gene", "cdr3")]
df
convert_gene(df, "tenx", "adaptive", verbose = FALSE)

Extract all gene names from a folder of FASTAs

Description

extract_imgt_genes() first runs parse_imgt_fasta() on all FASTA files in a given folder to pull out the gene names. Then it returns those names in an alphabetically sorted dataframe.

Usage

extract_imgt_genes(data_dir)

Arguments

data_dir

A string, the path to directory containing FASTA files.

Value

A dataframe of gene names.

Examples

# Given a folder with FASTA files containing these headers:
#   >SomeText|TRAC*01|MoreText|
#   >SomeText|TRAV1-1*01|MoreText|
#   >SomeText|TRAV1-1*02|MoreText|
#   >SomeText|TRAV1-2*01|MoreText|
#   >SomeText|TRAV14/DV4*01|MoreText|
#   >SomeText|TRAV38-1*01|MoreText|
#   >SomeText|TRAV38-2/DV8*01|MoreText|
#   >SomeText|TRBV29-1*01|MoreText|
#   >SomeText|TRBV29-1*02|MoreText|
#   >SomeText|TRBV29/OR9-2*01|MoreText|

fastadir <- get_example_path("fasta_dir/")
extract_imgt_genes(fastadir)

Get full path to an example file or directory

Description

get_example_path() takes a file or folder name that is expected to be located under the TCRconvertR examples directory and gets the full path to that item.

Usage

get_example_path(file_name)

Arguments

file_name

A string, the name of the example file or directory.

Value

A string, the path to example file or directory.

Examples

# Will probably be in a temp folder for the function example
get_example_path("tenx.csv")

Add a `0` to single-digit gene-level designation

Description

pad_single_digit() takes a gene name and ensures that any single-digit number following a sequence of letters is padded with a leading zero. This is to match the Adaptive format.

Usage

pad_single_digit(gene_str)

Arguments

gene_str

A string, the gene name.

Value

A string, the updated gene name.

Examples

pad_single_digit("TCRBV1-2")

Extract gene names from a reference FASTA

Description

parse_imgt_fasta() extracts the second element from a "|"-delimited FASTA header, which will be the gene name for IMGT reference FASTAs.

Usage

parse_imgt_fasta(infile)

Arguments

infile

A string, the path to FASTA file.

Value

A character vector of gene names.

Examples

# Given a FASTA file containing this header:
#   >SomeText|TRBV29-1*01|MoreText|
#   >SomeText|TRBV29-1*02|MoreText|
#   >SomeText|TRBV29/OR9-2*01|MoreText|

fasta <- get_example_path("fasta_dir/test_trbv.fa")
parse_imgt_fasta(fasta)

Save a lookup table to a CSV file

Description

save_lookup() saves a data frame as a CSV file (without row names) in the specified directory.

Usage

save_lookup(df, savedir, name)

Arguments

df

A data frame containing the lookup table data.

savedir

A string, the path to the save directory.

name

A string, the file name (should end in .csv).

Value

Nothing

Examples

# Create a temp save directory and load an example
save_dir <- file.path(tempdir(), "TCRconvertR_tmp")
dir.create(save_dir, showWarnings = FALSE, recursive = TRUE)
dat <- read.csv(get_example_path("fasta_dir/lookup.csv"))

save_lookup(dat, save_dir, "newlookup.csv")

# Clean up temporary folder
unlink(save_dir, recursive = TRUE)

Determine input columns to use

Description

which_frm_cols() determines the columns that are expected to hold gene name information in the input file based on the input format (frm). It returns a vector of those column names.

Usage

which_frm_cols(df, frm, frm_cols = NULL, verbose = TRUE)

Arguments

df

Dataframe containing TCR gene names.

frm

A string, the input format of TCR data. Must be one of "tenx", "adaptive", "adaptivev2", or "imgt".

frm_cols

A character vector, the custom column names to use.

verbose

A boolean, whether to show messages. Optional; defaults to TRUE

Value

A character vector, column names to use.

Examples

tcr_file <- get_example_path("tenx.csv")
df <- read.csv(tcr_file)
which_frm_cols(df, "tenx")

Add -01 to gene names lacking gene-level info

Description

Usage

Arguments

Value

Examples

Create lookup tables

Description

Usage

Arguments

Details

Value

Examples

Choose lookup table

Description

Usage

Arguments

Value

Examples

Convert gene names

Description

Usage

Arguments

Details

Value

Examples

Extract all gene names from a folder of FASTAs

Description

Usage

Arguments

Value

Examples

Get full path to an example file or directory

Description

Usage

Arguments

Value

Examples

Add a 0 to single-digit gene-level designation

Description

Usage

Arguments

Value

Examples

Extract gene names from a reference FASTA

Description

Usage

Arguments

Value

Examples

Save a lookup table to a CSV file

Description

Usage

Arguments

Value

Examples

Determine input columns to use

Description

Usage

Arguments

Value

Examples

Add `-01` to gene names lacking gene-level info

Add a `0` to single-digit gene-level designation