Type: Package
Title: Convert TCR Gene Names
Description: Convert T Cell Receptor (TCR) gene names between the 10X Genomics, Adaptive Biotechnologies, and ImMunoGeneTics (IMGT) nomenclatures.
Version: 1.0
License: MIT + file LICENSE
URL: https://github.com/seshadrilab/tcrconvertr, https://seshadrilab.github.io/tcrconvertr/
BugReports: https://github.com/seshadrilab/tcrconvertr/issues
Encoding: UTF-8
Imports: stats, utils, rappdirs
RoxygenNote: 7.3.2
Suggests: knitr, rmarkdown, roxyglobals, testthat (≥ 3.0.0), mockery
Config/testthat/edition: 3
Config/roxyglobals/filename: globals.R
Config/roxyglobals/unique: FALSE
VignetteBuilder: knitr
NeedsCompilation: no
Packaged: 2025-04-09 17:05:56 UTC; emmabishop
Author: Emma Bishop ORCID iD [aut, cre, cph]
Maintainer: Emma Bishop <emmab5@uw.edu>
Repository: CRAN
Date/Publication: 2025-04-17 07:20:08 UTC

Add -01 to gene names lacking gene-level info

Description

Some genes just have the IMGT subgroup (e.g. TRBV2) and allele (e.g. *01) designation. The Adaptive format always includes an IMGT gene (e.g. -01) designation, with "-01" as the apparent default. add_dash_one() adds a default gene-level designation if it's missing.

Usage

add_dash_one(gene_str)

Arguments

gene_str

A string, the gene name.

Value

A string, the updated gene name.

Examples

add_dash_one("TRBV2*01")

Create lookup tables

Description

build_lookup_from_fastas() processes IMGT reference FASTA files in a given folder to generate lookup tables used for making gene name conversions. It extracts all gene names and transforms them into 10X and Adaptive formats following predefined conversion rules. The resulting files are created:

The files are stored in a given subfolder (species) within the appropriate application folder via rappdirs. For example:

If a folder named species already exists in that location, it will be replaced.

Usage

build_lookup_from_fastas(data_dir, species)

Arguments

data_dir

A string, the directory containing FASTA files.

species

A string, the name of species that will be used when running TCRconvert with these lookup tables.

Details

Key transformations from IMGT:

Value

A string, path to new lookup directory

Examples

# For the example, create and use a temporary folder
fastadir <- file.path(tempdir(), "TCRconvertR_tmp")
dir.create(fastadir, showWarnings = FALSE, recursive = TRUE)
trav <- get_example_path("fasta_dir/test_trav.fa")
trbv <- get_example_path("fasta_dir/test_trbv.fa")
file.copy(c(trav, trbv), fastadir)

# Build lookup tables
build_lookup_from_fastas(fastadir, "rabbit")

# Clean up temporary folder
unlink(fastadir, recursive = TRUE)

Choose lookup table

Description

choose_lookup() determines which CSV lookup table to use based on the the input format (frm) and returns the path to that file.

Usage

choose_lookup(frm, to, species = "human", verbose = TRUE)

Arguments

frm

A string, the input format of TCR data. Must be one of "tenx", "adaptive", "adaptivev2", or "imgt".

to

A string, the output format of TCR data. Must be one of "tenx", "adaptive", "adaptivev2", or "imgt".

species

A string, the species. Optional; defaults to "human".

verbose

A boolean, whether to show messages. Optional; defaults to TRUE

Value

A string, the path to correct lookup table.

Examples

choose_lookup("imgt", "adaptive")

Convert gene names

Description

convert_gene() converts T-cell receptor (TCR) gene names between the IMGT, 10X, and Adaptive formats. It determines the columns to convert based on the input format (frm) unless specified by the user (frm_cols). It returns a modified version of the input data frame with converted gene names while preserving row order.

Usage

convert_gene(df, frm, to, species = "human", frm_cols = NULL, verbose = TRUE)

Arguments

df

A dataframe containing TCR gene names.

frm

A string, the input format of TCR data. Must be one of "imgt", "tenx", "adaptive", or "adaptivev2".

to

A string, the output format of TCR data. Must be one of "imgt", "tenx", "adaptive", or "adaptivev2".

species

A string,the species. Optional; defaults to "human".

frm_cols

A character vector of custom gene column names. Optional; defaults to NULL.

verbose

A boolean, whether to display messages. Optional; defaults to TRUE.

Details

Gene names are converted by performing a merge between the relevant input columns and a species-specific lookup table containing IMGT reference genes in all three formats.

Behavioral Notes

Standard Column Names

If frm_cols is not provided, these column names will be used if present:

Value

A dataframe with converted TCR gene names.

Examples

tcr_file <- get_example_path("tenx.csv")
df <- read.csv(tcr_file)[c("barcode", "v_gene", "j_gene", "cdr3")]
df
convert_gene(df, "tenx", "adaptive", verbose = FALSE)

Extract all gene names from a folder of FASTAs

Description

extract_imgt_genes() first runs parse_imgt_fasta() on all FASTA files in a given folder to pull out the gene names. Then it returns those names in an alphabetically sorted dataframe.

Usage

extract_imgt_genes(data_dir)

Arguments

data_dir

A string, the path to directory containing FASTA files.

Value

A dataframe of gene names.

Examples

# Given a folder with FASTA files containing these headers:
#   >SomeText|TRAC*01|MoreText|
#   >SomeText|TRAV1-1*01|MoreText|
#   >SomeText|TRAV1-1*02|MoreText|
#   >SomeText|TRAV1-2*01|MoreText|
#   >SomeText|TRAV14/DV4*01|MoreText|
#   >SomeText|TRAV38-1*01|MoreText|
#   >SomeText|TRAV38-2/DV8*01|MoreText|
#   >SomeText|TRBV29-1*01|MoreText|
#   >SomeText|TRBV29-1*02|MoreText|
#   >SomeText|TRBV29/OR9-2*01|MoreText|

fastadir <- get_example_path("fasta_dir/")
extract_imgt_genes(fastadir)

Get full path to an example file or directory

Description

get_example_path() takes a file or folder name that is expected to be located under the TCRconvertR examples directory and gets the full path to that item.

Usage

get_example_path(file_name)

Arguments

file_name

A string, the name of the example file or directory.

Value

A string, the path to example file or directory.

Examples

# Will probably be in a temp folder for the function example
get_example_path("tenx.csv")

Add a 0 to single-digit gene-level designation

Description

pad_single_digit() takes a gene name and ensures that any single-digit number following a sequence of letters is padded with a leading zero. This is to match the Adaptive format.

Usage

pad_single_digit(gene_str)

Arguments

gene_str

A string, the gene name.

Value

A string, the updated gene name.

Examples

pad_single_digit("TCRBV1-2")

Extract gene names from a reference FASTA

Description

parse_imgt_fasta() extracts the second element from a "|"-delimited FASTA header, which will be the gene name for IMGT reference FASTAs.

Usage

parse_imgt_fasta(infile)

Arguments

infile

A string, the path to FASTA file.

Value

A character vector of gene names.

Examples

# Given a FASTA file containing this header:
#   >SomeText|TRBV29-1*01|MoreText|
#   >SomeText|TRBV29-1*02|MoreText|
#   >SomeText|TRBV29/OR9-2*01|MoreText|

fasta <- get_example_path("fasta_dir/test_trbv.fa")
parse_imgt_fasta(fasta)

Save a lookup table to a CSV file

Description

save_lookup() saves a data frame as a CSV file (without row names) in the specified directory.

Usage

save_lookup(df, savedir, name)

Arguments

df

A data frame containing the lookup table data.

savedir

A string, the path to the save directory.

name

A string, the file name (should end in .csv).

Value

Nothing

Examples

# Create a temp save directory and load an example
save_dir <- file.path(tempdir(), "TCRconvertR_tmp")
dir.create(save_dir, showWarnings = FALSE, recursive = TRUE)
dat <- read.csv(get_example_path("fasta_dir/lookup.csv"))

save_lookup(dat, save_dir, "newlookup.csv")

# Clean up temporary folder
unlink(save_dir, recursive = TRUE)

Determine input columns to use

Description

which_frm_cols() determines the columns that are expected to hold gene name information in the input file based on the input format (frm). It returns a vector of those column names.

Usage

which_frm_cols(df, frm, frm_cols = NULL, verbose = TRUE)

Arguments

df

Dataframe containing TCR gene names.

frm

A string, the input format of TCR data. Must be one of "tenx", "adaptive", "adaptivev2", or "imgt".

frm_cols

A character vector, the custom column names to use.

verbose

A boolean, whether to show messages. Optional; defaults to TRUE

Value

A character vector, column names to use.

Examples

tcr_file <- get_example_path("tenx.csv")
df <- read.csv(tcr_file)
which_frm_cols(df, "tenx")