Type: | Package |
Title: | Brings Seurat to the Tidyverse |
Version: | 0.8.0 |
Description: | It creates an invisible layer that allow to see the 'Seurat' object as tibble and interact seamlessly with the tidyverse. |
License: | GPL-3 |
Depends: | R (≥ 4.1.0), ttservice (≥ 0.3.8), SeuratObject |
Imports: | Seurat (≥ 4.3.0), tibble, dplyr, magrittr, tidyr (≥ 1.2.0), ggplot2, rlang, purrr, lifecycle, methods, plotly, tidyselect, utils, ellipsis, vctrs, pillar, stringr, cli, fansi, Matrix |
Suggests: | testthat, knitr, GGally, markdown, SingleR |
VignetteBuilder: | knitr |
RdMacros: | lifecycle |
Biarch: | true |
biocViews: | AssayDomain, Infrastructure, RNASeq, DifferentialExpression, GeneExpression, Normalization, Clustering, QualityControl, Sequencing, Transcription, Transcriptomics |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.2.3 |
URL: | https://github.com/stemangiola/tidyseurat |
BugReports: | https://github.com/stemangiola/tidyseurat/issues |
NeedsCompilation: | no |
Packaged: | 2024-01-09 23:19:30 UTC; mangiola.s |
Author: | Stefano Mangiola [aut, cre], Maria Doyle [ctb] |
Maintainer: | Stefano Mangiola <mangiolastefano@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2024-01-10 04:50:02 UTC |
Pipe operator
Description
See magrittr::%>%
for details.
Usage
lhs %>% rhs
Value
void
Examples
data(pbmc_small)
pbmc_small %>% print()
Add class to abject
Description
Add class to abject
Usage
add_class(var, name)
Arguments
var |
A tibble |
name |
A character name of the attribute |
Value
A tibble with an additional attribute
Aggregate cells
Description
Combine cells into groups based on shared variables and aggregate feature counts.
Usage
## S4 method for signature 'Seurat'
aggregate_cells(
.data,
.sample = NULL,
slot = "data",
assays = NULL,
aggregation_function = Matrix::rowSums,
...
)
Arguments
.data |
A tidyseurat object |
.sample |
A vector of variables by which cells are aggregated |
slot |
The slot to which the function is applied |
assays |
The assay to which the function is applied |
aggregation_function |
The method of cell-feature value aggregation |
... |
Used for future extendibility |
Value
A tibble object
Examples
data(pbmc_small)
pbmc_small_pseudo_bulk <- pbmc_small |>
aggregate_cells(c(groups, letter.idents), assays="RNA")
Order rows using column values
Description
arrange()
orders the rows of a data frame by the values of selected
columns.
Unlike other dplyr verbs, arrange()
largely ignores grouping; you
need to explicitly mention grouping variables (or use .by_group = TRUE
)
in order to group by them, and functions of variables are evaluated
once per data frame, not once per group.
Usage
## S3 method for class 'Seurat'
arrange(.data, ..., .by_group = FALSE)
Arguments
.data |
A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details. |
... |
< |
.by_group |
If |
Details
Missing values
Unlike base sorting with sort()
, NA
are:
always sorted to the end for local data, even when wrapped with
desc()
.treated differently for remote data, depending on the backend.
Value
An object of the same type as .data
. The output has the following
properties:
All rows appear in the output, but (usually) in a different place.
Columns are not modified.
Groups are not modified.
Data frame attributes are preserved.
Methods
This function is a generic, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
The following methods are currently available in loaded packages: no methods found.
See Also
Other single table verbs:
mutate()
,
rename()
,
slice()
,
summarise()
Examples
data(pbmc_small)
pbmc_small |>
arrange(nFeature_RNA)
Coerce lists, matrices, and more to data frames
Description
as_tibble()
turns an existing object, such as a data frame or
matrix, into a so-called tibble, a data frame with class tbl_df
. This is
in contrast with tibble()
, which builds a tibble from individual columns.
as_tibble()
is to tibble()
as base::as.data.frame()
is to
base::data.frame()
.
as_tibble()
is an S3 generic, with methods for:
-
data.frame
: Thin wrapper around thelist
method that implements tibble's treatment of rownames. Default: Other inputs are first coerced with
base::as.data.frame()
.
as_tibble_row()
converts a vector to a tibble with one row.
If the input is a list, all elements must have size one.
as_tibble_col()
converts a vector to a tibble with one column.
Usage
## S3 method for class 'Seurat'
as_tibble(
x,
...,
.name_repair = c("check_unique", "unique", "universal", "minimal"),
rownames = NULL
)
Arguments
x |
A data frame, list, matrix, or other object that could reasonably be coerced to a tibble. |
... |
Unused, for extensibility. |
.name_repair |
Treatment of problematic column names:
This argument is passed on as |
rownames |
How to treat existing row names of a data frame or matrix:
Read more in rownames. |
Value
'tibble'
Row names
The default behavior is to silently remove row names.
New code should explicitly convert row names to a new column using the
rownames
argument.
For existing code that relies on the retention of row names, call
pkgconfig::set_config("tibble::rownames" = NA)
in your script or in your
package's .onLoad()
function.
Life cycle
Using as_tibble()
for vectors is superseded as of version 3.0.0,
prefer the more expressive as_tibble_row()
and
as_tibble_col()
variants for new code.
See Also
tibble()
constructs a tibble from individual columns. enframe()
converts a named vector to a tibble with a column of names and column of
values. Name repair is implemented using vctrs::vec_as_names()
.
Examples
data(pbmc_small)
pbmc_small |> as_tibble()
Efficiently bind multiple data frames by row and column
Description
This is an efficient implementation of the common pattern of 'do.call(rbind, dfs)' or 'do.call(cbind, dfs)' for binding many data frames into one.
This is an efficient implementation of the common pattern of 'do.call(rbind, dfs)' or 'do.call(cbind, dfs)' for binding many data frames into one.
Usage
## S3 method for class 'Seurat'
bind_rows(..., .id = NULL, add.cell.ids = NULL)
## S3 method for class 'Seurat'
bind_cols(..., .id = NULL)
Arguments
... |
Data frames to combine. Each argument can either be a data frame, a list that could be a data frame, or a list of data frames. When row-binding, columns are matched by name, and any missing columns will be filled with NA. When column-binding, rows are matched by position, so all data frames must have the same number of rows. To match by value, not position, see mutate-joins. |
.id |
Data frame identifier. When '.id' is supplied, a new column of identifiers is created to link each row to its original data frame. The labels are taken from the named arguments to 'bind_rows()'. When a list of data frames is supplied, the labels are taken from the names of the list. If no names are found a numeric sequence is used instead. |
add.cell.ids |
from Seurat 3.0 A character vector of length(x = c(x, y)). Appends the corresponding values to the start of each objects' cell names. |
Details
The output of 'bind_rows()' will contain a column if that column appears in any of the inputs.
The output of 'bind_rows()' will contain a column if that column appears in any of the inputs.
Value
'bind_rows()' and 'bind_cols()' return the same type as the first input, either a data frame, 'tbl_df', or 'grouped_df'.
'bind_rows()' and 'bind_cols()' return the same type as the first input, either a data frame, 'tbl_df', or 'grouped_df'.
Examples
data(pbmc_small)
tt <- pbmc_small
ttservice::bind_rows(tt, tt)
tt_bind <- tt |> select(nCount_RNA ,nFeature_RNA)
tt |> ttservice::bind_cols(tt_bind)
Cell types of 80 PBMC single cells
Description
A dataset containing the barcodes and cell types of 80 PBMC single cells.
Usage
data(cell_type_df)
Format
A tibble containing 80 rows and 2 columns. Cells are a subsample of the Peripheral Blood Mononuclear Cells (PBMC) dataset of 2,700 single cell. Cell types were identified with SingleR.
- cell
cell identifier, barcode
- first.labels
cell type
Value
'tibble'
Source
https://satijalab.org/seurat/v3.1/pbmc3k_tutorial.html
Count the observations in each group
Description
count()
lets you quickly count the unique values of one or more variables:
df %>% count(a, b)
is roughly equivalent to
df %>% group_by(a, b) %>% summarise(n = n())
.
count()
is paired with tally()
, a lower-level helper that is equivalent
to df %>% summarise(n = n())
. Supply wt
to perform weighted counts,
switching the summary from n = n()
to n = sum(wt)
.
add_count()
and add_tally()
are equivalents to count()
and tally()
but use mutate()
instead of summarise()
so that they add a new column
with group-wise counts.
Usage
## S3 method for class 'Seurat'
count(
x,
...,
wt = NULL,
sort = FALSE,
name = NULL,
.drop = group_by_drop_default(x)
)
## S3 method for class 'Seurat'
add_count(
x,
...,
wt = NULL,
sort = FALSE,
name = NULL,
.drop = group_by_drop_default(x)
)
Arguments
x |
A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). |
... |
< |
wt |
<
|
sort |
If |
name |
The name of the new column in the output. If omitted, it will default to |
.drop |
Handling of factor levels that don't appear in the data, passed
on to For
|
Value
An object of the same type as .data
. count()
and add_count()
group transiently, so the output has the same groups as the input.
Examples
data(pbmc_small)
pbmc_small |> count(groups)
Keep distinct/unique rows
Description
Keep only unique/distinct rows from a data frame. This is similar
to unique.data.frame()
but considerably faster.
Usage
## S3 method for class 'Seurat'
distinct(.data, ..., .keep_all = FALSE)
Arguments
.data |
A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details. |
... |
< |
.keep_all |
If |
Value
An object of the same type as .data
. The output has the following
properties:
Rows are a subset of the input but appear in the same order.
Columns are not modified if
...
is empty or.keep_all
isTRUE
. Otherwise,distinct()
first callsmutate()
to create new columns.Groups are not modified.
Data frame attributes are preserved.
Methods
This function is a generic, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
The following methods are currently available in loaded packages: no methods found.
Examples
data("pbmc_small")
pbmc_small |> distinct(groups)
Remove class to abject
Description
Remove class to abject
Usage
drop_class(var, name)
Arguments
var |
A tibble |
name |
A character name of the class |
Value
A tibble with an additional attribute
Extract a character column into multiple columns using regular expression groups
Description
extract()
has been superseded in favour of separate_wider_regex()
because it has a more polished API and better handling of problems.
Superseded functions will not go away, but will only receive critical bug
fixes.
Given a regular expression with capturing groups, extract()
turns
each group into a new column. If the groups don't match, or the input
is NA, the output will be NA.
Usage
## S3 method for class 'Seurat'
extract(
data,
col,
into,
regex = "([[:alnum:]]+)",
remove = TRUE,
convert = FALSE,
...
)
Arguments
data |
A data frame. |
col |
< |
into |
Names of new variables to create as character vector.
Use |
regex |
A string representing a regular expression used to extract the
desired values. There should be one group (defined by |
remove |
If |
convert |
If NB: this will cause string |
... |
Additional arguments passed on to methods. |
Value
'tidyseurat'
See Also
separate()
to split up by a separator.
Examples
data(pbmc_small)
pbmc_small |>
extract(groups,
into="g",
regex="g([0-9])",
convert=TRUE)
Keep rows that match a condition
Description
The filter()
function is used to subset a data frame,
retaining all rows that satisfy your conditions.
To be retained, the row must produce a value of TRUE
for all conditions.
Note that when a condition evaluates to NA
the row will be dropped, unlike base subsetting with [
.
Usage
## S3 method for class 'Seurat'
filter(.data, ..., .preserve = FALSE)
Arguments
.data |
A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details. |
... |
< |
.preserve |
Relevant when the |
Details
The filter()
function is used to subset the rows of
.data
, applying the expressions in ...
to the column values to determine which
rows should be retained. It can be applied to both grouped and ungrouped data (see group_by()
and
ungroup()
). However, dplyr is not yet smart enough to optimise the filtering
operation on grouped datasets that do not need grouped calculations. For this
reason, filtering is often considerably faster on ungrouped data.
Value
An object of the same type as .data
. The output has the following properties:
Rows are a subset of the input, but appear in the same order.
Columns are not modified.
The number of groups may be reduced (if
.preserve
is notTRUE
).Data frame attributes are preserved.
Useful filter functions
There are many functions and operators that are useful when constructing the expressions used to filter the data:
Grouped tibbles
Because filtering expressions are computed within groups, they may yield different results on grouped tibbles. This will be the case as soon as an aggregating, lagging, or ranking function is involved. Compare this ungrouped filtering:
starwars %>% filter(mass > mean(mass, na.rm = TRUE))
With the grouped equivalent:
starwars %>% group_by(gender) %>% filter(mass > mean(mass, na.rm = TRUE))
In the ungrouped version, filter()
compares the value of mass
in each row to
the global average (taken over the whole data set), keeping only the rows with
mass
greater than this global average. In contrast, the grouped version calculates
the average mass separately for each gender
group, and keeps rows with mass
greater
than the relevant within-gender average.
Methods
This function is a generic, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
The following methods are currently available in loaded packages: no methods found.
See Also
Other single table verbs:
arrange()
,
mutate()
,
reframe()
,
rename()
,
select()
,
slice()
,
summarise()
Examples
data("pbmc_small")
pbmc_small |> filter(groups == "g1")
# Learn more in ?dplyr_eval
Printing tibbles
Description
One of the main features of the tbl_df
class is the printing:
Tibbles only print as many rows and columns as fit on one screen, supplemented by a summary of the remaining rows and columns.
Tibble reveals the type of each column, which keeps the user informed about whether a variable is, e.g.,
<chr>
or<fct>
(character versus factor). Seevignette("types")
for an overview of common type abbreviations.
Printing can be tweaked for a one-off call by calling print()
explicitly
and setting arguments like n
and width
. More persistent control is
available by setting the options described in pillar::pillar_options.
See also vignette("digits")
for a comparison to base options,
and vignette("numbers")
that showcases num()
and char()
for creating columns with custom formatting options.
As of tibble 3.1.0, printing is handled entirely by the pillar package.
If you implement a package that extends tibble,
the printed output can be customized in various ways.
See vignette("extending", package = "pillar")
for details,
and pillar::pillar_options for options that control the display in the console.
Usage
## S3 method for class 'Seurat'
print(x, ..., n = NULL, width = NULL, n_extra = NULL)
Arguments
x |
Object to format or print. |
... |
Passed on to |
n |
Number of rows to show. If |
width |
Width of text output to generate. This defaults to |
n_extra |
Number of extra columns to print abbreviated information for, if the width is too small for the entire tibble. If 'NULL', the default, will print information about at most 'tibble.max_extra_cols' extra columns. |
Value
Prints a message to the console describing the contents of the 'tidyseurat'.
Examples
data(pbmc_small)
print(pbmc_small)
Mutating joins
Description
Mutating joins add columns from y
to x
, matching observations based on
the keys. There are four mutating joins: the inner join, and the three outer
joins.
Inner join
An inner_join()
only keeps observations from x
that have a matching key
in y
.
The most important property of an inner join is that unmatched rows in either input are not included in the result. This means that generally inner joins are not appropriate in most analyses, because it is too easy to lose observations.
Outer joins
The three outer joins keep observations that appear in at least one of the data frames:
A
left_join()
keeps all observations inx
.A
right_join()
keeps all observations iny
.A
full_join()
keeps all observations inx
andy
.
Usage
## S3 method for class 'Seurat'
full_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...)
Arguments
x , y |
A pair of data frames, data frame extensions (e.g. a tibble), or lazy data frames (e.g. from dbplyr or dtplyr). See Methods, below, for more details. |
by |
A join specification created with If To join on different variables between To join by multiple variables, use a
For simple equality joins, you can alternatively specify a character vector
of variable names to join by. For example, To perform a cross-join, generating all combinations of |
copy |
If |
suffix |
If there are non-joined duplicate variables in |
... |
Other parameters passed onto methods. |
Value
An object of the same type as x
(including the same groups). The order of
the rows and columns of x
is preserved as much as possible. The output has
the following properties:
The rows are affect by the join type.
-
inner_join()
returns matchedx
rows. -
left_join()
returns allx
rows. -
right_join()
returns matched ofx
rows, followed by unmatchedy
rows. -
full_join()
returns allx
rows, followed by unmatchedy
rows.
-
Output columns include all columns from
x
and all non-key columns fromy
. Ifkeep = TRUE
, the key columns fromy
are included as well.If non-key columns in
x
andy
have the same name,suffix
es are added to disambiguate. Ifkeep = TRUE
and key columns inx
andy
have the same name,suffix
es are added to disambiguate these as well.If
keep = FALSE
, output columns included inby
are coerced to their common type betweenx
andy
.
Many-to-many relationships
By default, dplyr guards against many-to-many relationships in equality joins by throwing a warning. These occur when both of the following are true:
A row in
x
matches multiple rows iny
.A row in
y
matches multiple rows inx
.
This is typically surprising, as most joins involve a relationship of one-to-one, one-to-many, or many-to-one, and is often the result of an improperly specified join. Many-to-many relationships are particularly problematic because they can result in a Cartesian explosion of the number of rows returned from the join.
If a many-to-many relationship is expected, silence this warning by
explicitly setting relationship = "many-to-many"
.
In production code, it is best to preemptively set relationship
to whatever
relationship you expect to exist between the keys of x
and y
, as this
forces an error to occur immediately if the data doesn't align with your
expectations.
Inequality joins typically result in many-to-many relationships by nature, so they don't warn on them by default, but you should still take extra care when specifying an inequality join, because they also have the capability to return a large number of rows.
Rolling joins don't warn on many-to-many relationships either, but many
rolling joins follow a many-to-one relationship, so it is often useful to
set relationship = "many-to-one"
to enforce this.
Note that in SQL, most database providers won't let you specify a many-to-many relationship between two tables, instead requiring that you create a third junction table that results in two one-to-many relationships instead.
Methods
These functions are generics, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
Methods available in currently loaded packages:
-
inner_join()
: no methods found. -
left_join()
: no methods found. -
right_join()
: no methods found. -
full_join()
: no methods found.
See Also
Other joins:
cross_join()
,
filter-joins
,
nest_join()
Examples
data(pbmc_small)
tt <- pbmc_small
tt |> full_join(tibble::tibble(groups="g1", other=1:4))
get abundance long
Description
get abundance long
Usage
get_abundance_sc_long(
.data,
features = NULL,
all = FALSE,
exclude_zeros = FALSE,
assay = Assays(.data),
slot = "data"
)
Arguments
.data |
A tidyseurat |
features |
A character |
all |
A boolean |
exclude_zeros |
A boolean |
assay |
assay name to extract feature abundance |
slot |
slot in the assay, e.g. 'data' and 'scale.data' |
Value
A Seurat object
Examples
data(pbmc_small)
pbmc_small %>%
get_abundance_sc_long(features=c("HLA-DRA", "LYZ"))
get abundance wide
Description
get abundance wide
Usage
get_abundance_sc_wide(
.data,
features = NULL,
all = FALSE,
assay = .data@active.assay,
slot = "data",
prefix = ""
)
Arguments
.data |
A tidyseurat |
features |
A character |
all |
A boolean |
assay |
assay name to extract feature abundance |
slot |
slot in the assay, e.g. 'data' and 'scale.data' |
prefix |
prefix for the feature names |
Value
A Seurat object
Examples
data(pbmc_small)
pbmc_small %>%
get_abundance_sc_wide(features=c("HLA-DRA", "LYZ"))
Create a new ggplot
from a tidyseurat
Description
ggplot()
initializes a ggplot object. It can be used to
declare the input data frame for a graphic and to specify the
set of plot aesthetics intended to be common throughout all
subsequent layers unless specifically overridden.
Usage
## S3 method for class 'Seurat'
ggplot(data = NULL, mapping = aes(), ..., environment = parent.frame())
Arguments
data |
Default dataset to use for plot. If not already a data.frame,
will be converted to one by |
mapping |
Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot. |
... |
Other arguments passed on to methods. Not currently used. |
environment |
Details
ggplot()
is used to construct the initial plot object,
and is almost always followed by a plus sign (+
) to add
components to the plot.
There are three common patterns used to invoke ggplot()
:
-
ggplot(data = df, mapping = aes(x, y, other aesthetics))
-
ggplot(data = df)
-
ggplot()
The first pattern is recommended if all layers use the same data and the same set of aesthetics, although this method can also be used when adding a layer using data from another data frame.
The second pattern specifies the default data frame to use for the plot, but no aesthetics are defined up front. This is useful when one data frame is used predominantly for the plot, but the aesthetics vary from one layer to another.
The third pattern initializes a skeleton ggplot
object, which
is fleshed out as layers are added. This is useful when
multiple data frames are used to produce different layers, as
is often the case in complex graphics.
The data =
and mapping =
specifications in the arguments are optional
(and are often omitted in practice), so long as the data and the mapping
values are passed into the function in the right order. In the examples
below, however, they are left in place for clarity.
Value
'ggplot'
Examples
library(ggplot2)
data(pbmc_small)
pbmc_small |>
ggplot(aes(groups, nCount_RNA)) +
geom_boxplot()
Get a glimpse of your data
Description
glimpse()
is like a transposed version of print()
:
columns run down the page, and data runs across.
This makes it possible to see every column in a data frame.
It's a little like str()
applied to a data frame
but it tries to show you as much data as possible.
(And it always shows the underlying data, even when applied
to a remote data source.)
See format_glimpse()
for details on the formatting.
Usage
## S3 method for class 'tidyseurat'
glimpse(x, width = NULL, ...)
Arguments
x |
An object to glimpse at. |
width |
Width of output: defaults to the setting of the
|
... |
Unused, for extensibility. |
Value
x original x is (invisibly) returned, allowing glimpse()
to be
used within a data pipe line.
S3 methods
glimpse
is an S3 generic with a customised method for tbl
s and
data.frames
, and a default method that calls str()
.
Examples
data(pbmc_small)
pbmc_small |> glimpse()
Group by one or more variables
Description
Most data operations are done on groups defined by variables.
group_by()
takes an existing tbl and converts it into a grouped tbl
where operations are performed "by group". ungroup()
removes grouping.
Usage
## S3 method for class 'Seurat'
group_by(.data, ..., .add = FALSE, .drop = group_by_drop_default(.data))
Arguments
.data |
A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details. |
... |
In |
.add |
When This argument was previously called |
.drop |
Drop groups formed by factor levels that don't appear in the
data? The default is |
Value
A grouped data frame with class grouped_df
,
unless the combination of ...
and add
yields a empty set of
grouping columns, in which case a tibble will be returned.
Methods
These function are generics, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
Methods available in currently loaded packages:
-
group_by()
: no methods found. -
ungroup()
: no methods found.
Ordering
Currently, group_by()
internally orders the groups in ascending order. This
results in ordered output from functions that aggregate groups, such as
summarise()
.
When used as grouping columns, character vectors are ordered in the C locale
for performance and reproducibility across R sessions. If the resulting
ordering of your grouped operation matters and is dependent on the locale,
you should follow up the grouped operation with an explicit call to
arrange()
and set the .locale
argument. For example:
data %>% group_by(chr) %>% summarise(avg = mean(x)) %>% arrange(chr, .locale = "en")
This is often useful as a preliminary step before generating content intended for humans, such as an HTML table.
Legacy behavior
Prior to dplyr 1.1.0, character vector grouping columns were ordered in the
system locale. If you need to temporarily revert to this behavior, you can
set the global option dplyr.legacy_locale
to TRUE
, but this should be
used sparingly and you should expect this option to be removed in a future
version of dplyr. It is better to update existing code to explicitly call
arrange(.locale = )
instead. Note that setting dplyr.legacy_locale
will
also force calls to arrange()
to use the system locale.
See Also
Other grouping functions:
group_map()
,
group_nest()
,
group_split()
,
group_trim()
Examples
data("pbmc_small")
pbmc_small |> group_by(groups)
Split data frame by groups
Description
group_split()
works like base::split()
but:
It uses the grouping structure from
group_by()
and therefore is subject to the data maskIt does not name the elements of the list based on the grouping as this only works well for a single character grouping variable. Instead, use
group_keys()
to access a data frame that defines the groups.
group_split()
is primarily designed to work with grouped data frames.
You can pass ...
to group and split an ungrouped data frame, but this
is generally not very useful as you want have easy access to the group
metadata.
Usage
## S3 method for class 'Seurat'
group_split(.tbl, ..., .keep = TRUE)
Arguments
.tbl |
A tbl. |
... |
If |
.keep |
Should the grouping columns be kept? |
Value
A list of tibbles. Each tibble contains the rows of .tbl
for the
associated group and all the columns, including the grouping variables.
Note that this returns a list_of which is slightly
stricter than a simple list but is useful for representing lists where
every element has the same type.
Lifecycle
group_split()
is not stable because you can achieve very similar results by
manipulating the nested column returned from
tidyr::nest(.by =)
. That also retains the group keys all
within a single data structure. group_split()
may be deprecated in the
future.
See Also
Other grouping functions:
group_by()
,
group_map()
,
group_nest()
,
group_trim()
Examples
data(pbmc_small)
pbmc_small |> group_split(groups)
Mutating joins
Description
Mutating joins add columns from y
to x
, matching observations based on
the keys. There are four mutating joins: the inner join, and the three outer
joins.
Inner join
An inner_join()
only keeps observations from x
that have a matching key
in y
.
The most important property of an inner join is that unmatched rows in either input are not included in the result. This means that generally inner joins are not appropriate in most analyses, because it is too easy to lose observations.
Outer joins
The three outer joins keep observations that appear in at least one of the data frames:
A
left_join()
keeps all observations inx
.A
right_join()
keeps all observations iny
.A
full_join()
keeps all observations inx
andy
.
Usage
## S3 method for class 'Seurat'
inner_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...)
Arguments
x , y |
A pair of data frames, data frame extensions (e.g. a tibble), or lazy data frames (e.g. from dbplyr or dtplyr). See Methods, below, for more details. |
by |
A join specification created with If To join on different variables between To join by multiple variables, use a
For simple equality joins, you can alternatively specify a character vector
of variable names to join by. For example, To perform a cross-join, generating all combinations of |
copy |
If |
suffix |
If there are non-joined duplicate variables in |
... |
Other parameters passed onto methods. |
Value
An object of the same type as x
(including the same groups). The order of
the rows and columns of x
is preserved as much as possible. The output has
the following properties:
The rows are affect by the join type.
-
inner_join()
returns matchedx
rows. -
left_join()
returns allx
rows. -
right_join()
returns matched ofx
rows, followed by unmatchedy
rows. -
full_join()
returns allx
rows, followed by unmatchedy
rows.
-
Output columns include all columns from
x
and all non-key columns fromy
. Ifkeep = TRUE
, the key columns fromy
are included as well.If non-key columns in
x
andy
have the same name,suffix
es are added to disambiguate. Ifkeep = TRUE
and key columns inx
andy
have the same name,suffix
es are added to disambiguate these as well.If
keep = FALSE
, output columns included inby
are coerced to their common type betweenx
andy
.
Many-to-many relationships
By default, dplyr guards against many-to-many relationships in equality joins by throwing a warning. These occur when both of the following are true:
A row in
x
matches multiple rows iny
.A row in
y
matches multiple rows inx
.
This is typically surprising, as most joins involve a relationship of one-to-one, one-to-many, or many-to-one, and is often the result of an improperly specified join. Many-to-many relationships are particularly problematic because they can result in a Cartesian explosion of the number of rows returned from the join.
If a many-to-many relationship is expected, silence this warning by
explicitly setting relationship = "many-to-many"
.
In production code, it is best to preemptively set relationship
to whatever
relationship you expect to exist between the keys of x
and y
, as this
forces an error to occur immediately if the data doesn't align with your
expectations.
Inequality joins typically result in many-to-many relationships by nature, so they don't warn on them by default, but you should still take extra care when specifying an inequality join, because they also have the capability to return a large number of rows.
Rolling joins don't warn on many-to-many relationships either, but many
rolling joins follow a many-to-one relationship, so it is often useful to
set relationship = "many-to-one"
to enforce this.
Note that in SQL, most database providers won't let you specify a many-to-many relationship between two tables, instead requiring that you create a third junction table that results in two one-to-many relationships instead.
Methods
These functions are generics, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
Methods available in currently loaded packages:
-
inner_join()
: no methods found. -
left_join()
: no methods found. -
right_join()
: no methods found. -
full_join()
: no methods found.
See Also
Other joins:
cross_join()
,
filter-joins
,
nest_join()
Examples
data(pbmc_small)
tt <- pbmc_small
tt |> inner_join(tt |>
distinct(groups) |>
mutate(new_column=1:2) |>
slice(1))
join_features
Description
join_features() extracts and joins information for specific features
Usage
## S4 method for signature 'Seurat'
join_features(
.data,
features = NULL,
all = FALSE,
exclude_zeros = FALSE,
shape = "long",
assay = NULL,
slot = "data",
...
)
Arguments
.data |
A tidyseurat object |
features |
A vector of feature identifiers to join |
all |
If TRUE return all |
exclude_zeros |
If TRUE exclude zero values |
shape |
Format of the returned table "long" or "wide" |
assay |
assay name to extract feature abundance |
slot |
slot name to extract feature abundance |
... |
Parameters to pass to join wide, i.e. assay name to extract feature abundance from and gene prefix, for shape="wide" |
Details
This function extracts information for specified features and returns the information in either long or wide format.
Value
A 'tidyseurat' object containing information for the specified features.
Examples
data(pbmc_small)
pbmc_small %>% join_features(
features=c("HLA-DRA", "LYZ"))
(DEPRECATED) Extract and join information for transcripts.
Description
join_transcripts() extracts and joins information for specified transcripts
Usage
join_transcripts(
.data,
transcripts = NULL,
all = FALSE,
exclude_zeros = FALSE,
shape = "long",
...
)
Arguments
.data |
A tidyseurat object |
transcripts |
A vector of transcript identifiers to join |
all |
If TRUE return all |
exclude_zeros |
If TRUE exclude zero values |
shape |
Format of the returned table "long" or "wide" |
... |
Parameters to pass to join wide, i.e. assay name to extract transcript abundance from |
Details
DEPRECATED, please use join_features()
Value
A 'tbl' containing the information.for the specified transcripts
Examples
print("DEPRECATED")
Mutating joins
Description
Mutating joins add columns from y
to x
, matching observations based on
the keys. There are four mutating joins: the inner join, and the three outer
joins.
Inner join
An inner_join()
only keeps observations from x
that have a matching key
in y
.
The most important property of an inner join is that unmatched rows in either input are not included in the result. This means that generally inner joins are not appropriate in most analyses, because it is too easy to lose observations.
Outer joins
The three outer joins keep observations that appear in at least one of the data frames:
A
left_join()
keeps all observations inx
.A
right_join()
keeps all observations iny
.A
full_join()
keeps all observations inx
andy
.
Usage
## S3 method for class 'Seurat'
left_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...)
Arguments
x , y |
A pair of data frames, data frame extensions (e.g. a tibble), or lazy data frames (e.g. from dbplyr or dtplyr). See Methods, below, for more details. |
by |
A join specification created with If To join on different variables between To join by multiple variables, use a
For simple equality joins, you can alternatively specify a character vector
of variable names to join by. For example, To perform a cross-join, generating all combinations of |
copy |
If |
suffix |
If there are non-joined duplicate variables in |
... |
Other parameters passed onto methods. |
Value
An object of the same type as x
(including the same groups). The order of
the rows and columns of x
is preserved as much as possible. The output has
the following properties:
The rows are affect by the join type.
-
inner_join()
returns matchedx
rows. -
left_join()
returns allx
rows. -
right_join()
returns matched ofx
rows, followed by unmatchedy
rows. -
full_join()
returns allx
rows, followed by unmatchedy
rows.
-
Output columns include all columns from
x
and all non-key columns fromy
. Ifkeep = TRUE
, the key columns fromy
are included as well.If non-key columns in
x
andy
have the same name,suffix
es are added to disambiguate. Ifkeep = TRUE
and key columns inx
andy
have the same name,suffix
es are added to disambiguate these as well.If
keep = FALSE
, output columns included inby
are coerced to their common type betweenx
andy
.
Many-to-many relationships
By default, dplyr guards against many-to-many relationships in equality joins by throwing a warning. These occur when both of the following are true:
A row in
x
matches multiple rows iny
.A row in
y
matches multiple rows inx
.
This is typically surprising, as most joins involve a relationship of one-to-one, one-to-many, or many-to-one, and is often the result of an improperly specified join. Many-to-many relationships are particularly problematic because they can result in a Cartesian explosion of the number of rows returned from the join.
If a many-to-many relationship is expected, silence this warning by
explicitly setting relationship = "many-to-many"
.
In production code, it is best to preemptively set relationship
to whatever
relationship you expect to exist between the keys of x
and y
, as this
forces an error to occur immediately if the data doesn't align with your
expectations.
Inequality joins typically result in many-to-many relationships by nature, so they don't warn on them by default, but you should still take extra care when specifying an inequality join, because they also have the capability to return a large number of rows.
Rolling joins don't warn on many-to-many relationships either, but many
rolling joins follow a many-to-one relationship, so it is often useful to
set relationship = "many-to-one"
to enforce this.
Note that in SQL, most database providers won't let you specify a many-to-many relationship between two tables, instead requiring that you create a third junction table that results in two one-to-many relationships instead.
Methods
These functions are generics, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
Methods available in currently loaded packages:
-
inner_join()
: no methods found. -
left_join()
: no methods found. -
right_join()
: no methods found. -
full_join()
: no methods found.
See Also
Other joins:
cross_join()
,
filter-joins
,
nest_join()
Examples
data(pbmc_small)
tt <- pbmc_small
tt |> left_join(tt |>
distinct(groups) |>
mutate(new_column=1:2))
Create, modify, and delete columns
Description
mutate()
creates new columns that are functions of existing variables.
It can also modify (if the name is the same as an existing
column) and delete columns (by setting their value to NULL
).
Usage
## S3 method for class 'Seurat'
mutate(.data, ...)
Arguments
.data |
A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details. |
... |
< The value can be:
|
Value
An object of the same type as .data
. The output has the following
properties:
Columns from
.data
will be preserved according to the.keep
argument.Existing columns that are modified by
...
will always be returned in their original location.New columns created through
...
will be placed according to the.before
and.after
arguments.The number of rows is not affected.
Columns given the value
NULL
will be removed.Groups will be recomputed if a grouping variable is mutated.
Data frame attributes are preserved.
Useful mutate functions
Grouped tibbles
Because mutating expressions are computed within groups, they may yield different results on grouped tibbles. This will be the case as soon as an aggregating, lagging, or ranking function is involved. Compare this ungrouped mutate:
starwars %>% select(name, mass, species) %>% mutate(mass_norm = mass / mean(mass, na.rm = TRUE))
With the grouped equivalent:
starwars %>% select(name, mass, species) %>% group_by(species) %>% mutate(mass_norm = mass / mean(mass, na.rm = TRUE))
The former normalises mass
by the global average whereas the
latter normalises by the averages within species levels.
Methods
This function is a generic, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
Methods available in currently loaded packages: no methods found.
See Also
Other single table verbs:
arrange()
,
rename()
,
slice()
,
summarise()
Examples
data(pbmc_small)
pbmc_small |> mutate(nFeature_RNA=1)
Nest rows into a list-column of data frames
Description
Nesting creates a list-column of data frames; unnesting flattens it back out into regular columns. Nesting is implicitly a summarising operation: you get one row for each group defined by the non-nested columns. This is useful in conjunction with other summaries that work with whole datasets, most notably models.
Learn more in vignette("nest")
.
Usage
## S3 method for class 'Seurat'
nest(.data, ..., .names_sep = NULL)
Arguments
.data |
A data frame. |
... |
< Specified using name-variable pairs of the form
If not supplied, then
|
.names_sep |
If |
Details
If neither ...
nor .by
are supplied, nest()
will nest all variables,
and will use the column name supplied through .key
.
Value
'tidyseurat_nested'
New syntax
tidyr 1.0.0 introduced a new syntax for nest()
and unnest()
that's
designed to be more similar to other functions. Converting to the new syntax
should be straightforward (guided by the message you'll receive) but if
you just need to run an old analysis, you can easily revert to the previous
behaviour using nest_legacy()
and unnest_legacy()
as follows:
library(tidyr) nest <- nest_legacy unnest <- unnest_legacy
Grouped data frames
df %>% nest(data = c(x, y))
specifies the columns to be nested; i.e. the
columns that will appear in the inner data frame. df %>% nest(.by = c(x, y))
specifies the columns to nest by; i.e. the columns that will remain in
the outer data frame. An alternative way to achieve the latter is to nest()
a grouped data frame created by dplyr::group_by()
. The grouping variables
remain in the outer data frame and the others are nested. The result
preserves the grouping of the input.
Variables supplied to nest()
will override grouping variables so that
df %>% group_by(x, y) %>% nest(data = !z)
will be equivalent to
df %>% nest(data = !z)
.
You can't supply .by
with a grouped data frame, as the groups already
represent what you are nesting by.
Examples
data(pbmc_small)
pbmc_small |>
nest(data=-groups) |>
unnest(data)
Intercellular ligand-receptor interactions for 38 ligands from a single cell RNA-seq cluster.
Description
A dataset containing ligand-receptor interactions within a sample. There are 38 ligands from a single cell cluster versus 35 receptors in 6 other clusters.
Usage
data(pbmc_small_nested_interactions)
Format
A 'tibble' containing 100 rows and 9 columns. Cells are a subsample of the PBMC dataset of 2,700 single cells. Cell interactions were identified with 'SingleCellSignalR'.
- sample
sample identifier
- ligand
cluster and ligand identifier
- receptor
cluster and receptor identifier
- ligand.name
ligand name
- receptor.name
receptor name
- origin
cluster containing ligand
- destination
cluster containing receptor
- interaction.type
type of interation, paracrine or autocrine
- LRscore
interaction score
Value
'tibble'
Source
https://satijalab.org/seurat/v3.1/pbmc3k_tutorial.html
Pivot data from wide to long
Description
pivot_longer()
"lengthens" data, increasing the number of rows and
decreasing the number of columns. The inverse transformation is
pivot_wider()
Learn more in vignette("pivot")
.
Usage
## S3 method for class 'Seurat'
pivot_longer(
data,
cols,
names_to = "name",
names_prefix = NULL,
names_sep = NULL,
names_pattern = NULL,
names_ptypes = NULL,
names_transform = NULL,
names_repair = "check_unique",
values_to = "value",
values_drop_na = FALSE,
values_ptypes = NULL,
values_transform = NULL,
...
)
Arguments
data |
A data frame to pivot. |
cols |
< |
names_to |
A character vector specifying the new column or columns to
create from the information stored in the column names of
|
names_prefix |
A regular expression used to remove matching text from the start of each variable name. |
names_sep , names_pattern |
If
If these arguments do not give you enough control, use
|
names_ptypes , values_ptypes |
Optionally, a list of column name-prototype
pairs. Alternatively, a single empty prototype can be supplied, which will
be applied to all columns. A prototype (or ptype for short) is a
zero-length vector (like |
names_transform , values_transform |
Optionally, a list of column
name-function pairs. Alternatively, a single function can be supplied,
which will be applied to all columns. Use these arguments if you need to
change the types of specific columns. For example, If not specified, the type of the columns generated from |
names_repair |
What happens if the output has invalid column names?
The default, |
values_to |
A string specifying the name of the column to create
from the data stored in cell values. If |
values_drop_na |
If |
... |
Additional arguments passed on to methods. |
Details
pivot_longer()
is an updated approach to gather()
, designed to be both
simpler to use and to handle more use cases. We recommend you use
pivot_longer()
for new code; gather()
isn't going away but is no longer
under active development.
Value
'tidyseurat'
Examples
data(pbmc_small)
pbmc_small |> pivot_longer(
cols=c(orig.ident, groups),
names_to="name", values_to="value")
Initiate a plotly visualization
Description
This function maps R objects to plotly.js,
an (MIT licensed) web-based interactive charting library. It provides
abstractions for doing common things (e.g. mapping data values to
fill colors (via color
) or creating animations (via frame
)) and sets
some different defaults to make the interface feel more 'R-like'
(i.e., closer to plot()
and ggplot2::qplot()
).
Usage
plot_ly(
data = data.frame(),
...,
type = NULL,
name = NULL,
color = NULL,
colors = NULL,
alpha = NULL,
stroke = NULL,
strokes = NULL,
alpha_stroke = 1,
size = NULL,
sizes = c(10, 100),
span = NULL,
spans = c(1, 20),
symbol = NULL,
symbols = NULL,
linetype = NULL,
linetypes = NULL,
split = NULL,
frame = NULL,
width = NULL,
height = NULL,
source = "A"
)
## S3 method for class 'tbl_df'
plot_ly(
data = data.frame(),
...,
type = NULL,
name = NULL,
color = NULL,
colors = NULL,
alpha = NULL,
stroke = NULL,
strokes = NULL,
alpha_stroke = 1,
size = NULL,
sizes = c(10, 100),
span = NULL,
spans = c(1, 20),
symbol = NULL,
symbols = NULL,
linetype = NULL,
linetypes = NULL,
split = NULL,
frame = NULL,
width = NULL,
height = NULL,
source = "A"
)
## S3 method for class 'Seurat'
plot_ly(
data = data.frame(),
...,
type = NULL,
name = NULL,
color = NULL,
colors = NULL,
alpha = NULL,
stroke = NULL,
strokes = NULL,
alpha_stroke = 1,
size = NULL,
sizes = c(10, 100),
span = NULL,
spans = c(1, 20),
symbol = NULL,
symbols = NULL,
linetype = NULL,
linetypes = NULL,
split = NULL,
frame = NULL,
width = NULL,
height = NULL,
source = "A"
)
Arguments
data |
A data frame (optional) or crosstalk::SharedData object. |
... |
Arguments (i.e., attributes) passed along to the trace |
type |
A character string specifying the trace type (e.g. |
name |
Values mapped to the trace's name attribute. Since a trace can
only have one name, this argument acts very much like |
color |
Values mapped to relevant 'fill-color' attribute(s)
(e.g. fillcolor,
marker.color,
textfont.color, etc.).
The mapping from data values to color codes may be controlled using
|
colors |
Either a colorbrewer2.org palette name (e.g. "YlOrRd" or "Blues"),
or a vector of colors to interpolate in hexadecimal "#RRGGBB" format,
or a color interpolation function like |
alpha |
A number between 0 and 1 specifying the alpha channel applied to |
stroke |
Similar to |
strokes |
Similar to |
alpha_stroke |
Similar to |
size |
(Numeric) values mapped to relevant 'fill-size' attribute(s)
(e.g., marker.size,
textfont.size,
and error_x.width).
The mapping from data values to symbols may be controlled using
|
sizes |
A numeric vector of length 2 used to scale |
span |
(Numeric) values mapped to relevant 'stroke-size' attribute(s)
(e.g.,
marker.line.width,
line.width for filled polygons,
and error_x.thickness)
The mapping from data values to symbols may be controlled using
|
spans |
A numeric vector of length 2 used to scale |
symbol |
(Discrete) values mapped to marker.symbol.
The mapping from data values to symbols may be controlled using
|
symbols |
A character vector of pch values or symbol names. |
linetype |
(Discrete) values mapped to line.dash.
The mapping from data values to symbols may be controlled using
|
linetypes |
A character vector of |
split |
(Discrete) values used to create multiple traces (one trace per value). |
frame |
(Discrete) values used to create animation frames. |
width |
Width in pixels (optional, defaults to automatic sizing). |
height |
Height in pixels (optional, defaults to automatic sizing). |
source |
a character string of length 1. Match the value of this string
with the source argument in |
Details
Unless type
is specified, this function just initiates a plotly
object with 'global' attributes that are passed onto downstream uses of
add_trace()
(or similar). A formula must always be used when
referencing column name(s) in data
(e.g. plot_ly(mtcars, x = ~wt)
).
Formulas are optional when supplying values directly, but they do
help inform default axis/scale titles
(e.g., plot_ly(x = mtcars$wt)
vs plot_ly(x = ~mtcars$wt)
)
Value
'plotly'
Author(s)
Carson Sievert
References
https://plotly-r.com/overview.html
See Also
For initializing a plotly-geo object:
plot_geo()
For initializing a plotly-mapbox object:
plot_mapbox()
For translating a ggplot2 object to a plotly object:
ggplotly()
For modifying any plotly object:
layout()
,add_trace()
,style()
For linked brushing:
highlight()
For arranging multiple plots:
subplot()
,crosstalk::bscols()
For inspecting plotly objects:
plotly_json()
For quick, accurate, and searchable plotly.js reference:
schema()
Examples
data(pbmc_small)
plot_ly(pbmc_small)
Extract a single column
Description
pull()
is similar to $
. It's mostly useful because it looks a little
nicer in pipes, it also works with remote data frames, and it can optionally
name the output.
Usage
## S3 method for class 'Seurat'
pull(.data, var = -1, name = NULL, ...)
Arguments
.data |
A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details. |
var |
A variable specified as:
The default returns the last column (on the assumption that's the column you've created most recently). This argument is taken by expression and supports quasiquotation (you can unquote column names and column locations). |
name |
An optional parameter that specifies the column to be used
as names for a named vector. Specified in a similar manner as |
... |
For use by methods. |
Value
A vector the same size as .data
.
Methods
This function is a generic, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
The following methods are currently available in loaded packages: no methods found.
Examples
data(pbmc_small)
pbmc_small |> pull(groups)
Convert array of quosure (e.g. c(col_a, col_b)) into character vector
Description
Convert array of quosure (e.g. c(col_a, col_b)) into character vector
Usage
quo_names(v)
Arguments
v |
A array of quosures (e.g. c(col_a, col_b)) |
Value
A character vector
Rename columns
Description
rename()
changes the names of individual variables using
new_name = old_name
syntax; rename_with()
renames columns using a
function.
Usage
## S3 method for class 'Seurat'
rename(.data, ...)
Arguments
.data |
A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details. |
... |
For For |
Value
An object of the same type as .data
. The output has the following
properties:
Rows are not affected.
Column names are changed; column order is preserved.
Data frame attributes are preserved.
Groups are updated to reflect new names.
Methods
This function is a generic, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
The following methods are currently available in loaded packages: no methods found.
See Also
Other single table verbs:
arrange()
,
mutate()
,
slice()
,
summarise()
Examples
data(pbmc_small)
pbmc_small |> rename(s_score=nFeature_RNA)
returns variables from an expression
Description
returns variables from an expression
Usage
return_arguments_of(expression)
Arguments
expression |
an expression |
Value
list of symbols
Mutating joins
Description
Mutating joins add columns from y
to x
, matching observations based on
the keys. There are four mutating joins: the inner join, and the three outer
joins.
Inner join
An inner_join()
only keeps observations from x
that have a matching key
in y
.
The most important property of an inner join is that unmatched rows in either input are not included in the result. This means that generally inner joins are not appropriate in most analyses, because it is too easy to lose observations.
Outer joins
The three outer joins keep observations that appear in at least one of the data frames:
A
left_join()
keeps all observations inx
.A
right_join()
keeps all observations iny
.A
full_join()
keeps all observations inx
andy
.
Usage
## S3 method for class 'Seurat'
right_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...)
Arguments
x , y |
A pair of data frames, data frame extensions (e.g. a tibble), or lazy data frames (e.g. from dbplyr or dtplyr). See Methods, below, for more details. |
by |
A join specification created with If To join on different variables between To join by multiple variables, use a
For simple equality joins, you can alternatively specify a character vector
of variable names to join by. For example, To perform a cross-join, generating all combinations of |
copy |
If |
suffix |
If there are non-joined duplicate variables in |
... |
Other parameters passed onto methods. |
Value
An object of the same type as x
(including the same groups). The order of
the rows and columns of x
is preserved as much as possible. The output has
the following properties:
The rows are affect by the join type.
-
inner_join()
returns matchedx
rows. -
left_join()
returns allx
rows. -
right_join()
returns matched ofx
rows, followed by unmatchedy
rows. -
full_join()
returns allx
rows, followed by unmatchedy
rows.
-
Output columns include all columns from
x
and all non-key columns fromy
. Ifkeep = TRUE
, the key columns fromy
are included as well.If non-key columns in
x
andy
have the same name,suffix
es are added to disambiguate. Ifkeep = TRUE
and key columns inx
andy
have the same name,suffix
es are added to disambiguate these as well.If
keep = FALSE
, output columns included inby
are coerced to their common type betweenx
andy
.
Many-to-many relationships
By default, dplyr guards against many-to-many relationships in equality joins by throwing a warning. These occur when both of the following are true:
A row in
x
matches multiple rows iny
.A row in
y
matches multiple rows inx
.
This is typically surprising, as most joins involve a relationship of one-to-one, one-to-many, or many-to-one, and is often the result of an improperly specified join. Many-to-many relationships are particularly problematic because they can result in a Cartesian explosion of the number of rows returned from the join.
If a many-to-many relationship is expected, silence this warning by
explicitly setting relationship = "many-to-many"
.
In production code, it is best to preemptively set relationship
to whatever
relationship you expect to exist between the keys of x
and y
, as this
forces an error to occur immediately if the data doesn't align with your
expectations.
Inequality joins typically result in many-to-many relationships by nature, so they don't warn on them by default, but you should still take extra care when specifying an inequality join, because they also have the capability to return a large number of rows.
Rolling joins don't warn on many-to-many relationships either, but many
rolling joins follow a many-to-one relationship, so it is often useful to
set relationship = "many-to-one"
to enforce this.
Note that in SQL, most database providers won't let you specify a many-to-many relationship between two tables, instead requiring that you create a third junction table that results in two one-to-many relationships instead.
Methods
These functions are generics, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
Methods available in currently loaded packages:
-
inner_join()
: no methods found. -
left_join()
: no methods found. -
right_join()
: no methods found. -
full_join()
: no methods found.
See Also
Other joins:
cross_join()
,
filter-joins
,
nest_join()
Examples
data(pbmc_small)
tt <- pbmc_small
tt |> right_join(tt |>
distinct(groups) |>
mutate(new_column=1:2) |>
slice(1))
Group input by rows
Description
rowwise()
allows you to compute on a data frame a row-at-a-time.
This is most useful when a vectorised function doesn't exist.
Most dplyr verbs preserve row-wise grouping. The exception is summarise()
,
which return a grouped_df. You can explicitly ungroup with ungroup()
or as_tibble()
, or convert to a grouped_df with group_by()
.
Usage
## S3 method for class 'Seurat'
rowwise(data, ...)
Arguments
data |
Input data frame. |
... |
< NB: unlike |
Value
A row-wise data frame with class rowwise_df
. Note that a
rowwise_df
is implicitly grouped by row, but is not a grouped_df
.
List-columns
Because a rowwise has exactly one row per group it offers a small
convenience for working with list-columns. Normally, summarise()
and
mutate()
extract a groups worth of data with [
. But when you index
a list in this way, you get back another list. When you're working with
a rowwise
tibble, then dplyr will use [[
instead of [
to make your
life a little easier.
See Also
nest_by()
for a convenient way of creating rowwise data frames
with nested data.
Examples
# TODO
Sample n rows from a table
Description
sample_n()
and sample_frac()
have been superseded in favour of
slice_sample()
. While they will not be deprecated in the near future,
retirement means that we will only perform critical bug fixes, so we recommend
moving to the newer alternative.
These functions were superseded because we realised it was more convenient to
have two mutually exclusive arguments to one function, rather than two
separate functions. This also made it to clean up a few other smaller
design issues with sample_n()
/sample_frac
:
The connection to
slice()
was not obvious.The name of the first argument,
tbl
, is inconsistent with other single table verbs which use.data
.The
size
argument uses tidy evaluation, which is surprising and undocumented.It was easier to remove the deprecated
.env
argument.-
...
was in a suboptimal position.
Usage
## S3 method for class 'Seurat'
sample_n(tbl, size, replace = FALSE, weight = NULL, .env = NULL, ...)
## S3 method for class 'Seurat'
sample_frac(tbl, size = 1, replace = FALSE, weight = NULL, .env = NULL, ...)
Arguments
tbl |
A data.frame. |
size |
< |
replace |
Sample with or without replacement? |
weight |
< |
.env |
DEPRECATED. |
... |
ignored |
Examples
data(pbmc_small)
pbmc_small |> sample_n(50)
pbmc_small |> sample_frac(0.1)
Keep or drop columns using their names and types
Description
Select (and optionally rename) variables in a data frame, using a concise
mini-language that makes it easy to refer to variables based on their name
(e.g. a:f
selects all columns from a
on the left to f
on the
right) or type (e.g. where(is.numeric)
selects all numeric columns).
Overview of selection features
Tidyverse selections implement a dialect of R where operators make it easy to select variables:
-
:
for selecting a range of consecutive variables. -
!
for taking the complement of a set of variables. -
&
and|
for selecting the intersection or the union of two sets of variables. -
c()
for combining selections.
In addition, you can use selection helpers. Some helpers select specific columns:
-
everything()
: Matches all variables. -
last_col()
: Select last variable, possibly with an offset. -
group_cols()
: Select all grouping columns.
Other helpers select variables by matching patterns in their names:
-
starts_with()
: Starts with a prefix. -
ends_with()
: Ends with a suffix. -
contains()
: Contains a literal string. -
matches()
: Matches a regular expression. -
num_range()
: Matches a numerical range like x01, x02, x03.
Or from variables stored in a character vector:
-
all_of()
: Matches variable names in a character vector. All names must be present, otherwise an out-of-bounds error is thrown. -
any_of()
: Same asall_of()
, except that no error is thrown for names that don't exist.
Or using a predicate function:
-
where()
: Applies a function to all variables and selects those for which the function returnsTRUE
.
Usage
## S3 method for class 'Seurat'
select(.data, ...)
Arguments
.data |
A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details. |
... |
< |
Value
An object of the same type as .data
. The output has the following
properties:
Rows are not affected.
Output columns are a subset of input columns, potentially with a different order. Columns will be renamed if
new_name = old_name
form is used.Data frame attributes are preserved.
Groups are maintained; you can't select off grouping variables.
Methods
This function is a generic, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
The following methods are currently available in loaded packages: no methods found.
Examples
Here we show the usage for the basic selection operators. See the
specific help pages to learn about helpers like starts_with()
.
The selection language can be used in functions like
dplyr::select()
or tidyr::pivot_longer()
. Let's first attach
the tidyverse:
library(tidyverse) # For better printing iris <- as_tibble(iris)
Select variables by name:
starwars %>% select(height) #> # A tibble: 87 x 1 #> height #> <int> #> 1 172 #> 2 167 #> 3 96 #> 4 202 #> # i 83 more rows iris %>% pivot_longer(Sepal.Length) #> # A tibble: 150 x 6 #> Sepal.Width Petal.Length Petal.Width Species name value #> <dbl> <dbl> <dbl> <fct> <chr> <dbl> #> 1 3.5 1.4 0.2 setosa Sepal.Length 5.1 #> 2 3 1.4 0.2 setosa Sepal.Length 4.9 #> 3 3.2 1.3 0.2 setosa Sepal.Length 4.7 #> 4 3.1 1.5 0.2 setosa Sepal.Length 4.6 #> # i 146 more rows
Select multiple variables by separating them with commas. Note how the order of columns is determined by the order of inputs:
starwars %>% select(homeworld, height, mass) #> # A tibble: 87 x 3 #> homeworld height mass #> <chr> <int> <dbl> #> 1 Tatooine 172 77 #> 2 Tatooine 167 75 #> 3 Naboo 96 32 #> 4 Tatooine 202 136 #> # i 83 more rows
Functions like tidyr::pivot_longer()
don't take variables with
dots. In this case use c()
to select multiple variables:
iris %>% pivot_longer(c(Sepal.Length, Petal.Length)) #> # A tibble: 300 x 5 #> Sepal.Width Petal.Width Species name value #> <dbl> <dbl> <fct> <chr> <dbl> #> 1 3.5 0.2 setosa Sepal.Length 5.1 #> 2 3.5 0.2 setosa Petal.Length 1.4 #> 3 3 0.2 setosa Sepal.Length 4.9 #> 4 3 0.2 setosa Petal.Length 1.4 #> # i 296 more rows
Operators:
The :
operator selects a range of consecutive variables:
starwars %>% select(name:mass) #> # A tibble: 87 x 3 #> name height mass #> <chr> <int> <dbl> #> 1 Luke Skywalker 172 77 #> 2 C-3PO 167 75 #> 3 R2-D2 96 32 #> 4 Darth Vader 202 136 #> # i 83 more rows
The !
operator negates a selection:
starwars %>% select(!(name:mass)) #> # A tibble: 87 x 11 #> hair_color skin_color eye_color birth_year sex gender homeworld species #> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> #> 1 blond fair blue 19 male masculine Tatooine Human #> 2 <NA> gold yellow 112 none masculine Tatooine Droid #> 3 <NA> white, blue red 33 none masculine Naboo Droid #> 4 none white yellow 41.9 male masculine Tatooine Human #> # i 83 more rows #> # i 3 more variables: films <list>, vehicles <list>, starships <list> iris %>% select(!c(Sepal.Length, Petal.Length)) #> # A tibble: 150 x 3 #> Sepal.Width Petal.Width Species #> <dbl> <dbl> <fct> #> 1 3.5 0.2 setosa #> 2 3 0.2 setosa #> 3 3.2 0.2 setosa #> 4 3.1 0.2 setosa #> # i 146 more rows iris %>% select(!ends_with("Width")) #> # A tibble: 150 x 3 #> Sepal.Length Petal.Length Species #> <dbl> <dbl> <fct> #> 1 5.1 1.4 setosa #> 2 4.9 1.4 setosa #> 3 4.7 1.3 setosa #> 4 4.6 1.5 setosa #> # i 146 more rows
&
and |
take the intersection or the union of two selections:
iris %>% select(starts_with("Petal") & ends_with("Width")) #> # A tibble: 150 x 1 #> Petal.Width #> <dbl> #> 1 0.2 #> 2 0.2 #> 3 0.2 #> 4 0.2 #> # i 146 more rows iris %>% select(starts_with("Petal") | ends_with("Width")) #> # A tibble: 150 x 3 #> Petal.Length Petal.Width Sepal.Width #> <dbl> <dbl> <dbl> #> 1 1.4 0.2 3.5 #> 2 1.4 0.2 3 #> 3 1.3 0.2 3.2 #> 4 1.5 0.2 3.1 #> # i 146 more rows
To take the difference between two selections, combine the &
and
!
operators:
iris %>% select(starts_with("Petal") & !ends_with("Width")) #> # A tibble: 150 x 1 #> Petal.Length #> <dbl> #> 1 1.4 #> 2 1.4 #> 3 1.3 #> 4 1.5 #> # i 146 more rows
See Also
Other single table verbs:
arrange()
,
filter()
,
mutate()
,
reframe()
,
rename()
,
slice()
,
summarise()
Examples
data(pbmc_small)
pbmc_small |> select(cell, orig.ident)
Separate a character column into multiple columns with a regular expression or numeric locations
Description
separate()
has been superseded in favour of separate_wider_position()
and separate_wider_delim()
because the two functions make the two uses
more obvious, the API is more polished, and the handling of problems is
better. Superseded functions will not go away, but will only receive
critical bug fixes.
Given either a regular expression or a vector of character positions,
separate()
turns a single character column into multiple columns.
Usage
## S3 method for class 'Seurat'
separate(
data,
col,
into,
sep = "[^[:alnum:]]+",
remove = TRUE,
convert = FALSE,
extra = "warn",
fill = "warn",
...
)
Arguments
data |
A data frame. |
col |
< |
into |
Names of new variables to create as character vector.
Use |
sep |
Separator between columns. If character, If numeric, |
remove |
If |
convert |
If NB: this will cause string |
extra |
If
|
fill |
If
|
... |
Additional arguments passed on to methods. |
Value
'tidyseurat'
See Also
unite()
, the complement, extract()
which uses regular
expression capturing groups.
Examples
data(pbmc_small)
un <- pbmc_small |> unite("new_col", c(orig.ident, groups))
un |> separate(new_col, c("orig.ident", "groups"))
Subset rows using their positions
Description
slice()
lets you index rows by their (integer) locations. It allows you
to select, remove, and duplicate rows. It is accompanied by a number of
helpers for common use cases:
-
slice_head()
andslice_tail()
select the first or last rows. -
slice_sample()
randomly selects rows. -
slice_min()
andslice_max()
select rows with highest or lowest values of a variable.
If .data
is a grouped_df, the operation will be performed on each group,
so that (e.g.) slice_head(df, n = 5)
will select the first five rows in
each group.
Usage
## S3 method for class 'Seurat'
slice(.data, ..., .by = NULL, .preserve = FALSE)
## S3 method for class 'Seurat'
slice_sample(
.data,
...,
n = NULL,
prop = NULL,
by = NULL,
weight_by = NULL,
replace = FALSE
)
## S3 method for class 'Seurat'
slice_head(.data, ..., n, prop, by = NULL)
## S3 method for class 'Seurat'
slice_tail(.data, ..., n, prop, by = NULL)
## S3 method for class 'Seurat'
slice_min(
.data,
order_by,
...,
n,
prop,
by = NULL,
with_ties = TRUE,
na_rm = FALSE
)
## S3 method for class 'Seurat'
slice_max(
.data,
order_by,
...,
n,
prop,
by = NULL,
with_ties = TRUE,
na_rm = FALSE
)
Arguments
.data |
A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details. |
... |
For Provide either positive values to keep, or negative values to drop. The values provided must be either all positive or all negative. Indices beyond the number of rows in the input are silently ignored. For |
.by , by |
< |
.preserve |
Relevant when the |
n , prop |
Provide either A negative value of |
weight_by |
< |
replace |
Should sampling be performed with ( |
order_by |
< |
with_ties |
Should ties be kept together? The default, |
na_rm |
Should missing values in |
Details
Slice does not work with relational databases because they have no
intrinsic notion of row order. If you want to perform the equivalent
operation, use filter()
and row_number()
.
Value
An object of the same type as .data
. The output has the following
properties:
Each row may appear 0, 1, or many times in the output.
Columns are not modified.
Groups are not modified.
Data frame attributes are preserved.
Methods
These function are generics, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
Methods available in currently loaded packages:
-
slice()
: no methods found. -
slice_head()
: no methods found. -
slice_tail()
: no methods found. -
slice_min()
: no methods found. -
slice_max()
: no methods found. -
slice_sample()
: no methods found.
These function are generics, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
Methods available in currently loaded packages:
-
slice()
: no methods found. -
slice_head()
: no methods found. -
slice_tail()
: no methods found. -
slice_min()
: no methods found. -
slice_max()
: no methods found. -
slice_sample()
: no methods found.
These function are generics, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
Methods available in currently loaded packages:
-
slice()
: no methods found. -
slice_head()
: no methods found. -
slice_tail()
: no methods found. -
slice_min()
: no methods found. -
slice_max()
: no methods found. -
slice_sample()
: no methods found.
These function are generics, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
Methods available in currently loaded packages:
-
slice()
: no methods found. -
slice_head()
: no methods found. -
slice_tail()
: no methods found. -
slice_min()
: no methods found. -
slice_max()
: no methods found. -
slice_sample()
: no methods found.
These function are generics, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
Methods available in currently loaded packages:
-
slice()
: no methods found. -
slice_head()
: no methods found. -
slice_tail()
: no methods found. -
slice_min()
: no methods found. -
slice_max()
: no methods found. -
slice_sample()
: no methods found.
These function are generics, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
Methods available in currently loaded packages:
-
slice()
: no methods found. -
slice_head()
: no methods found. -
slice_tail()
: no methods found. -
slice_min()
: no methods found. -
slice_max()
: no methods found. -
slice_sample()
: no methods found.
See Also
Other single table verbs:
arrange()
,
mutate()
,
rename()
,
summarise()
Examples
data(pbmc_small)
pbmc_small |> slice(1)
# Slice group-wise using .by
pbmc_small |> slice(1:2, .by=groups)
# slice_sample() allows you to random select with or without replacement
pbmc_small |> slice_sample(n=5)
# if using replacement, and duplicate cells are returned, a tibble will be
# returned because duplicate cells cannot exist in Seurat objects
pbmc_small |> slice_sample(n=1, replace=TRUE) # returns Seurat
pbmc_small |> slice_sample(n=100, replace=TRUE) # returns tibble
# weight by a variable
pbmc_small |> slice_sample(n=5, weight_by=nCount_RNA)
# sample by group
pbmc_small |> slice_sample(n=5, by=groups)
# sample using proportions
pbmc_small |> slice_sample(prop=0.10)
# First rows based on existing order
pbmc_small |> slice_head(n=5)
# Last rows based on existing order
pbmc_small |> slice_tail(n=5)
# Rows with minimum and maximum values of a metadata variable
pbmc_small |> slice_min(nFeature_RNA, n=5)
# slice_min() and slice_max() may return more rows than requested
# in the presence of ties.
pbmc_small |> slice_min(nFeature_RNA, n=2)
# Use with_ties=FALSE to return exactly n matches
pbmc_small |> slice_min(nFeature_RNA, n=2, with_ties=FALSE)
# Or use additional variables to break the tie:
pbmc_small |> slice_min(tibble::tibble(nFeature_RNA, nCount_RNA), n=2)
# Use by for group-wise operations
pbmc_small |> slice_min(nFeature_RNA, n=5, by=groups)
# Rows with minimum and maximum values of a metadata variable
pbmc_small |> slice_max(nFeature_RNA, n=5)
Summarise each group down to one row
Description
summarise()
creates a new data frame. It returns one row for each
combination of grouping variables; if there are no grouping variables, the
output will have a single row summarising all observations in the input. It
will contain one column for each grouping variable and one column for each of
the summary statistics that you have specified.
summarise()
and summarize()
are synonyms.
Usage
## S3 method for class 'Seurat'
summarise(.data, ...)
## S3 method for class 'Seurat'
summarize(.data, ...)
Arguments
.data |
A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details. |
... |
< The value can be:
|
Value
An object usually of the same type as .data
.
The rows come from the underlying
group_keys()
.The columns are a combination of the grouping keys and the summary expressions that you provide.
The grouping structure is controlled by the
.groups=
argument, the output may be another grouped_df, a tibble or a rowwise data frame.Data frame attributes are not preserved, because
summarise()
fundamentally creates a new data frame.
Useful functions
Count:
n()
,n_distinct()
Backend variations
The data frame backend supports creating a variable and using it in the
same summary. This means that previously created summary variables can be
further transformed or combined within the summary, as in mutate()
.
However, it also means that summary variables with the same names as previous
variables overwrite them, making those variables unavailable to later summary
variables.
This behaviour may not be supported in other backends. To avoid unexpected results, consider using new names for your summary variables, especially when creating multiple summaries.
Methods
This function is a generic, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
The following methods are currently available in loaded packages: no methods found.
See Also
Other single table verbs:
arrange()
,
mutate()
,
rename()
,
slice()
Examples
data(pbmc_small)
pbmc_small |> summarise(mean(nCount_RNA))
Format the header of a tibble
Description
For easier customization, the formatting of a tibble is split
into three components: header, body, and footer.
The tbl_format_header()
method is responsible for formatting the header
of a tibble.
Override this method if you need to change the appearance
of the entire header.
If you only need to change or extend the components shown in the header,
override or extend tbl_sum()
for your class which is called by the
default method.
Usage
## S3 method for class 'tidySeurat'
tbl_format_header(x, setup, ...)
Arguments
x |
A tibble-like object. |
setup |
A setup object returned from |
... |
These dots are for future extensions and must be empty. |
Value
A character vector.
Examples
# TODO
tidy for 'Seurat'
Description
tidy for 'Seurat'
Usage
tidy(object)
## S3 method for class 'Seurat'
tidy(object)
Arguments
object |
A 'Seurat' object. |
Value
A 'tidyseurat' object.
Examples
data(pbmc_small)
pbmc_small
Unite multiple columns into one by pasting strings together
Description
Convenience function to paste together multiple columns into one.
Usage
## S3 method for class 'Seurat'
unite(data, col, ..., sep = "_", remove = TRUE, na.rm = FALSE)
Arguments
data |
A data frame. |
col |
The name of the new column, as a string or symbol. This argument is passed by expression and supports
quasiquotation (you can unquote strings
and symbols). The name is captured from the expression with
|
... |
< |
sep |
Separator to use between values. |
remove |
If |
na.rm |
If |
Value
'tidyseurat'
See Also
separate()
, the complement.
Examples
data(pbmc_small)
pbmc_small |> unite(
col="new_col",
c("orig.ident", "groups"))
Unnest a list-column of data frames into rows and columns
Description
Unnest expands a list-column containing data frames into rows and columns.
Usage
## S3 method for class 'tidyseurat_nested'
unnest(
data,
cols,
...,
keep_empty = FALSE,
ptype = NULL,
names_sep = NULL,
names_repair = "check_unique",
.drop,
.id,
.sep,
.preserve
)
unnest_seurat(
data,
cols,
...,
keep_empty = FALSE,
ptype = NULL,
names_sep = NULL,
names_repair = "check_unique",
.drop,
.id,
.sep,
.preserve
)
Arguments
data |
A data frame. |
cols |
< When selecting multiple columns, values from the same row will be recycled to their common size. |
... |
|
keep_empty |
By default, you get one row of output for each element
of the list that you are unchopping/unnesting. This means that if there's a
size-0 element (like |
ptype |
Optionally, a named list of column name-prototype pairs to
coerce |
names_sep |
If |
names_repair |
Used to check that output data frame has valid names. Must be one of the following options:
See |
.drop , .preserve |
|
.id |
|
.sep |
Value
'tidyseurat'
New syntax
tidyr 1.0.0 introduced a new syntax for nest()
and unnest()
that's
designed to be more similar to other functions. Converting to the new syntax
should be straightforward (guided by the message you'll receive) but if
you just need to run an old analysis, you can easily revert to the previous
behaviour using nest_legacy()
and unnest_legacy()
as follows:
library(tidyr) nest <- nest_legacy unnest <- unnest_legacy
See Also
Other rectangling:
hoist()
,
unnest_longer()
,
unnest_wider()
Examples
data(pbmc_small)
pbmc_small |>
nest(data=-groups) |>
unnest(data)