% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/dplyr_methods.R
\name{filter}
\alias{filter}
\alias{filter.Seurat}
\title{Keep or drop rows that match a condition}
\usage{
\method{filter}{Seurat}(.data, ..., .preserve = FALSE)
}
\arguments{
\item{.data}{A data frame, data frame extension (e.g. a tibble), or a
lazy data frame (e.g. from dbplyr or dtplyr). See \emph{Methods}, below, for
more details.}

\item{...}{<\code{\link[rlang:args_data_masking]{data-masking}}> Expressions that
return a logical vector, defined in terms of the variables in \code{.data}. If
multiple expressions are included, they are combined with the \code{&} operator.
To combine expressions using \code{|} instead, wrap them in \code{\link[dplyr:when_any]{when_any()}}. Only
rows for which all expressions evaluate to \code{TRUE} are kept (for \code{filter()})
or dropped (for \code{filter_out()}).}

\item{.preserve}{Relevant when the \code{.data} input is grouped. If \code{.preserve = FALSE} (the default), the grouping structure is recalculated based on the
resulting data, otherwise the grouping is kept as is.}
}
\value{
An object of the same type as \code{.data}. The output has the following
properties:
\itemize{
\item Rows are a subset of the input, but appear in the same order.
\item Columns are not modified.
\item The number of groups may be reduced (if \code{.preserve} is not \code{TRUE}).
\item Data frame attributes are preserved.
}
}
\description{
These functions are used to subset a data frame, applying the expressions in
\code{...} to determine which rows should be kept (for \code{filter()}) or dropped (
for \code{filter_out()}).

Multiple conditions can be supplied separated by a comma. These will be
combined with the \code{&} operator. To combine comma separated conditions using
\code{|} instead, wrap them in \code{\link[dplyr:when_any]{when_any()}}.

Both \code{filter()} and \code{filter_out()} treat \code{NA} like \code{FALSE}. This subtle
behavior can impact how you write your conditions when missing values are
involved. See the section on \verb{Missing values} for important details and
examples.
}
\section{Missing values}{



Both \code{filter()} and \code{filter_out()} treat \code{NA} like \code{FALSE}. This results in
the following behavior:
\itemize{
\item \code{filter()} \emph{drops} both \code{NA} and \code{FALSE}.
\item \code{filter_out()} \emph{keeps} both \code{NA} and \code{FALSE}.
}

This means that \verb{filter(data, <conditions>) + filter_out(data, <conditions>)}
captures every row within \code{data} exactly once.

The \code{NA} handling of these functions has been designed to match your
\emph{intent}. When your intent is to keep rows, use \code{filter()}. When your intent
is to drop rows, use \code{filter_out()}.

For example, if your goal with this \code{cars} data is to "drop rows where the
\code{class} is suv", then you might write this in one of two ways:

\if{html}{\out{<div class="sourceCode r">}}\preformatted{cars <- tibble(class = c("suv", NA, "coupe"))
cars
#> # A tibble: 3 x 1
#>   class
#>   <chr>
#> 1 suv  
#> 2 <NA> 
#> 3 coupe
}\if{html}{\out{</div>}}

\if{html}{\out{<div class="sourceCode r">}}\preformatted{cars |> filter(class != "suv")
#> # A tibble: 1 x 1
#>   class
#>   <chr>
#> 1 coupe
}\if{html}{\out{</div>}}

\if{html}{\out{<div class="sourceCode r">}}\preformatted{cars |> filter_out(class == "suv")
#> # A tibble: 2 x 1
#>   class
#>   <chr>
#> 1 <NA> 
#> 2 coupe
}\if{html}{\out{</div>}}

Note how \code{filter()} drops the \code{NA} rows even though our goal was only to drop
\code{"suv"} rows, but \code{filter_out()} matches our intuition.

To generate the correct result with \code{filter()}, you'd need to use:

\if{html}{\out{<div class="sourceCode r">}}\preformatted{cars |> filter(class != "suv" | is.na(class))
#> # A tibble: 2 x 1
#>   class
#>   <chr>
#> 1 <NA> 
#> 2 coupe
}\if{html}{\out{</div>}}

This quickly gets unwieldy when multiple conditions are involved.

In general, if you find yourself:
\itemize{
\item Using "negative" operators like \code{!=} or \code{!}
\item Adding in \code{NA} handling like \verb{| is.na(col)} or \verb{& !is.na(col)}
}

then you should consider if swapping to the other filtering variant would
make your conditions simpler.
\subsection{Comparison to base subsetting}{

Base subsetting with \code{[} doesn't treat \code{NA} like \code{TRUE} or \code{FALSE}. Instead,
it generates a fully missing row, which is different from how both \code{filter()}
and \code{filter_out()} work.

\if{html}{\out{<div class="sourceCode r">}}\preformatted{cars <- tibble(class = c("suv", NA, "coupe"), mpg = c(10, 12, 14))
cars
#> # A tibble: 3 x 2
#>   class   mpg
#>   <chr> <dbl>
#> 1 suv      10
#> 2 <NA>     12
#> 3 coupe    14
}\if{html}{\out{</div>}}

\if{html}{\out{<div class="sourceCode r">}}\preformatted{cars[cars$class == "suv",]
#> # A tibble: 2 x 2
#>   class   mpg
#>   <chr> <dbl>
#> 1 suv      10
#> 2 <NA>     NA

cars |> filter(class == "suv")
#> # A tibble: 1 x 2
#>   class   mpg
#>   <chr> <dbl>
#> 1 suv      10
}\if{html}{\out{</div>}}
}

}

\section{Useful filter functions}{



There are many functions and operators that are useful when constructing the
expressions used to filter the data:
\itemize{
\item \code{\link{==}}, \code{\link{>}}, \code{\link{>=}} etc
\item \code{\link{&}}, \code{\link{|}}, \code{\link{!}}, \code{\link[=xor]{xor()}}
\item \code{\link[=is.na]{is.na()}}
\item \code{\link[dplyr:between]{between()}}, \code{\link[dplyr:near]{near()}}
\item \code{\link[dplyr:when_any]{when_any()}}, \code{\link[dplyr:when_all]{when_all()}}
}

}

\section{Grouped tibbles}{



Because filtering expressions are computed within groups, they may yield
different results on grouped tibbles. This will be the case as soon as an
aggregating, lagging, or ranking function is involved. Compare this ungrouped
filtering:

\if{html}{\out{<div class="sourceCode">}}\preformatted{starwars |> filter(mass > mean(mass, na.rm = TRUE))
}\if{html}{\out{</div>}}

With the grouped equivalent:

\if{html}{\out{<div class="sourceCode">}}\preformatted{starwars |> filter(mass > mean(mass, na.rm = TRUE), .by = gender)
}\if{html}{\out{</div>}}

In the ungrouped version, \code{filter()} compares the value of \code{mass} in each row
to the global average (taken over the whole data set), keeping only the rows
with \code{mass} greater than this global average. In contrast, the grouped
version calculates the average mass separately for each \code{gender} group, and
keeps rows with \code{mass} greater than the relevant within-gender average.

}

\section{Methods}{



This function is a \strong{generic}, which means that packages can provide
implementations (methods) for other classes. See the documentation of
individual methods for extra arguments and differences in behaviour.

The following methods are currently available in loaded packages:
\Sexpr[stage=render,results=rd]{dplyr:::methods_rd("filter")}.

}

\examples{
data("pbmc_small")
pbmc_small |>  filter(groups == "g1")

# Learn more in ?dplyr_eval

}
\seealso{
Other single table verbs: 
\code{\link[dplyr]{arrange}()},
\code{\link[dplyr]{mutate}()},
\code{\link[dplyr]{reframe}()},
\code{\link[dplyr]{rename}()},
\code{\link[dplyr]{select}()},
\code{\link[dplyr]{slice}()},
\code{\link[dplyr]{summarise}()}
}
