https://github.com/hadley/dplyr
Raw File
Tip revision: 369dc1f2624f4ba57c0615739cda3c4ec7fd8dcf authored by Mara Averick on 11 March 2020, 11:27:42 UTC
Fix misplaced `)` programming vignette
Tip revision: 369dc1f
filter.Rd
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/filter.R
\name{filter}
\alias{filter}
\title{Subset rows using column values}
\usage{
filter(.data, ..., .preserve = FALSE)
}
\arguments{
\item{.data}{A data frame, data frame extension (e.g. a tibble), or a
lazy data frame (e.g. from dbplyr or dtplyr). See \emph{Methods}, below, for
more details.}

\item{...}{<\code{\link[=dplyr_data_masking]{data-masking}}> Expressions that return a
logical value, and are defined in terms of the variables in \code{.data}.
If multiple expressions are included, they are combined with the \code{&} operator.
Only rows for which all conditions evaluate to \code{TRUE} are kept.}

\item{.preserve}{Relevant when the \code{.data} input is grouped.
If \code{.preserve = FALSE} (the default), the grouping structure
is recalculated based on the resulting data, otherwise the grouping is kept as is.}
}
\value{
An object of the same type as \code{.data}. The output has the following properties:
\itemize{
\item Rows are a subset of the input, but appear in the same order.
\item Columns are not modified.
\item The number of groups may be reduced (if \code{.preserve} is not \code{TRUE}).
\item Data frame attributes are preserved.
}
}
\description{
The \code{filter()} function is used to subset a data frame,
retaining all rows that satisfy your conditions.
To be retained, the row must produce a value of \code{TRUE} for all conditions.
Note that when a condition evaluates to \code{NA}
the row will be dropped, unlike base subsetting with \code{[}.
}
\details{
The \code{filter()} function is used to subset the rows of
\code{.data}, applying the expressions in \code{...} to the column values to determine which
rows should be retained. It can be applied to both grouped and ungrouped data (see \code{\link[=group_by]{group_by()}} and
\code{\link[=ungroup]{ungroup()}}). However, dplyr is not yet smart enough to optimise the filtering
operation on grouped datasets that do not need grouped calculations. For this
reason, filtering is often considerably faster on ungrouped data.
}
\section{Useful filter functions}{


There are many functions and operators that are useful when constructing the
expressions used to filter the data:
\itemize{
\item \code{\link{==}}, \code{\link{>}}, \code{\link{>=}} etc
\item \code{\link{&}}, \code{\link{|}}, \code{\link{!}}, \code{\link[=xor]{xor()}}
\item \code{\link[=is.na]{is.na()}}
\item \code{\link[=between]{between()}}, \code{\link[=near]{near()}}
}
}

\section{Grouped tibbles}{


Because filtering expressions are computed within groups, they may
yield different results on grouped tibbles. This will be the case
as soon as an aggregating, lagging, or ranking function is
involved. Compare this ungrouped filtering:\preformatted{starwars \%>\% filter(mass > mean(mass, na.rm = TRUE))
}

With the grouped equivalent:\preformatted{starwars \%>\% group_by(gender) \%>\% filter(mass > mean(mass, na.rm = TRUE))
}

In the ungrouped version, \code{filter()} compares the value of \code{mass} in each row to
the global average (taken over the whole data set), keeping only the rows with
\code{mass} greater than this global average. In contrast, the grouped version calculates
the average mass separately for each \code{gender} group, and keeps rows with \code{mass} greater
than the relevant within-gender average.
}

\section{Methods}{

This function is a \strong{generic}, which means that packages can provide
implementations (methods) for other classes. See the documentation of
individual methods for extra arguments and differences in behaviour.

The following methods are currently available in loaded packages:
\Sexpr[stage=render,results=rd]{dplyr:::methods_rd("filter")}.
}

\examples{
# Filtering by one criterion
filter(starwars, species == "Human")
filter(starwars, mass > 1000)

# Filtering by multiple criteria within a single logical expression
filter(starwars, hair_color == "none" & eye_color == "black")
filter(starwars, hair_color == "none" | eye_color == "black")

# When multiple expressions are used, they are combined using &
filter(starwars, hair_color == "none", eye_color == "black")


# The filtering operation may yield different results on grouped
# tibbles because the expressions are computed within groups.
#
# The following filters rows where `mass` is greater than the
# global average:
starwars \%>\% filter(mass > mean(mass, na.rm = TRUE))

# Whereas this keeps rows with `mass` greater than the gender
# average:
starwars \%>\% group_by(gender) \%>\% filter(mass > mean(mass, na.rm = TRUE))


# To refer to column names that are stored as strings, use the `.data` pronoun:
vars <- c("mass", "height")
cond <- c(80, 150)
starwars \%>\%
  filter(
    .data[[vars[[1]]]] > cond[[1]],
    .data[[vars[[2]]]] > cond[[2]]
  )
# Learn more in ?dplyr_data_masking
}
\seealso{
Other single table verbs: 
\code{\link{arrange}()},
\code{\link{mutate}()},
\code{\link{rename}()},
\code{\link{select}()},
\code{\link{slice}()},
\code{\link{summarise}()}
}
\concept{single table verbs}
back to top