https://github.com/hadley/dplyr
Raw File
Tip revision: 8fa749473e408ca2ee316e9fa263a36d4a695074 authored by Romain Francois on 11 May 2020, 09:50:22 UTC
Move deprecation down to method, not generic slice_()
Tip revision: 8fa7494
select.Rd
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/select.R
\name{select}
\alias{select}
\title{Subset columns using their names and types}
\usage{
select(.data, ...)
}
\arguments{
\item{.data}{A data frame, data frame extension (e.g. a tibble), or a
lazy data frame (e.g. from dbplyr or dtplyr). See \emph{Methods}, below, for
more details.}

\item{...}{<\code{\link[=dplyr_tidy_select]{tidy-select}}> One or more unquoted
expressions separated by commas. Variable names can be used as if they
were positions in the data frame, so expressions like \code{x:y} can
be used to select a range of variables.}
}
\value{
An object of the same type as \code{.data}. The output has the following
properties:
\itemize{
\item Rows are not affected.
\item Output columns are a subset of input columns, potentially with a different
order. Columns will be renamed if \code{new_name = old_name} form is used.
\item Data frame attributes are preserved.
\item Groups are maintained; you can't select off grouping variables.
}
}
\description{
Select (and optionally rename) variables in a data frame, using a concise
mini-language that makes it easy to refer to variables based on their name
(e.g. \code{a:f} selects all columns from \code{a} on the left to \code{f} on the
right). You can also use predicate functions like \link{is.numeric} to select
variables based on their properties.
}
\section{Useful functions}{

As well as using existing functions like \code{:} and \code{c()}, there are
a number of special functions that only work inside \code{select()}:
\itemize{
\item \code{\link[=any_of]{any_of()}}, \code{\link[=all_of]{all_of()}}.
\item \code{\link[=starts_with]{starts_with()}}, \code{\link[=ends_with]{ends_with()}}, \code{\link[=contains]{contains()}}, \code{\link[=matches]{matches()}}.
\item \code{\link[=num_range]{num_range()}}.
\item \code{\link[=group_cols]{group_cols()}}, \code{\link[=last_col]{last_col()}}.
\item \code{\link[=everything]{everything()}}.
}

You can also use predicate functions (functions that return a single \code{TRUE}
or \code{FALSE}) like \code{is.numeric}, \code{is.character}, and \code{is.factor}
to select variables of specific types.

Selections can be combined using Boolean algebra:
\itemize{
\item \code{starts_with("a") & ends_with("x")}: variables with names that start with "a" and end with "x"
\item \code{starts_with("a") | starts_with("b")}: variables with names that start with "a" or "b"
\item \code{!starts_with("a")}: variables with names that do not start with "a"
}

To remove variables from a selection, use \code{&} and \code{!}:
\itemize{
\item \code{starts_with("a") & !ends_width("x")}: variables with names that start with "a" and do not end with "x"
\item \code{is.numeric & !c(a, b, c)}: numeric variables except, for \code{a}, \code{b}, \code{c}.
}

See \link[tidyselect:select_helpers]{select helpers} for more details and
examples.

Note that except for \code{:}, \code{-} and \code{c()}, all complex expressions
are evaluated outside the data frame context. This is to prevent
accidental matching of data frame variables when you refer to
objects in your environment.
}

\section{Methods}{

This function is a \strong{generic}, which means that packages can provide
implementations (methods) for other classes. See the documentation of
individual methods for extra arguments and differences in behaviour.

The following methods are currently available in loaded packages:
\Sexpr[stage=render,results=rd]{dplyr:::methods_rd("select")}.
}

\examples{
select(starwars, starts_with("h"))
select(starwars, ends_with("color"))
select(starwars, !contains("s"))
select(starwars, starts_with("h") & ends_with("color"))
select(starwars, is.numeric)

# Optionally, rename individual variables as they are selected,
# in the format `new_name = old_name`
select(starwars, character_name = name, character_height = height)

# Use num_range() to select variables with numeric suffixes
df <- as.data.frame(matrix(runif(100), nrow = 10))
select(df, V4:V6) # Specify variable names explicitly
select(df, num_range(prefix = "V", range = 4:6)) # Or, specify the prefix used on a numeric range

# Select the existing grouping variables:
starwars \%>\% group_by(gender, eye_color) \%>\% select(group_cols())

# Using select() semantics in across()
starwars \%>\% summarise(across(.cols = height:mass, .fns = ~mean(.x, na.rm = TRUE)))

# Use `{{ }}` inside functions to tunnel data-variables through
# function arguments. See ?dplyr_tidy_eval for more information.
averages <- function(data, vars) {
  data \%>\%
    select({{ vars }}) \%>\%
    lapply(mean, na.rm = TRUE)
}
starwars \%>\% averages(height)
starwars \%>\% averages(c(height, mass))


# Modifying the order of variables --------------------------
# As of dplyr 1.0.0, use relocate(), not select():
starwars \%>\% select(name:birth_year) \%>\% relocate(birth_year, .before = 1)
starwars \%>\% select(name:birth_year) \%>\% relocate(name, .after = last_col())
}
\seealso{
Other single table verbs: 
\code{\link{arrange}()},
\code{\link{filter}()},
\code{\link{mutate}()},
\code{\link{rename}()},
\code{\link{slice}()},
\code{\link{summarise}()}
}
\concept{single table verbs}
back to top