https://github.com/hadley/dplyr
Raw File
Tip revision: 5e3f3ec7dd464d9251d2ec8eb9a3c31624130580 authored by Hadley Wickham on 28 May 2020, 23:52:31 UTC
Increment version number
Tip revision: 5e3f3ec
select.Rd
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/select.R
\name{select}
\alias{select}
\title{Subset columns using their names and types}
\usage{
select(.data, ...)
}
\arguments{
\item{.data}{A data frame, data frame extension (e.g. a tibble), or a
lazy data frame (e.g. from dbplyr or dtplyr). See \emph{Methods}, below, for
more details.}

\item{...}{<\code{\link[=dplyr_tidy_select]{tidy-select}}> One or more unquoted
expressions separated by commas. Variable names can be used as if they
were positions in the data frame, so expressions like \code{x:y} can
be used to select a range of variables.}
}
\value{
An object of the same type as \code{.data}. The output has the following
properties:
\itemize{
\item Rows are not affected.
\item Output columns are a subset of input columns, potentially with a different
order. Columns will be renamed if \code{new_name = old_name} form is used.
\item Data frame attributes are preserved.
\item Groups are maintained; you can't select off grouping variables.
}
}
\description{
Select (and optionally rename) variables in a data frame, using a concise
mini-language that makes it easy to refer to variables based on their name
(e.g. \code{a:f} selects all columns from \code{a} on the left to \code{f} on the
right). You can also use predicate functions like \link{is.numeric} to select
variables based on their properties.
\subsection{Overview of selection features}{

Tidyverse selections implement a dialect of R where operators make
it easy to select variables:
\itemize{
\item \code{:} for selecting a range of consecutive variables.
\item \code{!} for taking the complement of a set of variables.
\item \code{&} and \code{|} for selecting the intersection or the union of two
sets of variables.
\item \code{c()} for combining selections.
}

In addition, you can use \strong{selection helpers}. Some helpers select specific
columns:
\itemize{
\item \code{\link[tidyselect:everything]{everything()}}: Matches all variables.
\item \code{\link[tidyselect:last_col]{last_col()}}: Select last variable, possibly with an offset.
}

These helpers select variables by matching patterns in their names:
\itemize{
\item \code{\link[tidyselect:starts_with]{starts_with()}}: Starts with a prefix.
\item \code{\link[tidyselect:ends_with]{ends_with()}}: Ends with a suffix.
\item \code{\link[tidyselect:contains]{contains()}}: Contains a literal string.
\item \code{\link[tidyselect:matches]{matches()}}: Matches a regular expression.
\item \code{\link[tidyselect:num_range]{num_range()}}: Matches a numerical range like x01, x02, x03.
}

These helpers select variables from a character vector:
\itemize{
\item \code{\link[tidyselect:all_of]{all_of()}}: Matches variable names in a character vector. All
names must be present, otherwise an out-of-bounds error is
thrown.
\item \code{\link[tidyselect:any_of]{any_of()}}: Same as \code{all_of()}, except that no error is thrown
for names that don't exist.
}

This helper selects variables with a function:
\itemize{
\item \code{\link[tidyselect:where]{where()}}: Applies a function to all variables and selects those
for which the function returns \code{TRUE}.
}
}
}
\section{Methods}{

This function is a \strong{generic}, which means that packages can provide
implementations (methods) for other classes. See the documentation of
individual methods for extra arguments and differences in behaviour.

The following methods are currently available in loaded packages:
\Sexpr[stage=render,results=rd]{dplyr:::methods_rd("select")}.
}

\section{Examples}{


Here we show the usage for the basic selection operators. See the
specific help pages to learn about helpers like \code{\link[=starts_with]{starts_with()}}.

The selection language can be used in functions like
\code{dplyr::select()} or \code{tidyr::pivot_longer()}. Let's first attach
the tidyverse:\if{html}{\out{<div class="r">}}\preformatted{library(tidyverse)

# For better printing
iris <- as_tibble(iris)
}\if{html}{\out{</div>}}

Select variables by name:\if{html}{\out{<div class="r">}}\preformatted{starwars \%>\% select(height)
#> # A tibble: 87 x 1
#>   height
#>    <int>
#> 1    172
#> 2    167
#> 3     96
#> 4    202
#> # ... with 83 more rows

iris \%>\% pivot_longer(Sepal.Length)
#> # A tibble: 150 x 6
#>   Sepal.Width Petal.Length Petal.Width Species name         value
#>         <dbl>        <dbl>       <dbl> <fct>   <chr>        <dbl>
#> 1         3.5          1.4         0.2 setosa  Sepal.Length   5.1
#> 2         3            1.4         0.2 setosa  Sepal.Length   4.9
#> 3         3.2          1.3         0.2 setosa  Sepal.Length   4.7
#> 4         3.1          1.5         0.2 setosa  Sepal.Length   4.6
#> # ... with 146 more rows
}\if{html}{\out{</div>}}

Select multiple variables by separating them with commas. Note how
the order of columns is determined by the order of inputs:\if{html}{\out{<div class="r">}}\preformatted{starwars \%>\% select(homeworld, height, mass)
#> # A tibble: 87 x 3
#>   homeworld height  mass
#>   <chr>      <int> <dbl>
#> 1 Tatooine     172    77
#> 2 Tatooine     167    75
#> 3 Naboo         96    32
#> 4 Tatooine     202   136
#> # ... with 83 more rows
}\if{html}{\out{</div>}}

Functions like \code{tidyr::pivot_longer()} don't take variables with
dots. In this case use \code{c()} to select multiple variables:\if{html}{\out{<div class="r">}}\preformatted{iris \%>\% pivot_longer(c(Sepal.Length, Petal.Length))
#> # A tibble: 300 x 5
#>   Sepal.Width Petal.Width Species name         value
#>         <dbl>       <dbl> <fct>   <chr>        <dbl>
#> 1         3.5         0.2 setosa  Sepal.Length   5.1
#> 2         3.5         0.2 setosa  Petal.Length   1.4
#> 3         3           0.2 setosa  Sepal.Length   4.9
#> 4         3           0.2 setosa  Petal.Length   1.4
#> # ... with 296 more rows
}\if{html}{\out{</div>}}
\subsection{Operators:}{

The \code{:} operator selects a range of consecutive variables:\if{html}{\out{<div class="r">}}\preformatted{starwars \%>\% select(name:mass)
#> # A tibble: 87 x 3
#>   name           height  mass
#>   <chr>           <int> <dbl>
#> 1 Luke Skywalker    172    77
#> 2 C-3PO             167    75
#> 3 R2-D2              96    32
#> 4 Darth Vader       202   136
#> # ... with 83 more rows
}\if{html}{\out{</div>}}

The \code{!} operator negates a selection:\if{html}{\out{<div class="r">}}\preformatted{starwars \%>\% select(!(name:mass))
#> # A tibble: 87 x 11
#>   hair_color skin_color eye_color birth_year sex   gender homeworld species
#>   <chr>      <chr>      <chr>          <dbl> <chr> <chr>  <chr>     <chr>  
#> 1 blond      fair       blue            19   male  mascu~ Tatooine  Human  
#> 2 <NA>       gold       yellow         112   none  mascu~ Tatooine  Droid  
#> 3 <NA>       white, bl~ red             33   none  mascu~ Naboo     Droid  
#> 4 none       white      yellow          41.9 male  mascu~ Tatooine  Human  
#> # ... with 83 more rows, and 3 more variables: films <list>, vehicles <list>,
#> #   starships <list>

iris \%>\% select(!c(Sepal.Length, Petal.Length))
#> # A tibble: 150 x 3
#>   Sepal.Width Petal.Width Species
#>         <dbl>       <dbl> <fct>  
#> 1         3.5         0.2 setosa 
#> 2         3           0.2 setosa 
#> 3         3.2         0.2 setosa 
#> 4         3.1         0.2 setosa 
#> # ... with 146 more rows

iris \%>\% select(!ends_with("Width"))
#> # A tibble: 150 x 3
#>   Sepal.Length Petal.Length Species
#>          <dbl>        <dbl> <fct>  
#> 1          5.1          1.4 setosa 
#> 2          4.9          1.4 setosa 
#> 3          4.7          1.3 setosa 
#> 4          4.6          1.5 setosa 
#> # ... with 146 more rows
}\if{html}{\out{</div>}}

\code{&} and \code{|} take the intersection or the union of two selections:\if{html}{\out{<div class="r">}}\preformatted{iris \%>\% select(starts_with("Petal") & ends_with("Width"))
#> # A tibble: 150 x 1
#>   Petal.Width
#>         <dbl>
#> 1         0.2
#> 2         0.2
#> 3         0.2
#> 4         0.2
#> # ... with 146 more rows

iris \%>\% select(starts_with("Petal") | ends_with("Width"))
#> # A tibble: 150 x 3
#>   Petal.Length Petal.Width Sepal.Width
#>          <dbl>       <dbl>       <dbl>
#> 1          1.4         0.2         3.5
#> 2          1.4         0.2         3  
#> 3          1.3         0.2         3.2
#> 4          1.5         0.2         3.1
#> # ... with 146 more rows
}\if{html}{\out{</div>}}

To take the difference between two selections, combine the \code{&} and
\code{!} operators:\if{html}{\out{<div class="r">}}\preformatted{iris \%>\% select(starts_with("Petal") & !ends_with("Width"))
#> # A tibble: 150 x 1
#>   Petal.Length
#>          <dbl>
#> 1          1.4
#> 2          1.4
#> 3          1.3
#> 4          1.5
#> # ... with 146 more rows
}\if{html}{\out{</div>}}
}
}

\seealso{
Other single table verbs: 
\code{\link{arrange}()},
\code{\link{filter}()},
\code{\link{mutate}()},
\code{\link{rename}()},
\code{\link{slice}()},
\code{\link{summarise}()}
}
\concept{single table verbs}
back to top