https://github.com/hadley/dplyr
Raw File
Tip revision: d9794ec07566c23089c0493ce3b59029d8681ce5 authored by Romain Francois on 19 March 2019, 14:40:27 UTC
version bump [ci skip]
Tip revision: d9794ec
group_by.Rd
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/group-by.r
\name{group_by}
\alias{group_by}
\alias{ungroup}
\title{Group by one or more variables}
\usage{
group_by(.data, ..., add = FALSE, .drop = group_by_drop_default(.data))

ungroup(x, ...)
}
\arguments{
\item{.data}{a tbl}

\item{...}{Variables to group by. All tbls accept variable names.
Some tbls will accept functions of variables. Duplicated groups
will be silently dropped.}

\item{add}{When \code{add = FALSE}, the default, \code{group_by()} will
override existing groups. To add to the existing groups, use
\code{add = TRUE}.}

\item{.drop}{When \code{.drop = TRUE}, empty groups are dropped. See \code{\link[=group_by_drop_default]{group_by_drop_default()}} for
what the default value is for this argument.}

\item{x}{A \code{\link[=tbl]{tbl()}}}
}
\value{
A \link[=grouped_df]{grouped data frame}, unless the combination of \code{...} and \code{add}
yields a non empty set of grouping columns, a regular (ungrouped) data frame
otherwise.
}
\description{
Most data operations are done on groups defined by variables.
\code{group_by()} takes an existing tbl and converts it into a grouped tbl
where operations are performed "by group". \code{ungroup()} removes grouping.
}
\section{Tbl types}{


\code{group_by()} is an S3 generic with methods for the three built-in
tbls. See the help for the corresponding classes and their manip
methods for more details:

\itemize{
\item data.frame: \link{grouped_df}
\item data.table: \link[dtplyr:grouped_dt]{dtplyr::grouped_dt}
\item SQLite: \code{\link[=src_sqlite]{src_sqlite()}}
\item PostgreSQL: \code{\link[=src_postgres]{src_postgres()}}
\item MySQL: \code{\link[=src_mysql]{src_mysql()}}
}
}

\section{Scoped grouping}{


The three \link{scoped} variants (\code{\link[=group_by_all]{group_by_all()}}, \code{\link[=group_by_if]{group_by_if()}} and
\code{\link[=group_by_at]{group_by_at()}}) make it easy to group a dataset by a selection of
variables.
}

\examples{
by_cyl <- mtcars \%>\% group_by(cyl)

# grouping doesn't change how the data looks (apart from listing
# how it's grouped):
by_cyl

# It changes how it acts with the other dplyr verbs:
by_cyl \%>\% summarise(
  disp = mean(disp),
  hp = mean(hp)
)
by_cyl \%>\% filter(disp == max(disp))

# Each call to summarise() removes a layer of grouping
by_vs_am <- mtcars \%>\% group_by(vs, am)
by_vs <- by_vs_am \%>\% summarise(n = n())
by_vs
by_vs \%>\% summarise(n = sum(n))

# To removing grouping, use ungroup
by_vs \%>\%
  ungroup() \%>\%
  summarise(n = sum(n))

# You can group by expressions: this is just short-hand for
# a mutate/rename followed by a simple group_by
mtcars \%>\% group_by(vsam = vs + am)

# By default, group_by overrides existing grouping
by_cyl \%>\%
  group_by(vs, am) \%>\%
  group_vars()

# Use add = TRUE to instead append
by_cyl \%>\%
  group_by(vs, am, add = TRUE) \%>\%
  group_vars()

# when factors are involved, groups can be empty
tbl <- tibble(
  x = 1:10,
  y = factor(rep(c("a", "c"), each  = 5), levels = c("a", "b", "c"))
)
tbl \%>\%
  group_by(y) \%>\%
  group_rows()

}
\seealso{
Other grouping functions: \code{\link{group_by_all}},
  \code{\link{group_indices}}, \code{\link{group_keys}},
  \code{\link{group_map}}, \code{\link{group_nest}},
  \code{\link{group_rows}}, \code{\link{group_size}},
  \code{\link{group_trim}}, \code{\link{groups}}
}
\concept{grouping functions}
back to top