https://github.com/hadley/dplyr
Raw File
Tip revision: 9c51db5bf7dde29d282f65b320f0bafdc5434572 authored by Romain Francois on 28 June 2019, 17:02:58 UTC
link to pkgdown rather than github
Tip revision: 9c51db5
do.Rd
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/do.r
\name{do}
\alias{do}
\title{Do anything}
\usage{
do(.data, ...)
}
\arguments{
\item{.data}{a tbl}

\item{...}{Expressions to apply to each group. If named, results will be
stored in a new column. If unnamed, should return a data frame. You can
use \code{.} to refer to the current group. You can not mix named and
unnamed arguments.}
}
\value{
\code{do()} always returns a data frame. The first columns in the data frame
will be the labels, the others will be computed from \code{...}. Named
arguments become list-columns, with one element for each group; unnamed
elements must be data frames and labels will be duplicated accordingly.

Groups are preserved for a single unnamed input. This is different to
\code{\link[=summarise]{summarise()}} because \code{do()} generally does not reduce the
complexity of the data, it just expresses it in a special way. For
multiple named inputs, the output is grouped by row with
\code{\link[=rowwise]{rowwise()}}. This allows other verbs to work in an intuitive
way.
}
\description{
This is a general purpose complement to the specialised
manipulation functions \code{\link[=filter]{filter()}}, \code{\link[=select]{select()}}, \code{\link[=mutate]{mutate()}},
\code{\link[=summarise]{summarise()}} and \code{\link[=arrange]{arrange()}}. You can use \code{do()}
to perform arbitrary computation, returning either a data frame or
arbitrary objects which will be stored in a list. This is particularly
useful when working with models: you can fit models per group with
\code{do()} and then flexibly extract components with either another
\code{do()} or \code{summarise()}.

For an empty data frame, the expressions will be evaluated once, even in the
presence of a grouping.  This makes sure that the format of the resulting
data frame is the same for both empty and non-empty input.
}
\details{
\Sexpr[results=rd, stage=render]{dplyr:::lifecycle("questioning")}
}
\section{Alternative}{


\code{do()} is marked as questioning as of dplyr 0.8.0, and may be advantageously
replaced by \code{\link[=group_map]{group_map()}}.
}

\section{Connection to plyr}{


If you're familiar with plyr, \code{do()} with named arguments is basically
equivalent to \code{\link[plyr:dlply]{plyr::dlply()}}, and \code{do()} with a single unnamed argument
is basically equivalent to \code{\link[plyr:ldply]{plyr::ldply()}}. However, instead of storing
labels in a separate attribute, the result is always a data frame. This
means that \code{summarise()} applied to the result of \code{do()} can
act like \code{ldply()}.
}

\examples{
by_cyl <- group_by(mtcars, cyl)
do(by_cyl, head(., 2))

models <- by_cyl \%>\% do(mod = lm(mpg ~ disp, data = .))
models

summarise(models, rsq = summary(mod)$r.squared)
models \%>\% do(data.frame(coef = coef(.$mod)))
models \%>\% do(data.frame(
  var = names(coef(.$mod)),
  coef(summary(.$mod)))
)

models <- by_cyl \%>\% do(
  mod_linear = lm(mpg ~ disp, data = .),
  mod_quad = lm(mpg ~ poly(disp, 2), data = .)
)
models
compare <- models \%>\% do(aov = anova(.$mod_linear, .$mod_quad))
# compare \%>\% summarise(p.value = aov$`Pr(>F)`)

if (require("nycflights13")) {
# You can use it to do any arbitrary computation, like fitting a linear
# model. Let's explore how carrier departure delays vary over the time
carriers <- group_by(flights, carrier)
group_size(carriers)

mods <- do(carriers, mod = lm(arr_delay ~ dep_time, data = .))
mods \%>\% do(as.data.frame(coef(.$mod)))
mods \%>\% summarise(rsq = summary(mod)$r.squared)

\dontrun{
# This longer example shows the progress bar in action
by_dest <- flights \%>\% group_by(dest) \%>\% filter(n() > 100)
library(mgcv)
by_dest \%>\% do(smooth = gam(arr_delay ~ s(dep_time) + month, data = .))
}
}
}
back to top