https://github.com/hadley/dplyr
Raw File
Tip revision: 98b8a0f5de25e238ac97514da24ec228610c8701 authored by Lionel Henry on 19 January 2021, 09:23:23 UTC
Merge pull request #5686 from lionel-/fix-warning-overhead
Tip revision: 98b8a0f
sample_n.Rd
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/sample.R
\name{sample_n}
\alias{sample_n}
\alias{sample_frac}
\title{Sample n rows from a table}
\usage{
sample_n(tbl, size, replace = FALSE, weight = NULL, .env = NULL, ...)

sample_frac(tbl, size = 1, replace = FALSE, weight = NULL, .env = NULL, ...)
}
\arguments{
\item{tbl}{A data.frame.}

\item{size}{<\code{\link[=dplyr_tidy_select]{tidy-select}}>
For \code{sample_n()}, the number of rows to select.
For \code{sample_frac()}, the fraction of rows to select.
If \code{tbl} is grouped, \code{size} applies to each group.}

\item{replace}{Sample with or without replacement?}

\item{weight}{<\code{\link[=dplyr_tidy_select]{tidy-select}}> Sampling weights.
This must evaluate to a vector of non-negative numbers the same length as
the input. Weights are automatically standardised to sum to 1.}

\item{.env}{DEPRECATED.}

\item{...}{ignored}
}
\description{
\Sexpr[results=rd, stage=render]{lifecycle::badge("superseded")}
\code{sample_n()} and \code{sample_frac()} have been superseded in favour of
\code{\link[=slice_sample]{slice_sample()}}. While they will not be deprecated in the near future,
retirement means that we will only perform critical bug fixes, so we recommend
moving to the newer alternative.

These functions were superseded because we realised it was more convenient to
have two mutually exclusive arguments to one function, rather than two
separate functions. This also made it to clean up a few other smaller
design issues with \code{sample_n()}/\code{sample_frac}:
\itemize{
\item The connection to \code{slice()} was not obvious.
\item The name of the first argument, \code{tbl}, is inconsistent with other
single table verbs which use \code{.data}.
\item The \code{size} argument uses tidy evaluation, which is surprising and
undocumented.
\item It was easier to remove the deprecated \code{.env} argument.
\item \code{...} was in a suboptimal position.
}
}
\examples{
by_cyl <- mtcars \%>\% group_by(cyl)

# sample_n() -> slice_sample() ----------------------------------------------
sample_n(mtcars, 10)
sample_n(mtcars, 50, replace = TRUE)
sample_n(mtcars, 10, weight = mpg)

# Changes:
# * explicitly name the `n` argument,
# * the `weight` argument is now `weight_by`.

slice_sample(mtcars, n = 10)
slice_sample(mtcars, n = 50, replace = TRUE)
slice_sample(mtcars, n = 10, weight_by = mpg)

# Note that sample_n() would error if n was bigger than the group size
# slice_sample() will just use the available rows for consistency with
# the other slice helpers like slice_head()

# sample_frac() -> slice_sample() -------------------------------------------
sample_frac(mtcars)
sample_frac(mtcars, replace = TRUE)

# Changes:
# * use prop = 1 to randomly sample all rows

slice_sample(mtcars, prop = 1)
slice_sample(mtcars, prop = 1, replace = TRUE)

}
\keyword{internal}
back to top