https://github.com/hadley/dplyr
Raw File
Tip revision: 98b8a0f5de25e238ac97514da24ec228610c8701 authored by Lionel Henry on 19 January 2021, 09:23:23 UTC
Merge pull request #5686 from lionel-/fix-warning-overhead
Tip revision: 98b8a0f
distinct.Rd
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/distinct.R
\name{distinct}
\alias{distinct}
\title{Subset distinct/unique rows}
\usage{
distinct(.data, ..., .keep_all = FALSE)
}
\arguments{
\item{.data}{A data frame, data frame extension (e.g. a tibble), or a
lazy data frame (e.g. from dbplyr or dtplyr). See \emph{Methods}, below, for
more details.}

\item{...}{<\code{\link[=dplyr_data_masking]{data-masking}}> Optional variables to use
when determining uniqueness. If there are multiple rows for a given
combination of inputs, only the first row will be preserved. If omitted,
will use all variables.}

\item{.keep_all}{If \code{TRUE}, keep all variables in \code{.data}.
If a combination of \code{...} is not distinct, this keeps the
first row of values.}
}
\value{
An object of the same type as \code{.data}. The output has the following
properties:
\itemize{
\item Rows are a subset of the input but appear in the same order.
\item Columns are not modified if \code{...} is empty or \code{.keep_all} is \code{TRUE}.
Otherwise, \code{distinct()} first calls \code{mutate()} to create new columns.
\item Groups are not modified.
\item Data frame attributes are preserved.
}
}
\description{
Select only unique/distinct rows from a data frame. This is similar
to \code{\link[=unique.data.frame]{unique.data.frame()}} but considerably faster.
}
\section{Methods}{

This function is a \strong{generic}, which means that packages can provide
implementations (methods) for other classes. See the documentation of
individual methods for extra arguments and differences in behaviour.

The following methods are currently available in loaded packages:
\Sexpr[stage=render,results=rd]{dplyr:::methods_rd("distinct")}.
}

\examples{
df <- tibble(
  x = sample(10, 100, rep = TRUE),
  y = sample(10, 100, rep = TRUE)
)
nrow(df)
nrow(distinct(df))
nrow(distinct(df, x, y))

distinct(df, x)
distinct(df, y)

# You can choose to keep all other variables as well
distinct(df, x, .keep_all = TRUE)
distinct(df, y, .keep_all = TRUE)

# You can also use distinct on computed variables
distinct(df, diff = abs(x - y))

# use across() to access select()-style semantics
distinct(starwars, across(contains("color")))

# Grouping -------------------------------------------------
# The same behaviour applies for grouped data frames,
# except that the grouping variables are always included
df <- tibble(
  g = c(1, 1, 2, 2),
  x = c(1, 1, 2, 1)
) \%>\% group_by(g)
df \%>\% distinct(x)

}
back to top