https://github.com/hadley/dplyr
Raw File
Tip revision: 39ee11bfe78c4a301070b00d3b92127217786ba7 authored by Hadley Wickham on 23 January 2020, 16:18:08 UTC
Preserve drop attr
Tip revision: 39ee11b
src_dbi.Rd
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/src_dbi.R
\name{src_dbi}
\alias{src_dbi}
\alias{src_mysql}
\alias{src_postgres}
\alias{src_sqlite}
\title{Source for database backends}
\usage{
src_mysql(
  dbname,
  host = NULL,
  port = 0L,
  username = "root",
  password = "",
  ...
)

src_postgres(
  dbname = NULL,
  host = NULL,
  port = NULL,
  user = NULL,
  password = NULL,
  ...
)

src_sqlite(path, create = FALSE)
}
\arguments{
\item{dbname}{Database name}

\item{host, port}{Host name and port number of database}

\item{...}{for the src, other arguments passed on to the underlying
database connector, \code{\link[DBI:dbConnect]{DBI::dbConnect()}}. For the tbl, included for
compatibility with the generic, but otherwise ignored.}

\item{user, username, password}{User name and password.

Generally, you should avoid saving username and password in your
scripts as it is easy to accidentally expose valuable credentials.
Instead, retrieve them from environment variables, or use database
specific credential scores. For example, with MySQL you can set up \code{my.cnf}
as described in \code{\link[RMySQL:MySQL]{RMySQL::MySQL()}}.}

\item{path}{Path to SQLite database. You can use the special path
":memory:" to create a temporary in memory database.}

\item{create}{if \code{FALSE}, \code{path} must already exist. If
\code{TRUE}, will create a new SQLite3 database at \code{path} if
\code{path} does not exist and connect to the existing database if
\code{path} does exist.}
}
\value{
An S3 object with class \code{src_dbi}, \code{src_sql}, \code{src}.
}
\description{
For backward compatibility dplyr provides three srcs for popular
open source databases:
\itemize{
\item \code{src_mysql()} connects to a MySQL or MariaDB database using \code{\link[RMySQL:MySQL]{RMySQL::MySQL()}}.
\item \code{src_postgres()} connects to PostgreSQL using \code{\link[RPostgreSQL:PostgreSQL]{RPostgreSQL::PostgreSQL()}}
\item \code{src_sqlite()} to connect to a SQLite database using \code{\link[RSQLite:SQLite]{RSQLite::SQLite()}}.
}

However, modern best practice is to use \code{\link[=tbl]{tbl()}} directly on an \code{DBIConnection}.
}
\details{
All data manipulation on SQL tbls are lazy: they will not actually
run the query or retrieve the data unless you ask for it: they all return
a new \code{tbl_dbi} object. Use \code{\link[=compute]{compute()}} to run the query and save the
results in a temporary in the database, or use \code{\link[=collect]{collect()}} to retrieve the
results to R. You can see the query with \code{\link[=show_query]{show_query()}}.

For best performance, the database should have an index on the variables
that you are grouping by. Use \code{\link[=explain]{explain()}} to check that the database is using
the indexes that you expect.

There is one exception: \code{\link[=do]{do()}} is not lazy since it must pull the data
into R.
}
\examples{
# Basic connection using DBI -------------------------------------------
if (require(dbplyr, quietly = TRUE)) {

con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
copy_to(con, mtcars)

DBI::dbListTables(con)

# To retrieve a single table from a source, use `tbl()`
con \%>\% tbl("mtcars")

# You can also use pass raw SQL if you want a more sophisticated query
con \%>\% tbl(sql("SELECT * FROM mtcars WHERE cyl == 8"))

# To show off the full features of dplyr's database integration,
# we'll use the Lahman database. lahman_sqlite() takes care of
# creating the database.
lahman_p <- lahman_sqlite()
batting <- lahman_p \%>\% tbl("Batting")
batting

# Basic data manipulation verbs work in the same way as with a tibble
batting \%>\% filter(yearID > 2005, G > 130)
batting \%>\% select(playerID:lgID)
batting \%>\% arrange(playerID, desc(yearID))
batting \%>\% summarise(G = mean(G), n = n())

# There are a few exceptions. For example, databases give integer results
# when dividing one integer by another. Multiply by 1 to fix the problem
batting \%>\%
  select(playerID:lgID, AB, R, G) \%>\%
  mutate(
   R_per_game1 = R / G,
   R_per_game2 = R * 1.0 / G
 )

# All operations are lazy: they don't do anything until you request the
# data, either by `print()`ing it (which shows the first ten rows),
# or by `collect()`ing the results locally.
system.time(recent <- filter(batting, yearID > 2010))
system.time(collect(recent))

# You can see the query that dplyr creates with show_query()
batting \%>\%
  filter(G > 0) \%>\%
  group_by(playerID) \%>\%
  summarise(n = n()) \%>\%
  show_query()
}
}
back to top