https://github.com/cran/metafor
Raw File
Tip revision: 22b3661ee3371fc7acc64b8d53f9d211ed1ee023 authored by Wolfgang Viechtbauer on 11 April 2013, 00:00:00 UTC
version 1.8-0
Tip revision: 22b3661
escalc.Rd
\name{escalc}
\alias{escalc}
\alias{escalc.default}
\alias{escalc.formula}
\alias{[.escalc}
\alias{cbind.escalc}
\title{Calculate Effect Size and Outcome Measures}
\description{
   The function can be used to calculate various effect size or outcome measures (and the corresponding sampling variances) that are commonly used in meta-analyses.
}
\usage{
escalc(measure, formula, \dots)

\method{escalc}{default}(measure, formula, ai, bi, ci, di, n1i, n2i, x1i, x2i, t1i, t2i,
       m1i, m2i, sd1i, sd2i, xi, mi, ri, ti, sdi, ni, data, slab, subset,
       add=1/2, to="only0", drop00=FALSE, vtype="LS",
       var.names=c("yi","vi"), append=TRUE, replace=TRUE, digits=4, \dots)

\method{escalc}{formula}(measure, formula, weights, data,
       add=1/2, to="only0", drop00=FALSE, vtype="LS",
       var.names=c("yi","vi"), digits=4, \dots)
}
\arguments{
   \item{measure}{a character string indicating which effect size or outcome measure should be calculated. See \sQuote{Details} for possible options and how the data should be specified.}
   \item{formula}{when using the formula interface of the function (see \sQuote{Details} below), a model formula specifying the data structure should be specified via this argument. When not using the formula interface, this argument should be ignored and the data required to calculate the effect sizes or outcomes are then passed to the function via the following set of arguments. See \sQuote{Details}.}
   \item{weights}{vector of weights to specify the group sizes or cell frequencies (only needed when using the formula interface). See \sQuote{Details}.}
   \item{ai}{vector to specify the 2x2 table frequencies (upper left cell).}
   \item{bi}{vector to specify the 2x2 table frequencies (upper right cell).}
   \item{ci}{vector to specify the 2x2 table frequencies (lower left cell).}
   \item{di}{vector to specify the 2x2 table frequencies (lower right cell).}
   \item{n1i}{vector to specify the group sizes or row totals (first group/row).}
   \item{n2i}{vector to specify the group sizes or row totals (second group/row).}
   \item{x1i}{vector to specify the number of events (first group).}
   \item{x2i}{vector to specify the number of events (second group).}
   \item{t1i}{vector to specify the total person-times (first group).}
   \item{t2i}{vector to specify the total person-times (second group).}
   \item{m1i}{vector to specify the means (first group or time point).}
   \item{m2i}{vector to specify the means (second group or time point).}
   \item{sd1i}{vector to specify the standard deviations (first group or time point).}
   \item{sd2i}{vector to specify the standard deviations (second group or time point).}
   \item{xi}{vector to specify the frequencies of the event of interest.}
   \item{mi}{vector to specify the frequencies of the complement of the event of interest or the group means.}
   \item{ri}{vector to specify the raw correlation coefficients.}
   \item{ti}{vector to specify the total person-times.}
   \item{sdi}{vector to specify the standard deviations.}
   \item{ni}{vector to specify the sample/group sizes.}
   \item{data}{optional data frame containing the variables given to the arguments above.}
   \item{slab}{optional vector with unique labels for the studies.}
   \item{subset}{optional vector indicating the subset of studies that should be used. This can be a logical vector or a numeric vector indicating the indices of the studies to include.}
   \item{add}{a non-negative number indicating the amount to add to zero cells, counts, or frequencies. See \sQuote{Details}.}
   \item{to}{a character string indicating when the values under \code{add} should be added (either \code{"all"}, \code{"only0"}, \code{"if0all"}, or \code{"none"}). See \sQuote{Details}.}
   \item{drop00}{logical indicating whether studies with no cases/events (or only cases) in both groups should be dropped when calculating the observed outcomes of the individual studies. See \sQuote{Details}.}
   \item{vtype}{a character string indicating the type of sampling variances to calculate (either \code{"LS"}, \code{"UB"}, \code{"ST"}, or \code{vtype="CS"}). See \sQuote{Details}.}
   \item{var.names}{a character string with two elements, specifying the name of the variable for the observed outcomes and the name of the variable for the corresponding sampling variances (default is \code{"yi"} and \code{"vi"}).}
   \item{append}{logical indicating whether the data frame specified via the \code{data} argument (if one has been specified) should be returned together with the observed outcomes and corresponding sampling variances (default is \code{TRUE}).}
   \item{replace}{logical indicating whether existing values for \code{yi} and \code{vi} in the data frame should be replaced or not. Only relevant when \code{append=TRUE} and the data frame already contains the \code{yi} and \code{vi} variables. If \code{replace=TRUE} (the default), all of the existing values will be overwritten. If \code{replace=FALSE}, only \code{NA} values will be replaced. See \sQuote{Value} section below for more details.}
   \item{digits}{integer specifying the number of decimal places to which the printed results should be rounded (default is 4). Note that the values are stored without rounding in the returned object.}
   \item{\dots}{other arguments.}
}
\details{
   Before a meta-analysis can be conducted, the relevant results from each study must be quantified in such a way that the resulting values can be further aggregated and compared. Depending on (a) the goals of the meta-analysis, (b) the design and types of studies included, and (c) the information provided therein, one of the various effect size or outcome measures described below may be appropriate for the meta-analysis and can be computed with the \code{escalc} function.

   The \code{measure} argument is a character string specifying which outcome measure should be calculated (see below for the various options), arguments \code{ai} through \code{ni} are then used to specify the information needed to calculate the various measures (depending on the chosen outcome measure, different arguments need to be specified), and \code{data} can be used to specify a data frame containing the variables given to the previous arguments. The \code{add}, \code{to}, and \code{drop00} arguments may be needed when dealing with frequency or count data that may need special handling when some of the frequencies or counts are equal to zero (see below for details). Finally, the \code{vtype} argument is used to specify how to estimate the sampling variances (again, see below for details).

   To provide a structure to the various effect size or outcome measures that can be calculated with the \code{escalc} function, we can distinguish between measures that are used to:
   \itemize{
   \item contrast two (either experimentally created or naturally occuring) groups,
   \item describe the direction and strength of the association between two variables,
   \item summarize some characteristic or attribute of individual groups.
   } Furthermore, where appropriate, we can further distinguish between measures that are applicable when the characteristic, response, or dependent variable assessed in the individual studies is:
   \itemize{
   \item a dichotomous (binary) variable (e.g., remission versus no remission),
   \item a count of events per time unit (e.g., number of migraines per year),
   \item a quantitative variable (e.g., amount of depression as assessed by a rating scale).
   }

   \subsection{Outcome Measures for Two-Group Comparisons}{

      In many meta-analyses, the goal is to synthesize the results from studies that compare or contrast two groups. The groups may be experimentally defined (e.g., a treatment and a control group created via random assignment) or may naturally occur (e.g., men and women, employees working under high- versus low-stress conditions, people exposed to some environmental risk factor versus those not exposed).

      \subsection{Measures for Dichotomous Variables}{

         In various fields (such as the health and medical sciences), the response variable measured is often dichotomous (binary), so that the data from a study comparing two different groups can be expressed in terms of a 2x2 table, such as:
         \tabular{lccc}{
                 \tab outcome 1 \tab outcome 2 \tab total      \cr
         group 1 \tab \code{ai} \tab \code{bi} \tab \code{n1i} \cr
         group 2 \tab \code{ci} \tab \code{di} \tab \code{n2i}
         } where \code{ai}, \code{bi}, \code{ci}, and \code{di} denote the cell frequencies (i.e., the number of people falling into a particular category) and \code{n1i} and \code{n2i} the row totals (i.e., the group sizes).

         For example, in a set of randomized clinical trials, group 1 and group 2 may refer to the treatment and placebo/control group, respectively, with outcome 1 denoting some event of interest (e.g., death, complications, failure to improve under the treatment) and outcome 2 its complement. Similarly, in a set of cohort studies, group 1 and group 2 may denote those who engage in and those who do not engage in a potentially harmful behavior (e.g., smoking), with outcome 1 denoting the development of a particular disease (e.g., lung cancer) during the follow-up period. Finally, in a set of case-control studies, group 1 and group 2 may refer to those with the disease (i.e., cases) and those free of the disease (i.e., controls), with outcome 1 denoting, for example, exposure to some risk environmental risk factor and outcome 2 non-exposure. Note that in all of these examples, the stratified sampling scheme fixes the row totals (i.e., the group sizes) by design.

         A meta-analysis of studies reporting results in terms of 2x2 tables can be based on one of several different outcome measures, including the relative risk (risk ratio), the odds ratio, the risk difference, and the arcsine transformed risk difference (e.g., Fleiss & Berlin, 2009, Ruecker et al., 2009). For any of these outcome measures, one needs to specify the cell frequencies via the \code{ai}, \code{bi}, \code{ci}, and \code{di} arguments (or alternatively, one can use the \code{ai}, \code{ci}, \code{n1i}, and \code{n2i} arguments).

         The options for the \code{measure} argument are then:
         \itemize{
         \item \code{"RR"} for the \emph{log relative risk}.
         \item \code{"OR"} for the \emph{log odds ratio}.
         \item \code{"RD"} for the \emph{risk difference}.
         \item \code{"AS"} for the \emph{arcsine transformed risk difference} (Ruecker et al., 2009).
         \item \code{"PETO"} for the \emph{log odds ratio} estimated with Peto's method (Yusuf et al., 1985).
         } Note that the log is taken of the relative risk and the odds ratio, which makes these outcome measures symmetric around 0 and yields corresponding sampling distributions that are closer to normality.

         Cell entries with a zero count can be problematic, especially for the relative risk and the odds ratio. Adding a small constant to the cells of the 2x2 tables is a common solution to this problem. When \code{to="only0"} (the default), the value of \code{add} (the default is 1/2) is added to each cell of those 2x2 tables with at least one cell equal to 0. When \code{to="all"}, the value of \code{add} is added to each cell of all 2x2 tables. When \code{to="if0all"}, the value of \code{add} is added to each cell of all 2x2 tables, but only when there is at least one 2x2 table with a zero cell. Setting \code{to="none"} or \code{add=0} has the same effect: No adjustment to the observed table frequencies is made. Depending on the outcome measure and the data, this may lead to division by zero inside of the function (when this occurs, the resulting value is recoded to \code{NA}). Also, studies where \code{ai=ci=0} or \code{bi=di=0} may be considered to be uninformative about the size of the effect and dropping such studies has sometimes been recommended (Higgins & Green, 2008). This can be done by setting \code{drop00=TRUE}. The counts for such studies will be then be set to \code{NA}.

         A dataset corresponding to data of this type is provided in \code{\link{dat.bcg}}.

         Assuming that the dichotomous outcome is actually a dichotomized version of the responses on an underlying quantitative scale, it is also possible to estimate the standardized mean difference based on 2x2 table data, using either the probit transformed risk difference or a transformation of the odds ratio (e.g., Chinn, 2000; Hasselblad & Hedges, 1995; Sanchez-Meca et al., 2003). The options for the \code{measure} argument are then:
         \itemize{
         \item \code{"PBIT"} for the \emph{probit transformed risk difference} as an estimate of the standardized mean difference.
         \item \code{"OR2D"} for \emph{transformed odds ratio} as an estimate of the standardized mean difference.
         } The probit transformation assumes that the responses on the underlying quantitative scale are normally distributed, while the odds ratio transformation assumes that the responses follow logistic distributions within each group.

      }

      \subsection{Measures for Event Counts}{

         In medical and epidemiological studies comparing two different groups (e.g., treated versus untreated patients, exposed versus unexposed individuals), results are sometimes reported in terms of event counts (i.e., the number of events, such as strokes or myocardial infarctions) over a certain period of time. In particular, assume that the studies report data in the form:
         \tabular{lcc}{
                 \tab number of events \tab total person-time \cr
         group 1 \tab \code{x1i}       \tab \code{t1i} \cr
         group 2 \tab \code{x2i}       \tab \code{t2i}
         } where \code{x1i} and \code{x2i} denote the total number of events in the first and the second group, respectively, and \code{t1i} and \code{t2i} the corresponding total person-times at risk. Often, the person-time is measured in years, so that \code{t1i} and \code{t2i} denote the total number of follow-up years in the two groups.

         Note that this form of data is fundamentally different from that described in the previous section, since the total follow-up time may differ even for groups of the same size and the individuals studied may experience the event of interest multiple times. Hence, different outcome measures than the ones described in the previous section must be considered when data are reported in this format. These inlude the incidence rate ratio, the incidence rate difference, and the square-root transformed incidence rate difference (Bagos & Nikolopoulos, 2009). For any of these outcome measures, one needs to specify the total number of events via the \code{x1i} and \code{x2i} arguments and the corresponding total person-times via the \code{t1i} and \code{t2i} arguments.

         The options for the \code{measure} argument are then:
         \itemize{
         \item \code{"IRR"} for the \emph{log incidence rate ratio}.
         \item \code{"IRD"} for the \emph{incidence rate difference}.
         \item \code{"IRSD"} for the \emph{square-root transformed incidence rate difference}.
         } Note that the log is taken of the incidence rate ratio, which makes this outcome measure symmetric around 0 and yields a corresponding sampling distribution that is closer to normality.

         Studies with zero events in one or both groups can be problematic, especially for the incidence rate ratio. Adding a small constant to the number of events is a common solution to this problem. When \code{to="only0"} (the default), the value of \code{add} (the default is 1/2) is added to \code{x1i} and \code{x2i} only in the studies that have zero events in one or both groups. When \code{to="all"}, the value of \code{add} is added to \code{x1i} and \code{x2i} in all studies. When \code{to="if0all"}, the value of \code{add} is added to \code{x1i} and \code{x2i} in all studies, but only when there is at least one study with zero events in one or both groups. Setting \code{to="none"} or \code{add=0} has the same effect: No adjustment to the observed number of events is made. Depending on the outcome measure and the data, this may lead to division by zero inside of the function (when this occurs, the resulting value is recoded to \code{NA}). Like for 2x2 table data, studies where \code{x1i=x2i=0} may be considered to be uninformative about the size of the effect and dropping such studies has sometimes been recommended. This can be done by setting \code{drop00=TRUE}. The counts for such studies will be then be set to \code{NA}.

         A dataset corresponding to data of this type is provided in \code{\link{dat.hart1999}}.

      }

      \subsection{Measures for Quantitative Variables}{

         When the response or dependent variable assessed in the individual studies is measured on some quantitative scale, it is customary to report certain summary statistics, such as the mean and standard deviation of the scores. The data layout for a study comparing two groups with respect to such a variable is then of the form:
         \tabular{lccc}{
                 \tab mean       \tab standard deviation \tab group size \cr
         group 1 \tab \code{m1i} \tab \code{sd1i}        \tab \code{n1i} \cr
         group 2 \tab \code{m2i} \tab \code{sd2i}        \tab \code{n2i}
         } where \code{m1i} and \code{m2i} are the observed means of the two groups, \code{sd1i} and \code{sd2i} the observed standard deviations, and \code{n1i} and \code{n2i} the number of individuals in each group. Again, the two groups may be experimentally created (e.g., a treatment and control group based on random assignment) or naturally occurring (e.g., men and women). In either case, the raw mean difference, the standardized mean difference, and the ratio of means (also called response ratio) are useful outcome measures when meta-analyzing studies of this type (e.g., Borenstein, 2009). In addition, the (log) odds ratio can be estimated based on data of this type, using a simple transformation of the standardized mean difference (e.g., Chinn, 2000; Hasselblad & Hedges, 1995).

         The options for the \code{measure} argument are then:
         \itemize{
         \item \code{"MD"} for the \emph{raw mean difference}.
         \item \code{"SMD"} for the \emph{standardized mean difference}.
         \item \code{"SMDH"} for the \emph{standardized mean difference} without assuming equal population variances in the two groups (Bonett, 2008, 2009).
         \item \code{"ROM"} for the \emph{log transformed ratio of means} (Hedges et al., 1999).
         \item \code{"D2OR"} for the \emph{transformed standardized mean difference} as an estimate of the log odds ratio.
         } Note that the log is taken of the ratio of means, which makes this outcome measures symmetric around 0 and yields a corresponding sampling distribution that is closer to normality (however, note that if \code{m1i} and \code{m2i} have opposite signs, this outcome measure cannot be computed).

         The negative bias in the standardized mean difference is automatically corrected for within the function, yielding Hedges' g for \code{measure="SMD"} (Hedges, 1981). Similarly, the same bias correction is applied for \code{measure="SMDH"} (Bonett, 2009). Finally, for \code{measure="SMD"}, one can choose between \code{vtype="LS"} (the default) and \code{vtype="UB"}. The former uses a large sample approximation to compute the sampling variances. The latter provides unbiased estimates of the sampling variances.

         A dataset corresponding to data of this type is provided in \code{\link{dat.normand1999}} (for mean differences and standardized mean differences). A dataset showing the use of the ratio of means measure is provided in \code{\link{dat.curtis1998}}.

      }

   }

   \subsection{Outcome Measures for Variable Association}{

      Meta-analyses are often used to synthesize studies that examine the direction and strength of the association between two variables measured concurrently and/or without manipulation by experimenters. In this section, a variety of outcome measures will be discussed that may be suitable for a meta-analyses with this purpose. We can distinguish between measures that are applicable when both variables are measured on quantitative scales, when both variables measured are dichotomous, and when the two variables are of mixed types.

      \subsection{Measures for Two Quantitative Variables}{

         The (Pearson or product moment) correlation coefficient quantifies the direction and strength of the (linear) relationship between two quantitative variables and is therefore frequently used as the outcome measure for meta-analyses (e.g., Borenstein, 2009). Two alternative measures are a bias-corrected version of the correlation coefficient and Fisher's r-to-z transformed coefficient.

         For these measures, one needs to specify \code{ri}, the vector with the raw correlation coefficients, and \code{ni}, the corresponding sample sizes. The options for the \code{measure} argument are then:
         \itemize{
         \item \code{"COR"} for the \emph{raw correlation coefficient}.
         \item \code{"UCOR"} for the \emph{raw correlation coefficient} corrected for its slight negative bias (based on equation 2.7 in Olkin & Pratt, 1958).
         \item \code{"ZCOR"} for the \emph{Fisher's r-to-z transformed correlation coefficient} (Fisher, 1921).
         } For \code{measure="COR"} and \code{measure="UCOR"}, one can choose between \code{vtype="LS"} (the default) and \code{vtype="UB"}. The former uses a large sample approximation to compute the sampling variances. The latter provides approximately unbiased estimates of the sampling variances (see Hedges, 1989).

         A dataset corresponding to data of this type is provided in \code{\link{dat.mcdaniel1994}}.

      }

      \subsection{Measures for Two Dichotomous Variables}{

         When the goal of a meta-analysis is to examine the relationship between two dichotomous variables, the data for each study can again be presented in the form of a 2x2 table, except that there may not be a clear distinction between the group (i.e., the row) and the outcome (i.e., the column) variable. Moreover, the table may be a result of cross-sectional (i.e., multinomial) sampling, where none of the table margins (except the total sample size) is fixed by the study design.

         The phi coefficient and the odds ratio are commonly used measures of association for 2x2 table data (e.g., Fleiss & Berlin, 2009). The latter is particularly advantageous, as it is directly comparable to values obtained from stratified sampling (as described earlier). Yule's Q and Yule's Y (Yule, 1912) are additional measures of association for 2x2 table data (although they are not typically used in meta-analyses). Finally, assuming that the two dichotomous variables are actually dichotomized versions of the responses on two underlying quantitative scales (and assuming that the two variables follow a bivariate normal distribution), it is also possible to estimate the correlation between the two variables using the tetrachoric correlation coefficient (Pearson, 1900; Kirk, 1973).

         For any of these outcome measures, one needs to specify the cell frequencies via the \code{ai}, \code{bi}, \code{ci}, and \code{di} arguments. The options for the \code{measure} argument are then:
         \itemize{
         \item \code{"OR"} for the \emph{log odds ratio}.
         \item \code{"PHI"} for the \emph{phi coefficient}.
         \item \code{"YUQ"} for \emph{Yule's Q} (Yule, 1912).
         \item \code{"YUY"} for \emph{Yule's Y} (Yule, 1912).
         \item \code{"RTET"} for the \emph{tetrachoric correlation}.
         } Tables with one or two zero counts are handled as described earlier.

      }

      \subsection{Measures for Mixed Variable Types}{

         Finally, we will consider outcome measures that can be used to describe the relationship between two variables, where one variable is dichotomous and the other variable measures some quantitative characteristic. In that case, it is likely that study authors again report summary statistics, such as the mean and standard deviation of the scores within the two groups (defined by the dichotomous variable). In that case, one can compute the point-biserial (Tate, 1954) as a measure of association between the two variables. If the dichotomous variable is actually a dichotomized version of the responses on an underlying quantitative scale (and assuming that the two variables follow a bivariate normal distribution), it is also possible to estimate the correlation between the two variables using the biserial correlation coefficient (Pearson, 1909; Soper, 1914).

         Here, one again needs to specify \code{m1i} and \code{m2i} for the observed means of the two groups, \code{sd1i} and \code{sd2i} for the observed standard deviations, and \code{n1i} and \code{n2i} for the number of individuals in each group. The options for the \code{measure} argument are then:
         \itemize{
         \item \code{"RPB"} for the \emph{point-biserial correlation}.
         \item \code{"RBIS"} for the \emph{biserial correlation}.
         } For \code{measure="RPB"}, one must indicate via \code{vtype="ST"} or \code{vtype="CS"} whether the data for the studies were obtained using stratified or cross-sectional (i.e., multinomial) sampling, respectively (it is also possible to specify an entire vector for the \code{vtype} argument in case the sampling schemes differed for the various studies).

      }

   }

   \subsection{Outcome Measures for Individual Groups}{

      In this section, outcome measures will be described which may be useful when the goal of a meta-analysis is to synthesize studies that characterize some property of individual groups. We will again distinguish between measures that are applicable when the characteristic of interest is a dichotomous variable, when the characteristic represents an event count, or when the characteristic assessed is a quantiative variable.

      \subsection{Measures for Dichotomous Variables}{

         A meta-analysis may be conducted to aggregate studies that provide data for individual groups with respect to a dichotomous dependent variable. Here, one needs to specify \code{xi} and \code{ni}, denoting the number of individuals experiencing the event of interest and the total number of individuals, respectively. Instead of specifying \code{ni}, one can use \code{mi} to specify the number of individuals that do not experience the event of interest. The options for the \code{measure} argument are then:
         \itemize{
         \item \code{"PR"} for the \emph{raw proportion}.
         \item \code{"PLN"} for the \emph{log transformed proportion}.
         \item \code{"PLO"} for the \emph{logit transformed proportion} (i.e., log odds).
         \item \code{"PAS"} for the \emph{arcsine transformed proportion}.
         \item \code{"PFT"} for the \emph{Freeman-Tukey double arcsine transformed proportion} (Freeman & Tukey, 1950).
         } Zero cell entries can be problematic for certain outcome measures. When \code{to="only0"} (the default), the value of \code{add} (the default is 1/2) is added to \code{xi} and \code{mi} only for studies where \code{xi} or \code{mi} is equal to 0. When \code{to="all"}, the value of \code{add} is added to \code{xi} and \code{mi} in all studies. When \code{to="if0all"}, the value of \code{add} is added in all studies, but only when there is at least one study with a zero value for \code{xi} or \code{mi}. Setting \code{to="none"} or \code{add=0} has the same effect: No adjustment to the observed values is made. Depending on the outcome measure and the data, this may lead to division by zero inside of the function (when this occurs, the resulting value is recoded to \code{NA}).

         A dataset corresponding to data of this type is provided in \code{\link{dat.pritz1997}}.

      }

      \subsection{Measures for Event Counts}{

         Various measures can be used to characterize individual groups when the dependent variable assessed is an event count. Here, one needs to specify \code{xi} and \code{ti}, denoting the total number of events that occurred and the total person-time at risk, respectively. The options for the \code{measure} argument are then:
         \itemize{
         \item \code{"IR"} for the \emph{raw incidence rate}.
         \item \code{"IRLN"} for the \emph{log transformed incidence rate}.
         \item \code{"IRS"} for the \emph{square-root transformed incidence rate}.
         \item \code{"IRFT"} for the \emph{Freeman-Tukey transformed incidence rate} (Freeman & Tukey, 1950).
         } Studies with zero events can be problematic, especially for the log transformed incidence rate. Adding a small constant to the number of events is a common solution to this problem. When \code{to="only0"} (the default), the value of \code{add} (the default is 1/2) is added to \code{xi} only in the studies that have zero events. When \code{to="all"}, the value of \code{add} is added to \code{xi} in all studies. When \code{to="if0all"}, the value of \code{add} is added to \code{xi} in all studies, but only when there is at least one study with zero events. Setting \code{to="none"} or \code{add=0} has the same effect: No adjustment to the observed number of events is made. Depending on the outcome measure and the data, this may lead to division by zero inside of the function (when this occurs, the resulting value is recoded to \code{NA}).

      }

      \subsection{Measures for Quantitative Variables}{

         The goal of a meta-analysis may also be to characterize individual groups, where the response, characteristic, or dependent variable assessed in the individual studies is measured on some quantitative scale. In the simplest case, the raw mean for the quantitative variable is reported for each group, which then becomes the observed outcome for the meta-analysis. Here, one needs to specify \code{mi}, \code{sdi}, and \code{ni} for the observed means, the observed standard deviations, and the sample sizes, respectively. The only option for the \code{measure} argument is then:
         \itemize{
         \item \code{"MN"} for the \emph{raw mean}.
         } Note that \code{sdi} is used to specify the standard deviations of the observed values of the response, characteristic, or dependent variable and not the standard errors of the means.

         A more complicated situation arises when the purpose of the meta-analysis is to assess the amount of change within individual groups. In that case, either the raw mean change or standardized versions thereof can be used as outcome measures (Becker, 1988; Gibbons et al., 1993; Morris, 2000). Here, one needs to specify \code{m1i} and \code{m2i}, the observed means at the two measurement occasions, \code{sd1i} and \code{sd2i} for the corresponding observed standard deviations, \code{ri} for the correlation between the scores observed at the two measurement occasions, and \code{ni} for the sample size. The options for the \code{measure} argument are then:
         \itemize{
         \item \code{"MC"} for the \emph{raw mean change}.
         \item \code{"SMCC"} for the \emph{standardized mean change} using change score standardization.
         \item \code{"SMCR"} for the \emph{standardized mean change} using raw score standardization.
         } See also Morris and DeShon (2002) for a thorough discussion of the difference between the change score measures.

         A few notes about the change score measures. In practice, one often has a mix of information available from the individual studies to compute these measures. In particular, if \code{m1i} and \code{m2i} are unknown, but the raw mean change is directly reported in a particular study, then you can set \code{m1i} to that value and \code{m2i} to 0 (making sure that the raw mean change was computed as \code{m1i-m2i} within that study and not the other way around). Also, for the raw mean change (\code{"MC"}) or the standardized mean change using change score standardization (\code{"SMCC"}), if \code{sd1i}, \code{sd2i}, and \code{ri} are unknown, but the standard deviation of the change scores is directly reported, then you can set \code{sd1i} to that value and both \code{sd2i} and \code{ri} to 0. Finally, for the standardized mean change using raw score standardization (\code{"SMCR"}), argument \code{sd2i} is actually not needed, as the standardization is only based on \code{sd1i} (Becker, 1988; Morris, 2000), which is usually the pre-test standard deviation (if the post-test standard deviation should be used, then set \code{sd1i} to that).

      }

   }

   \subsection{Other Outcome Measures for Meta-Analyses}{

      Other outcome measures are sometimes used for meta-analyses that do not directly fall into the categories above. These are described in this section.

      \subsection{Cronbach's alpha and Transformations Thereof}{

         Meta-analytic methods can also be used to aggregate Cronbach's alpha values. This is usually referred to as a \sQuote{reliability generalization meta-analysis} (Vacha-Haase, 1998). Here, one needs to specify \code{ai}, \code{mi}, and \code{ni} for the observed alpha values, the number of items/replications/parts of the measurement instrument, and the sample sizes, respectively. One can either directly analyze the raw Cronbach's alpha values or transformations thereof (Bonett, 2002, 2010; Hakstian & Whalen, 1976). The options for the \code{measure} argument are then:
         \itemize{
         \item \code{"ARAW"} for \emph{raw alpha} values.
         \item \code{"AHW"} for \emph{transformed alpha values} (Hakstian & Whalen, 1976).
         \item \code{"ABT"} for \emph{transformed alpha values} (Bonett, 2002).
         } Note that the transformations implemented here are slightly different from the ones described by Hakstian and Whalen (1976) and Bonett (2002). In particular, for \code{"AHW"}, the transformation \eqn{1-(1-\alpha)^{1/3}}{1-(1-\alpha)^(1/3)} is used, while for \code{"ABT"}, the transformation \eqn{-ln(1-\alpha)} is used. This ensures that the transformed values are monotonically increasing functions of alpha.

         A dataset corresponding to data of this type is provided in \code{\link{dat.bonett2010}}.

      }

   }

   \subsection{Formula Interface}{

      There are two general ways of specifying the data for computing the various effect size or outcome measures when using the \code{escalc} function, the default and a formula interface. When using the default interface, which is described above, the information needed to compute the various outcome measures is passed to the function via the various arguments outlined above (i.e., arguments \code{ai} through \code{ni}).

      The formula interface works as follows. As above, the argument \code{measure} is a character string specifying which outcome measure should be calculated. The \code{formula} argument is then used to specify the data structure as a multipart formula. The \code{data} argument can be used to specify a data frame containing the variables in the formula. The \code{add}, \code{to}, and \code{vtype} arguments work as described above.

      \subsection{Outcome Measures for Two-Group Comparisons}{

         \subsection{Measures for Dichotomous Variables}{

            For 2x2 table data, the \code{formula} argument takes the form \code{outcome ~ group | study}, where \code{group} is a two-level factor specifying the rows of the tables, \code{outcome} is a two-level factor specifying the columns of the tables (the two possible outcomes), and \code{study} is a factor specifying the study factor. The \code{weights} argument is used to specify the frequencies in the various cells.

         }

         \subsection{Measures for Event Counts}{

            For two-group comparisons with event counts, the \code{formula} argument takes the form \code{events/times ~ group | study}, where \code{group} is a two-level factor specifying the group factor and \code{study} is a factor specifying the study factor. The left-hand side of the formula is composed of two parts, with the first variable for the number of events and the second variable for the person-time at risk.

         }

         \subsection{Measures for Quantitative Variables}{

            For two-group comparisons with quantitative variables, the \code{formula} argument takes the form \code{means/sds ~ group | study}, where \code{group} is a two-level factor specifying the group factor and \code{study} is a factor specifying the study factor. The left-hand side of the formula is composed of two parts, with the first variable for the means and the second variable for the standard deviations. The \code{weights} argument is used to specify the sample sizes in the groups.

         }

      }

      \subsection{Outcome Measures for Variable Association}{

         \subsection{Measures for Two Quantitative Variables}{

            For these outcome measures, the \code{formula} argument takes the form \code{outcome ~ 1 | study}, where \code{outcome} is used to specify the observed correlations and \code{study} is a factor specifying the study factor. The \code{weights} argument is used to specify the sample sizes.

         }

         \subsection{Measures for Two Dichotomous Variables}{

            Here, the data layout is assumed to be the same as for two-group comparisons with dichotomous variables. Hence, the \code{formula} argument is specified in the same manner.

         }

         \subsection{Measures for Mixed Variable Types}{

            Here, the data layout is assumed to be the same as for two-group comparisons with quantitative variables. Hence, the \code{formula} argument is specified in the same manner.

         }

      }

      \subsection{Outcome Measures for Individual Groups}{

         \subsection{Measures for Dichotomous Variables}{

            For these outcome measures, the \code{formula} argument takes the form \code{outcome ~ 1 | study}, where \code{outcome} is a two-level factor specifying the columns of the tables (the two possible outcomes) and \code{study} is a factor specifying the study factor. The \code{weights} argument is used to specify the frequencies in the various cells.

         }

         \subsection{Measures for Event Counts}{

            For these outcome measures, the \code{formula} argument takes the form \code{events/times ~ 1 | study}, where \code{study} is a factor specifying the study factor. The left-hand side of the formula is composed of two parts, with the first variable for the number of events and the second variable for the person-time at risk.

         }

         \subsection{Measures for Quantitative Variables}{

            For this outcome measures, the \code{formula} argument takes the form \code{means/sds ~ 1 | study}, where \code{study} is a factor specifying the study factor. The left-hand side of the formula is composed of two parts, with the first variable for the means and the second variable for the standard deviations. The \code{weights} argument is used to specify the sample sizes.

            Note: The formula interface is (currently) not implemented for the raw mean change and the standardized mean change measures.

         }

      }

      \subsection{Other Outcome Measures for Meta-Analyses}{

         \subsection{Cronbach's alpha and Transformations Thereof}{

            For these outcome measures, the \code{formula} argument takes the form \code{alpha/items ~ 1 | study}, where \code{study} is a factor specifying the study factor. The left-hand side of the formula is composed of two parts, with the first variable for the Cronbach's alpha values and the second variable for the number of items.

         }

      }

   }

}
\value{
   An object of class \code{c("escalc","data.frame")}. The object is a data frame containing the following components:
   \item{yi}{observed outcomes or effect size estimates.}
   \item{vi}{corresponding (estimated) sampling variances.}

   If \code{append=TRUE} and a data frame was specified via the \code{data} argument, then \code{yi} and \code{vi} are append to this data frame. Note that the \code{var.names} argument actually specifies the names of these two variables.

   If the data frame already contains two variables with names as specified by the \code{var.names} argument, the values for these two variables will be overwritten when \code{replace=TRUE} (which is the default). By setting \code{replace=FALSE}, only values that are \code{NA} will be replaced.

   The object is formated and printed with the \code{\link{print.escalc}} function. The \code{\link{summary.escalc}} function can be used to obtain confidence intervals for the individual outcomes.
}
\note{
   The variable names specified under \code{var.names} should be syntactically valid variable names. If necessary, they are adjusted so that they are.

   For standard meta-analyses using the typical (wide-format) data layout (i.e., one row in the dataset per study), the default interface is typically easier to use. The advantage of the formula interface is that it can, in principle, handle more complicated data structures (e.g., studies with more than two treatment groups or more than two outcomes). While such functionality is currently not implemented, this may be the case in the future.
}
\author{
   Wolfgang Viechtbauer \email{wvb@metafor-project.org} \cr
   package homepage: \url{http://www.metafor-project.org/} \cr
   author homepage: \url{http://www.wvbauer.com/}
}
\references{
   Bagos, P. G., & Nikolopoulos, G. K. (2009). Mixed-effects Poisson regression models for meta-analysis of follow-up studies with constant or varying durations. \emph{The International Journal of Biostatistics}, \bold{5}(1), article 21.

   Becker, B. J. (1988). Synthesizing standardized mean-change measures. \emph{British Journal of Mathematical and Statistical Psychology}, \bold{41}, 257--278.

   Bonett, D. G. (2002). Sample size requirements for testing and estimating coefficient alpha. \emph{Journal of Educational and Behavioral Statistics}, \bold{27}, 335--340.

   Bonett, D. G. (2008). Confidence intervals for standardized linear contrasts of means. \emph{Psychological Methods}, \bold{13}, 99--109.

   Bonett, D. G. (2009). Meta-analytic interval estimation for standardized and unstandardized mean differences. \emph{Psychological Methods}, \bold{14}, 225--238.

   Bonett, D. G. (2010). Varying coefficient meta-analytic methods for alpha reliability. \emph{Psychological Methods}, \bold{15}, 368--385.

   Borenstein, M. (2009). Effect sizes for continuous data. In H. Cooper, L. V. Hedges, & J. C. Valentine (Eds.), \emph{The handbook of research synthesis and meta-analysis} (2nd ed., pp. 221--235). New York: Russell Sage Foundation.

   Chinn, S. (2000). A simple method for converting an odds ratio to effect size for use in meta-analysis. \emph{Statistics in Medicine}, \bold{19}, 3127--3131.

   Fisher, R. A. (1921). On the \dQuote{probable error} of a coefficient of correlation deduced from a small sample. \emph{Metron}, \bold{1}, 1--32.

   Fleiss, J. L., & Berlin, J. (2009). Effect sizes for dichotomous data. In H. Cooper, L. V. Hedges, & J. C. Valentine (Eds.), \emph{The handbook of research synthesis and meta-analysis} (2nd ed., pp. 237--253). New York: Russell Sage Foundation.

   Freeman, M. F., & Tukey, J. W. (1950). Transformations related to the angular and the square root. \emph{Annals of Mathematical Statistics}, \bold{21}, 607--611.

   Gibbons, R. D., Hedeker, D. R., & Davis, J. M. (1993). Estimation of effect size from a series of experiments involving paired comparisons. \emph{Journal of Educational Statistics}, \bold{18}, 271--279.

   Hakstian, A. R., & Whalen, T. E. (1976). A k-sample significance test for independent alpha coefficients. \emph{Psychometrika}, \bold{41}, 219--231.

   Hasselblad, V., & Hedges, L. V. (1995). Meta-analysis of screening and diagnostic tests. Psychological Bulletin, 117(1), 167-178.

   Hedges, L. V. (1981). Distribution theory for Glass's estimator of effect size and related estimators. \emph{Journal of Educational Statistics}, \bold{6}, 107--128.

   Hedges, L. V. (1989). An unbiased correction for sampling error in validity generalization studies. \emph{Journal of Applied Psychology}, \bold{74}, 469--477.

   Hedges, L. V., Gurevitch, J., & Curtis, P. S. (1999). The meta-analysis of response ratios in experimental ecology. \emph{Ecology}, \bold{80}, 1150--1156.

   Higgins, J. P. T., & Green, S. (Eds.) (2008). \emph{Cochrane handbook for systematic reviews of interventions}. Chichester, Englang: Wiley.

   Kirk, D. B. (1973). On the numerical approximation of the bivariate normal (tetrachoric) correlation coefficient. \emph{Psychometrika}, \bold{38}, 259--268.

   Morris, S. B. (2000). Distribution of the standardized mean change effect size for meta-analysis on repeated measures. \emph{British Journal of Mathematical and Statistical Psychology}, \bold{53}, 17--29.

   Morris, S. B., & DeShon, R. P. (2002). Combining effect size estimates in meta-analysis with repeated measures and independent-groups designs. \emph{Psychological Methods}, \bold{7}, 105--125.

   Ruecker, G., Schwarzer, G., Carpenter, J., & Olkin, I. (2009). Why add anything to nothing? The arcsine difference as a measure of treatment effect in meta-analysis with zero cells. \emph{Statistics in Medicine}, \bold{28}, 721--738.

   Sanchez-Meca, J., Marin-Martinez, F., & Chacon-Moscoso, S. (2003). Effect-size indices for dichotomized outcomes in meta-analysis. \emph{Psychological Methods}, \bold{8}, 448--467.

   Olkin, I., & Pratt, J. W. (1958). Unbiased estimation of certain correlation coefficients. \emph{Annals of Mathematical Statistics}, \bold{29}, 201--211.

   Pearson, K. (1900). Mathematical contribution to the theory of evolution. VII. On the correlation of characters not quantitatively measurable. \emph{Philosophical Transactions of the Royal Society of London, Series A}, \bold{195}, 1--47.

   Pearson, K. (1909). On a new method of determining correlation between a measured character A, and a character B, of which only the percentage of cases wherein B exceeds (or falls short of) a given intensity is recorded for each grade of A. \emph{Biometrika}, \bold{7}, 96--105.

   Soper, H. E. (1914). On the probable error of the bi-serial expression for the correlation coefficient. \emph{Biometrika}, \bold{10}, 384--390.

   Tate, R. F. (1954). Correlation between a discrete and a continuous variable: Point-biserial correlation. \emph{Annals of Mathematical Statistics}, \bold{25}, 603--607.

   Vacha-Haase, T. (1998). Reliability generalization: Exploring variance in measurement error affecting score reliability across studies. \emph{Educational and Psychological Measurement}, \bold{58}, 6--20.

   Viechtbauer, W. (2010). Conducting meta-analyses in R with the metafor package. \emph{Journal of Statistical Software}, \bold{36}(3), 1--48. \url{http://www.jstatsoft.org/v36/i03/}.

   Yule, G. U. (1912). On the methods of measuring association between two attributes. \emph{Journal of the Royal Statistical Society}, \bold{75}, 579--652.

   Yusuf, S., Peto, R., Lewis, J., Collins, R., & Sleight, P. (1985). Beta blockade during and after myocardial infarction: An overview of the randomized trials. \emph{Progress in Cardiovascular Disease}, \bold{27}, 335--371.
}
\seealso{
   \code{\link{print.escalc}}, \code{\link{summary.escalc}}, \code{\link{rma.uni}}, \code{\link{rma.mh}}, \code{\link{rma.peto}}, \code{\link{rma.glmm}}
}
\examples{
### load BCG vaccine data
data(dat.bcg)

### calculate log relative risks and corresponding sampling variances
dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg)
dat

### suppose that for a particular study, yi and vi are known (i.e., have
### already been calculated) but the 2x2 table counts are not known; with
### replace=FALSE, the yi and vi values for that study are not replaced
dat[1:12,10:11] <- NA
dat[13,4:7] <- NA
dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat, replace=FALSE)
dat

### using formula interface (first rearrange data into required format)
k <- length(dat.bcg$trial)
dat.fm      <- data.frame(study=factor(rep(1:k, each=4)))
dat.fm$grp  <- factor(rep(c("T","T","C","C"), k), levels=c("T","C"))
dat.fm$out  <- factor(rep(c("+","-","+","-"), k), levels=c("+","-"))
dat.fm$freq <- with(dat.bcg, c(rbind(tpos, tneg, cpos, cneg)))
dat.fm
escalc(out ~ grp | study, weights=freq, data=dat.fm, measure="RR")
}
\keyword{datagen}
back to top