https://github.com/cran/sn
Tip revision: bc33612e6cc33fcf28f50655cab5f1931985ccde authored by Adelchi Azzalini on 04 April 2023, 17:10:02 UTC
version 2.1.1
version 2.1.1
Tip revision: bc33612
fitdistr.grouped.Rd
% file sn/man/fitdistr.grouped.Rd
% This file is a component of the package 'sn' for R
% copyright (C) 2022 Adelchi Azzalini
%
\name{fitdistr.grouped}
\alias{fitdistr.grouped}
\title{
Maximum-likelihood fitting of a univariate distribution from grouped data
}
\description{
Maximum-likelihood fitting of a univariate distribution when the data are
represented by a set of frequencies pertaining to given set of contiguous
intervals.
}
\usage{
fitdistr.grouped(breaks, counts, family, weights, trace = FALSE, wpar = NULL)
}
\arguments{
\item{breaks}{A numeric vector of strictly increasing values which identify
a set of contiguous intervals on the real line.
See \sQuote{Details} for additional information.}
\item{counts}{A vector of non-negative integers representing the number
of observations falling in the intervals specified by \code{breaks};
it is then required that \code{length(counts)+1=length(breaks)}. }
\item{family}{A character string specifying the parametric family of
distributions to be used for fitted.
Admissible names are: \code{"normal"}, \code{"logistic"}, \code{"t"},
\code{"Cauchy"}, \code{"SN"}, \code{"ST"}, \code{"SC"}, \code{"gamma"},
\code{"Weibull"};
the names \code{"gaussian"} and \code{"Gaussian"} are also allowed,
and are converted to \code{"normal"}.}
\item{weights}{An alias for \code{counts}, allowed for analogy with the
\code{selm} function.}
\item{trace}{A logical value which indicates whether intermediate evaluations
of the optimization process are printed (default: \code{FALSE}).}
\item{wpar}{
An optional vector with initial values of the \sQuote{working parameters}
for starting the maximization of the log-likelihood function;
see \sQuote{Details} for their description. }
}
\details{
The original motivation of this function was fitting a univariate \acronym{SN},
\acronym{ST} or \acronym{SC} distribution from grouped data;
its scope was later extended to include some other continuous distributions.
The adopted name of the function reflects the broad similarity of its purpose
with the one of \code{\link[MASS]{fitdistr}}, but there are substantial
differences in the actual working of the two functions.
The parameter set of a given \code{family} is the same as appearing in the
corresponding \code{d<basename>} function, with the exception of the \code{"t"}
distribution, for which a location and a scale parameter are included,
besides \code{df}.
The range of \code{breaks} does not need to span the whole support of the
chosen \code{family} of distributions, that is,
\code{(0, Inf)} for \code{"Weibull"} and \code{"gamma"} families,
\code{(-Inf, Inf)} for the other families.
In fact, for the purpose of post-fitting plotting, an infinite
\code{range(breaks)} represents a complication, requiring an \emph{ad hoc}
handling; so it is sensible to avoid it.
However, at the maximum-likelihood fitting stage, the full support of
the probability distribution is considered, with the following implications.
If \code{max(breaks)=xR}, say, and \code{xR<Inf}, then an additional
interval \code{(xR, Inf)} is introduced, with value \code{counts=0} assigned.
A similar action is taken at the lower end: if \code{min(breaks)=xL} is
larger than the infimum of the support of the distribution,
an extra 0-\code{counts} interval is introduced as \code{(0, xL)}
or \code{(-Inf, xL)}, depending on the support of the \code{family}.
Maximum likelihood fitting is obtained by maximizing the pertaining
multinomial log-likelihood using the \code{\link[stats]{optim}} function
with \code{method="Nelder-Mead"}. For numerical convenience, the numerical
search is performed using \sQuote{working parameters} in place of the original
ones, with reverse conversion at the end. The working parameters coincide
with the original distribution parameters when they have unbounded range,
while they are log-transformed in case of intrinsically positive parameters.
This transformation applies to the parameters of the positive-valued
distributions ("gamma" and "Weibull"), all scale parameters and \code{df}
of the \code{"t"} distribution.
}
\value{
An object of class \code{fitdistr.grouped}, whose components are described in
\code{\link{fitdistr.grouped-class}}.
}
%% \references{multinomial likelihood?}
\author{Adelchi Azzalini }
%%\note{%% ~~further notes~~}
%% ~Make other sections like Warning with \section{Warning }{....} ~
\seealso{For methods pertaining to this class of objects, see
\code{\link{fitdistr.grouped-class}} and
\code{\link{plot.fitdistr.grouped}}; see also
\code{\link{dsn}}, \code{\link{dst}}, \code{\link{dsc}},
\code{\link[stats]{Distributions}}, \code{\link[stats]{dmultinom}};
see also \code{\link{selm}} for ungrouped data fitting and an example
elaborated on below.
}
\examples{
data(barolo)
attach(barolo)
A75 <- (reseller=="A" & volume==75)
logPrice <- log(price[A75], 10) # used in documentation of 'selm'; see its fitting
detach(barolo)
breaks<- seq(1, 3, by=0.25)
f <- cut(logPrice, breaks = breaks)
counts <- tabulate(f, length(levels(f)))
logPrice.grouped <- fitdistr.grouped(breaks, counts, family='ST')
summary(logPrice.grouped) # compare this fit with the ungrouped data fitting
}
\keyword{distribution}
\keyword{univar}
\concept{grouped data}
\concept{multinomial distribution}