#' Non Graphical Cattel's Scree Test #' #' The \code{nScree} function returns an analysis of the number of component or #' factors to retain in an exploratory principal component or factor analysis. #' The function also returns information about the number of components/factors #' to retain with the Kaiser rule and the parallel analysis. #' #' The \code{nScree} function returns an analysis of the number of #' components/factors to retain in an exploratory principal component or factor #' analysis. Different solutions are given. The classical ones are the Kaiser #' rule, the parallel analysis, and the usual scree test #' (\code{\link{plotuScree}}). Non graphical solutions to the Cattell #' subjective scree test are also proposed: an acceleration factor (\emph{af}) #' and the optimal coordinates index \emph{oc}. The acceleration factor #' indicates where the elbow of the scree plot appears. It corresponds to the #' acceleration of the curve, i.e. the second derivative. The optimal #' coordinates are the extrapolated coordinates of the previous eigenvalue that #' allow the observed eigenvalue to go beyond this extrapolation. The #' extrapolation is made by a linear regression using the last eigenvalue #' coordinates and the \eqn{k+1} eigenvalue coordinates. There are \eqn{k-2} #' regression lines like this. The Kaiser rule or a parallel analysis #' criterion (\code{\link{parallel}}) must also be simultaneously satisfied to #' retain the components/factors, whether for the acceleration factor, or for #' the optimal coordinates. #' #' If \eqn{\lambda_i} is the \eqn{i^{th}} eigenvalue, and \eqn{LS_i} is a #' location statistics like the mean or a centile (generally the followings: #' \eqn{1^{st}, \ 5^{th}, \ 95^{th}, \ or \ 99^{th}}). #' #' The Kaiser rule is computed as: \deqn{ n_{Kaiser} = \sum_{i} (\lambda_{i} #' \ge \bar{\lambda}).} Note that \eqn{\bar{\lambda}} is equal to 1 when a #' correlation matrix is used. #' #' The parallel analysis is computed as: \deqn{n_{parallel} = \sum_{i} #' (\lambda_{i} \ge LS_i).} #' #' The acceleration factor (\eqn{AF}) corresponds to a numerical solution to #' the elbow of the scree plot: \deqn{n_{AF} \equiv \ If \ \left[ (\lambda_{i} #' \ge LS_i) \ and \ max(AF_i) \right].} #' #' The optimal coordinates (\eqn{OC}) corresponds to an extrapolation of the #' preceeding eigenvalue by a regression line between the eigenvalue #' coordinates and the last eigenvalue coordinates: \deqn{n_{OC} = \sum_i #' \left[(\lambda_i \ge LS_i) \cap (\lambda_i \ge (\lambda_{i \ predicted}) #' \right].} #' #' #' @param eig depreciated parameter (use x instead): eigenvalues to analyse #' @param x numeric: a \code{vector} of eigenvalues, a \code{matrix} of #' correlations or of covariances or a \code{data.frame} of data #' @param aparallel numeric: results of a parallel analysis. Defaults #' eigenvalues fixed at \eqn{\lambda >= \bar{\lambda}} (Kaiser and related #' rule) or \eqn{\lambda >= 0} (CFA analysis) #' @param cor logical: if \code{TRUE} computes eigenvalues from a correlation #' matrix, else from a covariance matrix #' @param model character: \code{"components"} or \code{"factors"} #' @param criteria numeric: by default fixed at \eqn{\bar{\lambda}}. When the #' \eqn{\lambda}s are computed from a principal component analysis on a #' correlation matrix, it corresponds to the usual Kaiser \eqn{\lambda >= 1} #' rule. On a covariance matrix or from a factor analysis, it is simply the #' mean. To apply \eqn{\lambda >= 0}, sometimes used with factor analysis, fix #' the criteria to \eqn{0}. #' @param ... variabe: additionnal parameters to give to the \code{cor} or #' \code{cov} functions #' #' #' @return #' \item{Components }{ Data frame for the number of components/factors #' according to different rules } \item{Components$noc }{ Number of #' components/factors to retain according to optimal coordinates \emph{oc}} #' \item{Components$naf }{ Number of components/factors to retain according to #' the acceleration factor \emph{af}} \item{Components$npar.analysis }{Number #' of components/factors to retain according to parallel analysis } #' \item{Components$nkaiser }{ Number of components/factors to retain according #' to the Kaiser rule } \item{Analysis }{ Data frame of vectors linked to the #' different rules } \item{Analysis$Eigenvalues }{ Eigenvalues } #' \item{Analysis$Prop }{ Proportion of variance accounted by eigenvalues } #' \item{Analysis$Cumu }{ Cumulative proportion of variance accounted by #' eigenvalues } \item{Analysis$Par.Analysis }{ Centiles of the random #' eigenvalues generated by the parallel analysis. } \item{Analysis$Pred.eig }{ #' Predicted eigenvalues by each optimal coordinate regression line } #' \item{Analysis$OC}{ Critical optimal coordinates \emph{oc}} #' \item{Analysis$Acc.factor }{ Acceleration factor \emph{af}} #' \item{Analysis$AF}{ Critical acceleration factor \emph{af}} Otherwise, #' returns a summary of the analysis. #' #' @author Gilles Raiche \cr Centre sur les Applications des Modeles de #' Reponses aux Items (CAMRI) \cr Universite du Quebec a Montreal\cr #' \email{raiche.gilles@@uqam.ca} #' #' @seealso \code{\link{plotuScree}}, \code{\link{plotnScree}}, #' \code{\link{parallel}}, \code{\link{plotParallel}}, #' #' @references #' Cattell, R. B. (1966). The scree test for the number of factors. #' \emph{Multivariate Behavioral Research, 1}, 245-276. #' #' Dinno, A. (2009). \emph{Gently clarifying the application of Horn's parallel #' analysis to principal component analysis versus factor analysis}. Portland, #' Oregon: Portland Sate University. #' #' Guttman, L. (1954). Some necessary conditions for common factor analysis. #' \emph{Psychometrika, 19, 149-162}. #' #' Horn, J. L. (1965). A rationale for the number of factors in factor #' analysis. \emph{Psychometrika, 30}, 179-185. #' #' Kaiser, H. F. (1960). The application of electronic computer to factor #' analysis. \emph{Educational and Psychological Measurement, 20}, 141-151. #' #' Raiche, G., Walls, T. A., Magis, D., Riopel, M. and Blais, J.-G. (2013). Non-graphical solutions #' for Cattell's scree test. Methodology, 9(1), 23-29. #' # #' @family nScree #' @export #' @importFrom stats lm coef #' @keywords multivariate #' @examples #' #' ## INITIALISATION #' data(dFactors) # Load the nFactors dataset #' attach(dFactors) #' vect <- Raiche # Uses the example from Raiche #' eigenvalues <- vect$eigenvalues # Extracts the observed eigenvalues #' nsubjects <- vect$nsubjects # Extracts the number of subjects #' variables <- length(eigenvalues) # Computes the number of variables #' rep <- 100 # Number of replications for PA analysis #' cent <- 0.95 # Centile value of PA analysis #' #' ## PARALLEL ANALYSIS (qevpea for the centile criterion, mevpea for the #' ## mean criterion) #' aparallel <- parallel(var = variables, #' subject = nsubjects, #' rep = rep, #' cent = cent #' )$eigen$qevpea # The 95 centile #' #' ## NUMBER OF FACTORS RETAINED ACCORDING TO DIFFERENT RULES #' results <- nScree(x=eigenvalues, aparallel=aparallel) #' results #' summary(results) #' #' ## PLOT ACCORDING TO THE nScree CLASS #' plotnScree(results) #' "nScree" <- function(eig=NULL, x=eig, aparallel = NULL, cor=TRUE, model="components", criteria=NULL, ...) { # Initialisation eig <- eigenComputes(x, cor=cor, model=model, ...) if (is.null(aparallel)) aparallel <- rep(1,length(eig)) # default to 1 in the diagonal nk <- length(eig) k <- 1:nk proportion <- eig/sum(eig) cumulative <- proportion if (is.null(criteria)) criteria <- mean(eig) for (i in 2:nk) cumulative[i] = cumulative[i-1] + proportion[i] proportion[proportion < 0] <- 0# To constraint negative proportions to be zero cond1 <- TRUE; cond2 <- TRUE; i <- 0; pred.eig <- af <- rep(NA,nk) while ((cond1 == TRUE) && (cond2 == TRUE) && (i < nk)) { i <- i + 1 ind <- k[c(i+1,nk)] #### Optimal coordinate based on the next eigenvalue regression (scree) vp.p <- lm(eig[c(i+1,nk)] ~ ind) vp.prec <- pred.eig[i] <- sum(c(1,i)* coef(vp.p)) cond1 <- (eig[i] >= vp.prec) cond2 <- (eig[i] >= aparallel[i]) nc <- i-1 } # Second derivative at the i eigenvalue (acceleration factor, elbow) # See Yakowitz and Szidarovszky (1986, p. 84) tag <- 1 for (j in 2:(nk-1)) { if (eig[j-1] >= aparallel[j-1]) { af[j] <- (eig[j+1] -2* eig[j]) + eig[j-1] } } if (model == "components") p.vec <- which(eig >= aparallel,TRUE) else p.vec <- which((eig-aparallel)>=0 & eig >= criteria) ###if (model == "components") p.vec <- which(eig >= aparallel,TRUE) else p.vec <- which((eig-aparallel)>=0 & eig > 0) npar <- sum(p.vec == (1:length(p.vec))) nkaiser <- sum(eig >= rep(criteria,nk)) #### if (model == "components") nkaiser <- sum(eig >= rep(criteria,nk)) else nkaiser <- sum(eig >= rep(0,nk)) #### if (model == "components") nkaiser <- sum(eig >= rep(1,nk)) else nkaiser <- sum(eig >= rep(mean(eig),nk)) naf <- which(af == max(af,na.rm=TRUE),TRUE) - 1 # Assure that all the optimal coordinates will be computed for (i in (nc+1):(nk-2)) { ind <- k[c(i+1,nk)] vp.p <- lm(eig[c(i+1,nk)] ~ ind) vp.prec <- pred.eig[i] <- sum(c(1,i)* coef(vp.p)) } # Assure that all the acceleration factors will be computed for (j in 2:(nk-1)) af[j] <- (eig[j+1] - 2 * eig[j]) + eig[j-1] # Return values by the function coc <- rep("",nk); coc[nc] = "(< OC)" caf <- rep("",nk); caf[naf] = "(< AF)" result <- (list(Components = data.frame(noc = nc, naf = naf, nparallel = npar, nkaiser = nkaiser), Analysis = data.frame(Eigenvalues = eig, Prop = proportion, Cumu = cumulative, Par.Analysis = aparallel, Pred.eig = pred.eig, OC = coc, Acc.factor = af, AF = caf), Model = model)) class(result) <- 'nScree' return(result) }