Summary

EFA

#Preparation ## Loading Required Packages

library(png)
library(psych)
library(EFA.dimensions)
library(imager) #install XQuartz
## Loading required package: magrittr
## 
## Attaching package: 'imager'
## The following object is masked from 'package:magrittr':
## 
##     add
## The following objects are masked from 'package:stats':
## 
##     convolve, spectrum
## The following object is masked from 'package:graphics':
## 
##     frame
## The following object is masked from 'package:base':
## 
##     save.image
library(corrplot)
## corrplot 0.92 loaded
library(knitr)
library(kableExtra) #choose “no” when installing
library(xtable)
## 
## Attaching package: 'xtable'
## The following objects are masked from 'package:imager':
## 
##     display, label
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:kableExtra':
## 
##     group_rows
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tibble)
library(ggplot2)
## 
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
## 
##     %+%, alpha

Loading Required Files

participantResponseFiles <- list.files(path= "./03_EFA/data",pattern = "\\.csv$") #names correspond to images, one participant per row, one word per column
imageFiles <- list.files(path= "./03_EFA/images",pattern = "\\.png$")

Helper Methods

#Clean the column names
cleanColnames <- function(data){
  newNames <- gsub("^.+?\\.(.+?)\\..*$", "\\1", colnames(data))
  return(newNames)
}

Methods Used in the Analysis

Methods to Test the Appropriateness of the Data for EFA

Correlation

“A subjective method is to examine the correlation matrix. A sizable number of correlations should exceed ±.30 or EFA may be inappropriate”

correlation <- function(num,data){
  return(cor(data))
}

Bartlett’s test of sphericity

An objective test of the factorability of the correlation matrix is Bartlett’s (1954) test of sphericity, which statistically tests the hypothesis that the correlation matrix contains ones on the diagonal and zeros on the off-diagonals. Hence, that it was generated by random data. This test should produce a statistically significant chi-square value to justify the application of EFA.

If the p-value from Bartlett’s Test of Sphericity is lower than our chosen significance level (common choices are 0.10, 0.05, and 0.01), then our dataset is suitable for a data reduction technique. (https://www.statology.org/bartletts-test-of-sphericity/)

bartlettTest <- function(num,data){
  bart <- cortest.bartlett(correlation(num,data), n = nrow(data))
  if(bart[2]>0.05) cat("WARNING the p value is above 0.05") else cat("The p value is below 0.05. We are good to continue.")
  cat("\n\n")
  print(bart)
  return(bart)
}

KMO

Large sample sizes make the Bartlett test sensitive to even trivial deviations from randomness, so its results should be supplemented with a measure of sampling adequacy. The Kaiser-Meyer-Olkin (KMO; Kaiser, 1974) measure of sampling adequacy is the ratio of correlations and partial correlations that reflects the extent to which correlations are a function of the variance shared across all variables rather than the variance shared by particular pairs of variables. KMO values range from 0.00 to 1.00 and can be computed for the total correlation matrix as well as for each measured variable.

  • KMO values ≥.70 are desired
  • KMO values ≤.50 are generally considered unacceptable
KMOTest <- function(num,data){
  kmo <- KMO(data)
  cat(paste("The overall measure of sampling adequacy is: ",kmo[1]))
   cat("\n\n")
  if(kmo[1]<.7) cat("WARNING the sampling adequacy has dropped below 0.7") else cat("The sampling adequacy is above 0.7. We are generally good.")
  cat("\n\n")
  return(kmo)
}

The Number of Factors to Retain

Measurement specialists have conducted simulation studies and concluded that parallel analysis and MAP are the most accurate empirical estimates of the number of factors to retain and that scree is a useful subjective adjunct to the empirical estimates. Unfortunately, no method has been found to be correct in all situations, so it is necessary to employ multiple methods and carefully judge each plausible solution to identify the most appropriate factor solution.

parallelAnalysis <- function(num,data){
  
  cairo_pdf(paste(paste("results/ScreePlot-Image_",num,sep=""),'.pdf'), width=8, height=4)
  parallel <- fa.parallel(correlation(num,data), n.obs=nrow(data), fa="fa", n.iter=100, main="Scree plots with parallel analysis")
  nfact <- parallel$nfact
  
  dev.off() 
  cat("\n\n")
  return(nfact)
}

EFA

Model of Factor Analysis

  • two models: PCA, common factor analysis
  • When the goal of research is to identify latent constructs for theory building or to create measurement instruments in which the researcher wishes to make the case that the resulting measurement instrument reflects a meaningful underlying construct, we argue that common factor analysis (EFA) procedures are usually preferable.
  • this distinction may make little difference when there are ≥40 measured variables

Estimation Method

  • two estimation methods: ML and PA
  • Statistical simulations have found that PA outperforms ML when the relationships between measured variables and factors are relatively weak (≤.40), sample size is relatively small (≤300), multivariate normality is violated, or when the number of factors underlying the measured variables is misspecified
EFA <- function(num, factor, rotation,data){
  efa <- fa(correlation(num,data), nfactors = factor, rotate = rotation, fm = "pa")
  
  #print(xtable(unclass(efa$Structure)),type="html")
  print(efa,sort=TRUE)
  #fa.diagram(efa,cut=.4,digits=2) #I don't fint this diagram particularly useful
  return(efa)
}

Analyzing Participant Responses Per Image

analyze_image <-function(num){
  #First we plot the image that we are analyzing first

  image <- load.image(paste("03_EFA/images/",imageFiles[[num]],sep=""))
  plot(image)
  cat("\n\n")
  
  data <- read.csv(paste("03_EFA/data/",participantResponseFiles[[num]],sep=""), encoding="UTF-8")
  colnames(data) <- cleanColnames(data)

   #Then we go through the analysis steps. These are explained in detail above
   #1. Correlation
  # cat("### Correlation\n")
  # corr <- correlation(num,data)
  # pdf(paste(paste("generatedPlots-EFA/CorrelationMatrix-Image_",num,sep=""),'.pdf'), width=8, height=4)
  #   corrplot(corr, method="square",tl.col="black",title=paste("Correlation for Image ",num),number.cex = 0.5)
  # dev.off()
  # cat("\n\n")
  # 
  # cat("### Bartlett’s test of sphericity\n")
  # bartlettTest(num,data)
  # cat("\n\n")
  # 
  # cat("### KMO\n")
  # KMOTest(num,data)
  # cat("\n\n")
 
  cat("## Scree Plot and Parallel Analysis\n")
  nfact <- parallelAnalysis(num,data) #number of factors
  cat("\n\n")

  cat("## Exploratory Factor Analysis - 1 Factor - No Rotation\n")
  efa_1factor <- EFA(num, 1, "none",data)
  cat("\n\n")

  #Exploratory Analyses below here
  cat("## Exploratory Factor Analysis - 2 Factors - Varimax Rotation(Orthogonal rotation)\n")
  efa_2factors_varimax <- EFA(num, 2, "varimax",data )
  cat("\n\n")

  cat("## Exploratory Factor Analysis - 2 Factors - Promax Rotation(Pblique rotation)\n")
  efa_2factors_promax <- EFA(num, 2, "promax",data )
  cat("\n\n")
  
  efa <- list(efa_1factor, efa_2factors_varimax, efa_2factors_promax, nfact)
  
  return(efa)
}

EFA Results for all images

imageCount <- length(participantResponseFiles)
# For debugging we can set the imageCount to whatever we want
#imageCount <- 1

df <- NULL

#number of factors suggested by parallel analysis
list_nfactor <- NULL
df_nfactor <- data.frame(matrix(ncol = 15, nrow = 0))
list_nfactor_column_name <- NULL

for (i in 1:imageCount){
  list_df_nfactor_column_name <- c(list_nfactor_column_name, paste("image", i))
  
   cat(paste(paste("## Image ",i),"\n"))
   
   #efa_1factor
   efa_1factor <- analyze_image(i)[[1]] #we want to create a big table with all the factor loadings so we'll save the efa results here
   data <- NULL
   loadings <- as.data.frame(unclass(efa_1factor$loadings))
   h2 <- efa_1factor$communality
   u2 <- efa_1factor$uniquenesses
   com <- efa_1factor$complexity
   data <- cbind(loadings, h2, u2, com)
   data <- tibble::rownames_to_column(data,"terms")
   data <- data %>% 
 mutate_if(is.numeric, round, digits=2)
   data <- data %>% mutate_at(vars(com), funs(round(., 1)))

   write.table(data, paste("results/efa_1factor_image",i,".tsv",sep=""),row.names=FALSE,sep='\t') #create factor loading table
   
   #efa_2factors_varimax
   efa_2factors_varimax <- analyze_image(i)[[2]]
    data <- NULL
   loadings <- as.data.frame(unclass(efa_2factors_varimax$loadings))
   h2 <- efa_2factors_varimax$communality
   u2 <- efa_2factors_varimax$uniquenesses
   com <- efa_2factors_varimax$complexity
   data <- cbind(loadings, h2, u2, com)
   data <- tibble::rownames_to_column(data,"terms")
   data <- data %>% 
 mutate_if(is.numeric, round, digits=2)
   data <- data %>% mutate_at(vars(com), funs(round(., 1)))

   write.table(data, paste("results/efa_2factors_varimax_image",i,".tsv",sep=""),row.names=FALSE,sep='\t')
   
   #efa_2factors_promax
   efa_2factors_promax <- analyze_image(i)[[3]]
    data <- NULL
   loadings <- as.data.frame(unclass(efa_2factors_promax$loadings))
   h2 <- efa_2factors_promax$communality
   u2 <- efa_2factors_promax$uniquenesses
   com <- efa_2factors_promax$complexity
   data <- cbind(loadings, h2, u2, com)
   data <- tibble::rownames_to_column(data,"terms")
   data <- data %>% 
 mutate_if(is.numeric, round, digits=2)
   data <- data %>% mutate_at(vars(com), funs(round(., 1)))

   write.table(data, paste("results/efa_2factors_promax_image",i,".tsv",sep=""),row.names=FALSE,sep='\t')
   
   #number of factors
   nfact <- analyze_image(i)[[4]]
   list_nfactor <- c(list_nfactor, nfact)
   
   
   efa_1factor <- analyze_image(i)[[1]]
   if(i == 1){
      df <- as.data.frame(unclass(efa_1factor$loadings))
      colnames(df) <- c(paste("PA1 Image ",i))
      df <- tibble::rownames_to_column(df,"terms")
      
   }
   else{
      dftemp <- as.data.frame(unclass(efa_1factor$loadings))
      colnames(dftemp) <- c(paste("PA1 Image ",i))
      dftemp <- tibble::rownames_to_column(dftemp,"terms")
      df <- merge(df,dftemp,by="terms")
   }
}
## ## Image  1
## 
## 
## ## Scree Plot and Parallel Analysis
## Parallel analysis suggests that the number of factors =  2  and the number of components =  NA 
## 
## 
## 
## 
## ## Exploratory Factor Analysis - 1 Factor - No Rotation
## Factor Analysis using method =  pa
## Call: fa(r = correlation(num, data), nfactors = factor, rotate = rotation, 
##     fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
##                  V  PA1    h2   u2 com
## likable         19 0.91 0.820 0.18   1
## nice            22 0.90 0.818 0.18   1
## enjoyable       13 0.87 0.764 0.24   1
## delightful      10 0.86 0.731 0.27   1
## pleasing        24 0.85 0.721 0.28   1
## appealing        1 0.85 0.719 0.28   1
## pretty          25 0.85 0.716 0.28   1
## lovely          20 0.85 0.716 0.28   1
## beautiful        5 0.84 0.707 0.29   1
## attractive       3 0.84 0.707 0.29   1
## elegant         11 0.83 0.696 0.30   1
## inviting        18 0.83 0.694 0.31   1
## exciting        14 0.79 0.625 0.38   1
## engaging        12 0.79 0.624 0.38   1
## harmonious      16 0.79 0.621 0.38   1
## tasteful        30 0.78 0.615 0.38   1
## satisfying      28 0.77 0.597 0.40   1
## wellDesigned    31 0.76 0.578 0.42   1
## motivating      21 0.74 0.549 0.45   1
## clean            6 0.73 0.527 0.47   1
## interesting     17 0.70 0.495 0.51   1
## balanced         4 0.69 0.480 0.52   1
## sophisticated   29 0.68 0.467 0.53   1
## fascinating     15 0.68 0.458 0.54   1
## colorHarmonious  8 0.65 0.427 0.57   1
## professional    26 0.63 0.400 0.60   1
## organized       23 0.59 0.348 0.65   1
## creative         9 0.53 0.284 0.72   1
## artistic         2 0.52 0.268 0.73   1
## cluttered        7 0.30 0.093 0.91   1
## provoking       27 0.17 0.029 0.97   1
## 
##                  PA1
## SS loadings    17.29
## Proportion Var  0.56
## 
## Mean item complexity =  1
## Test of the hypothesis that 1 factor is sufficient.
## 
## The degrees of freedom for the null model are  465  and the objective function was  29.91
## The degrees of freedom for the model are 434  and the objective function was  5.64 
## 
## The root mean square of the residuals (RMSR) is  0.06 
## The df corrected root mean square of the residuals is  0.06 
## 
## Fit based upon off diagonal values = 0.99
## Measures of factor score adequacy             
##                                                    PA1
## Correlation of (regression) scores with factors   0.99
## Multiple R square of scores with factors          0.98
## Minimum correlation of possible factor scores     0.96
## 
## 
## ## Exploratory Factor Analysis - 2 Factors - Varimax Rotation(Orthogonal rotation)
## Factor Analysis using method =  pa
## Call: fa(r = correlation(num, data), nfactors = factor, rotate = rotation, 
##     fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
##                 item   PA1  PA2    h2   u2 com
## clean              6  0.79 0.20 0.671 0.33 1.1
## balanced           4  0.78 0.17 0.630 0.37 1.1
## organized         23  0.72 0.07 0.530 0.47 1.0
## wellDesigned      31  0.69 0.37 0.609 0.39 1.5
## harmonious        16  0.69 0.41 0.640 0.36 1.6
## nice              22  0.68 0.59 0.817 0.18 2.0
## elegant           11  0.68 0.48 0.702 0.30 1.8
## inviting          18  0.68 0.48 0.700 0.30 1.8
## professional      26  0.67 0.20 0.485 0.52 1.2
## likable           19  0.66 0.62 0.820 0.18 2.0
## delightful        10  0.64 0.56 0.730 0.27 2.0
## lovely            20  0.63 0.57 0.714 0.29 2.0
## tasteful          30  0.61 0.49 0.615 0.39 1.9
## appealing          1  0.61 0.59 0.720 0.28 2.0
## pleasing          24  0.61 0.60 0.723 0.28 2.0
## motivating        21  0.60 0.43 0.553 0.45 1.8
## beautiful          5  0.60 0.59 0.710 0.29 2.0
## attractive         3  0.60 0.59 0.710 0.29 2.0
## engaging          12  0.58 0.54 0.624 0.38 2.0
## sophisticated     29  0.55 0.41 0.468 0.53 1.9
## colorHarmonious    8  0.52 0.40 0.428 0.57 1.9
## cluttered          7  0.35 0.05 0.129 0.87 1.0
## interesting       17  0.28 0.76 0.655 0.34 1.3
## fascinating       15  0.30 0.69 0.568 0.43 1.4
## exciting          14  0.47 0.67 0.669 0.33 1.8
## enjoyable         13  0.61 0.63 0.769 0.23 2.0
## creative           9  0.17 0.62 0.416 0.58 1.2
## artistic           2  0.16 0.61 0.399 0.60 1.1
## pretty            25  0.60 0.60 0.720 0.28 2.0
## satisfying        28  0.54 0.55 0.600 0.40 2.0
## provoking         27 -0.01 0.27 0.073 0.93 1.0
## 
##                         PA1  PA2
## SS loadings           10.49 8.10
## Proportion Var         0.34 0.26
## Cumulative Var         0.34 0.60
## Proportion Explained   0.56 0.44
## Cumulative Proportion  0.56 1.00
## 
## Mean item complexity =  1.7
## Test of the hypothesis that 2 factors are sufficient.
## 
## The degrees of freedom for the null model are  465  and the objective function was  29.91
## The degrees of freedom for the model are 404  and the objective function was  4.46 
## 
## The root mean square of the residuals (RMSR) is  0.04 
## The df corrected root mean square of the residuals is  0.04 
## 
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    PA1  PA2
## Correlation of (regression) scores with factors   0.94 0.92
## Multiple R square of scores with factors          0.88 0.85
## Minimum correlation of possible factor scores     0.77 0.70
## 
## 
## ## Exploratory Factor Analysis - 2 Factors - Promax Rotation(Pblique rotation)
## Loading required namespace: GPArotation
## Factor Analysis using method =  pa
## Call: fa(r = correlation(num, data), nfactors = factor, rotate = rotation, 
##     fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
##                 item   PA1   PA2    h2   u2 com
## clean              6  0.95 -0.19 0.671 0.33 1.1
## balanced           4  0.94 -0.22 0.630 0.37 1.1
## organized         23  0.93 -0.32 0.530 0.47 1.2
## professional      26  0.78 -0.12 0.485 0.52 1.0
## wellDesigned      31  0.71  0.09 0.609 0.39 1.0
## harmonious        16  0.68  0.15 0.640 0.36 1.1
## elegant           11  0.64  0.25 0.702 0.30 1.3
## inviting          18  0.63  0.25 0.700 0.30 1.3
## nice              22  0.58  0.39 0.817 0.18 1.8
## motivating        21  0.56  0.23 0.553 0.45 1.3
## delightful        10  0.53  0.38 0.730 0.27 1.8
## tasteful          30  0.53  0.31 0.615 0.39 1.6
## likable           19  0.53  0.44 0.820 0.18 1.9
## lovely            20  0.51  0.39 0.714 0.29 1.9
## sophisticated     29  0.49  0.24 0.468 0.53 1.4
## appealing          1  0.47  0.44 0.720 0.28 2.0
## pleasing          24  0.47  0.45 0.723 0.28 2.0
## colorHarmonious    8  0.47  0.23 0.428 0.57 1.5
## engaging          12  0.46  0.39 0.624 0.38 1.9
## beautiful          5  0.46  0.45 0.710 0.29 2.0
## attractive         3  0.46  0.45 0.710 0.29 2.0
## cluttered          7  0.44 -0.13 0.129 0.87 1.2
## interesting       17 -0.06  0.85 0.655 0.34 1.0
## fascinating       15  0.01  0.74 0.568 0.43 1.0
## creative           9 -0.12  0.73 0.416 0.58 1.1
## artistic           2 -0.13  0.72 0.399 0.60 1.1
## exciting          14  0.24  0.63 0.669 0.33 1.3
## enjoyable         13  0.46  0.49 0.769 0.23 2.0
## pretty            25  0.45  0.46 0.720 0.28 2.0
## satisfying        28  0.41  0.42 0.600 0.40 2.0
## provoking         27 -0.17  0.37 0.073 0.93 1.4
## 
##                         PA1  PA2
## SS loadings           11.05 7.54
## Proportion Var         0.36 0.24
## Cumulative Var         0.36 0.60
## Proportion Explained   0.59 0.41
## Cumulative Proportion  0.59 1.00
## 
##  With factor correlations of 
##      PA1  PA2
## PA1 1.00 0.73
## PA2 0.73 1.00
## 
## Mean item complexity =  1.5
## Test of the hypothesis that 2 factors are sufficient.
## 
## The degrees of freedom for the null model are  465  and the objective function was  29.91
## The degrees of freedom for the model are 404  and the objective function was  4.46 
## 
## The root mean square of the residuals (RMSR) is  0.04 
## The df corrected root mean square of the residuals is  0.04 
## 
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    PA1  PA2
## Correlation of (regression) scores with factors   0.98 0.97
## Multiple R square of scores with factors          0.96 0.94
## Minimum correlation of possible factor scores     0.92 0.87
## Warning: `funs()` was deprecated in dplyr 0.8.0.
## Please use a list of either functions or lambdas: 
## 
##   # Simple named list: 
##   list(mean = mean, median = median)
## 
##   # Auto named with `tibble::lst()`: 
##   tibble::lst(mean, median)
## 
##   # Using lambdas
##   list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.