Content - 63a4ecd8cfff3c27d6d4575481a013128b2501a6 - 81f2681/oneclick.Rmd

visit type:
Tip revision: 1765709ea126a8b54d672fc10ddad29b3cbce06c authored by Tobias Isenberg on 12 July 2024, 08:03:01 UTC
Update Readme.md
Tip revision: 1765709
oneclick.Rmd
---
title: "oneclick"
author: "Tingying He, Petra Isenberg"
output: html_document
date: '2022-07-07'
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

# Summary


## EFA

#Preparation
## Loading Required Packages
```{r}
library(png)
library(psych)
library(EFA.dimensions)
library(imager) #install XQuartz
library(corrplot)
library(knitr)
library(kableExtra) #choose “no” when installing
library(xtable)
library(dplyr)
library(tibble)
library(ggplot2)
```

## Loading Required Files
```{r}
participantResponseFiles <- list.files(path= "./03_EFA/data",pattern = "\\.csv$") #names correspond to images, one participant per row, one word per column
imageFiles <- list.files(path= "./03_EFA/images",pattern = "\\.png$")
```

# Helper Methods
```{r}
#Clean the column names
cleanColnames <- function(data){
  newNames <- gsub("^.+?\\.(.+?)\\..*$", "\\1", colnames(data))
  return(newNames)
}
```

# Methods Used in the Analysis
## Methods to Test the Appropriateness of the Data for EFA

### Correlation
"A subjective method is to examine the correlation matrix. A sizable number of correlations should exceed ±.30 or EFA may be inappropriate"

```{r}
correlation <- function(num,data){
  return(cor(data))
}
```

### Bartlett’s test of sphericity
An objective test of the factorability of the correlation matrix is Bartlett’s (1954) test of sphericity, which statistically tests the hypothesis that the correlation matrix contains ones on the diagonal and zeros on the off-diagonals. Hence, that it was generated by random data. This test should produce a statistically significant chi-square value to justify the application of EFA.

If the p-value from Bartlett’s Test of Sphericity is lower than our chosen significance level (common choices are 0.10, 0.05, and 0.01), then our dataset is suitable for a data reduction technique. (https://www.statology.org/bartletts-test-of-sphericity/)


```{r}
bartlettTest <- function(num,data){
  bart <- cortest.bartlett(correlation(num,data), n = nrow(data))
  if(bart[2]>0.05) cat("WARNING the p value is above 0.05") else cat("The p value is below 0.05. We are good to continue.")
  cat("\n\n")
  print(bart)
  return(bart)
}
```

### KMO
Large sample sizes make the Bartlett test sensitive to even trivial deviations from randomness, so its results should be supplemented with a measure of sampling adequacy. The Kaiser-Meyer-Olkin (KMO; Kaiser, 1974) measure of sampling adequacy is the ratio of correlations and partial correlations that reflects the extent to which correlations are a function of the variance shared across all variables rather than the variance shared by particular pairs of variables. KMO values range from 0.00 to 1.00 and can be computed for the total correlation matrix as well as for each measured variable. 

* KMO values ≥.70 are desired
* KMO values ≤.50 are generally considered unacceptable
```{r}
KMOTest <- function(num,data){
  kmo <- KMO(data)
  cat(paste("The overall measure of sampling adequacy is: ",kmo[1]))
   cat("\n\n")
  if(kmo[1]<.7) cat("WARNING the sampling adequacy has dropped below 0.7") else cat("The sampling adequacy is above 0.7. We are generally good.")
  cat("\n\n")
  return(kmo)
}
```

### The Number of Factors to Retain
Measurement specialists have conducted simulation studies and concluded that parallel analysis and MAP are the most accurate empirical estimates of the number of factors to retain and that scree is a useful subjective adjunct to the empirical estimates. Unfortunately, no method has been found to be correct in all situations, so it is necessary to employ multiple methods and carefully judge each plausible solution to identify the most appropriate factor solution. 

```{r}
parallelAnalysis <- function(num,data){
  
  cairo_pdf(paste(paste("results/ScreePlot-Image_",num,sep=""),'.pdf'), width=8, height=4)
  parallel <- fa.parallel(correlation(num,data), n.obs=nrow(data), fa="fa", n.iter=100, main="Scree plots with parallel analysis")
  nfact <- parallel$nfact
  
  dev.off() 
  cat("\n\n")
  return(nfact)
}
```

### EFA

#### Model of Factor Analysis
* two models: PCA, common factor analysis
* When the goal of research is to identify latent constructs for theory building or to create measurement instruments in which the researcher wishes to make the case that the resulting measurement instrument reflects a meaningful underlying construct, we argue that common factor analysis (EFA) procedures are usually preferable.
* this distinction may make little difference when there are ≥40 measured variables

#### Estimation Method
* two estimation methods: ML and PA
* Statistical simulations have found that PA outperforms ML when the relationships between measured variables and factors are relatively weak (≤.40), sample size is relatively small (≤300), multivariate normality is violated, or when the number of factors underlying the measured variables is misspecified

```{r}
EFA <- function(num, factor, rotation,data){
  efa <- fa(correlation(num,data), nfactors = factor, rotate = rotation, fm = "pa")
  
  #print(xtable(unclass(efa$Structure)),type="html")
  print(efa,sort=TRUE)
  #fa.diagram(efa,cut=.4,digits=2) #I don't fint this diagram particularly useful
  return(efa)
}
```

# Analyzing Participant Responses Per Image


```{r,results='asis'}
analyze_image <-function(num){
  #First we plot the image that we are analyzing first

  image <- load.image(paste("03_EFA/images/",imageFiles[[num]],sep=""))
  plot(image)
  cat("\n\n")
  
  data <- read.csv(paste("03_EFA/data/",participantResponseFiles[[num]],sep=""), encoding="UTF-8")
  colnames(data) <- cleanColnames(data)

   #Then we go through the analysis steps. These are explained in detail above
   #1. Correlation
  # cat("### Correlation\n")
  # corr <- correlation(num,data)
  # pdf(paste(paste("generatedPlots-EFA/CorrelationMatrix-Image_",num,sep=""),'.pdf'), width=8, height=4)
  #   corrplot(corr, method="square",tl.col="black",title=paste("Correlation for Image ",num),number.cex = 0.5)
  # dev.off()
  # cat("\n\n")
  # 
  # cat("### Bartlett’s test of sphericity\n")
  # bartlettTest(num,data)
  # cat("\n\n")
  # 
  # cat("### KMO\n")
  # KMOTest(num,data)
  # cat("\n\n")
 
  cat("## Scree Plot and Parallel Analysis\n")
  nfact <- parallelAnalysis(num,data) #number of factors
  cat("\n\n")

  cat("## Exploratory Factor Analysis - 1 Factor - No Rotation\n")
  efa_1factor <- EFA(num, 1, "none",data)
  cat("\n\n")

  #Exploratory Analyses below here
  cat("## Exploratory Factor Analysis - 2 Factors - Varimax Rotation(Orthogonal rotation)\n")
  efa_2factors_varimax <- EFA(num, 2, "varimax",data )
  cat("\n\n")

  cat("## Exploratory Factor Analysis - 2 Factors - Promax Rotation(Pblique rotation)\n")
  efa_2factors_promax <- EFA(num, 2, "promax",data )
  cat("\n\n")
  
  efa <- list(efa_1factor, efa_2factors_varimax, efa_2factors_promax, nfact)
  
  return(efa)
}

```

### EFA Results for all images

```{r,results='markup',fig.keep='all'}
imageCount <- length(participantResponseFiles)
# For debugging we can set the imageCount to whatever we want
#imageCount <- 1

df <- NULL

#number of factors suggested by parallel analysis
list_nfactor <- NULL
df_nfactor <- data.frame(matrix(ncol = 15, nrow = 0))
list_nfactor_column_name <- NULL

for (i in 1:imageCount){
  list_df_nfactor_column_name <- c(list_nfactor_column_name, paste("image", i))
  
   cat(paste(paste("## Image ",i),"\n"))
   
   #efa_1factor
   efa_1factor <- analyze_image(i)[[1]] #we want to create a big table with all the factor loadings so we'll save the efa results here
   data <- NULL
   loadings <- as.data.frame(unclass(efa_1factor$loadings))
   h2 <- efa_1factor$communality
   u2 <- efa_1factor$uniquenesses
   com <- efa_1factor$complexity
   data <- cbind(loadings, h2, u2, com)
   data <- tibble::rownames_to_column(data,"terms")
   data <- data %>% 
 mutate_if(is.numeric, round, digits=2)
   data <- data %>% mutate_at(vars(com), funs(round(., 1)))

   write.table(data, paste("results/efa_1factor_image",i,".tsv",sep=""),row.names=FALSE,sep='\t') #create factor loading table
   
   #efa_2factors_varimax
   efa_2factors_varimax <- analyze_image(i)[[2]]
    data <- NULL
   loadings <- as.data.frame(unclass(efa_2factors_varimax$loadings))
   h2 <- efa_2factors_varimax$communality
   u2 <- efa_2factors_varimax$uniquenesses
   com <- efa_2factors_varimax$complexity
   data <- cbind(loadings, h2, u2, com)
   data <- tibble::rownames_to_column(data,"terms")
   data <- data %>% 
 mutate_if(is.numeric, round, digits=2)
   data <- data %>% mutate_at(vars(com), funs(round(., 1)))

   write.table(data, paste("results/efa_2factors_varimax_image",i,".tsv",sep=""),row.names=FALSE,sep='\t')
   
   #efa_2factors_promax
   efa_2factors_promax <- analyze_image(i)[[3]]
    data <- NULL
   loadings <- as.data.frame(unclass(efa_2factors_promax$loadings))
   h2 <- efa_2factors_promax$communality
   u2 <- efa_2factors_promax$uniquenesses
   com <- efa_2factors_promax$complexity
   data <- cbind(loadings, h2, u2, com)
   data <- tibble::rownames_to_column(data,"terms")
   data <- data %>% 
 mutate_if(is.numeric, round, digits=2)
   data <- data %>% mutate_at(vars(com), funs(round(., 1)))

   write.table(data, paste("results/efa_2factors_promax_image",i,".tsv",sep=""),row.names=FALSE,sep='\t')
   
   #number of factors
   nfact <- analyze_image(i)[[4]]
   list_nfactor <- c(list_nfactor, nfact)
   
   
   efa_1factor <- analyze_image(i)[[1]]
   if(i == 1){
      df <- as.data.frame(unclass(efa_1factor$loadings))
      colnames(df) <- c(paste("PA1 Image ",i))
      df <- tibble::rownames_to_column(df,"terms")
      
   }
   else{
      dftemp <- as.data.frame(unclass(efa_1factor$loadings))
      colnames(dftemp) <- c(paste("PA1 Image ",i))
      dftemp <- tibble::rownames_to_column(dftemp,"terms")
      df <- merge(df,dftemp,by="terms")
   }
}
```


### Table for number of factors for all images
```{r}
colnames(df_nfactor) <- list_nfactor_column_name
df_nfactor[nrow(df_nfactor) + 1,] <- list_nfactor
row.names(df_nfactor) <- "Factors"
write.table(df_nfactor, paste("results/factor_numbers.tsv",sep=""),row.names=FALSE,sep='\t')
```

### Factor Loadings for all images from the chosen FA method
```{r,results='asis'}
  write.table(df,"results/factorLoadings_all_images.tsv",row.names=FALSE,sep='\t')
  print(xtable(df),type="html")
```

### Terms for which all factor loadings are >.7
```{r,results='asis'}
  remainingTerms <- df %>% filter_all(all_vars(. > 0.7))
  write.table(remainingTerms,"results/factorLoadingsAbove_.7_all_images.tsv",row.names=FALSE,sep='\t')
  print(xtable(remainingTerms),type="html")
```




## Reduce Items - Calculate Alpha

This code calculates the average ratings each image received from participants and saves them as plots with CIs. 
This analysis is partly exploratory as we check what would have happened if participants had used some of our reduced scale.

# Setup

```{r}
source("03_EFA/CI-Functions.R")
participantResponseFiles <- list.files(path= "03_EFA/data",pattern = "\\.csv$") #names correspond to images, one participant per row, one word per
```

```{r}
print(participantResponseFiles)
```

# Functions
This one cleans up the column names for each image's responses
```{r}
cleanColnames <- function(data){
  newNames <- gsub("^.+?\\.(.+?)\\..*$", "\\1", colnames(data))
  return(newNames)
}
```

This functions draws a bar chart with confidence intervals
```{r}
barChart <- function(resultTable, techniques, nbTechs = -1, ymin, ymax, xAxisLabel = "I am the X axis", yAxisLabel = "I am the Y Label",plotTitle){
  #tr <- t(resultTable)
  if(nbTechs <= 0){
    stop('Please give a positive number of Techniques, nbTechs');
  }
  
  tr <- as.data.frame(resultTable)
  nbTechs <- nbTechs - 1 ; # seq will generate nb+1
  
  #now need to calculate one number for the width of the interval
  tr$CI2 <- tr$upperBound_CI - tr$mean
  tr$CI1 <- tr$mean - tr$lowerBound_CI
  
  #add a technique column
  tr$technique <- factor(seq.int(0, nbTechs, 1));
  
  breaks <- c(as.character(tr$technique));
  print(tr)
  g <- ggplot(tr, aes(x=technique, y=mean)) + 
    #   geom_bar(stat="identity",fill = I("#CCCCCC")) +
    geom_errorbar(aes(ymin=mean-CI1, ymax=mean+CI2),
                  width=0,                    # Width of the error bars
                  size = 1.1
    ) +
    #labs(title="Overall time per technique") +
    labs(x = xAxisLabel, y = yAxisLabel) + 
    scale_y_continuous(limits = c(ymin,ymax),breaks=1:7) +
    scale_x_discrete(name="",breaks,techniques)+
    coord_flip() +
    ggtitle(plotTitle) +
    theme(panel.background = element_rect(fill = 'white', colour = 'white'),axis.title=element_text(size = rel(1.2), colour = "black"),axis.text=element_text(size = rel(1.2), colour = "black"),panel.grid.major = element_line(colour = "#DDDDDD"),panel.grid.major.y = element_blank(), panel.grid.minor.y = element_blank())+
    geom_point(size=2, colour="black")         # dots
  
  print(g)
}
```

This next function calculates the CIs of each image's responses depending on the scale items (terms) given. 
```{r}
calculateDrawResponseCIs <- function(scaleItems){


  imageCount <- length(participantResponseFiles)
  pointEstimateVector = c()
  lowerBoundVector = c()
  upperBoundVector = c()
  imageVector = c()
  
  for (image in 1:imageCount){
  
    data <- read.csv(paste("03_EFA/data/",participantResponseFiles[[image]],sep=""), encoding="UTF-8")
    terms <- cleanColnames(data) 
    colnames(data) <- terms
    
    #exploratory trying to see what would happen if we had had a lot fewer participants
    #data <- data[sample(nrow(data), 24), ]
    
    data <- data[scaleItems]
  
    means <- rowMeans(data)
    
    imageVector <- append(imageVector,image)
    ci <- bootstrapMeanCI(means)
    pointEstimateVector <- append(pointEstimateVector,ci[1])
    upperBoundVector <- append(upperBoundVector,ci[3])
    lowerBoundVector <- append(lowerBoundVector,ci[2])
  }
  
  df <- data.frame(image=imageVector,mean=pointEstimateVector,lowerBound_CI=lowerBoundVector,upperBound_CI=upperBoundVector)
  plotTitle <- paste(paste("Average Rating for the",length(scaleItems)),"item scale")
  barChart(df,df$image ,nbTechs = 15, ymin = 1, ymax = 7, "Image", "Average Ratings",plotTitle)
  ggsave(paste("./results/",paste(plotTitle,".pdf",sep=""),sep=""), width=8, height=4, device=cairo_pdf)
  
  print(df)
  
  return(df)
}
```

As the ratings per image were done by different participant pools we don't actually want to compare the ratings of each image.
```{r}
generatePerImageTables <-function(dfs,titles){
  imageCounts <- length(participantResponseFiles)

  for(i in 1:imageCounts){
    pointEstimateVector = c()
    lowerBoundVector = c()
    upperBoundVector = c()
    scaleVector = c()
    
    dflength <- length(dfs)
    for(d in 1:dflength){
      scaleVector <- append(scaleVector,titles[d])
      df <- dfs[[d]]
      
      pointEstimateVector <- append(pointEstimateVector,df$mean[df$image==i])
      upperBoundVector <- append(upperBoundVector,df$upperBound_CI[df$image==i])
      lowerBoundVector <- append(lowerBoundVector,df$lowerBound_CI[df$image==i])
    }
    
    df <- data.frame(scale=scaleVector,mean=pointEstimateVector,lowerBound_CI=lowerBoundVector,upperBound_CI=upperBoundVector)
    plotTitle <- paste(paste("Image",i)," Ratings Per Scale")
    barChart(df,df$scale ,nbTechs = dflength, ymin = 1, ymax = 7, "Image", "Average Ratings",plotTitle)
    path<-paste("./results/",paste(plotTitle,".pdf",sep=""),sep="")
    print(path)
    ggsave(path, width=8, height=4,device=cairo_pdf)
  }
}
```

And now we run our tests

## Average Ratings for the Test Scale

```{r}
data <- read.csv(paste("03_EFA/data/",participantResponseFiles[[1]],sep=""), encoding="UTF-8")
scaleItems <- cleanColnames(data)
df31 <- calculateDrawResponseCIs(scaleItems)
```

## Average Ratings for the 3-Item Scale
```{r}
scaleItems = c("enjoyable","likable","pleasing")
df3 <- calculateDrawResponseCIs(scaleItems)
```

## Average Ratings for the 4-Item Scale
```{r}
scaleItems = c("enjoyable","likable","pleasing","nice")
df4 <- calculateDrawResponseCIs(scaleItems)
```

## Average Ratings for the 5-Item Scale
```{r}
scaleItems = c("enjoyable","likable","pleasing","nice","appealing")
df5 <- calculateDrawResponseCIs(scaleItems)
```

```{r}
dfs <- list(df3,df4,df5,df31)
generatePerImageTables(list(df3,df4,df5,df31),c("3-Item","4-Item","5-Item","31-Item"))
```

## CFA

```{r}
# import library
library(lavaan) #CFA
library(ltm) # Cronbach's alpha
```

```{r}
# import data
data <- read.csv("04_CFA/validation results + demographics - valid - PID removed.csv" , encoding="UTF-8")
data
```

```{r}
# split data
# results for Sunburst - our scale
data_sun <- data[, c('sunburst.enjoyable.', 'sunburst.likable.', 'sunburst.pleasing.', 'sunburst.nice.', 'sunburst.appealing.')]
# results for Sunburst - classic aesthetic for website scale
data_sun_ca <- data[, c('sunburst.aesthetic.', 'sunburst.pleasant.', 'sunburst.clear.', 'sunburst.clean.', 'sunburst.symmetric.')]

# results for Beamtree - our scale
data_beam <- data[, c('beamtree.enjoyable.', 'beamtree.likable.', 'beamtree.pleasing.', 'beamtree.nice.', 'beamtree.appealing.')]
# results for Beamtree - classic aesthetic for website scale
data_beam_ca <- data[, c('beamtree.aesthetic.', 'beamtree.pleasant.', 'beamtree.clear.', 'beamtree.clean.', 'beamtree.symmetric.')]

# results for Startree - our scale
data_star <- data[, c('startree.enjoyable.', 'startree.likable.', 'startree.pleasing.', 'startree.nice.', 'startree.appealing.')]
# results for Startree - classic aesthetic for website scale
data_star_ca <- data[, c('startree.aesthetic.', 'startree.pleasant.', 'startree.clear.', 'startree.clean.', 'startree.symmetric.')]
```


```{r}
#Clean the column name function
cleanColnames <- function(data, visName){
  # delete visName and . from column name of dataframe, only keep the terms as column name
  names(data) <- sub(paste(visName, ".", sep =""), "", names(data))
  names(data) <- sub("\\.", "", names(data))
  return(data)
}
```

```{r}
# Clean column names for all data
data_sun <- cleanColnames(data_sun, "sunburst")
data_sun_ca <- cleanColnames(data_sun_ca, "sunburst")
data_beam <- cleanColnames(data_beam, "beamtree")
data_beam_ca <- cleanColnames(data_beam_ca, "beamtree")
data_star <- cleanColnames(data_star, "startree")
data_star_ca <- cleanColnames(data_star_ca, "startree")
```

## Confirmatory Factor Analysis

```{r}
CFA <- function(data){
  APV.model ='aesthetic_pleasure =~ enjoyable + likable + pleasing + nice + appealing'
  fit <- cfa(APV.model,data = data,std.lv = TRUE)
  
  pvalue <- fitMeasures(fit, "pvalue")
  tli <- fitMeasures(fit, "tli")
  cfi <- fitMeasures(fit, "cfi")
  srmr <- fitMeasures(fit, "srmr")
  rmsea <- fitMeasures(fit, "rmsea")
  
  good_fit_list <- NULL
  good_fit_list <- c(pvalue, tli, cfi, srmr, rmsea)
  
  
  summary(fit, fit.measures=T, standardized=TRUE)
  
  return(good_fit_list)
}
```

```{r}
#CFA factor numbers
APV.model ='aesthetic_pleasure =~ enjoyable + likable + pleasing + nice + appealing'
fit1 <- cfa(APV.model,data = data_sun,std.lv = TRUE)


l1 <- lavInspect(fit1, what = "std.all", add.labels = FALSE, add.class = TRUE,
           list.by.group = TRUE,
           drop.list.single.group = TRUE)[[1]]

fit2 <- cfa(APV.model,data = data_star,std.lv = TRUE)
l2 <- lavInspect(fit2, what = "std.all", add.labels = FALSE, add.class = TRUE,
           list.by.group = TRUE,
           drop.list.single.group = TRUE)[[1]]

fit3 <- cfa(APV.model,data = data_beam,std.lv = TRUE)
l3 <- lavInspect(fit3, what = "std.all", add.labels = FALSE, add.class = TRUE,
           list.by.group = TRUE,
           drop.list.single.group = TRUE)[[1]]

df_factor_loading_cfa <- cbind(l1, l2, l3)


colnames(df_factor_loading_cfa) <- c("SunBurst", "StarTree", "BeamTree")
rownames(df_factor_loading_cfa) <- c("enjoyable", "likable", "pleasing", "nice","appealing")

df_factor_loading_cfa <- as.data.frame(df_factor_loading_cfa)
df_factor_loading_cfa <- mutate(df_factor_loading_cfa, across(where(is.numeric), round, 3))

df_factor_loading_cfa
write.table(df_factor_loading_cfa, paste("results/cfa_factor_loading.tsv",sep=""),row.names=TRUE,col.names=NA,sep='\t')
```


```{r}
good_fit_sun <- CFA(data_sun)
good_fit_star <- CFA(data_star)
good_fit_beam <- CFA(data_beam)

good_fit_table <- cbind(good_fit_sun, good_fit_star, good_fit_beam)

good_fit_table <- as.data.frame(good_fit_table)
good_fit_table <- mutate(good_fit_table, across(where(is.numeric), round, 3))

good_fit_table 
write.table(good_fit_table, paste("results/cfa_good_fit.tsv",sep=""),row.names=TRUE,col.names=NA,sep='\t')
```

## Convergent Validity
```{r}
con_vali <- function(data1, data2){ # results of two scales you want to calculate correlation
  cor.test(rowMeans(data1), rowMeans(data2), method = "pearson")
}
```

## Discrinminant Validity
```{r}
dis_vali <- function(data1){ # result of your scale
  cor.test(rowMeans(data1), data$age , method = "pearson")
}
```


# Sunburst
```{r}
CFA(data_sun)
```

```{r}
con_vali(data_sun, data_sun_ca)
```

```{r}
dis_vali(data_sun)
```

```{r}
cronbach.alpha(data_sun)
```

# Startree
```{r}
CFA(data_star)
```

```{r}
con_vali(data_star, data_star_ca)
```

```{r}
dis_vali(data_star)
```


```{r}
cronbach.alpha(data_star)
```

# Beamtree
```{r}
CFA(data_beam)
```

```{r}
con_vali(data_beam, data_beam_ca)
```

```{r}
dis_vali(data_beam)
```

```{r}
cronbach.alpha(data_beam)
```


```{r}
#Cronbach's Alpha for CFA
df_alpha <- data.frame(matrix(ncol = 3, nrow = 0))
list_alpha <- c(cronbach.alpha(data_sun)$alpha, cronbach.alpha(data_star)$alpha,cronbach.alpha(data_beam)$alpha)
colnames(df_alpha) <- c("SunBurst", "StarTree", "BeamTree")
df_alpha[nrow(df_alpha) + 1,] <- list_alpha
row.names(df_alpha) <- "Cronbach’s Alpha"
df_alpha <- df_alpha %>% 
 mutate_if(is.numeric, round, digits=2)
df_alpha
write.table(df_alpha, paste("results/cfa_alpha.tsv",sep=""),row.names=TRUE,col.names=NA,sep='\t')
```
```{r}
# Pearson correlation
df_pearson <- data.frame(matrix(ncol = 3, nrow = 0))
list_pearson_con <- c(con_vali(data_sun, data_sun_ca)$estimate,con_vali(data_star, data_star_ca)$estimate,con_vali(data_beam, data_beam_ca)$estimate)
list_pearson_dis <- c(dis_vali(data_sun)$estimate,dis_vali(data_star)$estimate,dis_vali(data_beam)$estimate)
colnames(df_pearson) <- c("SunBurst", "StarTree", "BeamTree")
df_pearson[nrow(df_pearson) + 1,] <- list_pearson_con
df_pearson[nrow(df_pearson) + 1,] <- list_pearson_dis
row.names(df_pearson) <- c("Classic Aesthetic", "Age")
df_pearson <- df_pearson %>% 
 mutate_if(is.numeric, round, digits=2)
df_pearson
write.table(df_pearson, paste("results/cfa_pearson.tsv",sep=""),row.names=TRUE,col.names=NA,sep='\t')
```

### Rating Results

This code calculates the average ratings each image received from participants and saves them as plots with CIs. 

# Setup

```{r}
source("04_CFA/CI-Functions.R")
participantResponseFiles <- list.files(path= "04_CFA/data",pattern = "\\.csv$") #names correspond to images, one participant per row, one word per
```

# Functions
This one cleans up the column names for each image's responses
```{r}
cleanColnames <- function(data){
  newNames <- gsub("^.+?\\.(.+?)\\..*$", "\\1", colnames(data))
  return(newNames)
}
```

This functions draws a bar chart with confidence intervals
```{r}
barChart <- function(resultTable, techniques, nbTechs = -1, ymin, ymax, xAxisLabel = "I am the X axis", yAxisLabel = "I am the Y Label",plotTitle){
  #tr <- t(resultTable)
  if(nbTechs <= 0){
    stop('Please give a positive number of Techniques, nbTechs');
  }
  
  tr <- as.data.frame(resultTable)
  nbTechs <- nbTechs - 1 ; # seq will generate nb+1
  
  #now need to calculate one number for the width of the interval
  tr$CI2 <- tr$upperBound_CI - tr$mean
  tr$CI1 <- tr$mean - tr$lowerBound_CI
  
  #add a technique column
  tr$technique <- factor(seq.int(0, nbTechs, 1));
  
  breaks <- c(as.character(tr$technique));
  print(tr)
  g <- ggplot(tr, aes(x=technique, y=mean)) + 
    #   geom_bar(stat="identity",fill = I("#CCCCCC")) +
    geom_errorbar(aes(ymin=mean-CI1, ymax=mean+CI2),
                  width=0,                    # Width of the error bars
                  size = 1.1
    ) +
    #labs(title="Overall time per technique") +
    labs(x = xAxisLabel, y = yAxisLabel) + 
    scale_y_continuous(limits = c(ymin,ymax),breaks=1:7) +
    scale_x_discrete(name="",breaks,techniques)+
    coord_flip() +
    ggtitle(plotTitle) +
    theme(panel.background = element_rect(fill = 'white', colour = 'white'),axis.title=element_text(size = rel(1.2), colour = "black"),axis.text=element_text(size = rel(1.2), colour = "black"),panel.grid.major = element_line(colour = "#DDDDDD"),panel.grid.major.y = element_blank(), panel.grid.minor.y = element_blank())+
    geom_point(size=2, colour="black")         # dots
  
  print(g)
}
```

This next function calculates the CIs of each image's responses depending on the scale items (terms) given. 
```{r}
calculateDrawResponseCIs <- function(scaleItems){


  imageCount <- length(participantResponseFiles)
  pointEstimateVector = c()
  lowerBoundVector = c()
  upperBoundVector = c()
  imageVector = c()
  
  for (image in 1:imageCount){
  
    data <- read.csv(paste("04_CFA/data/",participantResponseFiles[[image]],sep=""), encoding="UTF-8")
    #terms <- cleanColnames(data) 
    #colnames(data) <- terms
    
    #exploratory trying to see what would happen if we had had a lot fewer participants
    #data <- data[sample(nrow(data), 24), ]
    
    data <- data[scaleItems]
  
    means <- rowMeans(data)
    
    imageVector <- append(imageVector,image)
    ci <- bootstrapMeanCI(means)
    pointEstimateVector <- append(pointEstimateVector,ci[1])
    upperBoundVector <- append(upperBoundVector,ci[3])
    lowerBoundVector <- append(lowerBoundVector,ci[2])
  }
  
  df <- data.frame(image=imageVector,mean=pointEstimateVector,lowerBound_CI=lowerBoundVector,upperBound_CI=upperBoundVector)
  plotTitle <- paste(paste("Average Rating for the",length(scaleItems)),"item scale")
  barChart(df,df$image ,nbTechs = imageCount, ymin = 1, ymax = 7, "Image", "Average Ratings",plotTitle)
  ggsave(paste("results/",paste(plotTitle,".pdf",sep=""),sep=""), width=8, height=4,device=cairo_pdf)
  
  print(df)
  
  return(df)
}
```

As the ratings per image were done by different participant pools we don't actually want to compare the ratings of each image.
```{r}
generatePerImageTables <-function(dfs,titles){
  imageCounts <- length(participantResponseFiles)

  for(i in 1:imageCounts){
    pointEstimateVector = c()
    lowerBoundVector = c()
    upperBoundVector = c()
    scaleVector = c()
    
    dflength <- length(dfs)
    for(d in 1:dflength){
      scaleVector <- append(scaleVector,titles[d])
      df <- dfs[[d]]
      
      pointEstimateVector <- append(pointEstimateVector,df$mean[df$image==i])
      upperBoundVector <- append(upperBoundVector,df$upperBound_CI[df$image==i])
      lowerBoundVector <- append(lowerBoundVector,df$lowerBound_CI[df$image==i])
    }
    
    df <- data.frame(scale=scaleVector,mean=pointEstimateVector,lowerBound_CI=lowerBoundVector,upperBound_CI=upperBoundVector)
    plotTitle <- paste(paste("Image",i)," Ratings Per Scale")
    barChart(df,df$scale ,nbTechs = dflength, ymin = 1, ymax = 7, "Image", "Average Ratings",plotTitle)
    path<-paste("results/",paste(plotTitle,".pdf",sep=""),sep="")
    print(path)
    ggsave(path, width=8, height=4,device=cairo_pdf)
  }
}
```

And now we run our tests

## Average Ratings for the Test Scale

```{r}
data <- read.csv(paste("04_CFA/data/",participantResponseFiles[[1]],sep=""), encoding="UTF-8")
scaleItems <- colnames(data)
scaleItems
df31 <- calculateDrawResponseCIs(scaleItems)
```
Browse the archive

https://github.com/tingying-he/beauvis