Content - 062b8d73b5d7af0c2781c327f75aa34f016315ff - a24b54a/Supereffectors_PART1_3_Initial_analysis.Rmd

visit type:
https://github.com/Lab-of-Adaptive-Immunity/Supereffectors_scRNAseq

02 February 2023, 13:36:01 UTC
Tip revision: 7b0dec9507dd45ab4bb0619912b240f512c7798f authored by JurMich on 26 November 2022, 12:08:33 UTC
Update README.md
Tip revision: 7b0dec9
Supereffectors_PART1_3_Initial_analysis.Rmd
---
title: "Super-Effectors PART1"
author: "Juraj Michalik"
date: "20/01/2021"
output: html_document
---

# Treg Pre-processing

This script realizes the pre-analysis of T-reg data-set. For this script to work, you have to have count matrix generated by 10X Genomics Cell Ranger 5.0.0 software downloaded from our source (see README for more information on how to get them). These data should be located in *Raw_data* (SUBJECT TO CHANGE) directory. The lists of cells marked by given hashtag is also required, these should be in directory *Hashtags*.

## Initialization

Here we load all needed packages and functions. Packages:

```{r}
knitr::opts_chunk$set(include = TRUE, warning = FALSE, message = FALSE, 
                      error = TRUE, cache = TRUE)
suppressMessages(require(Seurat))
suppressMessages(require(ggplot2))
suppressMessages(require(cowplot))
suppressMessages(require(data.table))
suppressMessages(require(pracma))
suppressMessages(require(dplyr))
suppressMessages(require(RColorBrewer))
suppressMessages(require(grDevices))
suppressMessages(require(gplots))
suppressMessages(require(kableExtra))
suppressMessages(require(knitr))
```

Functions:

```{r}
# for import of hashtag lists
# - dir: dir where hashtags are located
# - hashtag.list: filenames of all hashtag lists to import
import.hashtags <- function(dir, hashtag.list){
  # imports all hashtags here in order from hashtag.list
  hashtag.obj = list()
  for(i in 1:length(hashtag.list)){
    hashtag.obj[[i]] <- read.csv(paste0(dir, hashtag.list[[i]]), h=F)[,1] 
  }
  return(hashtag.obj)
}

# creates vector for meta-data using hashtags
build.metavector <- function(seurat.obj, hashtag.obj, hash.groups, descriptors, reformat = TRUE){
  metavector <- rep(descriptors[[1]], nrow(seurat.obj@meta.data))
  if(length(hash.groups)>1){
    for(i in 2:length(hash.groups)){
      cells <- unlist(hashtag.obj[hash.groups[[i]]])
      if(reformat){
        cells <- paste0(cells, '-1')
      }
      metavector[colnames(seurat.obj) %in% cells] <- descriptors[[i]]
    }
  }
  return(metavector)
}
```

Load hashtags from files:

```{r hashtags}
Treg_hashtags <- import.hashtags('',
                                 c('exp07_W3_Mouse_Hashtag_5.csv', 'exp07_W3_Mouse_Hashtag_6.csv',
                                   'exp07_W3_Mouse_Hashtag_7.csv', 'exp07_W3_Mouse_Hashtag_8.csv',
                                   'exp07_W3_Mouse_Hashtag_9.csv', 'exp07_W3_Mouse_Hashtag_10.csv'))
```

Other parameters, notably regarding filtering (removal of obvious outliers and speeding up the downstream analysis):

```{r parameters additional}
min.features = 200
min.mt = 15
```

## Setting up initial data-set

Here we set up data set for downstream analysis. We begin by loading the data from transcripts and antibodies and setting up the initial Seurat Object. Notice that the V(D)J genes of TRA and TRB chains are removed.

```{r init,  include=F, results=F}
GEX.data <- Read10X('Raw_data/raw_feature_bc_matrix')
GEX.data$`Gene Expression` <- GEX.data$`Gene Expression`[!(grepl('^Tr(a|b)(v|d|j)', rownames(GEX.data$`Gene Expression`))),]
Treg_seu <- CreateSeuratObject(GEX.data$`Gene Expression`, project = 'T_regs', min.cells = 3, min.features = min.features)

cells <- colnames(GetAssayData(Treg_seu, slot = 'counts'))
cells <- GEX.data$`Antibody Capture`[,colnames(GEX.data$`Antibody Capture`) %in% cells]
Treg_seu[['Antibodies']] <- CreateAssayObject(counts = cells)

Treg_seu <- subset(Treg_seu, cells = paste0(unlist(Treg_hashtags), '-1'))
```

Quick look at the quality of the data. At the end we filter out the cells with 15+ % of transcripts mapping to mitochondrial genes.

```{r}
Treg_seu[["percent.mt"]] <- PercentageFeatureSet(Treg_seu, pattern = "^(M|m)(T|t)-")
Treg_seu[["percent.rt"]] <- PercentageFeatureSet(Treg_seu, pattern = "^(R|r)p[sl]")
print(VlnPlot(Treg_seu, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3))
  
plot.mt <- FeatureScatter(Treg_seu, feature1 = "nCount_RNA", feature2 = "percent.mt")
plot.nFeat <- FeatureScatter(Treg_seu, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")
print(CombinePlots(plots = list(plot.mt, plot.nFeat)))

Treg_seu <- subset(Treg_seu, percent.mt < min.mt)
```

Now adding some meta data based on hashtags.

```{r iteration1, echo=FALSE, warning=FALSE, message=FALSE, fig.align = "center"}
Treg_seu$hashtags <- build.metavector(Treg_seu, Treg_hashtags, list(1,2,3,4,5,6), 
                                        paste('Mouse_Hashtag', list(5,6,7,8,9,10), sep='_'))
Treg_seu$description <- build.metavector(Treg_seu, Treg_hashtags, list(1,2,3,4,5,6),
                                           c('OT-I DC-OVA day 3 DEREG- sample 1','OT-I DC-OVA day 3 DEREG- sample 3',
                                             'OT-I DC-OVA day 3 DEREG- sample 5','OT-I DC-OVA day 3 DEREG+ sample 2',
                                             'OT-I DC-OVA day 3 DEREG+ sample 4','OT-I DC-OVA day 3 DEREG+ sample 6'))
Treg_seu$class <- build.metavector(Treg_seu, Treg_hashtags, list(c(1,2,3), c(4,5,6)), c('OT-I 3 DEREG-', 'OT-I 3 DEREG+'))
Treg_seu$time <- build.metavector(Treg_seu, Treg_hashtags, list(seq(1,6,1)), 'Day 3')
```

# Pre-analysis

Perform pre-clustering. This encompasses log-normalization, searching for variable features, scaling, dimensional reductions (PCA and UMAP) and finally establishing clusters via kNN method. This is rather crude clustering that serves only for establishing **bad** clusters that will be removed afterwards

```{r, include=F, results=F}
Treg_seu <- NormalizeData(Treg_seu)
Treg_seu <- FindVariableFeatures(Treg_seu, selection.method = "vst", nfeatures = 2000)
all.genes <- rownames(Treg_seu)
Treg_seu <- ScaleData(Treg_seu, features = all.genes)
Treg_seu <- RunPCA(Treg_seu, features = VariableFeatures(object = Treg_seu))
Treg_seu <- RunUMAP(Treg_seu, dims = 1:15, n.neighbors = 40)

Treg_seu <- FindNeighbors(Treg_seu, dims = 1:15, nn.method = 'rann')
Treg_seu <- FindClusters(Treg_seu, resolution = 0.8, random.seed = 42)
```

Plot clustering and VlnPlots for quality of the separate clusters. These will show you clusters that should be eliminated (see following part for the proper removal).

```{r}
print(DimPlot(Treg_seu) + ggtitle('Initial clustering'))
print(VlnPlot(Treg_seu, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3))
```

The pre-processing is finished. Now we can save the data.

```{r}
saveRDS(Treg_seu, 'data/E07_W3_510_data_init.rds')
```