Content - 0cd00f39108f1699317f73afa17966e0c3accef3 - 11cf0a4/README.md

visit type:

Tip revision: ad83786947a94089e624e4e0256a5bd91b8c0373 authored by Joshua Bloom on 16 August 2021, 21:16:39 UTC
Update README

Tip revision: ad83786

README.md

# Genetics of trans-regulatory variation in gene expression
### code and data for 1000 BYxRM haploid yeast segregant eQTL mapping
#### Joshua S Bloom and Frank W Albert

see [eQTL_BYxRM1000_stranded.R](code/eQTL_BYxRM1000_stranded.R) for main analysis script
___

genotype and raw transcript count data can be found [here](RData/counts.RData) as an RData object

```r
#After cloning git repo, cd to local directory, load R, and then
load('RData/counts.RData')
# Structure of phenotype data (counts$pheno)
# 6,713 transcripts (rows) X 1012 haploid segregants (columns)
#
# Structure of genotype data (counts$gdata)
# 1012 haploid segregants (rows) X 42,052 markers (columns)
# -1 indicates BY allele, +1 indicates RM allele
# column names indicate chromosome:position_BYvariant/RMvariant
# postitions are based on the S.Cerevisiae SacCer3 genome build
R> str(counts)
List of 2
 $ pheno: num [1:6713, 1:1012] 550 209 141 442 35 640 89 110 307 2 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:6713] "YHR055C" "YPR161C" "YOL138C" "YDR395W" ...
  .. ..$ : chr [1:1012] "A01_01-A01-A1-BYxRM_eQTL_10-H6" "A01_02-A01-A2-BYxRM_eQTL_11-F3" "A01_03-A01-A3-BYxRM_eQTL_11-C6" "A01_04-A01-A4-BYxRM_eQTL_03-G2" ...
 $ gdata: num [1:1012, 1:42052] 1 -1 1 1 -1 -1 -1 -1 -1 1 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:1012] "A01_01" "A01_02" "A01_03" "A01_04" ...
  .. ..$ : chr [1:42052] "chrI:33040_A/G" "chrI:33048_A/C" "chrI:33070_A/T" "chrI:33077_G/A" ...
```
processed and filtered log2(TPM+0.5) values per transcript can be found [here](RData/log2_t.tpm.matrix.RData) as an RData object

```r
load('RData/log2_t.tpm.matrix.RData')
R> str(t.tpm.matrix)
 num [1:1012, 1:5720] 4.87 6.09 5.64 1.88 6.27 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:1012] "A01_01-A01-A1-BYxRM_eQTL_10-H6" "A01_02-A01-A2-BYxRM_eQTL_11-F3" "A01_03-A01-A3-BYxRM_eQTL_11-C6" "A01_04-A01-A4-BYxRM_eQTL_03-G2" ...
  ..$ : chr [1:5720] "YAL062W" "YAL061W" "YAL060W" "YAL059W" ...
```

additional batch and growth covariates can be found [here](RData/covariates.OD.RData)
```r
R> load('RData/covariates.OD.RData')
R> str(covariates.OD)
 num [1:1012, 1:14] 1 1 1 1 1 1 1 1 1 1 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:1012] "A01_01-A01-A1-BYxRM_eQTL_10-H6" "A01_02-A01-A2-BYxRM_eQTL_11-F3" "A01_03-A01-A3-BYxRM_eQTL_11-C6" "A01_04-A01-A4-BYxRM_eQTL_03-G2" ...
  ..$ : chr [1:14] "(Intercept)" "gbatch.factBYxRM_eQTL_02" "gbatch.factBYxRM_eQTL_03" "gbatch.factBYxRM_eQTL_04" ...
 - attr(*, "assign")= int [1:14] 0 1 1 1 1 1 1 1 1 1 ...
 - attr(*, "contrasts")=List of 1
  ..$ gbatch.fact: chr "contr.treatment"
```

see [process_fastq](code/process_fastq) for scripts to turn gzipped fastq files from SRA into transcript abundances

Browse the archive

https://github.com/joshsbloom/eQTL_BYxRM