https://github.com/AlexanderDilthey/MHC-PRG
Tip revision: e59943adb8855532573a6c276651efad1e18a6b1 authored by Alexander Dilthey on 18 December 2018, 10:20:48 UTC
Update HLA-PRG.md
Update HLA-PRG.md
Tip revision: e59943a
PaperData.md
# Paper-related data
## PRG input data
The MHC-PRG data package (http://www.well.ox.ac.uk/MHC-PRG.tar.gz - c. 213GB) contains a subfolder 'paper' with files relevant to our publication:
- The 8-haplotype alignment used as a scaffold for the utilized PRG:
alignment_with_MANN_with_APD.zip
- The 1000G Phase 1 VCF file with the SNPs that went into the PRG:
ALL.wgs.phase1_release_v3.20101123.snps_indels_sv.sites.vcf.xMHC.zip
- Downloaded genomic HLA allele sequences from IMGT:
IMGT.zip
## Other data utilized in the paper
### NA12878 Platinum read data
Next-generation sequencing for NA12878 from the Illumina Platinum genomes project (www.illumina.com/platinumgenomes/) was downloaded from the EBI (www.ebi.ac.uk/ena/data/view/ERP001775).
### Moleculo data NA12878
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20131209_na12878_moleculo/
### GSK samples
Subjects CS1-6 were from the following 4 GSK sponsored clinical studies; EGF100151, EGF30008, EGF105485 and EGF106708. Access to anonymized patient-level data underlying this study will be made available to independent researchers, following review by an independent panel, and execution of a data sharing agreement. To submit a request or enquiry please visit www.clinicalstudydatarequest.com.
## PRG output data
- VCFs generated by MHC-PRG (PRG-Viterbi and PRG-Mapped):
VCFs.zip
- Chromotypes generated by MHC-PRG
Within in the data package directory structure:
MHC-PRG/tmp/kMerCount__GS_nextGen_varigraph3_AA02O9Q_Z2_31_required.binaryCount.*
- Moleculo alignments
Alignments of Moleculo reads to the NA12878 chromotypes can be found (within the data package directory structure) in:
MHC-PRG/tmp/alignedContigs/_GS_nextGen_varigraph3_AA02O9Q_Z2_31/contigs_xMHC_fasta
(see the four subfolders for alignments to the PRG chromotypes [toViterbiChromotypes, toAmendedChromotypes], the Platypus-VCF-based chromotype [toVCF] and the reference-based chromotype [toReference]).
## Wiggle files for UCSC
We provide the results from our genome-wide VCF evaluation in bigWig format for use in the UCSC genome browser (computed in 200bp windows). Include the following URL as 'custom track':
http://oxfordhla.well.ox.ac.uk/VCF.bw