https://github.com/AlexanderDilthey/MHC-PRG
Raw File
Tip revision: e59943adb8855532573a6c276651efad1e18a6b1 authored by Alexander Dilthey on 18 December 2018, 10:20:48 UTC
Update HLA-PRG.md
Tip revision: e59943a
PaperData.md
# Paper-related data

## PRG input data

The MHC-PRG data package (http://www.well.ox.ac.uk/MHC-PRG.tar.gz - c. 213GB) contains a subfolder 'paper' with files relevant to our publication:

- The 8-haplotype alignment used as a scaffold for the utilized PRG:

  alignment_with_MANN_with_APD.zip

- The 1000G Phase 1 VCF file with the SNPs that went into the PRG:

  ALL.wgs.phase1_release_v3.20101123.snps_indels_sv.sites.vcf.xMHC.zip

- Downloaded genomic HLA allele sequences from IMGT:

  IMGT.zip

## Other data utilized in the paper

### NA12878 Platinum read data

Next-generation sequencing for NA12878 from the Illumina Platinum genomes project (www.illumina.com/platinumgenomes/) was downloaded from the EBI (www.ebi.ac.uk/ena/data/view/ERP001775). 

### Moleculo data NA12878

ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20131209_na12878_moleculo/

### GSK samples

Subjects CS1-6 were from the following 4 GSK sponsored clinical studies; EGF100151, EGF30008, EGF105485 and EGF106708. Access to anonymized patient-level data underlying this study will be made available to independent researchers, following review by an independent panel, and execution of a data sharing agreement.  To submit a request or enquiry please visit www.clinicalstudydatarequest.com.

## PRG output data

- VCFs generated by MHC-PRG (PRG-Viterbi and PRG-Mapped):

  VCFs.zip
  
- Chromotypes generated by MHC-PRG

  Within in the data package directory structure:
  
  MHC-PRG/tmp/kMerCount__GS_nextGen_varigraph3_AA02O9Q_Z2_31_required.binaryCount.*

- Moleculo alignments

  Alignments of Moleculo reads to the NA12878 chromotypes can be found (within the data package directory structure) in:
  
  MHC-PRG/tmp/alignedContigs/_GS_nextGen_varigraph3_AA02O9Q_Z2_31/contigs_xMHC_fasta
  
  (see the four subfolders for alignments to the PRG chromotypes [toViterbiChromotypes, toAmendedChromotypes], the Platypus-VCF-based chromotype [toVCF] and the reference-based chromotype [toReference]).

## Wiggle files for UCSC

We provide the results from our genome-wide VCF evaluation in bigWig format for use in the UCSC genome browser (computed in 200bp windows). Include the following URL as 'custom track':

http://oxfordhla.well.ox.ac.uk/VCF.bw
back to top