https://github.com/galaxyproject/galaxy
Tip revision: 48510a2cc12dd76e661f1bac3c14268926a0b3c1 authored by John Chilton on 12 December 2019, 16:37:38 UTC
Merge pull request #9089 from nsoranzo/release_16.04_extra-index-url
Merge pull request #9089 from nsoranzo/release_16.04_extra-index-url
Tip revision: 48510a2
maf_to_fasta.xml
<tool id="MAF_To_Fasta1" name="MAF to FASTA" version="1.0.1">
<description>Converts a MAF formatted file to FASTA format</description>
<macros>
<import>macros.xml</import>
</macros>
<command interpreter="python">
#if $fasta_target_type.fasta_type == "multiple" #maf_to_fasta_multiple_sets.py $input1 $out_file1 $fasta_target_type.species $fasta_target_type.complete_blocks
#else #maf_to_fasta_concat.py $fasta_target_type.species $input1 $out_file1
#end if#
</command>
<inputs>
<param format="maf" name="input1" type="data" label="MAF file to convert"/>
<conditional name="fasta_target_type">
<param name="fasta_type" type="select" label="Type of FASTA Output">
<option value="multiple" selected="true">Multiple Blocks</option>
<option value="concatenated">One Sequence per Species</option>
</param>
<when value="multiple">
<param name="species" type="select" label="Select species" display="checkboxes" multiple="true" help="checked taxa will be included in the output">
<options>
<filter type="data_meta" ref="input1" key="species" />
</options>
</param>
<param name="complete_blocks" type="select" label="Choose to">
<option value="partial_allowed">include blocks with missing species</option>
<option value="partial_disallowed">exclude blocks with missing species</option>
</param>
</when>
<when value="concatenated">
<param name="species" type="select" label="Species to extract" display="checkboxes" multiple="true">
<options>
<filter type="data_meta" ref="input1" key="species" />
</options>
</param>
</when>
</conditional>
</inputs>
<outputs>
<data format="fasta" name="out_file1" />
</outputs>
<tests>
<test>
<param name="input1" value="3.maf" ftype="maf"/>
<param name="fasta_type" value="concatenated"/>
<param name="species" value="canFam1"/>
<output name="out_file1" file="cf_maf2fasta_concat.dat" ftype="fasta"/>
</test>
<test>
<param name="input1" value="4.maf" ftype="maf"/>
<param name="fasta_type" value="multiple"/>
<param name="species" value="hg17,panTro1,rheMac2,rn3,mm7,canFam2,bosTau2,dasNov1"/>
<param name="complete_blocks" value="partial_allowed"/>
<output name="out_file1" file="cf_maf2fasta_new.dat" ftype="fasta"/>
</test>
</tests>
<help>
**Types of MAF to FASTA conversion**
* **Multiple Blocks** converts a single MAF block to a single FASTA block. For example, if you have 6 MAF blocks, they will be converted to 6 FASTA blocks.
* **One Sequence per Species** converts MAF blocks to a single aggregated FASTA block. For example, if you have 6 MAF blocks, they will be converted and concatenated into a single FASTA block.
-------
**What it does**
This tool converts MAF blocks to FASTA format and concatenates them into a single FASTA block or outputs multiple FASTA blocks separated by empty lines.
The interface for this tool contains two pages (steps):
* **Step 1 of 2**. Choose multiple alignments from history to be converted to FASTA format.
* **Step 2 of 2**. Choose the type of output as well as the species from the alignment to be included in the output.
Multiple Block output has additional options:
* **Choose species** - the tool reads the alignment provided during Step 1 and generates a list of species contained within that alignment. Using checkboxes you can specify taxa to be included in the output (all species are selected by default).
* **Choose to include/exclude blocks with missing species** - if an alignment block does not contain any one of the species you selected within **Choose species** menu and this option is set to **exclude blocks with missing species**, then such a block **will not** be included in the output (see **Example 2** below). For example, if you want to extract human, mouse, and rat from a series of alignments and one of the blocks does not contain mouse sequence, then this block will not be converted to FASTA and will not be returned.
-----
**Example 1**:
In the concatenated approach, the following alignment::
##maf version=1
a score=68686.000000
s hg18.chr20 56827368 75 + 62435964 GACAGGGTGCATCTGGGAGGG---CCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-
s panTro2.chr20 56528685 75 + 62293572 GACAGGGTGCATCTGAGAGGG---CCTGCCAGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-
s rheMac2.chr10 89144112 69 - 94855758 GACAGGGTGCATCTGAGAGGG---CCTGCTGGGCCTTTG-TTCAAAACTAGATATGCCCCAACTCCAATTCTA-------
s mm8.chr2 173910832 61 + 181976762 AGAAGGATCCACCT------------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC-------
s canFam2.chr24 46551822 67 + 50763139 CG------GCGTCTGTAAGGGGCCACCGCCCGGCCTGTG-CTCAAAGCTACAAATGACTCAACTCCCAACCGA------C
a score=10289.000000
s hg18.chr20 56827443 37 + 62435964 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG
s panTro2.chr20 56528760 37 + 62293572 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG
s rheMac2.chr10 89144181 37 - 94855758 ATGTGCGGAAAATGTGATACAGAAACCTGCAGAGCAG
will be converted to (**note** that because mm8 (mouse) and canFam2 (dog) are absent from the second block, they are replaced with gaps after concatenation)::
>canFam2
CG------GCGTCTGTAAGGGGCCACCGCCCGGCCTGTG-CTCAAAGCTACAAATGACTCAACTCCCAACCGA------C-------------------------------------
>hg18
GACAGGGTGCATCTGGGAGGG---CCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG
>mm8
AGAAGGATCCACCT------------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC--------------------------------------------
>panTro2
GACAGGGTGCATCTGAGAGGG---CCTGCCAGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG
>rheMac2
GACAGGGTGCATCTGAGAGGG---CCTGCTGGGCCTTTG-TTCAAAACTAGATATGCCCCAACTCCAATTCTA-------ATGTGCGGAAAATGTGATACAGAAACCTGCAGAGCAG
------
**Example 2a**: Multiple Block Approach **Include all species** and **include blocks with missing species**:
The following alignment::
##maf version=1
a score=68686.000000
s hg18.chr20 56827368 75 + 62435964 GACAGGGTGCATCTGGGAGGG---CCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-
s panTro2.chr20 56528685 75 + 62293572 GACAGGGTGCATCTGAGAGGG---CCTGCCAGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-
s rheMac2.chr10 89144112 69 - 94855758 GACAGGGTGCATCTGAGAGGG---CCTGCTGGGCCTTTG-TTCAAAACTAGATATGCCCCAACTCCAATTCTA-------
s mm8.chr2 173910832 61 + 181976762 AGAAGGATCCACCT------------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC-------
s canFam2.chr24 46551822 67 + 50763139 CG------GCGTCTGTAAGGGGCCACCGCCCGGCCTGTG-CTCAAAGCTACAAATGACTCAACTCCCAACCGA------C
a score=10289.000000
s hg18.chr20 56827443 37 + 62435964 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG
s panTro2.chr20 56528760 37 + 62293572 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG
s rheMac2.chr10 89144181 37 - 94855758 ATGTGCGGAAAATGTGATACAGAAACCTGCAGAGCAG
will be converted to::
>hg18.chr20(+):56827368-56827443|hg18_0
GACAGGGTGCATCTGGGAGGG---CCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-
>panTro2.chr20(+):56528685-56528760|panTro2_0
GACAGGGTGCATCTGAGAGGG---CCTGCCAGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-
>rheMac2.chr10(-):89144112-89144181|rheMac2_0
GACAGGGTGCATCTGAGAGGG---CCTGCTGGGCCTTTG-TTCAAAACTAGATATGCCCCAACTCCAATTCTA-------
>mm8.chr2(+):173910832-173910893|mm8_0
AGAAGGATCCACCT------------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC-------
>canFam2.chr24(+):46551822-46551889|canFam2_0
CG------GCGTCTGTAAGGGGCCACCGCCCGGCCTGTG-CTCAAAGCTACAAATGACTCAACTCCCAACCGA------C
>hg18.chr20(+):56827443-56827480|hg18_1
ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG
>panTro2.chr20(+):56528760-56528797|panTro2_1
ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG
>rheMac2.chr10(-):89144181-89144218|rheMac2_1
ATGTGCGGAAAATGTGATACAGAAACCTGCAGAGCAG
-----
**Example 2b**: Multiple Block Approach **Include hg18 and mm8** and **exclude blocks with missing species**:
The following alignment::
##maf version=1
a score=68686.000000
s hg18.chr20 56827368 75 + 62435964 GACAGGGTGCATCTGGGAGGG---CCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-
s panTro2.chr20 56528685 75 + 62293572 GACAGGGTGCATCTGAGAGGG---CCTGCCAGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-
s rheMac2.chr10 89144112 69 - 94855758 GACAGGGTGCATCTGAGAGGG---CCTGCTGGGCCTTTG-TTCAAAACTAGATATGCCCCAACTCCAATTCTA-------
s mm8.chr2 173910832 61 + 181976762 AGAAGGATCCACCT------------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC-------
s canFam2.chr24 46551822 67 + 50763139 CG------GCGTCTGTAAGGGGCCACCGCCCGGCCTGTG-CTCAAAGCTACAAATGACTCAACTCCCAACCGA------C
a score=10289.000000
s hg18.chr20 56827443 37 + 62435964 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG
s panTro2.chr20 56528760 37 + 62293572 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG
s rheMac2.chr10 89144181 37 - 94855758 ATGTGCGGAAAATGTGATACAGAAACCTGCAGAGCAG
will be converted to (**note** that the second MAF block, which does not have mm8, is not included in the output)::
>hg18.chr20(+):56827368-56827443|hg18_0
GACAGGGTGCATCTGGGAGGGCCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC
>mm8.chr2(+):173910832-173910893|mm8_0
AGAAGGATCCACCT---------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC------
------
.. class:: infomark
**About formats**
**MAF format** multiple alignment format file. This format stores multiple alignments at the DNA level between entire genomes.
- The .maf format is line-oriented. Each multiple alignment ends with a blank line.
- Each sequence in an alignment is on a single line.
- Lines starting with # are considered to be comments.
- Each multiple alignment is in a separate paragraph that begins with an "a" line and contains an "s" line for each sequence in the multiple alignment.
- Some MAF files may contain two optional line types:
- An "i" line containing information about what is in the aligned species DNA before and after the immediately preceding "s" line;
- An "e" line containing information about the size of the gap between the alignments that span the current block.
@HELP_CITATIONS@
</help>
<expand macro="citations" />
</tool>