https://github.com/galaxyproject/galaxy
Raw File
Tip revision: ce362e60abbafdffa65acaf68f9492cec7196a9d authored by Martin Cech on 23 May 2018, 18:44:54 UTC
Update version to 18.05
Tip revision: ce362e6
add_scores.xml
<tool id="hgv_add_scores" name="phyloP" version="1.0.0">
  <description>interspecies conservation scores</description>
  <requirements>
    <requirement type="package">add_scores</requirement>
  </requirements>
  <command>
    python '$__tool_directory__/add_scores.py' '$input1' '$out_file1' '${GALAXY_DATA_INDEX_DIR}/add_scores.loc' '${input1.metadata.dbkey}' '${input1.metadata.chromCol}' '${input1.metadata.startCol}'
  </command>

  <inputs>
    <param format="interval" name="input1" type="data" label="Dataset">
      <validator type="unspecified_build"/>
      <validator type="dataset_metadata_in_file" filename="add_scores.loc" metadata_name="dbkey" metadata_column="0" message="Data is currently not available for the specified build."/>
    </param>
  </inputs>

  <outputs>
    <data format_source="input1" name="out_file1" />
  </outputs>
  <tests>
    <test>
      <param name="input1" value="add_scores_input1.interval" ftype="interval" dbkey="hg18" />
      <output name="output" file="add_scores_output1.interval" />
    </test>
    <test>
      <param name="input1" value="add_scores_input2.bed" ftype="interval" dbkey="hg18" />
      <output name="output" file="add_scores_output2.interval" />
    </test>
  </tests>

  <help>
.. class:: warningmark

This currently works only for builds hg18 and hg19.

-----

**Dataset formats**

The input can be any interval_ format dataset.  The output is also in interval format.
(`Dataset missing?`_)

.. _interval: ${static_path}/formatHelp.html#interval
.. _Dataset missing?: ${static_path}/formatHelp.html

-----

**What it does**

This tool adds a column that measures interspecies conservation at each SNP
position, using conservation scores for primates pre-computed by the
phyloP program.  PhyloP performs an exact P-value computation under a
continuous Markov substitution model.

The chromosome and start position
are used to look up the scores, so if a larger interval is in the input,
only the score for the first nucleotide is returned.

-----

**Example**

- input file, with SNPs::

    chr22  16440426  14440427  C/T
    chr22  15494851  14494852  A/G
    chr22  14494911  14494912  A/T
    chr22  14550435  14550436  A/G
    chr22  14611956  14611957  G/T
    chr22  14612076  14612077  A/G
    chr22  14668537  14668538  C
    chr22  14668703  14668704  A/T
    chr22  14668775  14668776  G
    chr22  14680074  14680075  A/T
    etc.

- output file, showing conservation scores for primates::

    chr22  16440426  14440427  C/T  0.509
    chr22  15494851  14494852  A/G  0.427
    chr22  14494911  14494912  A/T  NA
    chr22  14550435  14550436  A/G  NA
    chr22  14611956  14611957  G/T  -2.142
    chr22  14612076  14612077  A/G  0.369
    chr22  14668537  14668538  C    0.419
    chr22  14668703  14668704  A/T  -1.462
    chr22  14668775  14668776  G    0.470
    chr22  14680074  14680075  A/T  0.303
    etc.

  "NA" means that the phyloP score was not available.
  </help>
  <citations>
    <citation type="doi">10.1007/11732990_17</citation>
  </citations>
</tool>
back to top