https://github.com/kobe-liudong/miPlantPreMat
Raw File
Tip revision: f0297a96c558c1987dc59962e2bbbe58e4d1c97c authored by kobe-liudong on 29 January 2015, 10:24:03 UTC
update
Tip revision: f0297a9
README
miPlantPreMat Version 1.0
------------------------
miPlantPreMat program can be used to classify plant pre-miRNAs and their miRNAs from pseudo hairpins and other ncRNAs.
---------------------------------------------------------------------------------------------
Authors: Jun Meng and Chao Sun, Computing Laboratory, Dalian University of Technology.
---------------------------------------------------------------------------------------------

System requirements:
---------------------
- The current version of microPred predictor has been developed for linux platform (Ubuntu).
- Perl and java VM should be pre-installed.


How to excute the miPlantPreMat:
-----------------------------
1. Copy the input file containing the sequences to be classified into the 'miPlantPreMat/data' directory.
   The input file should contain the sequences in fasta format. Please do not keep empty lines in between the sequences 
   or after the last sequence of the input file. Otherwise, this will cause problems when running the RNAfold program (Hofacker, 2003) 
   used in the miPlantPreMat package. (For an correctly formatted input fasta file see miPlantPreMat/data/seq.fasta). 
 
2. Execute the perl scipts 'miPlantPreMat.pl' which is in the 'miPlantPreMat/progs' directory as follows:
   miPlantPreMat/progs> perl miPlantPreMat.pl [input_file]
   Ex: miPlantPreMat/progs>perl miPlantPreMat.pl seq.fasta 

   miPlantPreMat.pl calls the corresponding scripts for the following tasks:
    - to extract all the features.
    - to isolate the best features.
    - to format the features into SVM input format.
    - to scale the feature values.
    - to predict the sequences either as pre-miRNAs or not by using the best SVM model developed in our research.
    - to predict the sequences either as miRNAs or not by using the best SVM model developed in our research.
      As the SVM predictor the LIBSVM(Chang and Lin, 2001) package is used.


Important:
---------------------------
- To calculate zG,zD,zQ,zP and zF features, 10 random sequence are generated for each original sequence following the method 
  described in (Loong and Mishra, 2007). Therefore, in order to generate these random sequences, fold them, and calculate 
  these z-features it might take a long time depending on the number of original sequences in the input file. 
  Estimated time for this operation for 10 sequences is about 30mins. Therefore, we recommend to use an input file containing 
  a fewer number of sequences (maximum of 100 sequences) in a single run.



Results:
-----------
- All the features extracted from the sequences will be written to 'miPlantPreMat/data/all.input_file.-152.features'.
  These features are in the order[dP,MFEI5,zP,dH,%AA,Tm,|A-U|/L,%CU,dS,zG,zF,MFEI3,C(((_S,G..._S,MFEI4,%CC,G..(_S,%GA,NEFE,%CG,Tm/L,MFEI2,%GU,C..._S,MFEI6,%UC,%AU,%GC,%G+C,G(((_S,dQ,dG,Diversity,G..._S_end,%AC,dD,Freq,%GG,%CA,MFEI1,A(.(_S_end,mis_num_begin,|G-C|/L,%(G-C)/n_stems,%(A-U)/n_stems,A(((_S,U..._S_end]. 
- The results of the predictions will be written to 'miPlantPreMat/SVMPre/input_file.predictions'.
  [Here +1 represent that the corresponding sequence has been predicted as a pre-miRNA candidate.
       -1 represent that the corresponding sequence has been predicted as a negative candidate.]
  These features are in the order[dD,MFEI8,%AA,dP,mis_num_begin,MFEI7,MFEI4,|G-C|/L,A..._S_end,G..._S_begin,MFEI6,U..._S,C(((_S_end,avg_mis_num,G(((_S_end,G(((_S,MFEI5,Tm,Tm/L,%CA,%UG,%UA,%GA,%UC,%UU,Diversity,MFEI9,dQ,mis_num_end,A(((_S_end,U(((_S_end,MFEI3,%(A-U)/n_stems,U..._S_end,C(.._S_end,A(.._S_end,C.((_S,G((._S,C((._S_end,G((._S_begin,NEFE,A(((_S,G(.(_S,U(.._S_end,A(.._S_begin,C..._S,C(((_S,C..(_S,G..(_S,C(((_S_begin,U.((_S_begin,G..._S,U..(_S_end,A..(_S_begin,A(.._S,G(((_S_begin,C((._S,G(.._S_end,U.(._S_end,zP,dH,|A-U|/L,%CU]. 
- The results of the predictions will be written to 'miPlantPreMat/SVMMat/input_file.predictions'.
  [Here +1 represent that the corresponding sequence has been predicted as a miRNA candidate.
       -1 represent that the corresponding sequence has been predicted as a negative candidate.]

----------------------------------------------------------------------------------------------------------------------------
         
Additional Details about the different scripts used to assemble the microPred program:
--------------------------------------------------------------------------
- 'miPlantPreMat/progs' directory contains all the scripts used to calculate the features and to format them.
    - 'miPlantPreMat/progs/miPred' directory contains the scripts for calculating miPred (Loong and Mishra, 2007)features. These scripts were written for 
      (Loong and Mishra, 2007)and available at http://web.bii.a-star.edu.sg/~stanley/Publications/Supp_materials/06-002-supp.html
    - 'miPlantPreMat/progs/parameters.pl' and 'miPlantPreMat/progs/parameters_mat.pl' scripts for calculating all the sequence's features(written by us).
    - 'miPlantPreMat/progs/combine.pl'  scripts for combine features(written by us).
    - 'miPlantPreMat/progs/selectattribute.pl' for filtering all the extracted fetures(written by us).
    - 'miPlantPreMat/progs/RNAfoldfilter.java' for filtering RNAfold-related features(Batuwita, R. and Palade, 2009).
    - 'miPlantPreMat/progs/melt.pl' for calculating the mfold-related features and taken from UNAfold package (Markham and Zuker, 2008).
    - 'miPlantPreMat/progs/Mfoldfilter.java' for filtering mfold-related features (Batuwita, R. and Palade, 2009).
    - 'miPlantPreMat/progs/filter_all.java' for filtering all the extracted fetures (Batuwita, R. and Palade, 2009). 
    - 'miPlantPreMat/progs/result.pl' give out the result of our experiment (written by us).
    
- 'miPlantPreMat/SVMPre' and 'miPlantPreMat/SVMMat' directory contains the scripts related to the SVM predictor.
    - 'svm-scale' for scaling the feature values into [-1,+1] (Chang and Lin, 2001).
    - 'svm-predict' for predicting sequences as pre-miRNAs or not (Chang and Lin, 2001).  

- 'miPlantPreMat/progs/miPlantPreMat.pl' script glues all these programs together as the main program	.


References:
------------
- Ping Xuan, Maozu Guo, V. (2011) PlantMiRNAPred ef?cient classi?cation of real and pseudo plant pre-miRNAs. Bioinformatics, 27, 1368-1376, 2011.
- Batuwita, R. and Palade, V. (2009) microPred: Effective classification of pre-miRNAs for human miRNA gene prediction. Bioinformatics, 25, 989-995, 2009.
- Loong,K. and Mishra,S. (2007). De nove SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures. 
  Bioinformatics, 23(11),1321-1330.
- Hofacker,I.L. (2003) Vienna RNA secondary structure server. Nucleic Acids Res., 31, 3429-3431.
- Markham, N. R. and Zuker, M. (2005) DINAMelt web server for nucleic acid melting prediction. Nucleic Acids Res., 33, W577-W581.
- Markham, N. R. and Zuker, M. (2008) UNAFold: software for nucleic acid folding and hybriziation.In Keith, J. M., editor, Bioinformatics,  
  Volume II. Structure, Functions and Applications, number 453 in Methods in Molecular Biology, chapter 1, pages 3?1. Humana Press, Totowa, NJ. ISBN 978-1-60327-428-9.
- Chang,C. and Lin,C.-J (2001) LIBSVM: a library for support vector machines. Available http://www.csie.ntu.edu.tw/~cjlin/libsvm.

   
  




back to top