Content - 35af31621e18abcd3fa690ad6deed530e5c114be - 7fd4943/README.md

visit type:
Tip revision: 39a091d1fa569a7fc717ac73c4b3de07f0a1204d authored by fjruizruano on 03 August 2023, 11:48:27 UTC
adding gfa2fas.py and extract_gfa.py
Tip revision: 39a091d
README.md
ngs-protocols
=============

####Scripts to run several protocols to processing and analyzing of Next-Generation Sequencing data

* FastA.split.pl: Split FASTA files in several subfiles.
* FastQ.split.pl: Split FASTQ files in several subfiles.
* alignment_copy_paste.py: Cut alignment from the left until a position and paste to the right. 
* annot_to_rexp.py: Annotate RepeatExplorer's contigs following a list.
* bam_consensus.py: Get majority consensus sequences for BAM files.
* bam_coverage_join.py: Generate a table of coverage along the contigs in several BAM files.
* bam_var_join.py: Generate a variation table using several BAM files.
* bg_count.py: Generate a table with nucleotide counts from BAM files.
* blat_recursive.py: Parallelize a BLAT run in several threads.
* blat_recursive_hard.py: Parallelize a BLAT run in several threads with hard options.
* bowtie2_recursive.py: Map using Bowtie2 with several libraries consecutively.
* bwa_protocol.py: Map using BWA in multiple libraries.
* bwa_mem_protocol.py: Map using BWA-MEM in multiple libraries.
* cd_hit_filter_size.py: Filter out sequences from small CD-HIT clusters.
* count_acgtn.py: Count number of A, C, G, T and N in a multifasta file.
* count_bases_fastq.py: Count number of nucleotides in one o several FASTQ(.GZ) files.
* count_kmer.py: Count occurrences from a list of kmers using Jellyfish.
* count_reads_bam.py: Generate a table with mapped reads counts in several BAM files.
* coverage_graphics.py: Generate graphics using the ouput from bam_coverage_join.py and a samples file.
* coverage_graphics_coord.py: A complex version of coverage_graphics.py.
* coverage_seq_bed.py: Count number of mapped nucleotides per reference sequence in BED files.
* coverage_window.py: Count number of mapped nucleotides in a sliding window of defined size.
* cut_seq_unequal.py: Trim sequences from a FASTA file in subsequence of the defined size.
* deconseq_run.py: Run DeconSeq automatically and with several threads.
* dimerator.py: Convert a monomer fasta file in a dimer fasta file.
* divnuc_bam.py: Calculate nucleotide diversity per site from BAM files.
* divnuc_plot.py: Calculate nucleotide diversity per window from the output of divnuc_bam.py.
* divsum_ab.py: Used with satminer quantification.
* divsum_count.py: Count the number of nucleotides per elements in a RepeatMasker's divsum file.
* divsum_stats.py: Generates interesting stats from repeat landscapes from a list of divsum files.
* divsum_to_rl.py: Generates satDNA repeat lanscapes using satMiner's criteria.
* dnapipete_createdb.py: Generate a database compatible with RepeatMasker from the dnaPipeTe
* extract_gfa.py: Extract sequences based on their kmer content
* extract_member_reads_rexp.py: Extract reads in a specific cluster of RepeatExplorer.
* extract_no_seq.py: Extract sequecences from a FASTA file absent in a list.
* extract_reads_blat.py: Extract matching reads in a PSL output from BLAT.
* extract_reads_rm.py: Extract matching reads in a OUT output from RepeatMasker.
* extract_regions_bam.py: Extract reads from a BAM only in the indicated regions.
* extract_seq.py: Extract sequences from a FASTA file present in a list conserving the order.
* extract_seq_regions.py: Extract specific regions of sequences from a FASTA file present in a list conserving the order.
* fasta_filter_by_length.py: Filter out sequences from a FASTA file with a size lower than a thereshold.
* fasta_sequence_len.py: Generate a table with the length of each sequence in a FASTA file.
* fastq-combine-pe.py: Extract reads paired reads by ID from two FASTQ files.
* fastq-pe-random.py: Random selection of paired reads from two FASTQ files.
* fastq_edit_ids.py: Edit the ID from FASTQ files to end with the format "@ID/1".
* fastq_edit_ids_sra.py: Edit the ID from FASTQ files to end with the format "@ID/1" from SRA files.
* fastq_paired_combine_id: Extract paired reads looking at its ids.
* find_exclusive_kmers.py: Extract exclusive kmers of a library in comparison with other using Jellyfish.
* gatk_protocol.py: Run GATK in a list of FASTQ files with the same reference.
* get coordinates.py: Get coordinate file for coverage_graphics.py.
* get_no_blat.py: Extract sequences from a FASTA file absent in a PSL output of BLAT.
* get_shared_pw_shared_sunks.py: Get shared SUNKS for each pair combination in a set of reads.
* gfa2fas.py: convert from GFA to FASTA.
* gff_creator.py: Generate a GFF file for htseq-count from a FASTA file.
* id_rmasker.py: Edit IDs from a FASTA file with a format compatible with RepeatMasker.
* id_rmasker_rexp.py: Edit IDs from a FASTA file of RepeatExplorer contigs compatible with RepeatMasker.
* join_multiple_lists.py: Join the results of two or more lists.
* join_multiple_lists_var.py: Join the results of two or more lists for bam_var_join.py.
* join_rm_list.py: Join two files with RepatMasker nucleotide counts.
* kimura_window.py: Calculate kimura divergence per window using the RepeatMasker's script.
* kmer_to_fasta.py: Generate a FASTA file from a list of kmers.
* longranger_prepare_reference.py: Prepare FASTA reference for longranger.
* mapping_blat_gs.py: Extract matching reads with BLAT and optionally launch Newbler, RepeatMasker or SSAHA2.
* mapping_blat_gs_hard.py: Extract matching reads with hard options of BLAT and optionally launch Newbler, RepeatMasker or SSAHA2.
* mapping_blat_gs_saver.py: Version of mapping_blat_gs.py for big libraries.
* mapping_blat_gs_single_end.py: Version of mapping_blat_gs.py for single-end libraries.
* mapping_blat_gs_nonormal.py: Version of mapping_blat_gs for read with no normal header.
* massive_phylogeny.py: Using an only FASTA file and gene list, it runs RAxML for each gene. 
* massive_phylogenies_figure.py: Generate pdf phylogenies using a list of Newick files. 
* massive_phylogeny_raxml_support.py: Support script for massive_phylogeny.py.
* mitobim_run.py: Run MITObim with several protocols.
* mreps_extract.py: Generate a FASTA file with tandem sequences using a MREPS output.
* novoplasty_run.py: Run Novoplasty recursively.
* peru_protocol.py: Protocol to estimate number of external repeat_units in satellite DNA sequences.
* raxml_protocol.py: RAxML protocol.
* reduce_bam.py: Filter out unmapped paired reads from a BAM file.
* remove_ns.py: Remove reads with Ns after a masking.
* replace_patterns: Replace elements in a file.
* repeat_landscape_decimal.py: Generates a repeat landscape table with divergence values adjusted to one decimal (0.1%) from an ALIGN file. 
* repeat_landscape_decimal_050.py: Generates a repeat landscape table with divergence values adjusted to 0.5% from an ALIGN file. 
* repeat_masker_run.py: Run RepeatMasker alignment for small FASTA files.
* repeat_masker_run_big.py: Run RepeatMasker alignment for several big FASTA files.
* rexp_get_cluster.py: Get FASTA file concatenating all the contigs assembled with RepeatExplorer.
* rexp_prepare.py: Generate a FASTA file ready for RepeatExplorer from two FASTQ files.
* rexp_prepare_deconseq: Generate a FASTa file ready for RepeatExplorer from two FASTQ files filtered with DeconSeq.
* rexp_prepare_normaltag: Generate a FASTa file ready for RepeatExplorer from two FASTQ with normal tag (ids ended in /1 or /2).
* rexp_select_contigs: Select most coveraged contigs in a RepeatExplorer's output.
* rm_clas_seq.py: Classify reads aligning or not using a RepeatMasker's output.
* rm_clas_seq_names: Classify reads coinciding with a annotation and aligning or not using a RepeatMasker's output.
* rm_cluster_external.py: Select no homologous reads, group them per annotation of its read pair and clusterize them.
* rm_getseq.py: Extract sequences of the matching regions in a RepeatMasker's output.
* rm_getseq_annot.py: Extract sequences of the matching regions in a RepeatMasker's output and annotate the sequences of the FASTA.
* rm_getseq_split.py: Extract sequences of the matching regions in a RepeatMasker's output annotate and split the sequences in differente FASTAs.
* rm_getseq_stats.py: Extract sequences of the matching regions in a RepeatMasker's output and generate stats.
* rm_join_out.py: Concantenate OUT files from several RepeatMasker's run.
* rm_join_tbl.py: Join TBL files from several RepeatMaseker's run.
* rm_homology.py: Find homologies searching with RepeatMasker sequence by sequence.
* run_abyss.py: Run ABySS assembler with a range of kmers.
* sat_cross_libraries.py: Generate FASTA files to assembly satellites with RepeatExplorer.
* sat_cutter.py: Cut satellites in a FASTA alignment to align homologous regions.
* sat_subfam2fam.py: Edit ALIGN file from RepatMasker to calc Kimura divergence by family instead of subfamily.
* satminer_quant.py: satminer quantification protocol.
* search_issr_1nt.py: Count the number of occurrences for each nucleotide before a SRR region to desing ISRR primers.
* search_issr_2nt.py: Count the number of occurrences for each dinucleotide before a SRR region to desing ISRR primers.
* sequence_ref_alt.py: Get sequences with REF and ALT variants after a SNP calling.
* snp_calling_bchr.py: SNP calling for B chromosomes.
* snp_calling_bchr_z10.py: SNP calling for B chromosomes. Alt<10 in ZB.
* snp_calling_dn_ds: Perform a SNP calling to calculate the dn/dS from a BAM file.
* split_illumina.py: Split FASTQ files from Illumina sequencing in several files.
* sra_download.py: Download SRA files using a list of SRA's accesion numbers.
* ssaha2_run.py: Run SSAHA2 mapping in several libraries.
* ssaha2_run_multi.py: Run SSAHA2 mapping for several big libraries and parallized in different threads.
* ssaha2_run_multi_pe_se.py: Run SSAHA2 mapping for several big libraries and parallized in different threads with paired and unpaired reads.
* ssaha2_run_multi_se.py: Run SSAHA2 mapping for several big libraries and parallized in different threads using single-end libraries.
* stampy_protocol.py: Run Stampy mapping.
* subsampler.py: Subsample sequences from FASTA and FASTQ files.
* taxonomy_retrieve.py: Retrieve taxonomy using a Species list.
* trinity_extract_longest.py: Extract the longest contig for each gene in a Trinity assembly.
* trinotate_auto.py: Run Trinotate.
* unshuffle.py: Unshuffle a list of FASTQ files in _1.fastq and _2.fastq.
Browse the archive

https://github.com/fjruizruano/ngs-protocols