https://bitbucket.org/eulerian-ext/cpm2022/
Raw File
Tip revision: d7eb1ac39fc78b1266f9ffddf7707c0d474c44c8 authored by huiping chen on 04 April 2022, 18:34:55 UTC
README.txt edited online with Bitbucket
Tip revision: d7eb1ac
README.txt
Source code for the paper 'Making de Bruijn Graphs Eulerian' 
by G. Bernardini, H.Chen,  G. Loukides, S. P. Pissis, L. Stougie and M.Sweering that is published at the CPM 2022

33rd Annual Symposium on Combinatorial Pattern Matching (CPM), 2022.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY.

COMPILATION AND EXAMPLE
The implementation of our approach that is used in experiments compiles and runs on a small example with ./compile.sh 

Before compiling, please install the Boost library and change the <path for Boost> in compile.sh to the loaction of your Boost.


INFORMATION ABOUT THE INPUT AND OUTPUT

1. CAB
Input parameters:
	toygraph: This is the input graph. In the file, each line represents an edge and its multiplicity in the graph, e.g. 'ACGG CGGT 10'.

Output:

Size of Distinct Edges = 1347
Total Edges = 1400
--------- Find #.Components ---------------
Number of vertices: 1319
Total number of components: 2
Runtime find all components: 20 milliseconds
------------get the information of graph-------------
#. Imbalanced Nodes  = 100
Runtime for create AC Machine : 2 milliseconds
Runtime for Phase 1 : 20 milliseconds
Cost for Component Connection :  2
Runtime Phase 2: 0 milliseconds
Runtime for all: 22 milliseconds
Extension Cost  = 251



2. SAB
Input parameters:
	toygraph: This is the input graph. In the file, each line represents an edge and its multiplicity in the graph, e.g. 'ACGG CGGT 10'.
	toygraph.superstring : This is a SCS string which is used to connect the graph. We use 'toygraph.fna' as input to the greedy algorithm [1] to get the 'toygraph.superstring' file, the algorithm can be found at https://github.com/tsnorri/compact-superstring.

Output:

Size of Distinct Edges = 1347
Total Edges = 1400
Runtime for create find edge to connect component : 1 milliseconds
Cost for connect component = 1
#. Imbalanced Nodes  = 100
Runtime for create AC Machine : 0 milliseconds
Runtime Phase 2: 0 milliseconds
Extension Cost  = 252



3. MGR
Input parameters:
	toygraph: This is the input graph. In the file, each line represents an edge and its multiplicity in the graph, e.g. 'ACGG CGGT 10'.
	overlap.txt : This is a file for all-pairs overlap of the distinct edges in the toygraph. We use 'edges.txt' as input to the algorithm [2] to get the 'overlap.txt' file, the algorithm can bu found at https://github.com/felipelouza/apsp.
	k-1 : This is the parameter of (k-1)-mer in the paper.

Output:
	
Size of Distinct Edges = 1347
Number of  Vertex = 1347
---------------Greedy--------------------
Final Cost  = 260

Time taken by Greedy: 1599 miilliseconds


REFERENCES


[1]  Jarno Alanko and Tuukka Norri. Greedy shortest common superstring approximation in compact space.In24th SPIRE, volume 10508 ofLecture Notes in Computer Science, pages 1–13. Springer, 2017.

[2]Tustumi, W. H. A., Gog, S., Telles, G. P., Louza, F.A. (2016). An improved algorithm for the all-pairs suffix-prefix problem. Journal of Discrete Algorithms, 47, 34-43, http://www.sciencedirect.com/science/article/pii/S1570866716300053.

back to top