4ac2b4b | Erik Garrison | 13 March 2016, 10:57:11 UTC | update xg to try to work around OSX build failure | 13 March 2016, 10:57:11 UTC |
637e5a4 | Erik Garrison | 12 March 2016, 20:48:11 UTC | fix xg.o build | 12 March 2016, 20:48:11 UTC |
d11a193 | Erik Garrison | 12 March 2016, 15:42:49 UTC | update xg to version which allocates gPBWT dynamic int vector on heap This resolves segfaults with xg, and hopefully does here as well. | 12 March 2016, 15:42:49 UTC |
ec84137 | Erik Garrison | 11 March 2016, 17:25:33 UTC | updates paper | 11 March 2016, 17:25:33 UTC |
11ad1ec | Erik Garrison | 11 March 2016, 17:14:34 UTC | add paper draft | 11 March 2016, 17:14:34 UTC |
db48279 | Erik Garrison | 11 March 2016, 14:34:51 UTC | move xg to point to vgteam | 11 March 2016, 14:34:51 UTC |
862bcb9 | Erik Garrison | 11 March 2016, 14:19:04 UTC | adds DYNAMIC submodule to deps | 11 March 2016, 14:19:04 UTC |
6cf2864 | Erik Garrison | 11 March 2016, 11:15:17 UTC | Merge pull request #253 from JervenBolleman/raptorinmake Move raptor deps into getdeps in make file for ubuntu builds | 11 March 2016, 11:15:17 UTC |
b5f0dd7 | Erik Garrison | 10 March 2016, 13:19:09 UTC | Merge branch 'master' of https://github.com/ekg/vg | 10 March 2016, 13:19:09 UTC |
049d652 | Jerven Bolleman | 10 March 2016, 12:45:42 UTC | Move raptor deps into getdeps in make file | 10 March 2016, 12:45:42 UTC |
d265694 | Erik Garrison | 10 March 2016, 11:59:01 UTC | Merge pull request #249 from adamnovak/vcf-paths Vcf paths | 10 March 2016, 11:59:01 UTC |
47c121c | Erik Garrison | 10 March 2016, 11:43:54 UTC | redland-utils* | 10 March 2016, 11:43:54 UTC |
db4977a | Erik Garrison | 10 March 2016, 11:37:18 UTC | fixed typo (librdf-dev*) | 10 March 2016, 11:37:18 UTC |
c059040 | Erik Garrison | 10 March 2016, 11:22:36 UTC | Merge branch 'master' of https://github.com/ekg/vg | 10 March 2016, 11:22:36 UTC |
8069f20 | Adam Novak | 09 March 2016, 18:21:49 UTC | Don't use size() when we can use empty() | 09 March 2016, 18:55:58 UTC |
8eb57dc | Adam Novak | 08 March 2016, 01:02:51 UTC | Remove some validation calls that don't do what I thought they did | 09 March 2016, 18:55:56 UTC |
9d29fde | Adam Novak | 08 March 2016, 01:00:26 UTC | Only keep the non-phase paths in the graph; otherwise we can make really large protobuf messages for graph chunks that we can't read back in | 09 March 2016, 18:55:54 UTC |
1e9036e | Adam Novak | 08 March 2016, 00:24:54 UTC | Don't place multiple mappings for a phase path on a node; this compensates for overlapping ref alleles which may all be visited. Contradictory calls will still produce contradictory phase paths | 09 March 2016, 18:55:50 UTC |
ec2520a | Adam Novak | 07 March 2016, 23:26:23 UTC | Make phase path splitting acceptably fast | 09 March 2016, 18:55:48 UTC |
cfc97d0 | Adam Novak | 19 November 2015, 20:58:23 UTC | Revise path rank handling Now instead of path_mapping_order storing an order number for each mapping, we just use the rank field on the mapping. We also try and set reasonable ranks when mappings are appended, and don't allow duplicate rank values to slip past when recalculating the ranks based on the actual list ordering. We also add a mappings_by_rank map that can get the mapping with a given path name and rank efficiently. This eliminates the O(n^2) factor when adding many mappings to the same node to a path, as we only have to check if a mapping with the same rank has already been added (by an efficient lookup in mappings_by_rank), rather than having to scan the set of all mappings to that node for one that matches in rank and orientation. The Paths::has_mapping method has been changed so you can no longer pass in a Mapping&. It may need more overloads, but the semantics of querying by a Mapping reference weren't entirely clear (for example, the actual edits were never checked), so I've decided to switch over to overloads that query on particular combinations of Mapping properties instead. This commit includes some fixes to make it work correctly with the MSGA path inclusion, which the original commit on the bake-off branch did not have. | 09 March 2016, 18:55:45 UTC |
7f2468b | Adam Novak | 07 March 2016, 20:18:42 UTC | Run through phase paths and break them where some material was not visited | 09 March 2016, 18:55:43 UTC |
7d0ba06 | Adam Novak | 07 March 2016, 19:37:50 UTC | Don't break on a reference-only node if it isn't visited | 09 March 2016, 18:55:40 UTC |
9158514 | Adam Novak | 07 March 2016, 18:50:18 UTC | Allow middle and alt nodes to be runs of nodes to properly handle ref alleles of different lengths all needing to be chopped out starting at the same position | 09 March 2016, 18:55:38 UTC |
c6eb6b5 | Adam Novak | 05 March 2016, 00:18:31 UTC | Found a bug relating to breaking out ref alleles when the ref allele may overlap with another broken out ref allele | 09 March 2016, 18:55:36 UTC |
b7b422d | Adam Novak | 04 March 2016, 23:33:35 UTC | Handle missing intervening sequence and keep paths around | 09 March 2016, 18:55:34 UTC |
04587f5 | Adam Novak | 04 March 2016, 22:31:26 UTC | Can now go through normal from_alleles process with ref alleles alternative to alts | 09 March 2016, 18:55:30 UTC |
d1a1e70 | Adam Novak | 04 March 2016, 20:53:36 UTC | Bringing phase visits through into plans | 09 March 2016, 18:55:28 UTC |
1c13a5d | Adam Novak | 04 March 2016, 20:31:40 UTC | Don't use new if we can help it | 09 March 2016, 18:55:26 UTC |
5cec7b9 | Adam Novak | 04 March 2016, 20:25:54 UTC | Replace pointer in the Plans with a move operation | 09 March 2016, 18:55:24 UTC |
d3ee974 | Adam Novak | 04 March 2016, 20:16:41 UTC | Fix bit vector merge-in | 09 March 2016, 18:55:21 UTC |
1569bd2 | Adam Novak | 04 March 2016, 00:43:08 UTC | Still doesn't work, but no longer breaks when not running the sample phasing code | 09 March 2016, 18:55:19 UTC |
ac20113 | Adam Novak | 03 March 2016, 00:43:35 UTC | Added flag-based encodign of phase sets; doesn't work | 09 March 2016, 18:55:16 UTC |
cf4a96a | Adam Novak | 02 March 2016, 23:41:49 UTC | Throw out old phase block representation | 09 March 2016, 18:55:14 UTC |
3580f4f | Adam Novak | 02 March 2016, 23:36:43 UTC | Save what I have; it looks like treating phase blocks as lists or vectors of allele indexes won't work. | 09 March 2016, 18:55:11 UTC |
f73f0a7 | Adam Novak | 02 March 2016, 22:55:37 UTC | Don't bother calling slice_alleles at all; just make big nodes and dice them | 09 March 2016, 18:55:09 UTC |
90d3c14 | Adam Novak | 02 March 2016, 22:52:41 UTC | Document what is up with slice_alleles | 09 March 2016, 18:55:07 UTC |
9b2e78c | Adam Novak | 02 March 2016, 22:09:54 UTC | Switch to vectors instead of sets to hold variants since we want to point to them. | 09 March 2016, 18:55:04 UTC |
64a23f8 | Adam Novak | 02 March 2016, 21:54:56 UTC | Implement semi-continuous variant parsing, and add unimplemented phase block path generation option | 09 March 2016, 18:55:02 UTC |
8547acf | Erik Garrison | 09 March 2016, 17:18:17 UTC | debug-only exhaustive alignment checking This calls some single-threaded code in xg which is thread-blocking. That's a pain. We really only need it when debugging. A proper fix is forthcoming. | 09 March 2016, 17:18:17 UTC |
558aa83 | Erik Garrison | 09 March 2016, 16:47:25 UTC | add "greedy accept" mode to exact match mapper | 09 March 2016, 16:47:25 UTC |
f4f91f2 | Erik Garrison | 09 March 2016, 16:21:08 UTC | avoid picking up the paths from the xg index in the mapper This blocks due to poor design of xg path handling (there is a single threaded compressed suffix array that's used to store the path names, which should be changed). In any case, we can avoid getting the paths by specifying the call to expand_context correctly, which resolves the basic issue of thread blocking. | 09 March 2016, 16:21:08 UTC |
9a53c51 | Erik Garrison | 09 March 2016, 15:18:09 UTC | incomplete development of an index-driven mapping algorithm In principle, we should be able to use the series of exact matches provided by the forward MEM method to determine likely alignment positions. We shouldn't need to align these segments using GSSW, because we already know they match. Here, I've attempted to develop a method that povides a single "optimal" alignment using the forward MEM list for a given read. We convert the exact matches into alignments. Then, the regions between the forward-MEMs get aligned in the regions of the graph which are near the exact matches that flank them. The resulting vector of multi-alignments can be resolved into a single alignment by passing over it with resolve_banded_multi. This is all nice, but it does not seem to work well in the presence of high error rates, which trigger many spurious forward MEM matches. I will come back to this when we have true MEM functionality in GCSA2. | 09 March 2016, 15:18:09 UTC |
ecb8a39 | Erik Garrison | 07 March 2016, 17:26:47 UTC | stub functions for converting exact matches into alignment fragments | 07 March 2016, 17:26:47 UTC |
64d2bc8 | Erik Garrison | 06 March 2016, 10:10:09 UTC | removes corruption-inducing flip of node reassingment in simplify to sibs | 06 March 2016, 10:10:09 UTC |
675b077 | Erik Garrison | 05 March 2016, 10:51:11 UTC | Merge pull request #248 from JervenBolleman/245_uri_escape url encode path names, which can have pipes etc... #245 | 05 March 2016, 10:51:11 UTC |
7a405eb | Jerven Bolleman | 04 March 2016, 09:45:22 UTC | url encode path names, which can have pipes etc... #245 | 04 March 2016, 09:45:22 UTC |
410985b | Erik Garrison | 03 March 2016, 17:13:33 UTC | don't touch the dead mapping! Getting the is_reverse flag from the dead mapping seems to work almost all the time, but electric fence doesn't like it one bit. | 03 March 2016, 17:13:33 UTC |
eb2551b | Erik Garrison | 02 March 2016, 17:07:54 UTC | try to ensure that we don't corrupt paths by re-compacting ranks after removing null nodes | 02 March 2016, 17:07:54 UTC |
36d8cdf | Erik Garrison | 02 March 2016, 13:03:38 UTC | Merge pull request #244 from JervenBolleman/220_import_turtle.3 220 import turtle.3 | 02 March 2016, 13:03:38 UTC |
77533ea | Jerven Bolleman | 02 March 2016, 11:03:14 UTC | brew install rasqal | 02 March 2016, 11:03:14 UTC |
d4c940c | Jerven Bolleman | 02 March 2016, 08:47:28 UTC | Add sparql test case #220 | 02 March 2016, 09:35:27 UTC |
65f57ec | Jerven Bolleman | 01 March 2016, 13:40:27 UTC | More tests for the RDF including roundtrip | 02 March 2016, 09:35:27 UTC |
311825f | Jerven Bolleman | 24 February 2016, 15:08:10 UTC | Reuse -C option to make turtle output smaller | 02 March 2016, 09:35:27 UTC |
e085113 | Jerven Bolleman | 24 February 2016, 08:32:52 UTC | Clean up rdf parsing #220 Memory leak in the way edges where instantiated in the from_turtle parser view option code alligned for readability, also turtle in selecion aligned Added inline comments and improved error messages on RDF parse. #220 Reduce code dupplication in from_turtle method | 02 March 2016, 09:35:26 UTC |
853ae90 | Jerven Bolleman | 23 February 2016, 02:59:32 UTC | semi colon missing in brew install line Trying to fix travis build raptor2.h issues #220 Fix raptor include dir Fix missing space #220 Fix #220 | 02 March 2016, 09:35:26 UTC |
dea365a | Jerven Bolleman | 23 February 2016, 03:03:34 UTC | Add roundtrip test #220 | 02 March 2016, 09:35:26 UTC |
59cd940 | Jerven Bolleman | 21 February 2016, 16:24:23 UTC | Able to convert raptor_term to std::string #220 Managing to be able to talk to VG compile wise in the turtle_parse #220 We have a good start on the turlte parser #220 Nodes written into RDF are parsed out again #220 Sequences attached to nodes parsed out and saved #220 Parsing and adding edges #220 Working on matching the paths into the vg graph. #220 however, mapping not constructed correct. Working rdf import #220, assumes rdf was generated by vg in the first place. | 02 March 2016, 09:35:26 UTC |
1b50dfa | Jerven Bolleman | 21 February 2016, 16:55:51 UTC | Try to set the include path to find raptor #220 | 02 March 2016, 09:35:26 UTC |
406c442 | Jerven Bolleman | 21 February 2016, 16:05:02 UTC | Do not put a hash in a prefixed uri, escape the hash symbol | 02 March 2016, 09:35:25 UTC |
1fb3c5f | Jerven Bolleman | 21 February 2016, 15:50:23 UTC | Redland parsing works #220 | 02 March 2016, 09:35:25 UTC |
ea0b601 | Jerven Bolleman | 21 February 2016, 14:05:59 UTC | librdf-dev is needed for compile on linux #220 | 02 March 2016, 09:35:25 UTC |
1114b43 | Jerven Bolleman | 21 February 2016, 14:03:20 UTC | Allow input from stdin for turtle parsing without crashing #220 | 02 March 2016, 09:35:25 UTC |
3598e30 | Jerven Bolleman | 21 February 2016, 13:49:05 UTC | rapper integration compiles #220 | 02 March 2016, 09:35:24 UTC |
d0e9195 | Jerven Bolleman | 21 February 2016, 09:30:18 UTC | #220 Start parsing CLI options to turn on N-Triples parsing | 02 March 2016, 09:35:24 UTC |
edd4fa4 | Erik Garrison | 02 March 2016, 09:16:54 UTC | Merge pull request #243 from JervenBolleman/only_trusty Build on trusty | 02 March 2016, 09:16:54 UTC |
e67c7c8 | Jerven Bolleman | 02 March 2016, 08:09:30 UTC | Build on trusty | 02 March 2016, 08:09:30 UTC |
fe432eb | Erik Garrison | 28 February 2016, 08:21:04 UTC | rebuild mapping ranks to avoid invalid memory access and undefined behavior | 28 February 2016, 08:21:04 UTC |
6c88c26 | Erik Garrison | 27 February 2016, 18:11:18 UTC | hoping that the OSX test failed because of a grep incompatibility | 27 February 2016, 18:11:18 UTC |
e952184 | Erik Garrison | 27 February 2016, 16:56:51 UTC | switch back node forwarding to previous model, resolve bugs in tests | 27 February 2016, 16:56:51 UTC |
d94f496 | Erik Garrison | 27 February 2016, 15:57:51 UTC | avoid attempting to resolve softclips on unmapped reads | 27 February 2016, 15:57:51 UTC |
fe7968a | Erik Garrison | 27 February 2016, 15:41:54 UTC | sorting the graph will do some weird things Rather, avoid cycles by checking if we've linked to the node before at the current level in the unroll. | 27 February 2016, 15:41:54 UTC |
8ed0180 | Erik Garrison | 27 February 2016, 14:21:46 UTC | use greedy cycle breaking to decrease the steps required to dagify/unroll This also resolves a problem with self linking nodes I ran into. Test included. | 27 February 2016, 14:21:46 UTC |
4a577bf | Erik Garrison | 27 February 2016, 11:51:33 UTC | hook up MEM mapping in msga This works. But, it has exposed many bugs due to the change in the way the alignments work. The MEM approach appears to have a higher intrinsic mapping rate, and as a result we are getting new kinds of weird structures in the graph that break normalization routines. To resolve this I then restructured the remove_node_forwarding_edges code to use use NodeTraversals to enumerate the edges to forward. I worry that there was not a bug here, and I haven't run tests to see if this is introducing a problem, so it may be reverted. I'm checkpointing here because the rabbit hole of bugs has just gone three levels deep and it's getting a little too real to continue without rigging up some climing gear to get out if it gets even more intense. In rewriting remove_node_forwarding_edges, I generated graphs with many self loops on nodes. These are not handled properly in the unfold/dagify unrolling, it seems; will check. There is a lot of debugging code floating around now because I'm not sure which bugs have been resolved yet. | 27 February 2016, 11:51:33 UTC |
9bf2861 | Erik Garrison | 25 February 2016, 17:42:04 UTC | add a single test for mem mapping In the future I hope we move to it everywhere, but for now it's just a new thing. | 25 February 2016, 17:42:04 UTC |
eaa22e4 | Erik Garrison | 25 February 2016, 17:36:28 UTC | Merge pull request #226 from buske/patch-1 Homebrew README improvements | 25 February 2016, 17:36:28 UTC |
c385c7c | Erik Garrison | 25 February 2016, 17:35:39 UTC | set the kmer size when we are using align_kmers on a rocksdb | 25 February 2016, 17:35:39 UTC |
fffb018 | Erik Garrison | 25 February 2016, 17:12:07 UTC | wire up MEM mapper, resolve MEM resolution problem If we exhaust the range, we need to step back to the last range, which wasn't being done with the result being a failure to get anything but the first MEM in the read. Also wires MEM mapping into the CLI. For good performance, the neighborhood expansion parameter to the mapper needs to be increased. I'm finding `-n 5` to be good. The problem is that we get many soft clips or many subgraphs if we don't manage to expand the context out the the full subgraph between successive MEMs that are separated by a mismatch. | 25 February 2016, 17:12:07 UTC |
5dca235 | Erik Garrison | 25 February 2016, 14:29:20 UTC | avoid corruption when constructing mem sequence | 25 February 2016, 14:29:20 UTC |
4bb9aa2 | Erik Garrison | 25 February 2016, 13:53:06 UTC | avoid path corruption in path extension Don't overwrite the last element of one path while we're trying to figure out how to add onto it. | 25 February 2016, 13:53:06 UTC |
d0732f0 | Erik Garrison | 24 February 2016, 14:33:07 UTC | wires mem mapper up to CLI; use by not giving a kmer and passing gcsa index | 24 February 2016, 14:33:07 UTC |
8c57498 | Erik Garrison | 24 February 2016, 13:59:39 UTC | remove unused mapping function (align_simple) | 24 February 2016, 13:59:39 UTC |
c48fbd4 | Erik Garrison | 24 February 2016, 13:56:42 UTC | implementation of mem mapper | 24 February 2016, 13:56:42 UTC |
b9d5685 | Erik Garrison | 24 February 2016, 12:09:08 UTC | ensure that alignments are flipped in sim | 24 February 2016, 12:09:08 UTC |
25bf083 | Erik Garrison | 24 February 2016, 12:00:49 UTC | provide CLI API for MEMs in vg find + test | 24 February 2016, 12:00:49 UTC |
1bd21c3 | Erik Garrison | 24 February 2016, 10:53:39 UTC | remove unused variable from mapper main | 24 February 2016, 10:53:39 UTC |
793a62e | Erik Garrison | 24 February 2016, 10:52:44 UTC | add MEM interface, stub alignment function Untested, but it looks cool. | 24 February 2016, 10:52:44 UTC |
f3ae844 | Erik Garrison | 24 February 2016, 09:29:24 UTC | serial alignment merge with less copying The parallel merge appears to become disordered with high parallelism. Additionally, it saves little time due to the huge number of alignment copies which must be made. In the interest of simplifying things and resolving this issue, I've moved the merge back to a serial process that builds up a single alignment. | 24 February 2016, 09:29:24 UTC |
84e7fbd | Erik Garrison | 24 February 2016, 09:29:08 UTC | make a new mars based viz in vg2pdf | 24 February 2016, 09:29:08 UTC |
3222c52 | Erik Garrison | 24 February 2016, 09:28:42 UTC | remove excessive checks for graph validity in msga | 24 February 2016, 09:28:42 UTC |
77c2a89 | Erik Garrison | 23 February 2016, 10:14:31 UTC | Merge pull request #231 from buske/issue-230 Issue #230: input format should always default to VG | 23 February 2016, 10:14:31 UTC |
4cebf34 | Orion Buske | 23 February 2016, 09:30:14 UTC | [test] Update tests for issue #230 | 23 February 2016, 09:30:14 UTC |
f8daaa8 | Orion Buske | 23 February 2016, 09:29:17 UTC | Issue #230: make input always default to VG | 23 February 2016, 09:29:17 UTC |
bbc2b93 | Erik Garrison | 23 February 2016, 09:04:51 UTC | vgteam FTW! | 23 February 2016, 09:04:51 UTC |
c22842b | Erik Garrison | 23 February 2016, 06:58:41 UTC | adds handy neato2browser script | 23 February 2016, 06:58:41 UTC |
dcdd7fd | Erik Garrison | 23 February 2016, 04:08:30 UTC | actually complete the merge of DAGify | 23 February 2016, 04:08:30 UTC |
ccf57ee | Erik Garrison | 23 February 2016, 03:15:08 UTC | Merge branch 'dagify' | 23 February 2016, 03:15:08 UTC |
217cfe4 | Erik Garrison | 23 February 2016, 03:14:31 UTC | Merge branch 'master' into dagify | 23 February 2016, 03:14:31 UTC |
e830424 | Erik Garrison | 23 February 2016, 03:11:28 UTC | Merge branch 'master' of github.com:ekg/vg | 23 February 2016, 03:11:28 UTC |
fc88ab1 | Erik Garrison | 23 February 2016, 02:52:38 UTC | Merge branch 'steponforwardinpredicate' of https://github.com/JervenBolleman/vg | 23 February 2016, 02:52:38 UTC |
c6ac02e | Erik Garrison | 23 February 2016, 02:41:28 UTC | dagify: count the minimum length from the first edge after the root Without this we fail to contain all the paths in the graph less than our target length in the dagifid version. | 23 February 2016, 02:41:28 UTC |