Branches - origin: https://github.com/brettc/partitionfinder

visit type:

https://github.com/brettc/partitionfinder

06 April 2024, 01:02:24 UTC

Name	Target	Message	Date
HEAD	4201050	remove download stats for reasons I don't understand, it doesn't work.	10 March 2021, 21:10:50 UTC
refs/heads/buffer2	c92c486	Don't use stderr in ExternalProgramError because it’s empty now.	19 April 2017, 09:13:03 UTC
refs/heads/combined-speedup	e0c2f2d	remove pre-made task lists no evidence that they speed things up from empirical tests	14 October 2016, 01:33:49 UTC
refs/heads/develop	47d2328	fix bug in rclusterf this bug meant that if the median change was zero, we got stuck in an infinite loop.	03 December 2015, 22:24:34 UTC
refs/heads/feature-krmeans	3378c1c	implement krmeans this is an idea for an algorithm in which the zero entropy sites are reassigned the entropy of their nearest physical non-zero entropy site in the alignment. Works fine so far.	05 May 2016, 06:29:00 UTC
refs/heads/feature/1kite-bugfix2	ceaef5d	fixed bug in rcluster This is really fixed now. The issue was that the previous bug fix wasn’t bulletproof. It left the door open for a second bug, in which a single subset had an identical improvement score with >1 other subset. The new fix addresses this bug, as well as making sure that the original bug is fixed.	25 August 2015, 00:14:20 UTC
refs/heads/feature/DBSCAN	7d0e1a2	proto_DBSCAN	19 August 2015, 13:10:55 UTC
refs/heads/feature/complete_alignments	5ba17e0	improved user output for kmeans	15 September 2015, 05:19:47 UTC
refs/heads/feature/fabricated_subsets	0eb9964	fixed fabricated subset dealings at the end of the kmeans algorithm	27 February 2015, 03:32:50 UTC
refs/heads/feature/fastercluster	e394e8e	add two spaces	13 November 2015, 02:46:55 UTC
refs/heads/feature/fasttree	1f628e8	added write_fasta alignment function * FastTree requires interleaved phylip or fasta alignments. It is probably easier to write a fasta alignment so this function does that.	02 September 2014, 15:29:48 UTC
refs/heads/feature/fix-tests-pf2	55493fb	add init to make tests run	26 February 2015, 06:45:24 UTC
refs/heads/feature/garli_output	f9836c0	Merge branch 'develop' into feature/garli_output	30 April 2013, 01:14:54 UTC
refs/heads/feature/greedy-speedy	57715e4	new version of greedy algorithm that borrows from the cluster algorithm, and is now a whole lot quicker and more efficient.	07 September 2015, 22:25:35 UTC
refs/heads/feature/importcheck	a0ea4df	some very minor changes	18 September 2016, 23:35:19 UTC
refs/heads/feature/iqtree	77a3347	first attempt at a whole bunch of IQtree model commandlines including R4-R8, R10, R12, R15, R20. will require some empirical tests to see which of the R’s are really needed. Ultimately, a progressive algorithm like that in IQtree (keep adding R cats until the AICc starts dropping) would be better.	21 March 2017, 00:00:52 UTC
refs/heads/feature/kmeans-manyparts	a4a5718	make RAxML fall back on standard raxml with one CPU preparation for making the ML tree the default option	25 July 2016, 22:54:19 UTC
refs/heads/feature/krmeans2	6f760ad	new krmeans algorithm the previous version was naive. I reassigned invariant sites at every step, which just got the algorithm stuck early on. This version waits until the end of the kmeans algorithm to reassign sites, which is a much better idea. It appears to work well (in terms of AICc scores) on empirical datasets.	12 May 2016, 07:58:51 UTC
refs/heads/feature/lie-markov	d9ae785	include category for lie markov models in models.cv These models have the attractive and possibly important property that you can multiply them together along branches and still have lie markov models. I don’t know of any evidence that inferences go wrong if you don’t use these models, but it’s possible. For a full description see e.g.: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4468350/	13 November 2015, 03:00:40 UTC
refs/heads/feature/merge-little-subsets	5851b59	change scheme name when cleaning schemes so that it’s very obvious in the best_scheme.txt that you are using a cleaned scheme.	11 November 2015, 22:02:54 UTC
refs/heads/feature/model_csv	7bc895f	implemented models.csv file, which is working in principle.	26 February 2015, 06:38:55 UTC
refs/heads/feature/morph_tiger	0fd45b4	Experimental morpho tiger rates This is a preliminary implementation to estimate tiger rates from a morphology alignment.	25 November 2015, 19:22:30 UTC
refs/heads/feature/morph_tiger_rates	28a0cf1	cleaned up print statements	13 June 2016, 15:27:58 UTC
refs/heads/feature/morphology	e0d9e80	added dummy morphology models for phyml	01 July 2013, 19:58:00 UTC
refs/heads/feature/morphology2	dcc88c3	clean up model list checking still needs more work, but this is a good start	21 November 2013, 21:31:59 UTC
refs/heads/feature/new_clustering	a172fb3	new relaxed clustering algorithm complete contains some much more efficient routines, including only making schemes once per step, and keeping a more efficient running tally of subset improvements.	14 December 2013, 07:36:42 UTC
refs/heads/feature/no-sleep	50dd23a	remove sleep condition I suspect this is slowing us down a lot…	29 September 2016, 04:12:00 UTC
refs/heads/feature/phyml-external	27137d2	Update saved results files for latest phyml	07 August 2015, 04:46:46 UTC
refs/heads/feature/profiling	2739914	PEP8: Newline at end of file	04 December 2013, 19:29:37 UTC
refs/heads/feature/pytables	61804e4	Merge branch 'develop' into feature/pytables	10 December 2013, 22:42:19 UTC
refs/heads/feature/raxml-external	de2d12a	Fix raxml build	07 August 2015, 07:41:11 UTC
refs/heads/feature/test-tiger-arrays	d40f4d3	Fix silly bugs after merge.	09 March 2015, 06:47:49 UTC
refs/heads/gh-pages	0dfbde4	github generated gh-pages branch	04 July 2011, 10:14:21 UTC
refs/heads/gui_test	c957068	basic gui working	30 June 2012, 03:57:23 UTC
refs/heads/h5-bug	a874845	ignore pyenv cruft	03 October 2017, 19:55:50 UTC
refs/heads/master	4201050	remove download stats for reasons I don't understand, it doesn't work.	10 March 2021, 21:10:50 UTC
refs/heads/paul_develop	d0cf8a8	removed confusing log statement *log.info() statement read the number of sites from each codon position being split. This was for testing and, in reality, doesn’t work for most datasets.	16 February 2015, 19:39:57 UTC
refs/heads/release/1.1.0	27309ff	Handle expected failure of DNA_Clustering3	16 May 2013, 03:25:16 UTC
refs/heads/speedup-threadpool	3c5bfd0	Create list of correct size for all tasks Should speed things up a little	13 October 2016, 22:59:29 UTC
refs/tags/h5-bugfix-1	a874845	ignore pyenv cruft	03 October 2017, 19:55:50 UTC
refs/tags/v0.9.1	eccc508	Use md5 to generate consistent length names	07 March 2012, 11:43:38 UTC
refs/tags/v2.0-pre1	a554d7f	Better user output for kmeans	14 August 2015, 11:13:37 UTC
refs/tags/v2.0-pre2	dba0794	Merge pull request #63 from brettc/feature/1kitebugfix1 Feature/1kitebugfix1	16 August 2015, 02:09:03 UTC
refs/tags/v2.0.0	41a5ef0	update PF2 citation the ppr is now accepted	22 November 2016, 04:50:43 UTC
refs/tags/v2.0.0-pre10	b6fcd69	Merge pull request #80 from brettc/feature/fastercluster Feature/fastercluster we now have the search option rclusterf, which is a faster version of the rcluster algorithm. I do not yet know exactly how well it compares to rcluster, though it should be quite a bit faster in certain situations (especially where the number of models is << than the number of processors you have).	13 November 2015, 05:14:57 UTC
refs/tags/v2.0.0-pre11	47d2328	fix bug in rclusterf this bug meant that if the median change was zero, we got stuck in an infinite loop.	03 December 2015, 22:24:34 UTC
refs/tags/v2.0.0-pre12	2d28e48	remove unused test	14 March 2016, 04:59:39 UTC
refs/tags/v2.0.0-pre13	a307ab8	remove old debugging statement Embarrassing. Thanks to Ben Anderson for pointing this out. https://groups.google.com/forum/#!topic/partitionfinder/MSdcgxJ415w	18 March 2016, 20:29:32 UTC
refs/tags/v2.0.0-pre14	acb84f8	Merge pull request #104 from brettc/feature/morph Feature/morph	31 May 2016, 22:39:13 UTC
refs/tags/v2.0.0-pre15	a06b857	updated citation for PF2	18 September 2016, 23:41:24 UTC
refs/tags/v2.0.0-pre16	7f70beb	fix windows bug reported here: https://groups.google.com/forum/#!topic/partitionfinder/4pAkDOHB5FM the bug was a hangover from the TIGER days.	21 September 2016, 05:55:35 UTC
refs/tags/v2.0.0-pre17	e561bfa	update raxml version to https://github.com/stamatak/standard-RAxML/commit/5d9558ac18ddb2c69dd75a 9dc971bcf541bbfeb2	22 September 2016, 06:29:18 UTC
refs/tags/v2.0.0-pre3	e7529ea	updated gitignore	04 May 2015, 07:21:07 UTC
refs/tags/v2.0.0-pre4	97b68ef	updated manual contents	25 August 2015, 03:55:24 UTC
refs/tags/v2.0.0-pre5	8bd784c	changed user output for cluster	25 August 2015, 05:17:09 UTC
refs/tags/v2.0.0-pre6	fb4fcd3	remove -U option for RAxML it might be causing issues, and won’t work with morphology data.	28 August 2015, 23:52:38 UTC
refs/tags/v2.0.0-pre7	b106624	update kmeans test since we now disallow multiple subsets as input	12 September 2015, 07:55:45 UTC
refs/tags/v2.0.0-pre8	83be0bd	Merge pull request #70 from brettc/feature/complete_alignments Feature/complete alignments	15 September 2015, 05:22:05 UTC
refs/tags/v2.0.0-pre9	a2d3b33	updated manual added in —all-states and —min-subset-size	02 October 2015, 04:17:43 UTC
refs/tags/v2.1.0	19d7fe4	Disable k-means for all but morphology #Why? A paper came out yesterday (http://www.sciencedirect.com/science/article/pii/S1055790316302780) that raises some serious concerns about the k-means algorithm, suggesting that it might lead to bad inferences on empirical datasets. I had spoken to the authors of the paper when they were revising it, but wasn’t aware until yesterday of the details of the problems they’d uncovered. Given how odd the inferences from k-means look, we decided to disable the method for all but morphological analyses (see below). # But there was a warning before, why disable it now? Our previous concerns came from our own realisation about one aspect of the method (that it lumps together all invariant sites) and some concerns raised by folks in Brian Moore’s lab this year. Specifically, we put in the warning when we learned that some simulated datasets that were analysed with k-means partitioning schemes led to bad inferences. I was hopeful that these simulations would be corner cases, and/or that one aspect of the simulations where k-means was misleading (that you got implausibly long trees) would mean that it would be trivial to diagnose cases in which there were issues. In addition, we had tried the method on lots of empirical datasets, and never seen any issues. Indeed, on at least one dataset the k-means tree seemed much more reasonable than trees we were getting from other methods. (I note that Brian Moore and co were less optimistic, and suggested from the start that we should consider disabling the method.) The empirical results in the recent paper suggest otherwise, and suggest that the best we can say of k-means for now is: ‘you should try other methods too, and if the methods disagree, we’d suggest ignoring the k-means tree'. On this basis, there seems little point keeping k-means as an available method: if you can't trust the reuslts, why bother. # I liked/used it, what should I do? Use standard methods, e.g. partitioning by codon position and locus, instead. Even better (if your dataset is small enough) use the automatic partitioning solutions in BEAST2 and/or MrBayes (google AutoParts). If you have used k-means to make an inference, it would be worthwhile to check that the inference is robust when you use a standard partitioning scheme too. # What’s the problem? We don’t know for sure, but it’s likely to be related to the fact that k-means separates out all invariant sites into a single subset. I presented on this at SMBE in July this year, but this has a couple of downstream effects. First, it makes AIC/AICc/BIC scores look really great, because when you have all the invariant sites together, you can estimate a rate of zero and get likelihoods of 1 for all of those sites. That’s a bit silly, and something I wish we’d realised earlier. Second, and more seriously for inference, putting all the invariant sites into one subset means that the other subsets have NO invariant sites. If you then analyse these without a model that accounts for this (e.g. with some kind of ascertainment bias) this is likely to mess with estimate of rates, branch lengths, and topologies. It’s not totally obvious yet how common the problem is, but now we’ve seen it in simulated and empirical datasets, it seems wise to can the method until we completely understand the problem and can fix it. # Are you going to fix it? We're working on it, but it will take a while. Apart from anything else, we are going to be exceptionally cautious in proposing more new methods related to this one. # But why is it still available for morphology? We’ve kept it in there for morphological datasets as an experimental method, and provide lots of warnings when you run the code and in the output that it’s experimental, untested etc. We did this because morphological datasets are different: they tend to have no invariant sites, and people tend to use models that correct for ascertainment bias. Because of that, it seems worthwhile to leave it in. We are working on testing it as exhaustively as possible for these datasets. # I want to use it anyway If you want to use it for empirical inferences, just don’t. But if you want to use it to try and figure out why it doesn’t work, and how you might improve it, then all you need to do is edit out the line that raises the error. # I have questions… Post on the google group or raise an issue on GitHub.	02 December 2016, 05:08:47 UTC
refs/tags/v2.1.1	63d5af1	bump version number	06 December 2016, 01:45:13 UTC