https://github.com/ekg/freebayes

sort by:
Revision Author Date Message Commit Date
698098e Setting Release-Version v0.9.14 03 March 2014, 14:43:15 UTC
c04d877 resolve bug in implementation of Ewens' Sampling Formula Thanks to Severine Catreux for the catch. This should have the largest effect on sites that are multiallelic. The quality of these will be diminished slightly, in line with the bug. This may reduce the overall power to detect variants at multiallelic loci. It is possible that this is not the most-correct way to utilize the ESF in freebayes, as the assumption of a constant mutation rate for each locus in the genome is inadequate. At some loci the mutation rate is orders of magnitude higher than elsewhere. Problematically, these are the same places where Illumina data (and any PCR-based library) tends to present artifacts. 03 March 2014, 14:35:11 UTC
a830efd fix targeting issue in vcflib, add some more info to allele obs debugging 19 February 2014, 17:35:48 UTC
0e8c2f2 add warning that region cannot be set on haplotype basis file 19 February 2014, 14:41:27 UTC
dc0f063 Merge pull request #66 from andersje/git_to_http changed git:// to https:// 19 February 2014, 00:37:29 UTC
4e21366 changed git:// to https:// 18 February 2014, 22:10:54 UTC
c807ef8 another update to version Make sure it sticks. 14 February 2014, 22:00:52 UTC
f2efa6d Setting Release-Version v0.9.13 14 February 2014, 21:57:37 UTC
c5c8aa8 add check that submodules are downloaded to makefile 14 February 2014, 21:56:19 UTC
d298b4e updated version 14 February 2014, 21:40:37 UTC
21a2951 Setting Release-Version v9.9.13 14 February 2014, 21:39:33 UTC
dfddc43 Merge pull request #62 from pmarks/master fix crasher in homogenizeAllele. use map.rbegin() to the entry with the... 24 January 2014, 17:12:04 UTC
e1203fc resolve #63 by update to documentation 24 January 2014, 17:05:52 UTC
dda84cb fix crasher in homogenizeAllele. use map.rbegin() to the entry with the largest key 23 January 2014, 06:00:35 UTC
a97dbf8 Setting Release-Version v9.9.11 16 January 2014, 23:29:42 UTC
e8e862c Setting Release-Version v9.9.11 16 January 2014, 23:29:18 UTC
53e147a handle = and X in alignment cigars The handling at present isn't intelligent, but by treating this in the same way as 'M' we can properly parse CIGARs which have them. Better would be to skip some of the comparison logic during parsing. 16 January 2014, 23:28:16 UTC
d15f1e6 Setting Release-Version v9.9.11 16 January 2014, 23:27:47 UTC
90b7027 output ALT='.' in --report-monomorphic 15 January 2014, 22:48:14 UTC
ffbc611 resolve inconsistencies in vcflib from previous commit 15 January 2014, 22:42:01 UTC
103d814 resolve #59 Set an internal reference sample name to avoid polluting the CNV map table with the reference sequence names, which can sometimes intersect with actual sequence names (e.g. 1, 2, 3... can both be sequence names and sample names). 15 January 2014, 22:32:06 UTC
ff98393 ... done 25 December 2013, 21:37:49 UTC
0e8e881 completes previous commit 25 December 2013, 21:37:13 UTC
350ab0b removed horizontal lines from README.md github already has these! Cool. 25 December 2013, 21:36:20 UTC
294c1f0 extensive updates to documentation/manual (README.md) freebayes now has a much better manual! Please provide feedback and keep the questions coming! Happy holidays to all. 25 December 2013, 21:33:39 UTC
3c54afe remove errant warning message "What is this?" ... shouldn't happen. 18 December 2013, 23:19:58 UTC
47a713e change behavior of --report-genotype-likelihood-max This does not affect typical operation vs. the previous commit. 18 December 2013, 20:17:53 UTC
6aa6591 avoid and resolve #6 Some aligners will place reads past the ends of short reference sequences that aren't flanked by Ns. Before this commit, freebayes would choke on this kind of issue, and say it was "Unable to read reference sequence base past end of current reference sequence." Now, it just registers the error and continues, as it probably always should have. 18 December 2013, 16:11:45 UTC
9ed353c version 0.9.10 This begins the v1 release candidate series. The change in version numbering style recognizes the need to move the version past 0.9.9! However, version 1.0 is meant to coincide with paper publication or submission. Soon! A large number of small changes land in this version, including changes to the genotype likelihood calculations: * Support for correct genotyping of large deletions. This fix involved correcting handling of "partial" observations of the reference allele. * Default use of mapping quality in genotype likelihood calculations (this is disabled via the --standard-gls flag). Mapping qualities are incorporated via the Effective Base Depth metric from snpTools (Baylor). * Exclusion of mapping quality 0 mappings. (This can be disabled with --min-mapping-quality -1 --standard-gls). The change in genotype likelihood calculations means these are not considered meaningful. These three changes improved the ROC-AUC for indels in simulation by about 1%, which is substantial given that it now stands at 0.95. No difference was recorded for SNPs. The commit also includes a change relevant to the output of variant records. * Removal of variant parsing routines meant to standardize VCF records by removing redundant information from the ALT/REF pairs. These manipulation routines would, for example, remove portions of the REF and ALTs which were always matching. The canonical case would be a haplotype call in which only a SNP was ultimately called. The call would occur over several bp of reference. The manipulation is complex and caused errors in many instances, which leads to many maintenance issues and sad users. At present, VCF normalization is best-handled by external utilities, such as vcfallelicprimitives from vcflib. Retaining the haplotype information in the call in this way clarifies which calls were attempted haplotype calls and which weren't, regardless of output state. 10 December 2013, 00:25:15 UTC
9755e43 Setting Release-Version v0.9.10 10 December 2013, 00:24:08 UTC
5d5b8ac updated vcflib, fix ordering of flags to linker 15 November 2013, 20:56:30 UTC
8a98f11 ensure allele homogenization works when homogenizing to reference allele 13 November 2013, 22:24:14 UTC
4f2fe5e SAR and SAF should be Number=A in the header Thanks to Rajgopal Srinivasan for catching this. 03 November 2013, 17:02:17 UTC
78714b8 ignore un-flanked deletions at the beginning and end of alignments Some aligners will report deletions at the beginning of alignments. It is not clear how to interpret these cases because, without flanking sequence in the read, the indication of a relative deletion in the read is meaningless. For instance, a cigar for a 70bp read could be 1D70M. What base in the read has been deleted? This resolves some bugs which generate errors like this: deletion... alt is empty 31 October 2013, 23:32:02 UTC
7e198dc avoid mixing true full and partial observations Resolves common error mode causing "ref is same as alt" bug. 16 October 2013, 19:37:49 UTC
c283d6d proper positional progression, resolves issue #52 A flag which indicated change in target region was not respected by the previous commit. 10 October 2013, 13:15:04 UTC
e315866 further improvements to indel (and SNP) detection A further 1% improvement in AUC for indels, and .5% improvemint in AUC for SNPs. Changes in this commit are not yet well-optimized, and users should be aware that runtime performance here will be slower than in previous commits. (3x slower for 100 10X samples.) Subsequent commits will focus on reducing runtime. 09 October 2013, 02:26:40 UTC
60622b5 checkpoint, resolution of haplotype breakage issue (a->empty() errors) There are still some outstanding concerns with this commit, but a checkpoint is necessary as this code state has excellent performance against indels, another 1% better than the previous commit. 08 October 2013, 12:57:46 UTC
011561f handling of large variants Maintain the registered alleles set correctly. 03 October 2013, 20:08:37 UTC
10ac8d4 resolve #51 Slicing and dicing haplotype observations could lead to (erroneous) divided indels. Avoid these using a suitable guard in the haplotype observation generation process (fithaplotype). 30 September 2013, 23:18:45 UTC
8e44a20 indel genotye likelihoods This commit resolves a number of issues with indel genotype likelihoods. First, although haplotype detection has proceeded across repeat structures in the reference, the same extension was not applied to repeats which were not represented in the reference, and only in reads. This allowed reads supporting the alternate to appear as if they supported the reference, leading to false heterozygote calls for homozygous alternates. Second, in some cases, it may be desireable to call haplotypes across regions of low complexity. --min-repeat-entropy provides this facility, requiring a given number of bits per base of any reference-relative haplotype window built around a repeat structure. Thirdly, improvements in performance were provided by careful handling of partial observations, specifically checking if apparent full-length haplotype observations in fact support alternate haplotypes, Fourthly, "null" observations (portions of reads which are N, variants that are not specified in --haplotype-basis-alleles), are now correctly treated as non-observations. Previously, these were incorrectly coerced to be reference. In all, these changes yield a 2% improvement in the area under the curve for detection in 100 simulated 10x genomes (0.917 -> 0.937). They also eliminate a number of pathological errors in 1000 Genomes. Also! This commit includes bugfixes for invalid memory access errors detected by valgrind. 28 September 2013, 06:03:14 UTC
a290229 strand observation counts In response to popular demand, the raw counts of forward and reverse observation count return to the freebayes output. SAF: forward alternate observations SAR: reverse alternate observations SRF: forward reference observations SRR: reverse reference observations A number of infrequently-used variables have been removed (X* mismatch variables, CPG, discrete HWE sampling prob). 13 September 2013, 22:02:47 UTC
c2ea1aa fix reference handling, correctly switch targets Resolves failing updates to cached reference sequence. These generated "alt is the same as the ref" type errors, and monomorphic calls in the VCF. They also could generate spurious haplotype calls. Also affected by this change were "subsequence of zero length or negative offset" errors. Bounds checking is now integrated into the routines in Fasta.cpp, and out-of-bounds returns the null string instead of blowing up. When processing entire BAM files, rather than targets or specified regions, freebayes would wait for the condition that there were no more alignments in order to switch targets. This would cause problems specifically when 1) an alignment reached the end of a sequence, 2) there were still more alignments, 3) no targets were specified. A clarification of the position and target switching logic resolves this issue. 13 September 2013, 21:12:15 UTC
3e07445 resolve "...is out of order! expected after..." bug This error message was generated when attempting to gather partial support for haplotypes. If an alignment did not span the haplotype (which is unlikely, because the haplotype calls tend to be short, but happens with reads that are heavily soft-clipped), then this warning would be triggered and the read would be ignored. The adjustment allows for the use of such alignments when generating partial support. 29 August 2013, 23:54:15 UTC
ad718f9 fix default reporting of (some) monomorphic loci If you want to report monomorphic loci, use --report-monomorphic. 23 August 2013, 20:00:20 UTC
c3e485d correctly retain observations during partial observation generation, allele balance fix A coding error resulted in observation loss after assembling partial observations. Limit the "probe" length required of all observations when approximating correct allele balance for indels (specifically insertions). This limits the probe length to 50bp, not-configurable. It will be updated to a measure based on the average read length for future-correctness. 05 August 2013, 14:17:03 UTC
cbe555b --report-monomorphic Optionally report all loci for which there are any observations. Also report failing considered alternates with AC=0. 01 August 2013, 18:06:42 UTC
576bc70 pooled variant detection improvements * enable use-best-n-alleles The --use-best-n-alleles parameter was previously disabled for non-SNP variation, but this prevents its use as a bound on computational complexity. This occurs readily in the case of multiple --ploidy 20 pools and --pooled-discrete, leading to an combinatorial explosion in possible genotype states. A common result of --pooled-diploid would be the exhaustion of system memory. Users can now safely use --pooled-discrete provided they also use a suitable setting for --use-best-n-alleles. In practice, setting to 5 or lower should be sufficient to prevent memory blowup in most situations. For the time being, I suggest testing with progressively lower settings, or simply setting it as low as you think reasonable. (For pooled experiments focused on SNPs, this would be 2.) This update includes two other fixes: * uppercase reference sequence Uppercasing the reference allele properly resolves an error with FastaReference::getSubSequence (negative length). * no hwe priors for pooled samples The HWE component of the mappability estimate should be turned off in the case of pooled-discrete runs, so it is now turned off when --pooled-discrete is specified. 22 July 2013, 16:17:04 UTC
fbf46fc fix header, set new GL calculations as default "Int" should be "Integer" in the VCF header. The new GL calculations take mapping quality into account by default, so observation probability is given by (1-MQ)*(1-BQ). 11 July 2013, 09:38:39 UTC
64f41db quick fix to previous commit Variable casting issue. 09 July 2013, 17:19:50 UTC
9cd5548 LUT for factorials, tweaks to input filtering, QUAL < 0 bugfix And generally, better performance than previous commit. 09 July 2013, 17:18:10 UTC
dce0cb0 fix bounding on genotyping iterations 07 July 2013, 17:24:08 UTC
c0cca0b help text update 06 July 2013, 12:02:29 UTC
aa5282c reset version to 0.9.9.2 (rather than v9...) 06 July 2013, 11:43:15 UTC
69f09a5 Setting Release-Version v0.9.9.2 06 July 2013, 11:42:45 UTC
0603a6e allow maximum search iterations to be == genotypingMaxIterations 05 July 2013, 15:56:05 UTC
c8bbba1 update version_git.h 04 July 2013, 16:45:15 UTC
be06174 Setting Release-Version v9.9.2 04 July 2013, 16:41:50 UTC
c2483d7 6x performance improvement Remove duplicated genotype search, as the algorithm now always searches deeply. The default default --pvar of 0 means "run gradient descent on everything". Fix incorrect (too large type) usage of ttmath. This resolves a major performance bug (30% of runtime) in the previous builds. Remove indel mask vector<bool>, whose aligned copy was occupying a large fraction of runtime (50%). Users employing this method for the removal of artifacts are suggested to look at samtools BAQ, which applies an HMM to incorporate local alignment quality into base quality. Together, these changes increase processing speed in the 1000G release set (2500 samples) by around 6x over the previous commit. 04 July 2013, 16:36:09 UTC
29fa45a remove partial component from RO in VCF output 03 July 2013, 11:02:58 UTC
296164d updated bamtools 03 July 2013, 09:59:01 UTC
67805dd remove spurious N alleles from output Now that haplotype construction occurs at every base, the removal of part-null alleles needs to occur also in the genotypeAlleles routine (which establishes which alleles should be used for genotyping). 03 July 2013, 09:55:24 UTC
7f88f0a parameter changes, --bam-list and --region Allow a bam file list to be provided on the command line. Allow the use of '-' as a region separator in region strings. 02 July 2013, 12:06:38 UTC
3b29e57 updated .gitignore, added CNV BED example 02 July 2013, 09:21:04 UTC
5736bf1 versioning fix (add version_git.h to dependencies) 02 July 2013, 08:55:04 UTC
d291f99 include autoversion in default build path Now, the version will include the git commit id from the repo! 02 July 2013, 08:54:03 UTC
a5a4300 properly include Contamination.* 02 July 2013, 08:51:22 UTC
2873687 add version_git.h 02 July 2013, 08:49:53 UTC
5f36d03 balanced observations for indels (indels) For insertions (currently), require that the reference observations at the haplotype containing the insertion flank the insertion by a combined number of bases that is the same as the length of the insertion. This normalizes the likelihood calculations for insertions. Normalization is (currently) provided implicitly for deletions in that reference bias prevents the mapping of longer deletions. This effect is stronger than the sum of bases effect driving bias problems for insertions. 02 July 2013, 08:45:37 UTC
a994e78 Setting Release-Version v9.9.1 02 July 2013, 08:32:27 UTC
e73c3e5 partial haplotype observations Use all read evidence, even when calling haplotypes, by utilizing equivalencies between partial observations of a haplotype and the putative alleles at the site. Observations which partially support a number of haplotypes have their probability mass divided amongst the alleles they support when calculating genotype likelihoods. This provision resolves sensitivity issues caused by increased spanning coverage required to call small variants when using larger --haplotype-length values. The adjustment to genotype likelihood calculations currently requires the use of the new GL calculation routine provided when enabling --prob-contamination 0, or providing a per-sample contamination estimate file via --contamination-estimates. 01 July 2013, 12:39:16 UTC
bea983b per-read group contamination estimates This is a checkpoint. This functionality is relatively stable. 31 May 2013, 09:55:36 UTC
d549975 addition of contamination estimates into GLs This is a first pass solution, and should probably not be used in production. This commit is for future reference. 25 April 2013, 18:48:12 UTC
296a0fa resolve read group header parsing bug, empty alignment bug When read groups had colons in them, they were not parsed properly. When reads were aligned as wholly soft-clipped, the allele parsing segfaulted. 23 April 2013, 15:39:50 UTC
b0ee6e1 set correct merge order using new bamtools method 19 April 2013, 23:05:46 UTC
e0f8a94 avoid errors with soft clipped sequence at the beginning of reference This leads to errors typically in chrM, as the circular nature of this chromosome means many reads are mapped with soft clips at position 0. 19 April 2013, 15:32:49 UTC
6989b9e set the haplotype calling window with --haplotype-length This is a synonym for --max-complex-gap, but I wanted to ensure that users were clear on the meaning of this parameter. If you want to call haplotypes, then increase --haplotype-length. (It's 3bp by default.) 15 April 2013, 16:36:24 UTC
cd21fa4 indicate that 0.9.9 is new stable revision 09 April 2013, 10:58:54 UTC
c993c5c allow detection of long indels Long deletions were filtered out by legacy code which considered gaps as mismatches. 23 March 2013, 05:12:04 UTC
d0c1f12 turn off genotype qualities by default Genotype qualities (reported as GQ in the output) are marginal likelihoods of the specific genotype for a specific sample given the data and Bayesian model. They may be helpful for filtering or assessing genotyping accuracy, but they take a lot of time to compute because the current method for estimating them is O(N^2) in the number of samples. For more than 10 low-coverage samples, GQ estimation becomes the dominant use of compute time. For 1000 samples, GQ estimation is 90% of compute time. Prior to this commit it was possible to disable them using --no-marginals. I've removed this crypticly-named parameter, set GQ estimation off by default, and added --genotype-qualities, which turns them back on. Users who wish to use fill out the GQ field (old behavior) must provide --genotype-qualities. 30 January 2013, 00:03:43 UTC
cc16f93 --no-marginals now means "no-GQ's" 29 January 2013, 20:13:29 UTC
e85de7b version 0.9.9, set as default "mappability" priors freebayes can estimate the probability that the loci in question is accurately mapped using a number of features extracted from read placement and distribution among samples. This framework effectively extends the basic Bayesian formulation from P(genotypes | reads) to P(genotypes, properly-mapped alleles | reads). As such, the QUAL value must be understood to incorporate expectations about mappability derived from observation features such as allele balance, strand bias, and read placement relative to the allele. This commit sets this model on by default. To turn OFF this behavior, use -wVa or: --hwe-priors-off \ --binomial-obs-priors-off \ --allele-balance-priors-off Extensive testing showed that this combination of parameters provided excellent sensitivity and specificity at all levels of genomic coverage and numbers of samples. The largest improvement in performance is for low-coverage resequencing (<5x coverage, >1000 samples) experiments. Higher-coverage experiments, where data tends to overwhelm priors, should not be affected. 28 January 2013, 23:16:01 UTC
d01982a pooled frequency-based calling (and nan guard) Separate --pooled into --pooled-discrete (old behavior) and --pooled-continuous. In the continuous case, allele observation characteristics are reported for all alleles which passed the input filters (default -F 0.2 -C 2). Pooled continuous calling does not modify the Bayesian algorithm, and is effectively orthogonal to other parameters. For instance, --pooled-discrete and --pooled-continuous can be specified toether. The called genotypes will be affected by the --ploidy setting and --pooled-discrete flags, but the output will reflect all observed alleles passing input filters a the site. Also, guard against nan's in output (Utility.cpp). 28 January 2013, 22:37:32 UTC
36fe0be use of big number library for improved numerical precision (ttmath) Removes the QUAL limit of 50000, more experimentation may be required to apply the method to the marginal genotype quality calculations. 28 January 2013, 20:53:23 UTC
0595cc5 change to help text to reflect region spec change 28 January 2013, 02:59:19 UTC
8bb6181 resolve targeting issue The last position in a region was being excluded. This ensures that the entire target is processed. The documentation is updated to reflect this change. scripts/fasta_generate_regions.py will now make completely covering regions. 28 January 2013, 02:57:24 UTC
d84ba44 track total genotyping iterations, change iteration defaults 27 January 2013, 18:01:46 UTC
f3e5186 improve scaling of probabilities, resolve #42 When using --allele-balance-priors, --binomial-obs-priors, scale probabilities according to the number of possible observation permutations. Resolve #42 by preventing use of soft-clipped sequence at the beginning of the reference. 06 January 2013, 18:15:29 UTC
8c2bb94 resolve #43, add segfault handler In #43, challisd reports a segfault when using input alleles. This was caused by 0-length allele artifacts generated when parsing the input VCF. Additionally, when compiled in debug mode, the segfault handler will now print a stacktrace. 04 January 2013, 17:12:11 UTC
eda4b69 *actually* set defaults in Parametecs.cpp Sets -C 2 -F 0.2 by default. 20 December 2012, 12:08:19 UTC
f8d78ff set default input filters (-C 2 -F 0.2) In testing, these input filters on the minimum support for a given allele have been found to provide a very good balance between sensitivity and specificity, reducing the need for users to place complex filters on their VCF output. We use them by default in our work in the 1000 Genomes project low-coverage (4-6x) data. However, They may not be ideal in polyploid or pooled systems or low-frequency somatic variant detection, so users working in such contexts should set them to a level appropriate for their needs. 19 December 2012, 17:28:13 UTC
48962c8 version 0.9.8 18 December 2012, 13:10:38 UTC
84bf532 add empirical allele observation bias adjustment table By specifying --observation-bias users may provide a table which describes the empirical mapping bias against alleles given the number of bases subtracted or added between the allele and the reference. This is intended to improve genotype likelihood (GL) estimates and downstream imputation and processing of these likelihoods. 18 December 2012, 13:01:23 UTC
61b7bbc use cast to get correct call to max(...) To resolve issue reported by C here: http://blog.gkno.me/post/29962850248/getting-started-with-gkno#disqus_thread 06 December 2012, 15:15:35 UTC
8af4379 guard againstsoft-clip edge cases Soft clips can occur where there is not reference sequence. When generating the allele do notprocess the reference sequence. 02 December 2012, 22:25:14 UTC
0e3f75b cache only needed sequence, cleanup repeat detection Indeed, freebayes was holding onto unneeded reference sequence. This closes a long-standing issue. Also, cleans up repeat edge detection issues. 10 October 2012, 15:58:28 UTC
65e689c bump, attempting to fix github state The last commit is not reflected in github, but is possible to obtain by cloning. This is an attempt to resolve the mismatch between github's overview and the repository. 10 October 2012, 12:27:00 UTC
b950138 reference most recent stable revision Users encountering bugs with the development version can revert to the most recent stable version. This version will be updated in the README as development continues. 09 October 2012, 14:29:03 UTC
f16e2bb remove errant debugging messages and force exit 05 October 2012, 15:25:09 UTC
0995bd8 build haplotypes across repeats (version 0.9.7) When an indel is based on underlying repeat structure, record the right boundary of the repeat in the reference (technically, the first base past the repeat) in the indel's Allele structure. When building haplotype alleles during genotyping, assemble across the repeat, requiring, for instance, reference-matching reads to cover the entire repeat sequence. 02 October 2012, 07:33:20 UTC
back to top