https://github.com/ekg/freebayes

sort by:
Revision Author Date Message Commit Date
a994e78 Setting Release-Version v9.9.1 02 July 2013, 08:32:27 UTC
e73c3e5 partial haplotype observations Use all read evidence, even when calling haplotypes, by utilizing equivalencies between partial observations of a haplotype and the putative alleles at the site. Observations which partially support a number of haplotypes have their probability mass divided amongst the alleles they support when calculating genotype likelihoods. This provision resolves sensitivity issues caused by increased spanning coverage required to call small variants when using larger --haplotype-length values. The adjustment to genotype likelihood calculations currently requires the use of the new GL calculation routine provided when enabling --prob-contamination 0, or providing a per-sample contamination estimate file via --contamination-estimates. 01 July 2013, 12:39:16 UTC
bea983b per-read group contamination estimates This is a checkpoint. This functionality is relatively stable. 31 May 2013, 09:55:36 UTC
d549975 addition of contamination estimates into GLs This is a first pass solution, and should probably not be used in production. This commit is for future reference. 25 April 2013, 18:48:12 UTC
296a0fa resolve read group header parsing bug, empty alignment bug When read groups had colons in them, they were not parsed properly. When reads were aligned as wholly soft-clipped, the allele parsing segfaulted. 23 April 2013, 15:39:50 UTC
b0ee6e1 set correct merge order using new bamtools method 19 April 2013, 23:05:46 UTC
e0f8a94 avoid errors with soft clipped sequence at the beginning of reference This leads to errors typically in chrM, as the circular nature of this chromosome means many reads are mapped with soft clips at position 0. 19 April 2013, 15:32:49 UTC
6989b9e set the haplotype calling window with --haplotype-length This is a synonym for --max-complex-gap, but I wanted to ensure that users were clear on the meaning of this parameter. If you want to call haplotypes, then increase --haplotype-length. (It's 3bp by default.) 15 April 2013, 16:36:24 UTC
cd21fa4 indicate that 0.9.9 is new stable revision 09 April 2013, 10:58:54 UTC
c993c5c allow detection of long indels Long deletions were filtered out by legacy code which considered gaps as mismatches. 23 March 2013, 05:12:04 UTC
d0c1f12 turn off genotype qualities by default Genotype qualities (reported as GQ in the output) are marginal likelihoods of the specific genotype for a specific sample given the data and Bayesian model. They may be helpful for filtering or assessing genotyping accuracy, but they take a lot of time to compute because the current method for estimating them is O(N^2) in the number of samples. For more than 10 low-coverage samples, GQ estimation becomes the dominant use of compute time. For 1000 samples, GQ estimation is 90% of compute time. Prior to this commit it was possible to disable them using --no-marginals. I've removed this crypticly-named parameter, set GQ estimation off by default, and added --genotype-qualities, which turns them back on. Users who wish to use fill out the GQ field (old behavior) must provide --genotype-qualities. 30 January 2013, 00:03:43 UTC
cc16f93 --no-marginals now means "no-GQ's" 29 January 2013, 20:13:29 UTC
e85de7b version 0.9.9, set as default "mappability" priors freebayes can estimate the probability that the loci in question is accurately mapped using a number of features extracted from read placement and distribution among samples. This framework effectively extends the basic Bayesian formulation from P(genotypes | reads) to P(genotypes, properly-mapped alleles | reads). As such, the QUAL value must be understood to incorporate expectations about mappability derived from observation features such as allele balance, strand bias, and read placement relative to the allele. This commit sets this model on by default. To turn OFF this behavior, use -wVa or: --hwe-priors-off \ --binomial-obs-priors-off \ --allele-balance-priors-off Extensive testing showed that this combination of parameters provided excellent sensitivity and specificity at all levels of genomic coverage and numbers of samples. The largest improvement in performance is for low-coverage resequencing (<5x coverage, >1000 samples) experiments. Higher-coverage experiments, where data tends to overwhelm priors, should not be affected. 28 January 2013, 23:16:01 UTC
d01982a pooled frequency-based calling (and nan guard) Separate --pooled into --pooled-discrete (old behavior) and --pooled-continuous. In the continuous case, allele observation characteristics are reported for all alleles which passed the input filters (default -F 0.2 -C 2). Pooled continuous calling does not modify the Bayesian algorithm, and is effectively orthogonal to other parameters. For instance, --pooled-discrete and --pooled-continuous can be specified toether. The called genotypes will be affected by the --ploidy setting and --pooled-discrete flags, but the output will reflect all observed alleles passing input filters a the site. Also, guard against nan's in output (Utility.cpp). 28 January 2013, 22:37:32 UTC
36fe0be use of big number library for improved numerical precision (ttmath) Removes the QUAL limit of 50000, more experimentation may be required to apply the method to the marginal genotype quality calculations. 28 January 2013, 20:53:23 UTC
0595cc5 change to help text to reflect region spec change 28 January 2013, 02:59:19 UTC
8bb6181 resolve targeting issue The last position in a region was being excluded. This ensures that the entire target is processed. The documentation is updated to reflect this change. scripts/fasta_generate_regions.py will now make completely covering regions. 28 January 2013, 02:57:24 UTC
d84ba44 track total genotyping iterations, change iteration defaults 27 January 2013, 18:01:46 UTC
f3e5186 improve scaling of probabilities, resolve #42 When using --allele-balance-priors, --binomial-obs-priors, scale probabilities according to the number of possible observation permutations. Resolve #42 by preventing use of soft-clipped sequence at the beginning of the reference. 06 January 2013, 18:15:29 UTC
8c2bb94 resolve #43, add segfault handler In #43, challisd reports a segfault when using input alleles. This was caused by 0-length allele artifacts generated when parsing the input VCF. Additionally, when compiled in debug mode, the segfault handler will now print a stacktrace. 04 January 2013, 17:12:11 UTC
eda4b69 *actually* set defaults in Parametecs.cpp Sets -C 2 -F 0.2 by default. 20 December 2012, 12:08:19 UTC
f8d78ff set default input filters (-C 2 -F 0.2) In testing, these input filters on the minimum support for a given allele have been found to provide a very good balance between sensitivity and specificity, reducing the need for users to place complex filters on their VCF output. We use them by default in our work in the 1000 Genomes project low-coverage (4-6x) data. However, They may not be ideal in polyploid or pooled systems or low-frequency somatic variant detection, so users working in such contexts should set them to a level appropriate for their needs. 19 December 2012, 17:28:13 UTC
48962c8 version 0.9.8 18 December 2012, 13:10:38 UTC
84bf532 add empirical allele observation bias adjustment table By specifying --observation-bias users may provide a table which describes the empirical mapping bias against alleles given the number of bases subtracted or added between the allele and the reference. This is intended to improve genotype likelihood (GL) estimates and downstream imputation and processing of these likelihoods. 18 December 2012, 13:01:23 UTC
61b7bbc use cast to get correct call to max(...) To resolve issue reported by C here: http://blog.gkno.me/post/29962850248/getting-started-with-gkno#disqus_thread 06 December 2012, 15:15:35 UTC
8af4379 guard againstsoft-clip edge cases Soft clips can occur where there is not reference sequence. When generating the allele do notprocess the reference sequence. 02 December 2012, 22:25:14 UTC
0e3f75b cache only needed sequence, cleanup repeat detection Indeed, freebayes was holding onto unneeded reference sequence. This closes a long-standing issue. Also, cleans up repeat edge detection issues. 10 October 2012, 15:58:28 UTC
65e689c bump, attempting to fix github state The last commit is not reflected in github, but is possible to obtain by cloning. This is an attempt to resolve the mismatch between github's overview and the repository. 10 October 2012, 12:27:00 UTC
b950138 reference most recent stable revision Users encountering bugs with the development version can revert to the most recent stable version. This version will be updated in the README as development continues. 09 October 2012, 14:29:03 UTC
f16e2bb remove errant debugging messages and force exit 05 October 2012, 15:25:09 UTC
0995bd8 build haplotypes across repeats (version 0.9.7) When an indel is based on underlying repeat structure, record the right boundary of the repeat in the reference (technically, the first base past the repeat) in the indel's Allele structure. When building haplotype alleles during genotyping, assemble across the repeat, requiring, for instance, reference-matching reads to cover the entire repeat sequence. 02 October 2012, 07:33:20 UTC
03cb231 add freebayes parallelization script 26 September 2012, 12:40:20 UTC
39e625c resolution of issues related to directed haplotyping 18 September 2012, 17:15:15 UTC
8351d54 updated vcflib 09 September 2012, 15:54:21 UTC
0f20f17 bugfix for haplotype basis alleles 09 September 2012, 15:48:05 UTC
9608597 example pipeline script This script is suitable for large (>1000 sample) processing jobs that broken down by genomic region. 20 August 2012, 12:59:48 UTC
7677631 resolve #22 In this case the problem was that the cached reference sequence window is not updated before the first time that the "current reference base" is acquired. This leads to garbage in the allele tags used internally, resulting in an out-of-range error. 15 August 2012, 22:49:55 UTC
20fd465 reference to arXiv:1207.3907 26 July 2012, 20:25:39 UTC
0132d1e remove assertion (and thus exit) on alt == ref However, the warning will still be triggered. 26 July 2012, 20:10:49 UTC
5b17936 ignore (and signal errors) when out-of-order alignments are detected 30 May 2012, 19:10:29 UTC
88afddf resolve #30, multiple alternates with same sequence This appears to be caused by Ns in the read sequence, which were not being parsed properly in some cases. Proper handling of these bases should resolve the issue. 18 May 2012, 01:34:24 UTC
ba4fb65 remove default mapping quality and base quality restrictions The mis-estimation of mapping quality causes a lot of problems for users. Largely, these issues can be resolved by removing the default input filters in freebayes. A better method of incorporating mapping quality into the analysis is generate genotype likelihoods using the minimum of base quality and mapping quality. This can be enabled by providing the --use-mapping-quality flag on the command line. 16 May 2012, 19:30:12 UTC
9696d0c fix allele misclassification bug 30 April 2012, 19:35:47 UTC
3f0ae56 haplotype basis alleles By specifying a set of haplotype basis alleles, phasing information can be established in long reads even in the presence of high error rates. The haplotype basis allele input is used to select the alleles which will be phased. Other alleleic primitives will be ignored by replacement with the reference allele. 27 April 2012, 19:37:23 UTC
a464833 *really* resolve #17 Use eof() to check for string/variable conversion in convert.h instead of tellg(), which behaves correctly according to the C++ spec as of gcc 4.6.2, returning -1 when eof() is set and when there is an error. 27 March 2012, 20:04:22 UTC
81c3ec5 resolve #17 Per http://stackoverflow.com/questions/6552876/file-stream-tellg-tellp-and-gcc-4-6-is-this-a-bug tellg(): (27.6.1.3) After constructing a sentry object, if fail() != false, returns pos_type(-1) to indicate failure. Otherwise, returns rdbuf()->pubseekoff(0, cur, in). 26 March 2012, 18:48:53 UTC
df23b3f resolve bug #25 During non-targeted analysis of an entire reference sequence, freebayes would fail to process positions after the first reference sequence. This resolves the issue. 07 February 2012, 22:39:17 UTC
a6943d9 bamtools update 03 February 2012, 00:25:04 UTC
8c35b17 update documentation to describe variant input behavior 03 February 2012, 00:05:55 UTC
32b9693 remove requirement of PL (sequencing technology) tag 31 January 2012, 22:41:29 UTC
a2db81c Revert "update of bamtools" This reverts commit 3cb41894c3850a863a8d66cca51d3c3da4d4961e. 19 January 2012, 18:21:05 UTC
2cdddfc Revert "update submodules, vcflib and bamtools" This reverts commit a3b707ef2a9ef4174a7b16a61a996b822973219f. Conflicts: bamtools 19 January 2012, 18:20:41 UTC
3cb4189 update of bamtools 18 January 2012, 23:57:52 UTC
a3b707e update submodules, vcflib and bamtools 18 January 2012, 23:05:57 UTC
31cffd8 resolve https://github.com/ekg/freebayes/issues/26 This issue arose due to a split indel allele generated in the haplotype creation step. The issue is resolved by removing such alleles from analysis at a prior stage. 05 January 2012, 23:53:24 UTC
094879e resolves https://github.com/ekg/freebayes/issues/22 This involved errors when producing VCF output with complex alleles. 04 January 2012, 20:47:10 UTC
4fd14bb minor adjustments to handle BAMs produced by Complete Genomics Some CG BAM records are not processable using our system, and so they must be ignored. This commit ensures proper handling of these cases. 15 December 2011, 22:04:34 UTC
5693c69 ignore indel alleles which are not flanked by invariant sequence In its present design, the detection model used by freebayes cannot handle ambiguous alleles. One notable class of these are indels described at the beginning and end of alignments, as it is not guaranteed that these are fully defined. This commit excludes these alleles from analysis. 14 December 2011, 23:15:09 UTC
d149da7 resolve segfault when enumerating polyploid genotype likelihoods For the time being, I am removing the genotype likelihood output for the polyploid model. The ordering of genotypes for polyploid data is not specified in the VCF 4.1 spec. 07 December 2011, 21:32:27 UTC
152bf35 allele frequency input priors 07 December 2011, 15:51:34 UTC
474c9e1 use intervaltree for in-target detection Eventually this will allow the selection of a set of (possibly overlapping) targets when reading data from stdin. 18 November 2011, 16:34:46 UTC
2f4c924 add intervaltree submodule 17 November 2011, 14:25:05 UTC
9b42dc8 output bug with adjusted haplotypes Errant overwriting of refbase caused breakage of downstream GT and GL output functions. 16 November 2011, 19:06:03 UTC
0a912f9 resolves segfault in the context of a read N matching a reference N 16 November 2011, 18:04:08 UTC
64c2a9a retain a flanking base when reporting adjusted biallelic indels This bug produced "SEQ"/"" calls with empty alternate sequences, which violates the VCF spec. 10 November 2011, 21:48:16 UTC
47a4513 remove errant -1 Causes truncation error with haplotype allele printing. 07 November 2011, 01:45:22 UTC
f4207a3 clean up reporting of ref/alt pairs with matching start and end sequence Prevents reporting lots of extra matching sequence on haplotype-based alleles. 27 October 2011, 23:46:39 UTC
f47c3da version 0.9.4 Haplotype calling cleanup. 27 October 2011, 21:19:18 UTC
6543587 homogenize alternate alleles at haplotype loci Depending on sequence context, a complex allele which is 1M5D1M1D is potentially the same as a deletion allele 2M6D. This equivalence can be established by comparing the alternate sequences for a given reference-relative haplotype. The most-commonly-observed alignment is used to adjust the cigars for identical but differentially described alternate alleles. 27 October 2011, 15:55:29 UTC
d404615 fix VCF fields, AA -> AO, RA -> RO AA is reserved for another use. Also, resolves mistake with previous bugfix. 26 October 2011, 23:10:11 UTC
8115c27 inbreeding coefficient calculations in python 26 October 2011, 21:54:38 UTC
7776445 fix haplotype breakage across MNPs 26 October 2011, 21:48:02 UTC
addf5a0 allow unsetting the genotyping max banddepth This is done via "--genotyping-max-banddepth 0". 19 October 2011, 22:31:37 UTC
75ef778 combine homozgyous combos across populations This is required for proper normalization of site QUAL, as it depends on the present homozygous genotypings by definition. 19 October 2011, 14:45:48 UTC
3bfe993 fix bugs with population subdivision 18 October 2011, 01:24:10 UTC
f37c427 population subdivisions These changes allow the subdivision of the input samples into sub-populations. The sub-populations are assumed to be inbreeding, selectively neutral, random samples. Provided this, the model is evaluated for each population independently. The sub-population model assumes independence among the populations. At present, mutual information is shared between populations only in the sense that alleles and genotypes evaluated for one population are evaluated for all. Populations may be specified using a file mapping sample names to populations. The command-line flag is --populations. 17 October 2011, 23:08:45 UTC
8ee5c9d minor README update 14 October 2011, 22:28:30 UTC
642d44f fix bugs related to the input of complex alleles And, version 0.9.3! 14 October 2011, 05:07:48 UTC
a3e256f add discrete HWE sampling probability of genotyping to VCF Also, rationalize het sample all observation count, used in some other VCF INFO fields. 12 October 2011, 15:53:50 UTC
dd97e73 fix AB, add MEANALT, fix haplotype generation bug 10 October 2011, 22:38:07 UTC
900d8e5 resolves haplotype allele construction bug Inappropriate amplification. 10 October 2011, 04:49:40 UTC
623efe4 exclude alleles with no reference sequence These are generated in the process of haplotype construction. Call them chaff; the alleles which they have been carved out of could not pass relatively minimal filter cutoffs, and as such are carved up. It's very unlikely that they are significant, and reconstructing them properly would require a lot of code adjustment and probably would not result in better performance. 05 October 2011, 00:01:50 UTC
83f24b6 re-enable DPRA (depth reference alternate ratio) 04 October 2011, 22:43:19 UTC
d5d90a3 version 0.9.2 30 September 2011, 02:43:43 UTC
8e9d51c code cleanup, performance enhancement Add an upper bound for the depth of integration (default 6 best genotypes, sorted by data likelihood) for each sample. This caps the amount of computation at complex multiallelic sites. 30 September 2011, 02:23:48 UTC
648e845 fix AB calculations for multiallelics 29 September 2011, 18:33:12 UTC
74f7171 remove broken ts/tv tagging 28 September 2011, 05:24:12 UTC
c370968 fix bug with homozgyous convergence case 27 September 2011, 23:38:27 UTC
32a7145 use the null allele and genotype when excluding unobserved genotypes Only attempt to add the null allele in the case of --exclude-unobserved-genotypes. 27 September 2011, 16:37:54 UTC
d0a70df performance improvements 1) Introduce the concept of a null allele. This is used in the place of all the other potential alleles at the site when calculating genotype likelihoods for a given sample. If the sample does not have any observations for a given alternate allele, we just ignore it. The likelihoods for such genotypes are then provided by matching the genotype to one in which the missing allele is replaced by a null allele. The benefit of this is that we dramatically reduce the number of potential genotype combinations which we have to evaluate when searching the posterior space for the maximum likelihood solution. This is done without any serious change to the algorithm design, and allows marginals to be calculated without issue. 2) Cache binomial calculations. This provides a 15% speedup when using --binomial-obs-priors. 3) Don't store intermediate genotype combination results. Doing so causes severe memory blowups. (I'm also considering changing the GenotypeCombo to a vector<short>, and including some kind of genotype ptr -> short int mapping for the combo.) 26 September 2011, 23:13:37 UTC
61bffa9 limit size of factorial cache 22 September 2011, 15:01:38 UTC
8a0e27b add null alleles to handle N's in reads Additionally, this adjusts the way that some complex alleles are generated, such as those flanking N bases in reads. Also, cleanup of some logic in the AlleleParser::getNextAlleles code. 22 September 2011, 00:12:29 UTC
4957b5c fix haplotype allele generation bug Resolves an off-by-one error in the haplotype generation code. 21 September 2011, 15:29:48 UTC
42eb578 haplotype-based detection This commit enables correct evaluation of variant loci with multi-base alleles by combining variant alleles into dynamically-sized haplotype alleles. These haplotypes, or phased sets of alleles, are tagged as "complex" in the VCF output. Some minor issues remain following this commit: 1) the reported CIGAR strings for SNPs are sometimes incorrect, as when the SNP lies at the first base in a mult-base allele, 2) freebayes cannot yet take complex alleles as --variant-input; they will be broken into their constituent alleles. 20 September 2011, 17:58:56 UTC
0f42d54 ensure proper future use of allLocalGenotypeCombinations This check allows the use of this function in the case that it is used to add to a previous set of genotype combinations. 15 September 2011, 21:29:03 UTC
e55cd26 resolve haploid genotyping bug Due to a recent change in the way that the reference allele is handled, in some cases it was possible that the best genotype combination was not evaluated when calculating marginal genotype likelihoods. As a result, genotypes were frequenty mis-called in the case of two haploid samples. This resolves the bug by ensuring that the best genotype combination is added to the set of combinations that are evaluated. 15 September 2011, 21:26:36 UTC
fd4184c resolve ref allele bug 09 September 2011, 00:01:16 UTC
47df665 allow complex alleles to have embedded matching sequence (v0.9.0) With this change, complex alleles are generated for cases where two small variants in the same read occur at most --max-complex-gap bases apart (3bp, by default). This allows for the detection of MNPs (multi-nucleotide polymorphisms) with embedded matching bases. The behavior can be disabled by setting --max-complex-gap 0. Also, this commit adds a new tag to the VCF output, CIGAR, which provides the CIGAR strings of the variants, allowing for post-hoc filtering of certain classes of complex variants. 08 September 2011, 17:18:37 UTC
3d097bf more cleanup for bamtools API integration 30 August 2011, 15:49:54 UTC
4b641f4 add libbamtools.a to build commands Without this, users would need libbamtools.so in their shared object search path. 30 August 2011, 15:29:20 UTC
back to top