f408a14 | Rahul Gupta | 22 January 2021, 19:30:50 UTC | More instances of s/blacklist/forbidden. | 22 January 2021, 19:30:50 UTC |
84944a6 | rahul1980 | 22 January 2021, 19:01:18 UTC | s/blacklist/forbidden in phrases.cc | 22 January 2021, 19:01:18 UTC |
090a350 | Anders Thorhauge Sandholm | 26 August 2020, 08:43:18 UTC | Update README.md (#445) Include reference to ringgaard/sling repo. | 26 August 2020, 08:43:18 UTC |
d2b3d4f | Michael Ringgaard | 19 May 2020, 15:03:25 UTC | Python API for document analyzer (#442) | 19 May 2020, 15:03:25 UTC |
af58a2d | Michael Ringgaard | 28 February 2020, 17:41:12 UTC | Silver type annotator and corpus splitter (#439) | 28 February 2020, 17:41:12 UTC |
d95c7ef | Michael Ringgaard | 11 January 2020, 15:42:41 UTC | RNN stacks (#437) | 11 January 2020, 15:42:41 UTC |
551f3f4 | JDzvonik | 20 December 2019, 11:54:32 UTC | Fix closing recordio files from pyapi (#436) | 20 December 2019, 11:54:32 UTC |
3000f6f | Anders Thorhauge Sandholm | 10 December 2019, 10:47:29 UTC | Update train_caspar.py Minor comment fix | 10 December 2019, 10:47:29 UTC |
74bf670 | Anders Thorhauge Sandholm | 04 December 2019, 15:15:24 UTC | Update myelin.md Minor updates and fixes to the myelin documentation. | 04 December 2019, 15:15:24 UTC |
becdd06 | Anders Thorhauge Sandholm | 19 November 2019, 11:02:57 UTC | Introduce resolver topic weight and increase mention weight. (#433) | 19 November 2019, 11:02:57 UTC |
5ed17a4 | Michael Ringgaard | 18 November 2019, 15:59:12 UTC | Rename silver mention annotator (#432) | 18 November 2019, 15:59:12 UTC |
b600fe5 | Michael Ringgaard | 18 November 2019, 13:02:13 UTC | Move annotators from ner to silver directory (#431) | 18 November 2019, 13:02:13 UTC |
75452a3 | Michael Ringgaard | 18 November 2019, 09:41:14 UTC | Rearrange parser training modules (#430) | 18 November 2019, 09:41:14 UTC |
93541e9 | Michael Ringgaard | 15 November 2019, 15:36:28 UTC | Integrate link graph into main wiki pipeline (#429) | 15 November 2019, 15:36:28 UTC |
8d85040 | Michael Ringgaard | 11 November 2019, 13:02:15 UTC | Parallel decompression of Wikidata dumps (#427) | 11 November 2019, 13:02:15 UTC |
f2f9910 | Michael Ringgaard | 09 November 2019, 10:31:12 UTC | Write parser model after training (#425) | 09 November 2019, 10:31:12 UTC |
66d005d | Anders Thorhauge Sandholm | 08 November 2019, 15:27:17 UTC | Set max edit distance. (#426) | 08 November 2019, 15:27:17 UTC |
612c6c1 | Michael Ringgaard | 08 November 2019, 12:10:06 UTC | Transition-based cascaded parser trainer in C++ using Myelin (#424) | 08 November 2019, 12:10:06 UTC |
87cf72a | Michael Ringgaard | 08 November 2019, 11:59:47 UTC | Silver anaphora and relation annotations (#423) | 08 November 2019, 11:59:47 UTC |
5cb3d0c | Michael Ringgaard | 02 November 2019, 10:28:38 UTC | Global gradient function registration (#421) | 02 November 2019, 10:28:38 UTC |
1b6468a | Michael Ringgaard | 14 October 2019, 15:03:12 UTC | Document corpus reader (#418) | 14 October 2019, 15:03:12 UTC |
da9901c | Michael Ringgaard | 14 October 2019, 14:33:57 UTC | Upgrade to Bazel 1.0.0 (#419) | 14 October 2019, 14:33:57 UTC |
2850d72 | Anders Thorhauge Sandholm | 14 October 2019, 12:35:49 UTC | Removing an additional (malformed) occurrence of 'party leader (Q1553195)'. (#417) | 14 October 2019, 12:35:49 UTC |
ce50ad9 | Anders Thorhauge Sandholm | 09 October 2019, 12:36:58 UTC | Add clitics splitting for it and fr. (#416) | 09 October 2019, 12:36:58 UTC |
3134461 | Michael Ringgaard | 09 October 2019, 12:15:35 UTC | Fix a few wiki pipeline issues (#415) | 09 October 2019, 12:15:35 UTC |
d3daa67 | Michael Ringgaard | 07 October 2019, 17:48:49 UTC | Generalize scatter/gather ops (#413) | 07 October 2019, 17:48:49 UTC |
3d44ff3 | Michael Ringgaard | 07 October 2019, 14:43:28 UTC | Symbol table resizing (#414) | 07 October 2019, 14:43:28 UTC |
6fb4936 | Anders Thorhauge Sandholm | 05 September 2019, 10:52:40 UTC | Deleting old template file (#410) | 05 September 2019, 10:52:40 UTC |
62230d6 | rahul1980 | 04 September 2019, 19:52:05 UTC | Transition generation in C++ (#409) Verified by running against the Ontonotes dev corpus (>9.6K sentences) and checking that we get the same actions as the Python implementation. Bonus: Fix bug in frame/object.h, where we were holding a reference to a temporary functor. | 04 September 2019, 19:52:05 UTC |
a32c310 | Michael Ringgaard | 23 August 2019, 08:55:38 UTC | Phrase structure annotations (#408) | 23 August 2019, 08:55:38 UTC |
9adad67 | Michael Ringgaard | 21 August 2019, 11:21:26 UTC | Force Bazel upgrade in setup (#407) | 21 August 2019, 11:21:26 UTC |
8868545 | Michael Ringgaard | 21 August 2019, 08:38:03 UTC | Document annotation pipeline (#406) | 21 August 2019, 08:38:03 UTC |
ec97759 | Michael Ringgaard | 20 August 2019, 08:16:56 UTC | Fast snapshot GC (#405) | 20 August 2019, 08:16:56 UTC |
bd20762 | Anders Thorhauge Sandholm | 12 August 2019, 14:47:12 UTC | Fix 'unknown' overwrite in wikibot (#402) | 12 August 2019, 14:47:12 UTC |
41b0027 | Denny Vrandečić | 12 August 2019, 14:30:07 UTC | Source and claim written in the same edit (#400) By adding the source to the claim before adding the claim to the item we only do one write to Wikidata (thanks to vrandezo) | 12 August 2019, 14:30:07 UTC |
5b9226c | Michael Ringgaard | 09 July 2019, 13:56:15 UTC | Fix bug in Express::AlwaysZero() (#398) | 09 July 2019, 13:56:15 UTC |
2a7f451 | Michael Ringgaard | 09 July 2019, 07:43:12 UTC | Sparse embedding updates (#397) | 09 July 2019, 07:43:12 UTC |
631f219 | Michael Ringgaard | 03 July 2019, 09:03:47 UTC | Plausibility model (#396) | 03 July 2019, 09:03:47 UTC |
64b48ed | Michael Ringgaard | 01 July 2019, 13:05:12 UTC | Split myelin simulator into separate module (#395) | 01 July 2019, 13:05:12 UTC |
f8e2b69 | rahul1980 | 27 June 2019, 19:46:47 UTC | Make frame types optional in the trainer code (#393) | 27 June 2019, 19:46:47 UTC |
5312cf1 | Anders Thorhauge Sandholm | 27 June 2019, 11:05:33 UTC | More fault tolerant wikibot (#394) | 27 June 2019, 11:05:33 UTC |
40be473 | Michael Ringgaard | 27 June 2019, 07:44:37 UTC | Fix dangling op producer (#392) | 27 June 2019, 07:44:37 UTC |
af2ebfe | Michael Ringgaard | 25 June 2019, 14:29:17 UTC | General reduction and transpose (#391) | 25 June 2019, 14:29:17 UTC |
10bb092 | rahul1980 | 19 June 2019, 14:06:50 UTC | Fix copy-paste error in python/task/entity.py | 19 June 2019, 14:06:50 UTC |
2063648 | Michael Ringgaard | 19 June 2019, 11:43:18 UTC | More myelin ops and faster matmuls (#389) | 19 June 2019, 11:43:18 UTC |
5c848c2 | rahul1980 | 14 June 2019, 16:57:58 UTC | Minor fixes in flow reading code in Python (#386) | 14 June 2019, 16:57:58 UTC |
4354ed9 | rahul1980 | 14 June 2019, 03:56:05 UTC | Ensure qid label is a string in the parse signature (#388) | 14 June 2019, 03:56:05 UTC |
bdb4f7b | Alvin Hou | 13 June 2019, 16:53:28 UTC | Fix wget missing URL error while installing Bazel in setup.sh (#387) | 13 June 2019, 16:53:28 UTC |
bb45740 | Michael Ringgaard | 12 June 2019, 09:22:10 UTC | Custom inputs and output for NER labeler (#385) | 12 June 2019, 09:22:10 UTC |
1edcf38 | rahul1980 | 29 May 2019, 16:58:13 UTC | Fix ontonotes conversion (#383) Use id files from conll.cemantix.org/2012/coref instead of the current URL. Shuffle the training corpus using a fix (but configurable) random seed in the trainer. | 29 May 2019, 16:58:13 UTC |
753ee75 | rahul1980 | 24 May 2019, 16:53:33 UTC | Support for pagination and selection of individual categories in the Wikicat browser (#382) * Add pagination support * Add checkboxes to select individual categories for a signature * Add comments * Increase topk and page size | 24 May 2019, 16:53:33 UTC |
0db5ee3 | Anders Thorhauge Sandholm | 21 May 2019, 22:24:22 UTC | Increase robustness and remove redundant str() calls. (#380) | 21 May 2019, 22:24:22 UTC |
f2f05e2 | Anders Thorhauge Sandholm | 21 May 2019, 13:59:17 UTC | Fix str bug and add check to not upload previously deleted fact. (#379) | 21 May 2019, 13:59:17 UTC |
74f29ce | rahul1980 | 20 May 2019, 20:03:08 UTC | Fix embedding reader code to work with python3 (#377) * Fix embedding reader code to work with python3 * Serialize suffix table as bytes instead of str(byterarray) | 20 May 2019, 20:03:08 UTC |
9565740 | Michael Ringgaard | 20 May 2019, 09:32:01 UTC | One wheel for all Python 3 versions (#378) | 20 May 2019, 09:32:01 UTC |
bc0c473 | rahul1980 | 17 May 2019, 20:18:28 UTC | Show individual span counts in Wikicat browser, and a few other minor fixes (#376) | 17 May 2019, 20:18:28 UTC |
b51776e | rahul1980 | 16 May 2019, 17:08:19 UTC | Cap nesting depth for MARK actions (#374) TODO: Max nesting depth is heuristically set to 5. we need to set this programmatically by looking at the training corpus. | 16 May 2019, 17:08:19 UTC |
d3d9c22 | Anders Thorhauge Sandholm | 16 May 2019, 12:32:52 UTC | Allow both bytes and strings in Wikidata converter. (#375) * Allow both bytes and strings in Wikidata converter. | 16 May 2019, 12:32:52 UTC |
8656dbc | rahul1980 | 15 May 2019, 17:28:00 UTC | Update install.md to refer to the latest PyTorch | 15 May 2019, 17:28:00 UTC |
7b476dc | rahul1980 | 15 May 2019, 17:23:33 UTC | Move trainer code to python3 and pytorch1.1 (#370) Main changes: - Writing byte arrays in a flow's file writer. - Simplify suffix extraction code, since string slices now directly return characters instead of bytes. - Calls to tensor.item() at a couple of places (due to pytorch 1.1) - Usual changes to print, xrange, iteritems. Tested by training a small model on ~100 sentences. | 15 May 2019, 17:23:33 UTC |
0070aec | Anders Thorhauge Sandholm | 15 May 2019, 14:34:28 UTC | Port wikibot to python3. (#373) * Port wikibot to python3. * Address review comment | 15 May 2019, 14:34:28 UTC |
a0841eb | Michael Ringgaard | 15 May 2019, 12:21:01 UTC | Allow both bytes and strings for pyapi methods (#372) | 15 May 2019, 12:21:01 UTC |
99c7ab7 | Michael Ringgaard | 15 May 2019, 09:48:00 UTC | Refactor gradient clipping (#369) | 15 May 2019, 09:48:00 UTC |
c70cbd6 | Michael Ringgaard | 15 May 2019, 08:14:57 UTC | Change __str__ and __repr__ for frames (#368) | 15 May 2019, 08:14:57 UTC |
5f729a8 | rahul1980 | 13 May 2019, 14:35:41 UTC | Port Wikicat to Python3 (#367) - Make 'print' a function - Handle bytes <-> string conversions at a couple of places. Bonus: - Fix a sorting bug in Top Signatures - Show an error message if our selection formula is too restrictive. - Show a help message to clarify how signature scores are computed. | 13 May 2019, 14:35:41 UTC |
2591d18 | Michael Ringgaard | 09 May 2019, 16:09:06 UTC | Python 3 support for SLING API (#366) | 09 May 2019, 16:09:06 UTC |
727b108 | rahul1980 | 09 May 2019, 14:15:47 UTC | Tweaks to the Wikicat browser (#365) - Push parse scoring before dedup so that low scoring dupes are removed (as opposed to the last k). - Some code clean up. - Add an id field to each parse, for future use. | 09 May 2019, 14:15:47 UTC |
e63b4bf | rahul1980 | 07 May 2019, 14:32:10 UTC | Rewrite Wikicat browser in AngularJS (#364) The existing browser was very unwieldy since it was based on form submissions, and was getting harder to extend. This PR rewrites the browser in AngularJS, and highly simplifies it. The main changes are: - AngularJS provides data binding for free (we were doing it ourselves in the old browser). - We remove all kinds of scores, except the various match-type weights. - We remove "coarse-signatures" and instead only keep full span signatures. - We remove a few more settings, simplifying the look. - We introduce a "parse selection formula" which allows the user to enter a python formula (with support for special placeholders) to restrict parses to those that pass the formula. Bonus: Dedup the set of examples for each match-type. | 07 May 2019, 14:32:10 UTC |
967f616 | Michael Ringgaard | 03 May 2019, 13:02:55 UTC | Tracking of entity handles in resolver (#361) | 03 May 2019, 13:02:55 UTC |
fa6b718 | Anders Thorhauge Sandholm | 01 May 2019, 14:21:22 UTC | Enhance wikibot to deal with category parser input. (#359) Enhance wikibot to deal with category parser input. Fix 'disappeared' and 'Minister' bug. | 01 May 2019, 14:21:22 UTC |
fb444b5 | Michael Ringgaard | 05 April 2019, 13:50:07 UTC | Gradient checking (#358) | 05 April 2019, 13:50:07 UTC |
fa023de | Michael Ringgaard | 03 April 2019, 15:52:17 UTC | NER labeling of documents (#356) | 03 April 2019, 15:52:17 UTC |
bea43fd | Michael Ringgaard | 03 April 2019, 08:29:27 UTC | Fix musical notation bug (#357) | 03 April 2019, 08:29:27 UTC |
7daee6d | Michael Ringgaard | 01 April 2019, 13:55:47 UTC | Wikidata musical notation data type (#355) | 01 April 2019, 13:55:47 UTC |
0edabbf | Michael Ringgaard | 29 March 2019, 09:44:59 UTC | Add documentation for Corpus class | 29 March 2019, 09:44:59 UTC |
6096332 | rahul1980 | 26 March 2019, 17:41:12 UTC | Wikicat: Enforce category_contains constraint (#351) If a category has "category_contains" populated, then we restrict its members to belong to that class. e.g. ("american cyclists", "category_contains", "human") and ("Lance Armstrong", "instance of", "human"). The instance_of check is done transitively using subclass-of edges, and if a category doesn't have a category_contains role, then the check is not performed. | 26 March 2019, 17:41:12 UTC |
201a881 | Michael Ringgaard | 26 March 2019, 10:25:04 UTC | Output SLING frames as JSON in Python API (#352) | 26 March 2019, 10:25:04 UTC |
1f88ffc | rahul1980 | 22 March 2019, 21:15:45 UTC | Fix refcounting bug in PyStore::Resolve() (#349) | 22 March 2019, 21:15:45 UTC |
f948bd4 | Michael Ringgaard | 22 March 2019, 15:26:06 UTC | Named entity annotators (#342) | 22 March 2019, 15:26:06 UTC |
68c1e11 | rahul1980 | 22 March 2019, 15:00:06 UTC | Fix bug #347 (#348) The Python API for `FactsFor()` doesn't parse keyword args, and I had mistaken changed the closure arg from `False` to `closure=False` in #346. | 22 March 2019, 15:00:06 UTC |
6b0cde7 | rahul1980 | 21 March 2019, 17:09:22 UTC | Tests for FactMatcher (#346) - Refactor fact-matching code so that bulk of the logic is in a new match_type method. - Add >100 test cases for match_type across different kinds of properties, including extensive tests of date-valued properties whose values can be dates or ints or qids. - More regression cases or cases where the matcher's output is desired can simply be added to the list of hard-coded test cases. | 21 March 2019, 17:09:22 UTC |
2956799 | Anders Thorhauge Sandholm | 20 March 2019, 21:51:44 UTC | Fix subsumed calculation (#345) * Fix subsumed calculation * Add date special case handling | 20 March 2019, 21:51:44 UTC |
080c05a | rahul1980 | 19 March 2019, 14:30:02 UTC | Minor fixes to the Wikicat browser (#344) - Some cosmetic fixes - Fix a counting bug while generating recordio | 19 March 2019, 14:30:02 UTC |
a6a6775 | rahul1980 | 18 March 2019, 17:29:57 UTC | Enhancements to the Wikicat browser (#343) - Hovering over fact-matching counts now brings up a list of qids that illustrate that count. - While browsing a signature, the user now has an option to generate Wikibot recordio files directly from the browser. - Modify the fact member so that for it stores exemplars for each match type. Further, this list is exhaustive for NEW, ADDITIONAL, and SUBSUMED_BY_EXISTING. This is used in both the new features above. - Simplified the browser code a bit, and removed some unnecessary Javascript. | 18 March 2019, 17:29:57 UTC |
64c35cf | Michael Ringgaard | 15 March 2019, 13:31:01 UTC | Fix the problem with women (#341) | 15 March 2019, 13:31:01 UTC |
fce18b2 | Michael Ringgaard | 15 March 2019, 11:49:51 UTC | Infobox aliases and wiki links (#339) | 15 March 2019, 11:49:51 UTC |
b765579 | rahul1980 | 13 March 2019, 17:45:32 UTC | Various Wikicat fixes (#340) - Omit outputting low-frequency (pid, qid) spans. - Move subsumption checking code to Python - We only get the closure from C++ code now - Subsumption code in Python checks for genre, subclass, part_of, parent_org, located_in, and date subsumption. More properties can be added easily. - Fix a minor bug in the browser. - Expose the Resolve() method in the Python API. | 13 March 2019, 17:45:32 UTC |
97746ce | Michael Ringgaard | 13 March 2019, 13:13:27 UTC | Token styles (#338) | 13 March 2019, 13:13:27 UTC |
05373f7 | Michael Ringgaard | 11 March 2019, 09:41:31 UTC | XML frame reader (#337) | 11 March 2019, 09:41:31 UTC |
cdacfc0 | rahul1980 | 26 February 2019, 22:02:44 UTC | Browser for category parses (#336) Allows browsing by category, signature, or top signatures (denoted by 'top'). Supports coarse and fine signatures, and three metrics to sort the parses with. Allows customization of fact-matching scores. | 26 February 2019, 22:02:44 UTC |
7c53dec | Anders Thorhauge Sandholm | 25 February 2019, 10:13:42 UTC | A few fixes to wikibot.py (#332) | 25 February 2019, 10:13:42 UTC |
b0388ed | Michael Ringgaard | 25 February 2019, 10:00:37 UTC | Check HTTP path after unescaping (#335) | 25 February 2019, 10:00:37 UTC |
1c09339 | rahul1980 | 21 February 2019, 04:13:35 UTC | Fact-matching statistics for category parsing (#331) * FactMatcher that computes how proposed facts in a parse match with existing facts for the same property. They are classified as new, or matching existing facts (either exactly or via subsumption), or conflicting (e.g. for unique-valued properties) or additional facts. * A workflow task that attaches this information to each parse. Other changes: * Add methods to FactExtractor for: (a) only reporting facts for specified properties. (b) report facts with or without backoff (aka closure). Previously the only supported mode was with backoff. I have confirmed that this change doesn't affect the runtime of the backoff mode. (c) Add a method to check if one value subsumes another. These methods also come with the corresponding Python API methods. * Skip empty parses in the parse generator. * A performance improvement: we replace a sling.Array with a python list, allowing the sling.Store behind the array to be garbage collected. * Store the list of category members in the category frame. These only cover legit members, e.g. they exclude subcategories. * Add 'type of sport' and 'cause of death' to the custom taxonomy. * Replace prior and member_score values with their geometric means. This makes it fair to compare a parse with many spans vs a parse with only a few spans (since the priors and member_scores are multiplicative across spans). * Also attach the coarse signature to each parse. * Allow load_kb() to also take filename arguments, and make it use a global pool of loaded KBs. This way multiple tasks can share a KB, which saves both memory and runtime. * Add a 'skip_generation' flag to the workflow, so we have an option to not run (and instead use the cached output of) the expensive candidate parse generation stage. | 21 February 2019, 04:13:35 UTC |
87b8666 | Michael Ringgaard | 06 February 2019, 18:34:19 UTC | Handle Unicode strings in Python frame API (#330) | 06 February 2019, 18:34:19 UTC |
797ec43 | Darren Garvey | 01 February 2019, 13:27:17 UTC | Fix benign error in KB UI. (#329) When deleting characters from the search bar the update handler gets a null item. Check it before dereferencing `.ref` on it. For reference, the console error: TypeError: Cannot read property 'ref' of undefined at Object.self.selectedItemChange (kb.js:46) at fn (eval at compile (angular.js:14605), <anonymous>:4:318) at m.d.(:8080/kb/anonymous function) [as itemChange] (http://ajax.googleapis.com/ajax/libs/angularjs/1.5.7/angular.min.js:83:232) at D (angular-material.min.js:13) at N (angular-material.min.js:13) at m.$digest (angular.js:17286) at b.$apply (angular.js:17552) at Pg.$$debounceViewValueCommit (angular.js:27516) at Pg.$setViewValue (angular.js:27488) at HTMLInputElement.l (angular.js:23730) | 01 February 2019, 13:27:17 UTC |
8f0d22d | Anders Thorhauge Sandholm | 28 January 2019, 10:06:26 UTC | Better extraction and upload of birth and death dates. (#328) | 28 January 2019, 10:06:26 UTC |
7ea2aa6 | Jordan Rupprecht | 22 January 2019, 21:39:05 UTC | [cpu] Replace _xgetbv identifier with xgetbv (#327) | 22 January 2019, 21:39:05 UTC |
93c2ab1 | Michael Ringgaard | 22 January 2019, 09:35:43 UTC | Alias transfer (#326) | 22 January 2019, 09:35:43 UTC |
e037430 | Michael Ringgaard | 14 January 2019, 22:53:18 UTC | Update README.md | 14 January 2019, 22:53:18 UTC |
051b8ad | rahul1980 | 14 January 2019, 21:50:13 UTC | Fix NotShiftOrMarkDelegate to work even if there is no MARK action (#324) When the training corpora only consists of single-token spans, MARK is not added to the action table. Therefore actions.mark() will return None and break NotShiftOrMarkDelegate. This PR fixes the delegate's behavior in this corner case. | 14 January 2019, 21:50:13 UTC |