https://github.com/google/sling

sort by:
Revision Author Date Message Commit Date
f408a14 More instances of s/blacklist/forbidden. 22 January 2021, 19:30:50 UTC
84944a6 s/blacklist/forbidden in phrases.cc 22 January 2021, 19:01:18 UTC
090a350 Update README.md (#445) Include reference to ringgaard/sling repo. 26 August 2020, 08:43:18 UTC
d2b3d4f Python API for document analyzer (#442) 19 May 2020, 15:03:25 UTC
af58a2d Silver type annotator and corpus splitter (#439) 28 February 2020, 17:41:12 UTC
d95c7ef RNN stacks (#437) 11 January 2020, 15:42:41 UTC
551f3f4 Fix closing recordio files from pyapi (#436) 20 December 2019, 11:54:32 UTC
3000f6f Update train_caspar.py Minor comment fix 10 December 2019, 10:47:29 UTC
74bf670 Update myelin.md Minor updates and fixes to the myelin documentation. 04 December 2019, 15:15:24 UTC
becdd06 Introduce resolver topic weight and increase mention weight. (#433) 19 November 2019, 11:02:57 UTC
5ed17a4 Rename silver mention annotator (#432) 18 November 2019, 15:59:12 UTC
b600fe5 Move annotators from ner to silver directory (#431) 18 November 2019, 13:02:13 UTC
75452a3 Rearrange parser training modules (#430) 18 November 2019, 09:41:14 UTC
93541e9 Integrate link graph into main wiki pipeline (#429) 15 November 2019, 15:36:28 UTC
8d85040 Parallel decompression of Wikidata dumps (#427) 11 November 2019, 13:02:15 UTC
f2f9910 Write parser model after training (#425) 09 November 2019, 10:31:12 UTC
66d005d Set max edit distance. (#426) 08 November 2019, 15:27:17 UTC
612c6c1 Transition-based cascaded parser trainer in C++ using Myelin (#424) 08 November 2019, 12:10:06 UTC
87cf72a Silver anaphora and relation annotations (#423) 08 November 2019, 11:59:47 UTC
5cb3d0c Global gradient function registration (#421) 02 November 2019, 10:28:38 UTC
1b6468a Document corpus reader (#418) 14 October 2019, 15:03:12 UTC
da9901c Upgrade to Bazel 1.0.0 (#419) 14 October 2019, 14:33:57 UTC
2850d72 Removing an additional (malformed) occurrence of 'party leader (Q1553195)'. (#417) 14 October 2019, 12:35:49 UTC
ce50ad9 Add clitics splitting for it and fr. (#416) 09 October 2019, 12:36:58 UTC
3134461 Fix a few wiki pipeline issues (#415) 09 October 2019, 12:15:35 UTC
d3daa67 Generalize scatter/gather ops (#413) 07 October 2019, 17:48:49 UTC
3d44ff3 Symbol table resizing (#414) 07 October 2019, 14:43:28 UTC
6fb4936 Deleting old template file (#410) 05 September 2019, 10:52:40 UTC
62230d6 Transition generation in C++ (#409) Verified by running against the Ontonotes dev corpus (>9.6K sentences) and checking that we get the same actions as the Python implementation. Bonus: Fix bug in frame/object.h, where we were holding a reference to a temporary functor. 04 September 2019, 19:52:05 UTC
a32c310 Phrase structure annotations (#408) 23 August 2019, 08:55:38 UTC
9adad67 Force Bazel upgrade in setup (#407) 21 August 2019, 11:21:26 UTC
8868545 Document annotation pipeline (#406) 21 August 2019, 08:38:03 UTC
ec97759 Fast snapshot GC (#405) 20 August 2019, 08:16:56 UTC
bd20762 Fix 'unknown' overwrite in wikibot (#402) 12 August 2019, 14:47:12 UTC
41b0027 Source and claim written in the same edit (#400) By adding the source to the claim before adding the claim to the item we only do one write to Wikidata (thanks to vrandezo) 12 August 2019, 14:30:07 UTC
5b9226c Fix bug in Express::AlwaysZero() (#398) 09 July 2019, 13:56:15 UTC
2a7f451 Sparse embedding updates (#397) 09 July 2019, 07:43:12 UTC
631f219 Plausibility model (#396) 03 July 2019, 09:03:47 UTC
64b48ed Split myelin simulator into separate module (#395) 01 July 2019, 13:05:12 UTC
f8e2b69 Make frame types optional in the trainer code (#393) 27 June 2019, 19:46:47 UTC
5312cf1 More fault tolerant wikibot (#394) 27 June 2019, 11:05:33 UTC
40be473 Fix dangling op producer (#392) 27 June 2019, 07:44:37 UTC
af2ebfe General reduction and transpose (#391) 25 June 2019, 14:29:17 UTC
10bb092 Fix copy-paste error in python/task/entity.py 19 June 2019, 14:06:50 UTC
2063648 More myelin ops and faster matmuls (#389) 19 June 2019, 11:43:18 UTC
5c848c2 Minor fixes in flow reading code in Python (#386) 14 June 2019, 16:57:58 UTC
4354ed9 Ensure qid label is a string in the parse signature (#388) 14 June 2019, 03:56:05 UTC
bdb4f7b Fix wget missing URL error while installing Bazel in setup.sh (#387) 13 June 2019, 16:53:28 UTC
bb45740 Custom inputs and output for NER labeler (#385) 12 June 2019, 09:22:10 UTC
1edcf38 Fix ontonotes conversion (#383) Use id files from conll.cemantix.org/2012/coref instead of the current URL. Shuffle the training corpus using a fix (but configurable) random seed in the trainer. 29 May 2019, 16:58:13 UTC
753ee75 Support for pagination and selection of individual categories in the Wikicat browser (#382) * Add pagination support * Add checkboxes to select individual categories for a signature * Add comments * Increase topk and page size 24 May 2019, 16:53:33 UTC
0db5ee3 Increase robustness and remove redundant str() calls. (#380) 21 May 2019, 22:24:22 UTC
f2f05e2 Fix str bug and add check to not upload previously deleted fact. (#379) 21 May 2019, 13:59:17 UTC
74f29ce Fix embedding reader code to work with python3 (#377) * Fix embedding reader code to work with python3 * Serialize suffix table as bytes instead of str(byterarray) 20 May 2019, 20:03:08 UTC
9565740 One wheel for all Python 3 versions (#378) 20 May 2019, 09:32:01 UTC
bc0c473 Show individual span counts in Wikicat browser, and a few other minor fixes (#376) 17 May 2019, 20:18:28 UTC
b51776e Cap nesting depth for MARK actions (#374) TODO: Max nesting depth is heuristically set to 5. we need to set this programmatically by looking at the training corpus. 16 May 2019, 17:08:19 UTC
d3d9c22 Allow both bytes and strings in Wikidata converter. (#375) * Allow both bytes and strings in Wikidata converter. 16 May 2019, 12:32:52 UTC
8656dbc Update install.md to refer to the latest PyTorch 15 May 2019, 17:28:00 UTC
7b476dc Move trainer code to python3 and pytorch1.1 (#370) Main changes: - Writing byte arrays in a flow's file writer. - Simplify suffix extraction code, since string slices now directly return characters instead of bytes. - Calls to tensor.item() at a couple of places (due to pytorch 1.1) - Usual changes to print, xrange, iteritems. Tested by training a small model on ~100 sentences. 15 May 2019, 17:23:33 UTC
0070aec Port wikibot to python3. (#373) * Port wikibot to python3. * Address review comment 15 May 2019, 14:34:28 UTC
a0841eb Allow both bytes and strings for pyapi methods (#372) 15 May 2019, 12:21:01 UTC
99c7ab7 Refactor gradient clipping (#369) 15 May 2019, 09:48:00 UTC
c70cbd6 Change __str__ and __repr__ for frames (#368) 15 May 2019, 08:14:57 UTC
5f729a8 Port Wikicat to Python3 (#367) - Make 'print' a function - Handle bytes <-> string conversions at a couple of places. Bonus: - Fix a sorting bug in Top Signatures - Show an error message if our selection formula is too restrictive. - Show a help message to clarify how signature scores are computed. 13 May 2019, 14:35:41 UTC
2591d18 Python 3 support for SLING API (#366) 09 May 2019, 16:09:06 UTC
727b108 Tweaks to the Wikicat browser (#365) - Push parse scoring before dedup so that low scoring dupes are removed (as opposed to the last k). - Some code clean up. - Add an id field to each parse, for future use. 09 May 2019, 14:15:47 UTC
e63b4bf Rewrite Wikicat browser in AngularJS (#364) The existing browser was very unwieldy since it was based on form submissions, and was getting harder to extend. This PR rewrites the browser in AngularJS, and highly simplifies it. The main changes are: - AngularJS provides data binding for free (we were doing it ourselves in the old browser). - We remove all kinds of scores, except the various match-type weights. - We remove "coarse-signatures" and instead only keep full span signatures. - We remove a few more settings, simplifying the look. - We introduce a "parse selection formula" which allows the user to enter a python formula (with support for special placeholders) to restrict parses to those that pass the formula. Bonus: Dedup the set of examples for each match-type. 07 May 2019, 14:32:10 UTC
967f616 Tracking of entity handles in resolver (#361) 03 May 2019, 13:02:55 UTC
fa6b718 Enhance wikibot to deal with category parser input. (#359) Enhance wikibot to deal with category parser input. Fix 'disappeared' and 'Minister' bug. 01 May 2019, 14:21:22 UTC
fb444b5 Gradient checking (#358) 05 April 2019, 13:50:07 UTC
fa023de NER labeling of documents (#356) 03 April 2019, 15:52:17 UTC
bea43fd Fix musical notation bug (#357) 03 April 2019, 08:29:27 UTC
7daee6d Wikidata musical notation data type (#355) 01 April 2019, 13:55:47 UTC
0edabbf Add documentation for Corpus class 29 March 2019, 09:44:59 UTC
6096332 Wikicat: Enforce category_contains constraint (#351) If a category has "category_contains" populated, then we restrict its members to belong to that class. e.g. ("american cyclists", "category_contains", "human") and ("Lance Armstrong", "instance of", "human"). The instance_of check is done transitively using subclass-of edges, and if a category doesn't have a category_contains role, then the check is not performed. 26 March 2019, 17:41:12 UTC
201a881 Output SLING frames as JSON in Python API (#352) 26 March 2019, 10:25:04 UTC
1f88ffc Fix refcounting bug in PyStore::Resolve() (#349) 22 March 2019, 21:15:45 UTC
f948bd4 Named entity annotators (#342) 22 March 2019, 15:26:06 UTC
68c1e11 Fix bug #347 (#348) The Python API for `FactsFor()` doesn't parse keyword args, and I had mistaken changed the closure arg from `False` to `closure=False` in #346. 22 March 2019, 15:00:06 UTC
6b0cde7 Tests for FactMatcher (#346) - Refactor fact-matching code so that bulk of the logic is in a new match_type method. - Add >100 test cases for match_type across different kinds of properties, including extensive tests of date-valued properties whose values can be dates or ints or qids. - More regression cases or cases where the matcher's output is desired can simply be added to the list of hard-coded test cases. 21 March 2019, 17:09:22 UTC
2956799 Fix subsumed calculation (#345) * Fix subsumed calculation * Add date special case handling 20 March 2019, 21:51:44 UTC
080c05a Minor fixes to the Wikicat browser (#344) - Some cosmetic fixes - Fix a counting bug while generating recordio 19 March 2019, 14:30:02 UTC
a6a6775 Enhancements to the Wikicat browser (#343) - Hovering over fact-matching counts now brings up a list of qids that illustrate that count. - While browsing a signature, the user now has an option to generate Wikibot recordio files directly from the browser. - Modify the fact member so that for it stores exemplars for each match type. Further, this list is exhaustive for NEW, ADDITIONAL, and SUBSUMED_BY_EXISTING. This is used in both the new features above. - Simplified the browser code a bit, and removed some unnecessary Javascript. 18 March 2019, 17:29:57 UTC
64c35cf Fix the problem with women (#341) 15 March 2019, 13:31:01 UTC
fce18b2 Infobox aliases and wiki links (#339) 15 March 2019, 11:49:51 UTC
b765579 Various Wikicat fixes (#340) - Omit outputting low-frequency (pid, qid) spans. - Move subsumption checking code to Python - We only get the closure from C++ code now - Subsumption code in Python checks for genre, subclass, part_of, parent_org, located_in, and date subsumption. More properties can be added easily. - Fix a minor bug in the browser. - Expose the Resolve() method in the Python API. 13 March 2019, 17:45:32 UTC
97746ce Token styles (#338) 13 March 2019, 13:13:27 UTC
05373f7 XML frame reader (#337) 11 March 2019, 09:41:31 UTC
cdacfc0 Browser for category parses (#336) Allows browsing by category, signature, or top signatures (denoted by 'top'). Supports coarse and fine signatures, and three metrics to sort the parses with. Allows customization of fact-matching scores. 26 February 2019, 22:02:44 UTC
7c53dec A few fixes to wikibot.py (#332) 25 February 2019, 10:13:42 UTC
b0388ed Check HTTP path after unescaping (#335) 25 February 2019, 10:00:37 UTC
1c09339 Fact-matching statistics for category parsing (#331) * FactMatcher that computes how proposed facts in a parse match with existing facts for the same property. They are classified as new, or matching existing facts (either exactly or via subsumption), or conflicting (e.g. for unique-valued properties) or additional facts. * A workflow task that attaches this information to each parse. Other changes: * Add methods to FactExtractor for: (a) only reporting facts for specified properties. (b) report facts with or without backoff (aka closure). Previously the only supported mode was with backoff. I have confirmed that this change doesn't affect the runtime of the backoff mode. (c) Add a method to check if one value subsumes another. These methods also come with the corresponding Python API methods. * Skip empty parses in the parse generator. * A performance improvement: we replace a sling.Array with a python list, allowing the sling.Store behind the array to be garbage collected. * Store the list of category members in the category frame. These only cover legit members, e.g. they exclude subcategories. * Add 'type of sport' and 'cause of death' to the custom taxonomy. * Replace prior and member_score values with their geometric means. This makes it fair to compare a parse with many spans vs a parse with only a few spans (since the priors and member_scores are multiplicative across spans). * Also attach the coarse signature to each parse. * Allow load_kb() to also take filename arguments, and make it use a global pool of loaded KBs. This way multiple tasks can share a KB, which saves both memory and runtime. * Add a 'skip_generation' flag to the workflow, so we have an option to not run (and instead use the cached output of) the expensive candidate parse generation stage. 21 February 2019, 04:13:35 UTC
87b8666 Handle Unicode strings in Python frame API (#330) 06 February 2019, 18:34:19 UTC
797ec43 Fix benign error in KB UI. (#329) When deleting characters from the search bar the update handler gets a null item. Check it before dereferencing `.ref` on it. For reference, the console error: TypeError: Cannot read property 'ref' of undefined at Object.self.selectedItemChange (kb.js:46) at fn (eval at compile (angular.js:14605), <anonymous>:4:318) at m.d.(:8080/kb/anonymous function) [as itemChange] (http://ajax.googleapis.com/ajax/libs/angularjs/1.5.7/angular.min.js:83:232) at D (angular-material.min.js:13) at N (angular-material.min.js:13) at m.$digest (angular.js:17286) at b.$apply (angular.js:17552) at Pg.$$debounceViewValueCommit (angular.js:27516) at Pg.$setViewValue (angular.js:27488) at HTMLInputElement.l (angular.js:23730) 01 February 2019, 13:27:17 UTC
8f0d22d Better extraction and upload of birth and death dates. (#328) 28 January 2019, 10:06:26 UTC
7ea2aa6 [cpu] Replace _xgetbv identifier with xgetbv (#327) 22 January 2019, 21:39:05 UTC
93c2ab1 Alias transfer (#326) 22 January 2019, 09:35:43 UTC
e037430 Update README.md 14 January 2019, 22:53:18 UTC
051b8ad Fix NotShiftOrMarkDelegate to work even if there is no MARK action (#324) When the training corpora only consists of single-token spans, MARK is not added to the action table. Therefore actions.mark() will return None and break NotShiftOrMarkDelegate. This PR fixes the delegate's behavior in this corner case. 14 January 2019, 21:50:13 UTC
back to top