https://github.com/kermitt2/grobid

sort by:
Revision Author Date Message Commit Date
2e70ade set rights in docker image in case it will not run as user-root (e.g. kubernetes) 05 December 2019, 09:43:32 UTC
0b6d1d7 typo 26 November 2019, 16:08:31 UTC
82c0181 a bit more explanation on building local docker image 26 November 2019, 15:54:27 UTC
cfa50af bug fixing for affiliation block fragments 21 November 2019, 15:21:37 UTC
63885c9 remove non TEI-valid attribute in table 20 November 2019, 15:17:52 UTC
e520000 consistent formatting of ref in doc 20 November 2019, 15:17:21 UTC
5d3ad03 typo in ref 16 November 2019, 18:22:34 UTC
4d4b8d8 add one more ref. 15 November 2019, 22:52:23 UTC
7a4fa7d update appinfo TEI output 13 November 2019, 13:57:51 UTC
a9a0c54 add robustness for pdfalto recent serialization problems 10 November 2019, 18:25:26 UTC
b4644c2 last issue with #519 09 November 2019, 23:15:55 UTC
16c432c tackle the case of doi introduced with string Doi (was missed so far) 08 November 2019, 16:44:26 UTC
bf79e8f Merge pull request #519 from kermitt2/fixes-n-fold-evaluation Fixes n fold evaluation 08 November 2019, 07:36:08 UTC
7f0a04f Update/Fix minor doc errors 07 November 2019, 19:50:51 UTC
5a38058 Computing average as the macro average of the micro averages of all the results from each fold #516 06 November 2019, 07:06:23 UTC
3951dc0 Compute best and worst model using micro F1 score #516 04 November 2019, 23:45:21 UTC
9ad861e fixing pdfviewer demo Former-commit-id: 4a7e4f3b93e46caa81e7507e659631551d8e7dd8 17 October 2019, 09:49:45 UTC
a4d35d2 update doc with new version number Former-commit-id: 41ab93618f9515e8638c0237f9b91234f18b7b6b 17 October 2019, 08:22:28 UTC
c0806fa [Gradle Release Plugin] - new version commit: '0.6.0-SNAPSHOT'. Former-commit-id: 00c0dfc2ff970dbb77f0af11d9d34775dbda4f75 16 October 2019, 13:58:40 UTC
4cfebc8 [Gradle Release Plugin] - pre tag commit: '0.5.6'. Former-commit-id: c395489b14dae2df9e4fc0b6dbdb8da1f7e7a5ba 16 October 2019, 13:56:28 UTC
00831b0 Revert "[Gradle Release Plugin] - pre tag commit: '0.5.6'." This reverts commit cc73bcaefd32a7f0ace4d0eae532a92df59e3f21 [formerly 0820ec13889c1a21eda9b0c546535be5df1b1bc2]. Former-commit-id: c12c0002e17960ff11b6f5851a5c97465ff291e0 16 October 2019, 13:50:57 UTC
dd5ffc1 Revert "[Gradle Release Plugin] - new version commit: '0.6.0'." This reverts commit 95580d901760f1cdd971fbcb302a70f8db71aa31 [formerly b41aa84a09d57167cf32a014a733d7615d029410]. Former-commit-id: 9b5fa3643cbbad3b06f7edeb02f02bd6c5184e62 16 October 2019, 13:49:58 UTC
95580d9 [Gradle Release Plugin] - new version commit: '0.6.0'. Former-commit-id: b41aa84a09d57167cf32a014a733d7615d029410 16 October 2019, 13:40:15 UTC
cc73bca [Gradle Release Plugin] - pre tag commit: '0.5.6'. Former-commit-id: 0820ec13889c1a21eda9b0c546535be5df1b1bc2 16 October 2019, 13:38:17 UTC
48cda84 update of linux pdfalto binaries, release benchmark Former-commit-id: 388f888255d132aeb35c619618cc74cc334197b4 16 October 2019, 08:09:48 UTC
bd2b272 Merge pull request #496 from kermitt2/pdfalto_parser_fixes pdfalto binaries update pdfalto parser updates Fix for #509 #152 Doc update Former-commit-id: f85acd5afe094d1959b3faff6705505de0ef47c6 14 October 2019, 18:05:16 UTC
2ea9d7e cleaning Former-commit-id: 0c8863b508ed9343af7d19e663336dcd8c5c8a49 14 October 2019, 17:46:00 UTC
1622c58 fix #509 #152 and state to preserve spaces in xml Former-commit-id: 369de2ceda78238cdf38f41c0d67a31da1bf08ca 14 October 2019, 17:43:06 UTC
e10741b update readme for new release 0.5.6 Former-commit-id: 3eed2d6611d0a9a9a04aba9c7745dc1d0d37c539 14 October 2019, 17:41:15 UTC
60b8535 adding pdfalto for windows Former-commit-id: c5d98478999b03e62a22f3367104c25c2a38e36d 13 October 2019, 14:41:45 UTC
52a80b4 adding pdfalto for mac-64 Former-commit-id: 0fb990455f0daae30401b431bc9796dd437a28fa 29 September 2019, 05:47:13 UTC
cd449d4 fix test, update lin-64 pdfalto for bold/italic capture Former-commit-id: 7066d20119f433bffbf79ce9f37e57e0b12ea55b 28 September 2019, 20:04:12 UTC
7f3e08e Remove the font name test for bold/italic because it is done in pdfalto now Former-commit-id: 26efb67bb9ac323c99c4d369ad15a61c77b9e4ba 28 September 2019, 19:16:39 UTC
b6ac573 Merge branch 'master' into pdfalto_parser_fixes Former-commit-id: 2eaed4516e1e90cb78ce9e8883cd97d5454d0dc5 28 September 2019, 19:13:23 UTC
1adc351 Merge pull request #498 from kermitt2/improved-dehypenisation Improved dehypenisation Former-commit-id: 472324ac14e4b6489972fd6e086360525d6736d4 28 September 2019, 19:06:28 UTC
794297e fix test Former-commit-id: 3212b61d1eb3d99b118dae35074c26003c8d639d 28 September 2019, 17:40:36 UTC
f1ddb17 support unicode strings Former-commit-id: f395e2104f9b3d61157e11e65af1ce85b901deff 28 September 2019, 17:39:22 UTC
845ecbb do not use anymore deprecated dehyphenization methods in grobid core Former-commit-id: 883b3cbf327afced5f3ee88e1454c948e5f07c69 28 September 2019, 17:21:03 UTC
df4e01f Merge branch 'master' into improved-dehypenisation Former-commit-id: 5f4af225ee68a7656172dddd9063e2c49b552cf1 28 September 2019, 14:58:48 UTC
8d5afd2 doc update Former-commit-id: 24e6f0ee4834a3d2e79340747688b69b7003e40e 27 September 2019, 18:49:11 UTC
9d0344b Fix #505 Former-commit-id: ce45a96c3e3318e6a54c234041f2e0a435734498 23 September 2019, 07:08:24 UTC
71e3e17 Do not use the XMP embedded metadata for the moment; cleaning Former-commit-id: bc18bed3e6b963113f24c0fac8b872ccf6423471 14 September 2019, 05:14:21 UTC
bc0d325 Review usage of XMP PDF embedded metadata Former-commit-id: 8086c8cb0f4e3955896a4af28c1c612e1e17ca5e 13 September 2019, 19:07:34 UTC
0ce0cf3 cleaning Former-commit-id: bf7e1de523e2d5a0b5034888dcc28196c89f61d1 13 September 2019, 05:32:39 UTC
1ff82c4 merge with master for benchmark Former-commit-id: c71f8791a24670dd519cb8c5e32c98473f72c07a 13 September 2019, 05:02:01 UTC
10be7af fix merging issue with master Former-commit-id: bc6bd9ad70f4be00f7154ca8237510d9242b89ac 13 September 2019, 05:01:07 UTC
ef8a54e Merge pull request #486 from kermitt2/duplicated-body-parts-476 Avoid duplicated body part in the abstract Former-commit-id: f1845462e4b0e926cab7c8fc229153a259937014 12 September 2019, 19:49:20 UTC
6ece1fb Add a cleaning method for abstract working with layout tokens Former-commit-id: 06da47f7ebebc2b4e5b97b06121d882fc2a8a15a 12 September 2019, 19:32:31 UTC
4fb374f fix #424, fix labeled abstract mapping Former-commit-id: 6a9e16768ff3a794aad0acd005814e9b9bbbfe14 12 September 2019, 12:25:59 UTC
78fb889 review processShort; fix bug for DocumentPiece handling in feature generation Former-commit-id: 345c6ae5c19b97e6df5c7ee2eb71d67d1630a26a 12 September 2019, 07:34:55 UTC
0308066 add model declaration for dataseer Former-commit-id: 1a1653b19e349f6e6a5aa976d268d8cdf02e2497 11 September 2019, 08:30:23 UTC
43ad93e Merge pull request #280 from kermitt2/check-evaluation Improvement in evaluation framework Former-commit-id: 38489ed87ca296f77451a339ceb8ccc8c0ba0bb7 11 September 2019, 08:25:14 UTC
6fc4f4b ignore submodule grobid-keyterm Former-commit-id: 12e392c56d2e9c1898facb711cf82357b2b0936b 11 September 2019, 08:21:18 UTC
3fad113 Merge branch 'master' of https://github.com/kermitt2/grobid Former-commit-id: db33786b3518e17ddeb6ecec36c333692c402b71 07 September 2019, 23:59:42 UTC
2ae6076 avoid that the python.virtualEnv property breaks the modules performing checkProperties Former-commit-id: c534c417caee94eed1a8bf6b6b109c3f026a7321 07 September 2019, 23:58:59 UTC
b7c15e9 Implementing suggestions and move code into methods + adding some unit tests Former-commit-id: 02612ffbd6c00a6d7e3460d08ff221b66f527052 07 September 2019, 09:19:09 UTC
f7f1030 adding subList by Offset for layout tokens Former-commit-id: a22098e68133785975aa9524b9755161ddcfa601 06 September 2019, 08:46:42 UTC
dc2d834 getting instance of GrobidProperties before running tests Former-commit-id: d6b1d0e55a41f0f77ed65f7984608abb72edd022 06 September 2019, 06:49:08 UTC
655d8b1 cosmetics Former-commit-id: 4b14c67a1b74a40d0fe1949887541fc99bc13ce7 06 September 2019, 06:47:22 UTC
5a4e921 avoiding going out of bounds Former-commit-id: f6b243425177751d762e2f124260b9d99ef84f77 06 September 2019, 05:55:28 UTC
0040cf7 improving dehypenisation using coordinates to check breakline Former-commit-id: 3ccfd89e4381cf5488fb7d9d44061c96e77e8ac6 06 September 2019, 05:29:46 UTC
1cd280e cleanup dehypenisation Former-commit-id: fc0cafa787d5d318c00e9bccea91975fd51839dd 06 September 2019, 05:29:32 UTC
74932ca improving naming Former-commit-id: 9e44d1c007d5fa8cad0450796a695692343343f7 04 September 2019, 00:07:41 UTC
6b1cd96 fixing extraction of font styles from ALTO format #495 Former-commit-id: 920113efb4697aa73f04fba1fa0dae04135a5209 03 September 2019, 23:54:34 UTC
79e9869 adding more tests on pdfalto parser and trying to fix issues with bold/italic and subscript/superscript Former-commit-id: 8c2d2d504d7a0c18bbfe4f3bc82eb80e59043d98 03 September 2019, 08:24:44 UTC
a14e427 extra explanations on grobid-home for the batch mode to avoid any confusions Former-commit-id: 64a2a46eff406920b2162f9d9d467b2521d4afc2 31 August 2019, 19:18:35 UTC
a88d851 correct spelling in new doc Former-commit-id: 27cde82e3439428364ab61c375879031af810a65 28 August 2019, 02:13:35 UTC
c7b922a documentation for n-folds evaluation Former-commit-id: 2adac376ee56023a2a4ffef85bee42e2229acf61 28 August 2019, 01:44:55 UTC
ec7cfa2 Implementing review remarks #453 Former-commit-id: 2a6ab0988339c0ecd4e1a382ee72ce844e3eb686 27 August 2019, 23:26:44 UTC
813de2a use previous processShort for all short texts Former-commit-id: 2087e78a0cacc959f93a47c2dc492616ae3e9a47 23 August 2019, 11:48:20 UTC
76c8f01 rollback Former-commit-id: 377ad90264ea41c9919f5bad7cf43f8637d59aa8 22 August 2019, 20:35:03 UTC
b1ee6b1 update processShort for applying the fulltext model to short piece of texts like the abstract Former-commit-id: 626ad60fea3093ae56dc08f8477b7eec0417e9b9 22 August 2019, 15:36:34 UTC
7fa9f4a better PMID and PMC ID recognition, update citation model with some PMID examples Former-commit-id: 25258577dcc3b409b20341f19eea56360191ad3d 22 August 2019, 14:02:31 UTC
08e5064 cleaning remaining bin/ Former-commit-id: 17d83dd87b2299f421d0a80d722dde13bff87ca6 20 August 2019, 19:01:36 UTC
ea612a0 Merge pull request #488 from elifesciences/added-bin-to-gitignore added bin to .gitignore Former-commit-id: 01b92384a2bb3df0a45bf4f59ed5e56901602d87 20 August 2019, 18:59:15 UTC
9b28884 added bin to .gitignore Former-commit-id: 5f71cd663458528834ef0d8f19ce9f2088f5eaae 20 August 2019, 13:22:35 UTC
4316344 create valid DocumentPiece for further structuring abstract Former-commit-id: c6d29308a36c34fe1dcfed7c3db224a0153578c7 20 August 2019, 12:32:41 UTC
df92b5a saved by a test :-) Former-commit-id: 2b0915310596aec9fa5edde8aa1b366a6d412e11 20 August 2019, 09:11:26 UTC
979c1f9 Adding more tests and moving code around Former-commit-id: ce933aa5ad7079deb10cc282e2e2d9ed5a4fa9b4 20 August 2019, 09:06:02 UTC
da41b09 adding more tests for evaluation and fixing small bug on support metrics Former-commit-id: abc94909f21d6a09fa3023b5e8ddd9e2398c3b45 20 August 2019, 08:27:04 UTC
c65199c document optional parameter includeRawCitations for patent processing Former-commit-id: bb8cf62584b57e6ee2f364d1e405bf245abbe915 16 August 2019, 06:22:40 UTC
50e4b01 Merge pull request #468 from elifesciences/fix-label-task-very-special-characters added workaround for setting JEP value with very special characters Former-commit-id: 27a1eed657c8bef6f6202b5678868b1e7ed96011 15 August 2019, 18:47:09 UTC
8df9a42 Merge pull request #483 from kermitt2/option-442 Add optional raw reference string in results, see #442 Former-commit-id: cad7683b97f3f0ce3060fbd00e18723e7f055c2d 15 August 2019, 17:31:40 UTC
4ad19e3 adapt tests for the option to add the raw reference string to the extracted citation parsed results Former-commit-id: 662c814b805b4f93dfdcb068af395b035fe2a045 15 August 2019, 16:09:14 UTC
2fb9d2b documentation about the option to add the raw reference string to the extracted citation parsed results Former-commit-id: dcd1c2fc041267ed21217c01fbecaef255a84f78 15 August 2019, 16:00:37 UTC
17850e3 add option to get the raw reference string in the extracted citation parsed results Former-commit-id: b073e9af2680e5aa7131828ca03657fe58c70750 15 August 2019, 15:14:51 UTC
364d792 Merge pull request #454 from kermitt2/jep_macOs [wip] better integration with Delft via JEP Former-commit-id: 379c77ac22f950c4e5c632ac276145dde7bf34c0 13 August 2019, 18:49:59 UTC
2f68481 revert delft as default sequence labelling Former-commit-id: 5e3a81c619f8b56ef185a73cd61c62c643ca2cdb 13 August 2019, 18:40:04 UTC
c9c0d75 remove useless trace Former-commit-id: d6226a8d2c5cb37553f93ba3b9cb019fa3627a1d 13 August 2019, 18:38:57 UTC
77583cf Remove 10-fold from date trainer - forgot there from testing Former-commit-id: 7fc02a888eda01a7771df33bffee2db9f1268d35 09 August 2019, 09:00:42 UTC
c45a298 fixing test (cherry picked from commit 6bea136d0313d5748b06866632df11c1717fe931 [formerly 67243f42c09aac9f16ef5f7d1b23472058a99f10]) Former-commit-id: 270c83ed5f7a5a14e98865e284c01dac0fca7e1a 08 August 2019, 03:01:36 UTC
6bea136 fixing test Former-commit-id: 67243f42c09aac9f16ef5f7d1b23472058a99f10 08 August 2019, 02:59:52 UTC
440bde4 Merge branch 'master' into check-evaluation Former-commit-id: 24487e3c4a14af6d09f0ebbcf01bf9345a35839a 08 August 2019, 02:47:34 UTC
e00988f minor cosmetics, renaming test on pdf alto to match the main class Former-commit-id: 1c343f7d2eb4878f12baad2e30669427067ffab4 08 August 2019, 02:44:36 UTC
7502dcb Update pdfalto with last fixes Former-commit-id: f4e945d7238793323803ea3af30e8fde0cbeb613 07 August 2019, 23:23:07 UTC
264656c Merge pull request #479 from elifesciences/disable-header-heuristics optionally disable header heuristics Former-commit-id: aefa5df9d0202c61f2ad522160a3c836eca1e2bf 07 August 2019, 21:40:08 UTC
f4ff694 changed header us heuristics default to true Former-commit-id: 60215174217da258ad1541b0000a2f21a46b7f93 07 August 2019, 17:57:37 UTC
2726f65 disable header heuristics by default Former-commit-id: 9395ce1b80cf9040b2be8e7b9e90a5dea32bfbb0 07 August 2019, 16:42:19 UTC
691b467 create training data: log full exception (#471) Logs the full exception rather than just the message. This helped to narrow down #470 Former-commit-id: 4f3a906fb7fc9e4eec060ce64420d3af063986f5 30 July 2019, 12:36:46 UTC
8df7a0a improving documentation Former-commit-id: cb1f5383b755ea88357c7475908b924d1bdce82a 29 July 2019, 06:37:12 UTC
back to top