5ea4768 | lopez | 28 April 2021, 20:34:25 UTC | add method signature to ensure api compatibility | 28 April 2021, 20:34:25 UTC |
77f4530 | lopez | 16 April 2021, 23:52:04 UTC | compute MD5 digest of a PDF file processed by the services and add it in the resultin TEI | 16 April 2021, 23:52:04 UTC |
deaed0c | Patrice Lopez | 14 April 2021, 23:33:26 UTC | Merge pull request #729 from kermitt2/lucbouge-patch-1 Fix NullPointer exception in PDFALTOOutlineSaxHandler | 14 April 2021, 23:33:26 UTC |
01e0e4f | Luca Foppiano | 13 April 2021, 00:54:52 UTC | revert modification to the bounding box information mapping | 13 April 2021, 00:54:52 UTC |
7433c89 | Luca Foppiano | 13 April 2021, 00:32:31 UTC | update bounding box calculation for outline and management of nested items | 13 April 2021, 00:32:31 UTC |
6ad419f | Luca Foppiano | 12 April 2021, 00:09:07 UTC | fix bounding box extraction and update test | 12 April 2021, 00:09:07 UTC |
4073084 | lopez | 05 April 2021, 20:52:37 UTC | update pdfalto | 05 April 2021, 20:52:37 UTC |
98793c3 | lopez | 22 March 2021, 05:29:38 UTC | comments for new citation parser methods | 22 March 2021, 05:29:38 UTC |
7cc6da4 | Luca Foppiano | 22 March 2021, 05:00:22 UTC | remove unused delft models from the crf docker image | 22 March 2021, 05:00:22 UTC |
0c880e7 | Patrice Lopez | 20 March 2021, 21:24:00 UTC | document the failing PDF case in biorXiv (too many blocks) | 20 March 2021, 21:24:00 UTC |
c82413e | Patrice Lopez | 20 March 2021, 21:21:17 UTC | add benchmark biorXiv with SciBERT-CRF header model | 20 March 2021, 21:21:17 UTC |
f36a9da | Patrice Lopez | 20 March 2021, 20:33:44 UTC | add benchmark PMC with SciBERT-CRF header model | 20 March 2021, 20:33:44 UTC |
955a0ec | lopez | 20 March 2021, 19:47:05 UTC | minor doc corrections | 20 March 2021, 19:47:05 UTC |
05a8649 | lopez | 20 March 2021, 19:17:42 UTC | dealing with markdown mood | 20 March 2021, 19:17:42 UTC |
34fe823 | lopez | 20 March 2021, 19:00:46 UTC | add new release benchmarking, update doc | 20 March 2021, 19:00:46 UTC |
3e9c5df | lopez | 20 March 2021, 04:59:46 UTC | update badges / demo | 20 March 2021, 04:59:46 UTC |
fa0db49 | lopez | 20 March 2021, 04:50:32 UTC | update badges | 20 March 2021, 04:50:32 UTC |
52f12fb | lopez | 20 March 2021, 04:36:37 UTC | typo | 20 March 2021, 04:36:37 UTC |
c74e3cd | lopez | 20 March 2021, 04:07:29 UTC | update docker version | 20 March 2021, 04:07:29 UTC |
d6040e6 | lopez | 20 March 2021, 04:05:30 UTC | update docker full image infos | 20 March 2021, 04:05:30 UTC |
54eed88 | lopez | 20 March 2021, 01:22:05 UTC | update Grobid versions | 20 March 2021, 01:22:05 UTC |
3c7e1be | lopez | 20 March 2021, 00:23:36 UTC | [Gradle Release Plugin] - new version commit: '0.7.0-SNAPSHOT'. | 20 March 2021, 00:23:36 UTC |
17f2d46 | lopez | 20 March 2021, 00:23:01 UTC | [Gradle Release Plugin] - pre tag commit: '0.6.2'. | 20 March 2021, 00:23:01 UTC |
a9a2aa5 | Patrice Lopez | 19 March 2021, 23:55:21 UTC | Merge pull request #721 from kermitt2/improved-dockerfile-delft Aligned Deep Learning Dockerfile with PR #703 | 19 March 2021, 23:55:21 UTC |
d225477 | lopez | 19 March 2021, 23:35:49 UTC | reduce image similarly as done with the crf-only one | 19 March 2021, 23:35:49 UTC |
a998649 | Patrice Lopez | 19 March 2021, 22:50:13 UTC | Merge pull request #707 from kermitt2/improvement/add-tests-sentence-segmentation Add more tests on sentence segmentation | 19 March 2021, 22:50:13 UTC |
350dfe6 | lopez | 19 March 2021, 22:26:57 UTC | Merge branch 'master' into improved-dockerfile-delft | 19 March 2021, 22:26:57 UTC |
14fe601 | lopez | 19 March 2021, 22:26:02 UTC | slim docker image by removing useless native binaries | 19 March 2021, 22:26:02 UTC |
b9e6f48 | Patrice Lopez | 19 March 2021, 22:23:59 UTC | fix tests (problem with GrobidProperties class... but fixed with PR 687 normally) | 19 March 2021, 22:23:59 UTC |
eb496bb | Patrice Lopez | 19 March 2021, 21:30:38 UTC | Merge branch 'master' into improvement/add-tests-sentence-segmentation | 19 March 2021, 21:30:38 UTC |
e9189e3 | Patrice Lopez | 19 March 2021, 21:30:06 UTC | simple fix (see PR 687 for correct singleton) | 19 March 2021, 21:30:06 UTC |
227dffa | lopez | 19 March 2021, 20:53:11 UTC | use OpenNLP as default sentence segmenter | 19 March 2021, 20:53:11 UTC |
06f98e0 | Patrice Lopez | 19 March 2021, 20:46:46 UTC | fix typo in model name | 19 March 2021, 20:46:46 UTC |
487edab | Patrice Lopez | 19 March 2021, 19:21:59 UTC | Merge pull request #725 from kermitt2/fix_crossref_multi-thread Updates to the CrossRef requests implementation | 19 March 2021, 19:21:59 UTC |
81e959f | Patrice Lopez | 19 March 2021, 19:11:39 UTC | review sleep time and add more comments | 19 March 2021, 19:11:39 UTC |
59acfc9 | Patrice Lopez | 19 March 2021, 16:58:17 UTC | Merge pull request #702 from kermitt2/bugfix/fix-training-figures-tables-generation Generation of training data for tables and figures outputs less element than in input | 19 March 2021, 16:58:17 UTC |
a6469ed | Luca Foppiano | 08 March 2021, 01:39:28 UTC | Merge branch 'master' into improvement/add-tests-sentence-segmentation | 08 March 2021, 01:39:28 UTC |
bb8bd0e | Patrice Lopez | 06 March 2021, 03:45:50 UTC | missing from PR #726 | 06 March 2021, 03:45:50 UTC |
1e4ba99 | Patrice Lopez | 06 March 2021, 03:41:47 UTC | Merge pull request #726 from kermitt2/bugfix/correction-features-fulltext Minor correction on citation features | 06 March 2021, 03:41:47 UTC |
c2fcbd9 | Patrice Lopez | 05 March 2021, 20:47:55 UTC | remove rate division not relevant anymore | 05 March 2021, 20:47:55 UTC |
b0493a4 | Patrice Lopez | 03 March 2021, 20:47:15 UTC | retrained fulltext model with fixed feature | 03 March 2021, 20:47:15 UTC |
dcfe221 | Aazhar | 03 March 2021, 08:58:03 UTC | use getMaxPoolSize. | 03 March 2021, 08:58:03 UTC |
556e8c9 | Aazhar | 03 March 2021, 08:56:10 UTC | check active threads before submitting new ones. | 03 March 2021, 08:56:10 UTC |
300efd3 | Luca Foppiano | 02 March 2021, 05:49:25 UTC | fixing wrong assignment of father id and wrong override of label, adding a test case, rename test to match the class name | 02 March 2021, 05:49:25 UTC |
a71e862 | Luca Foppiano | 01 March 2021, 01:11:48 UTC | Revert "minor corrections" This reverts commit 610426f3 | 01 March 2021, 01:11:48 UTC |
610426f | Luca Foppiano | 24 February 2021, 05:33:27 UTC | minor corrections | 24 February 2021, 05:33:27 UTC |
c7939ec | lopez | 22 February 2021, 00:26:45 UTC | aligned with PR #703 | 22 February 2021, 00:26:45 UTC |
870c305 | Patrice Lopez | 21 February 2021, 22:32:23 UTC | Merge pull request #703 from superdude264/dockerfile-enhancements Dockerfile enhancements | 21 February 2021, 22:32:23 UTC |
2340aa7 | Luc Bougé | 19 February 2021, 18:47:22 UTC | Update PDFALTOOutlineSaxHandler.java I cleaned up the correction as requested. Luc. | 19 February 2021, 18:47:22 UTC |
23fe7fe | Rob Lewis | 19 February 2021, 16:53:16 UTC | Upgrade runtime image to Java 11 | 19 February 2021, 16:53:16 UTC |
62599b4 | Luc Bougé | 16 February 2021, 14:32:11 UTC | Fix NullPointer exception in PDFALTOOutlineSaxHandler I observed the following error. févr. 12, 2021 3:20:44 PM org.grobid.core.document.Document addTokenizedDocument GRAVE: Cannot parse file: my_dir/grobid-0.6.1/grobid-home/tmp/ZW3PpM4E9g.lxml_outline.xml févr. 12, 2021 3:20:44 PM org.grobid.core.document.Document addTokenizedDocument GRAVE: Cannot parse file: my_dir/grobid-0.6.1/grobid-home/tmp/ZW3PpM4E9g.lxml_outline.xml févr. 12, 2021 3:20:46 PM org.grobid.core.engines.ProcessEngine createTraining INFOS: 2 files processed. it originates in an illegal nullPointer access in file PDFALTOOutlineSaxHandler.java. In understand that the error is not that "GRAVE". According to @kermitt2, it is simply one of the generated XML file resulting from the PDF parsing which is not XML valid (very frequent) - it has no impact because the outline file (containing the table of content outline if available embedded in the PDF) is not exploited by GROBID for the moment - it's to allow some possible improvements in the future. The error is more a reminder for the developers... the XML parser that classifies it as "GRAVE" but it would be rather INFO for us. Unfortunately, a side effect might have been overlooked. If father is null, then father.addChild(currentNode) is called with a nullPointer exception. This exception is caught by catch (Exception e) at line 372 in grobid-core/src/main/java/org/grobid/core/document/Document.java where the error message "Cannot parse" is misleading. I think an additional else is just missing. | 16 February 2021, 14:32:11 UTC |
e0c52c6 | Luca Foppiano | 05 February 2021, 09:27:35 UTC | add more tests related to sentence segmentation | 05 February 2021, 09:27:35 UTC |
2210323 | Rob Lewis | 28 January 2021, 20:26:38 UTC | remove unused volumes & mkdir commands | 29 January 2021, 16:23:20 UTC |
3f92ee1 | Rob Lewis | 28 January 2021, 20:25:21 UTC | do not copy onejar to runner image This artifact is not needed to run the service. | 28 January 2021, 20:40:15 UTC |
349425d | Rob Lewis | 28 January 2021, 20:24:26 UTC | shrink images created by Dockerfile.crf Maintains all artifacts & volumes from previous version. * Move unzip to builder. * Clear apt-cache in runner. | 28 January 2021, 20:40:15 UTC |
1fc6186 | Luca Foppiano | 27 January 2021, 23:44:45 UTC | simplification | 27 January 2021, 23:44:45 UTC |
bfc10f7 | lopez | 27 January 2021, 13:52:25 UTC | typo in doc | 27 January 2021, 13:52:25 UTC |
f7b1d8d | Luca Foppiano | 27 January 2021, 02:25:01 UTC | fixing possible miss of opened tables when the stream finishes | 27 January 2021, 08:40:41 UTC |
a9e1bd1 | Luca Foppiano | 14 January 2021, 08:22:06 UTC | adding assertion on tests and attempt a fix | 27 January 2021, 08:40:35 UTC |
84f2bef | Luca Foppiano | 14 January 2021, 03:49:11 UTC | add missing figures when only one figure is present | 27 January 2021, 08:40:25 UTC |
bfeed63 | lopez | 22 January 2021, 01:50:43 UTC | update files to ignore | 22 January 2021, 01:50:43 UTC |
5d2d814 | lopez | 11 January 2021, 00:00:24 UTC | use OpenNLP segmenter as default segmenter | 11 January 2021, 00:00:24 UTC |
5877faf | lopez | 10 January 2021, 23:59:21 UTC | update reference year | 10 January 2021, 23:59:21 UTC |
4ff805e | lopez | 07 January 2021, 04:31:22 UTC | language specification for sentence segmentation process | 07 January 2021, 04:31:22 UTC |
aa19f66 | lopez | 07 January 2021, 04:26:54 UTC | add coordinates and sentence segmentation options on the demo console | 07 January 2021, 04:26:54 UTC |
480b080 | lopez | 06 January 2021, 21:24:47 UTC | inject ORCID if available by consolidation | 06 January 2021, 21:24:47 UTC |
e44ff2c | lopez | 06 January 2021, 21:07:21 UTC | update name header models | 06 January 2021, 21:07:21 UTC |
1ca74c8 | Patrice Lopez | 06 January 2021, 18:02:54 UTC | Merge pull request #630 from kermitt2/orcid_author_header_model_annotation training data to improve prediction of authors combined with some ids (orcid) | 06 January 2021, 18:02:54 UTC |
64f54de | Patrice Lopez | 06 January 2021, 02:53:23 UTC | fix doc | 06 January 2021, 02:53:23 UTC |
434c8dc | Patrice Lopez | 06 January 2021, 02:36:01 UTC | Merge pull request #686 from kermitt2/batch-labeling-support [WIP] Exploit batch processing for Deep Learning models | 06 January 2021, 02:36:01 UTC |
79b6d12 | lopez | 31 December 2020, 15:13:12 UTC | clean doc | 31 December 2020, 15:13:12 UTC |
f41eda8 | lopez | 31 December 2020, 13:46:26 UTC | improve end-to-end process, see issue #626 | 31 December 2020, 13:46:26 UTC |
9bf0f4c | Patrice Lopez | 29 December 2020, 23:44:09 UTC | fix bug when no bib ref are extracted | 29 December 2020, 23:44:09 UTC |
ead4928 | Patrice Lopez | 29 December 2020, 13:37:48 UTC | update results | 29 December 2020, 13:37:48 UTC |
5ebdcbb | Patrice Lopez | 29 December 2020, 03:12:46 UTC | preserve sequence separation in case of DL batch labeling | 29 December 2020, 03:12:46 UTC |
6322358 | lopez | 29 December 2020, 02:47:23 UTC | group bibref processing to ensure usage of batch in DL | 29 December 2020, 02:47:23 UTC |
503c572 | lopez | 28 December 2020, 18:58:22 UTC | typos in doc | 28 December 2020, 18:58:22 UTC |
9675224 | lopez | 28 December 2020, 18:35:49 UTC | update doc for docker images | 28 December 2020, 18:35:49 UTC |
cb241e6 | lopez | 27 December 2020, 18:47:15 UTC | complete dockerfile for deep learning models | 27 December 2020, 18:47:15 UTC |
50ec324 | lopez | 26 December 2020, 16:07:58 UTC | add GPU detection to docker image via nvidia container toolkit | 26 December 2020, 16:07:58 UTC |
1d05bcc | Patrice Lopez | 25 December 2020, 00:35:43 UTC | Merge pull request #684 from kermitt2/elifesciences-workaround-fulltext-npe Avoid NPE in FulltextParser when the body text is empty or null | 25 December 2020, 00:35:43 UTC |
5686315 | Luca Foppiano | 24 December 2020, 07:35:22 UTC | avoid processing the fulltext and any downstream models if the body is empty | 24 December 2020, 07:35:22 UTC |
12c87f9 | lopez | 24 December 2020, 06:19:21 UTC | Merge branch 'master' of https://github.com/kermitt2/grobid | 24 December 2020, 06:19:21 UTC |
480da08 | lopez | 24 December 2020, 06:19:00 UTC | dockerfile and script for deep learning models with delft | 24 December 2020, 06:19:00 UTC |
28b5981 | Patrice Lopez | 23 December 2020, 15:37:43 UTC | Merge pull request #682 from kermitt2/fix-crossref-plus-token use Crossref-Plus-API-Token instead of Authorization | 23 December 2020, 15:37:43 UTC |
fbde00e | lopez | 23 December 2020, 07:17:25 UTC | update dockerfile with embeddings download | 23 December 2020, 07:17:25 UTC |
21076e6 | lopez | 21 December 2020, 12:07:11 UTC | a grobid Dockerfile to run deep learning delft models | 21 December 2020, 12:07:11 UTC |
abc8b85 | lopez | 18 December 2020, 09:51:35 UTC | use Crossref-Plus-API-Token instead of Authorization | 18 December 2020, 09:51:35 UTC |
19c3993 | Patrice Lopez | 15 December 2020, 08:42:15 UTC | Merge pull request #681 from elifesciences/fix-dl-data-generation-space-vs-tab replaced space by tab in DeLFTModel.label | 15 December 2020, 08:42:15 UTC |
ba2b767 | lopez | 03 December 2020, 19:28:26 UTC | typo ;) | 03 December 2020, 19:28:26 UTC |
cfe8028 | lopez | 03 December 2020, 18:11:01 UTC | Merge branch 'master' of https://github.com/kermitt2/grobid | 03 December 2020, 18:11:01 UTC |
33b50b7 | lopez | 03 December 2020, 18:10:45 UTC | fix mediabox for PDF display via pdfbox | 03 December 2020, 18:10:45 UTC |
c3418c4 | Patrice Lopez | 03 December 2020, 04:29:49 UTC | add header model with feature channel | 03 December 2020, 04:29:49 UTC |
e71a8b6 | Daniel Ecer | 01 December 2020, 11:37:19 UTC | replaced space by tab in DeLFTModel.label | 01 December 2020, 11:37:19 UTC |
35c46b0 | Patrice Lopez | 30 November 2020, 04:59:04 UTC | update of a citation model | 30 November 2020, 04:59:04 UTC |
f4b8fe6 | Patrice Lopez | 30 November 2020, 04:24:02 UTC | improve support DL feature models; use full model name for DL models | 30 November 2020, 04:24:02 UTC |
6658937 | Patrice Lopez | 29 November 2020, 23:00:48 UTC | Merge branch 'master' of https://github.com/kermitt2/grobid | 29 November 2020, 23:00:48 UTC |
cafabb2 | Patrice Lopez | 29 November 2020, 22:46:53 UTC | skip commit for all bert model files | 29 November 2020, 22:46:53 UTC |
60b06d7 | lopez | 29 November 2020, 22:44:17 UTC | adapt tests for properties | 29 November 2020, 22:44:17 UTC |
820ad26 | lopez | 29 November 2020, 22:19:40 UTC | simplify properties/parameters | 29 November 2020, 22:19:40 UTC |