https://github.com/kermitt2/grobid

sort by:
Revision Author Date Message Commit Date
5ea4768 add method signature to ensure api compatibility 28 April 2021, 20:34:25 UTC
77f4530 compute MD5 digest of a PDF file processed by the services and add it in the resultin TEI 16 April 2021, 23:52:04 UTC
deaed0c Merge pull request #729 from kermitt2/lucbouge-patch-1 Fix NullPointer exception in PDFALTOOutlineSaxHandler 14 April 2021, 23:33:26 UTC
01e0e4f revert modification to the bounding box information mapping 13 April 2021, 00:54:52 UTC
7433c89 update bounding box calculation for outline and management of nested items 13 April 2021, 00:32:31 UTC
6ad419f fix bounding box extraction and update test 12 April 2021, 00:09:07 UTC
4073084 update pdfalto 05 April 2021, 20:52:37 UTC
98793c3 comments for new citation parser methods 22 March 2021, 05:29:38 UTC
7cc6da4 remove unused delft models from the crf docker image 22 March 2021, 05:00:22 UTC
0c880e7 document the failing PDF case in biorXiv (too many blocks) 20 March 2021, 21:24:00 UTC
c82413e add benchmark biorXiv with SciBERT-CRF header model 20 March 2021, 21:21:17 UTC
f36a9da add benchmark PMC with SciBERT-CRF header model 20 March 2021, 20:33:44 UTC
955a0ec minor doc corrections 20 March 2021, 19:47:05 UTC
05a8649 dealing with markdown mood 20 March 2021, 19:17:42 UTC
34fe823 add new release benchmarking, update doc 20 March 2021, 19:00:46 UTC
3e9c5df update badges / demo 20 March 2021, 04:59:46 UTC
fa0db49 update badges 20 March 2021, 04:50:32 UTC
52f12fb typo 20 March 2021, 04:36:37 UTC
c74e3cd update docker version 20 March 2021, 04:07:29 UTC
d6040e6 update docker full image infos 20 March 2021, 04:05:30 UTC
54eed88 update Grobid versions 20 March 2021, 01:22:05 UTC
3c7e1be [Gradle Release Plugin] - new version commit: '0.7.0-SNAPSHOT'. 20 March 2021, 00:23:36 UTC
17f2d46 [Gradle Release Plugin] - pre tag commit: '0.6.2'. 20 March 2021, 00:23:01 UTC
a9a2aa5 Merge pull request #721 from kermitt2/improved-dockerfile-delft Aligned Deep Learning Dockerfile with PR #703 19 March 2021, 23:55:21 UTC
d225477 reduce image similarly as done with the crf-only one 19 March 2021, 23:35:49 UTC
a998649 Merge pull request #707 from kermitt2/improvement/add-tests-sentence-segmentation Add more tests on sentence segmentation 19 March 2021, 22:50:13 UTC
350dfe6 Merge branch 'master' into improved-dockerfile-delft 19 March 2021, 22:26:57 UTC
14fe601 slim docker image by removing useless native binaries 19 March 2021, 22:26:02 UTC
b9e6f48 fix tests (problem with GrobidProperties class... but fixed with PR 687 normally) 19 March 2021, 22:23:59 UTC
eb496bb Merge branch 'master' into improvement/add-tests-sentence-segmentation 19 March 2021, 21:30:38 UTC
e9189e3 simple fix (see PR 687 for correct singleton) 19 March 2021, 21:30:06 UTC
227dffa use OpenNLP as default sentence segmenter 19 March 2021, 20:53:11 UTC
06f98e0 fix typo in model name 19 March 2021, 20:46:46 UTC
487edab Merge pull request #725 from kermitt2/fix_crossref_multi-thread Updates to the CrossRef requests implementation 19 March 2021, 19:21:59 UTC
81e959f review sleep time and add more comments 19 March 2021, 19:11:39 UTC
59acfc9 Merge pull request #702 from kermitt2/bugfix/fix-training-figures-tables-generation Generation of training data for tables and figures outputs less element than in input 19 March 2021, 16:58:17 UTC
a6469ed Merge branch 'master' into improvement/add-tests-sentence-segmentation 08 March 2021, 01:39:28 UTC
bb8bd0e missing from PR #726 06 March 2021, 03:45:50 UTC
1e4ba99 Merge pull request #726 from kermitt2/bugfix/correction-features-fulltext Minor correction on citation features 06 March 2021, 03:41:47 UTC
c2fcbd9 remove rate division not relevant anymore 05 March 2021, 20:47:55 UTC
b0493a4 retrained fulltext model with fixed feature 03 March 2021, 20:47:15 UTC
dcfe221 use getMaxPoolSize. 03 March 2021, 08:58:03 UTC
556e8c9 check active threads before submitting new ones. 03 March 2021, 08:56:10 UTC
300efd3 fixing wrong assignment of father id and wrong override of label, adding a test case, rename test to match the class name 02 March 2021, 05:49:25 UTC
a71e862 Revert "minor corrections" This reverts commit 610426f3 01 March 2021, 01:11:48 UTC
610426f minor corrections 24 February 2021, 05:33:27 UTC
c7939ec aligned with PR #703 22 February 2021, 00:26:45 UTC
870c305 Merge pull request #703 from superdude264/dockerfile-enhancements Dockerfile enhancements 21 February 2021, 22:32:23 UTC
2340aa7 Update PDFALTOOutlineSaxHandler.java I cleaned up the correction as requested. Luc. 19 February 2021, 18:47:22 UTC
23fe7fe Upgrade runtime image to Java 11 19 February 2021, 16:53:16 UTC
62599b4 Fix NullPointer exception in PDFALTOOutlineSaxHandler I observed the following error. févr. 12, 2021 3:20:44 PM org.grobid.core.document.Document addTokenizedDocument GRAVE: Cannot parse file: my_dir/grobid-0.6.1/grobid-home/tmp/ZW3PpM4E9g.lxml_outline.xml févr. 12, 2021 3:20:44 PM org.grobid.core.document.Document addTokenizedDocument GRAVE: Cannot parse file: my_dir/grobid-0.6.1/grobid-home/tmp/ZW3PpM4E9g.lxml_outline.xml févr. 12, 2021 3:20:46 PM org.grobid.core.engines.ProcessEngine createTraining INFOS: 2 files processed. it originates in an illegal nullPointer access in file PDFALTOOutlineSaxHandler.java. In understand that the error is not that "GRAVE". According to @kermitt2, it is simply one of the generated XML file resulting from the PDF parsing which is not XML valid (very frequent) - it has no impact because the outline file (containing the table of content outline if available embedded in the PDF) is not exploited by GROBID for the moment - it's to allow some possible improvements in the future. The error is more a reminder for the developers... the XML parser that classifies it as "GRAVE" but it would be rather INFO for us. Unfortunately, a side effect might have been overlooked. If father is null, then father.addChild(currentNode) is called with a nullPointer exception. This exception is caught by catch (Exception e) at line 372 in grobid-core/src/main/java/org/grobid/core/document/Document.java where the error message "Cannot parse" is misleading. I think an additional else is just missing. 16 February 2021, 14:32:11 UTC
e0c52c6 add more tests related to sentence segmentation 05 February 2021, 09:27:35 UTC
2210323 remove unused volumes & mkdir commands 29 January 2021, 16:23:20 UTC
3f92ee1 do not copy onejar to runner image This artifact is not needed to run the service. 28 January 2021, 20:40:15 UTC
349425d shrink images created by Dockerfile.crf Maintains all artifacts & volumes from previous version. * Move unzip to builder. * Clear apt-cache in runner. 28 January 2021, 20:40:15 UTC
1fc6186 simplification 27 January 2021, 23:44:45 UTC
bfc10f7 typo in doc 27 January 2021, 13:52:25 UTC
f7b1d8d fixing possible miss of opened tables when the stream finishes 27 January 2021, 08:40:41 UTC
a9e1bd1 adding assertion on tests and attempt a fix 27 January 2021, 08:40:35 UTC
84f2bef add missing figures when only one figure is present 27 January 2021, 08:40:25 UTC
bfeed63 update files to ignore 22 January 2021, 01:50:43 UTC
5d2d814 use OpenNLP segmenter as default segmenter 11 January 2021, 00:00:24 UTC
5877faf update reference year 10 January 2021, 23:59:21 UTC
4ff805e language specification for sentence segmentation process 07 January 2021, 04:31:22 UTC
aa19f66 add coordinates and sentence segmentation options on the demo console 07 January 2021, 04:26:54 UTC
480b080 inject ORCID if available by consolidation 06 January 2021, 21:24:47 UTC
e44ff2c update name header models 06 January 2021, 21:07:21 UTC
1ca74c8 Merge pull request #630 from kermitt2/orcid_author_header_model_annotation training data to improve prediction of authors combined with some ids (orcid) 06 January 2021, 18:02:54 UTC
64f54de fix doc 06 January 2021, 02:53:23 UTC
434c8dc Merge pull request #686 from kermitt2/batch-labeling-support [WIP] Exploit batch processing for Deep Learning models 06 January 2021, 02:36:01 UTC
79b6d12 clean doc 31 December 2020, 15:13:12 UTC
f41eda8 improve end-to-end process, see issue #626 31 December 2020, 13:46:26 UTC
9bf0f4c fix bug when no bib ref are extracted 29 December 2020, 23:44:09 UTC
ead4928 update results 29 December 2020, 13:37:48 UTC
5ebdcbb preserve sequence separation in case of DL batch labeling 29 December 2020, 03:12:46 UTC
6322358 group bibref processing to ensure usage of batch in DL 29 December 2020, 02:47:23 UTC
503c572 typos in doc 28 December 2020, 18:58:22 UTC
9675224 update doc for docker images 28 December 2020, 18:35:49 UTC
cb241e6 complete dockerfile for deep learning models 27 December 2020, 18:47:15 UTC
50ec324 add GPU detection to docker image via nvidia container toolkit 26 December 2020, 16:07:58 UTC
1d05bcc Merge pull request #684 from kermitt2/elifesciences-workaround-fulltext-npe Avoid NPE in FulltextParser when the body text is empty or null 25 December 2020, 00:35:43 UTC
5686315 avoid processing the fulltext and any downstream models if the body is empty 24 December 2020, 07:35:22 UTC
12c87f9 Merge branch 'master' of https://github.com/kermitt2/grobid 24 December 2020, 06:19:21 UTC
480da08 dockerfile and script for deep learning models with delft 24 December 2020, 06:19:00 UTC
28b5981 Merge pull request #682 from kermitt2/fix-crossref-plus-token use Crossref-Plus-API-Token instead of Authorization 23 December 2020, 15:37:43 UTC
fbde00e update dockerfile with embeddings download 23 December 2020, 07:17:25 UTC
21076e6 a grobid Dockerfile to run deep learning delft models 21 December 2020, 12:07:11 UTC
abc8b85 use Crossref-Plus-API-Token instead of Authorization 18 December 2020, 09:51:35 UTC
19c3993 Merge pull request #681 from elifesciences/fix-dl-data-generation-space-vs-tab replaced space by tab in DeLFTModel.label 15 December 2020, 08:42:15 UTC
ba2b767 typo ;) 03 December 2020, 19:28:26 UTC
cfe8028 Merge branch 'master' of https://github.com/kermitt2/grobid 03 December 2020, 18:11:01 UTC
33b50b7 fix mediabox for PDF display via pdfbox 03 December 2020, 18:10:45 UTC
c3418c4 add header model with feature channel 03 December 2020, 04:29:49 UTC
e71a8b6 replaced space by tab in DeLFTModel.label 01 December 2020, 11:37:19 UTC
35c46b0 update of a citation model 30 November 2020, 04:59:04 UTC
f4b8fe6 improve support DL feature models; use full model name for DL models 30 November 2020, 04:24:02 UTC
6658937 Merge branch 'master' of https://github.com/kermitt2/grobid 29 November 2020, 23:00:48 UTC
cafabb2 skip commit for all bert model files 29 November 2020, 22:46:53 UTC
60b06d7 adapt tests for properties 29 November 2020, 22:44:17 UTC
820ad26 simplify properties/parameters 29 November 2020, 22:19:40 UTC
back to top