https://github.com/kermitt2/grobid

sort by:
Revision Author Date Message Commit Date
01745c0 clean 04 August 2024, 09:11:54 UTC
4c50b79 fix enum instance 04 August 2024, 09:11:35 UTC
82b72f0 fix test 04 August 2024, 08:11:19 UTC
6a799da cleaning 03 August 2024, 17:33:04 UTC
5a453dc services, models 03 August 2024, 17:32:17 UTC
21d4928 use parent model config as fallback, add resources 03 August 2024, 12:01:26 UTC
4112189 Merge branch 'release-0.8.1' into flavor 03 August 2024, 10:32:19 UTC
399a3c1 add flavor training tasks 03 August 2024, 10:32:10 UTC
6e8da87 update constructors 03 August 2024, 09:57:48 UTC
56d351c downgrade to jdk 11 03 July 2024, 08:05:47 UTC
8210d69 remove jdk 17 only constructs 03 July 2024, 07:55:32 UTC
11f8e6a start adding flavors 20 June 2024, 18:18:28 UTC
50860c5 Merge branch 'master' into release-0.8.1 17 June 2024, 21:49:04 UTC
c408076 fix delft command 17 June 2024, 08:28:52 UTC
f1d703c use crossref by default for consolidation 17 June 2024, 08:18:40 UTC
d714650 use crossref by default for consolidation 17 June 2024, 08:17:51 UTC
4675511 update doc, cleaning, support python env>3.9 16 June 2024, 14:25:08 UTC
6afe157 Add specific DL configuration for the full docker image (#1117) * Provide a DL-enabled configuration for the full grobid image * add missing copyright and licence models in the configuration 12 June 2024, 03:02:12 UTC
516926d Merge pull request #1131 from kermitt2/bugfix/sentence-segm-spaces-end Fix issue #1130 12 June 2024, 02:55:11 UTC
c44a755 Merge pull request #1125 from kermitt2/bugfix/fix-links-beginning-sent Adjust circuit-breaker for #1113 12 June 2024, 02:42:26 UTC
d4a973a Merge pull request #1124 from kermitt2/bugfix/notes-same-label fix notes identification 12 June 2024, 02:41:53 UTC
a472cb1 avoid NPE on empty notes 12 June 2024, 02:20:00 UTC
1657b9c apply the conservative approach also when extracting coordinates 11 June 2024, 09:48:25 UTC
d1763e0 avoid sublist on a value greater than the text string itself, fix possible lost of references in certain cases 11 June 2024, 06:14:28 UTC
c5c924e Adjust circuit-breaker that would miss to identify urls at the beginning of the sentence/paragraph 10 June 2024, 05:41:56 UTC
2f9e211 update the search space once a note is found in the text, use the identifier to fetch the specific notes from the map 10 June 2024, 03:54:08 UTC
b6a2a20 update documentation 09 June 2024, 22:56:27 UTC
0a95872 update readme and changelog 09 June 2024, 22:56:19 UTC
694f0ed Merge pull request #1106 from kermitt2/bugfix/sent-seg-ack-fund Add missing sentence segmentation in funding and acknowledgement 09 June 2024, 21:25:47 UTC
bbca7dd Merge branch 'master' into bugfix/sent-seg-ack-fund 09 June 2024, 21:08:49 UTC
cb7118d Merge pull request #1099 from kermitt2/feature/identify-urls Identify URLs and output them in TEI 09 June 2024, 20:55:30 UTC
76fd16f Merge pull request #1097 from kermitt2/feature/preserve-urls Avoid splitting URLs between sentences 09 June 2024, 20:54:13 UTC
4d4c1e3 Fix corner case 27 May 2024, 01:22:59 UTC
10f8465 cosmetics 2 20 May 2024, 07:02:40 UTC
779c575 cosmetics 20 May 2024, 06:56:41 UTC
a710b3e add note about the need to always supply the Accept header for processHeaderDocument 20 May 2024, 06:36:29 UTC
1a8c826 Add the generateIds parameter in the documentation 20 May 2024, 06:35:44 UTC
6370de2 Merge branch 'feature/preserve-urls' into feature/identify-urls 10 May 2024, 05:33:44 UTC
d58633d When the annotations are missing and we capture a single closed parenthesis as last character of the url, we should back off 10 May 2024, 04:46:54 UTC
878d50c More conservative approach 10 May 2024, 03:54:35 UTC
bca302d Sometimes there is a breakline before the URL 10 May 2024, 03:01:46 UTC
cf6fb98 Merge branch 'master' into bugfix/sent-seg-ack-fund 09 May 2024, 23:05:18 UTC
283a262 Merge branch 'master' into feature/identify-urls 09 May 2024, 23:05:04 UTC
5bcb8b1 Merge branch 'master' into feature/preserve-urls 09 May 2024, 22:35:55 UTC
4779385 Add space if there is an actual space at the end of layout tokens 09 May 2024, 22:30:09 UTC
83f2c81 quick fix for #1113 09 May 2024, 21:40:44 UTC
617aa16 Apply url preservation also in tables description and notes 09 May 2024, 08:11:22 UTC
f983f25 Add additional test and fix to the method so that the offsets are correctly matching the real text (dehypenised) 09 May 2024, 03:17:59 UTC
322bf23 Merge branch 'master' into feature/preserve-urls 08 May 2024, 01:24:55 UTC
7d9044e Merge branch 'master' into bugfix/sent-seg-ack-fund 08 May 2024, 00:54:47 UTC
c70d6d3 Fix merging of coordinates to avoid merge when on different pages, add object for annotations with xml nodes 08 May 2024, 00:39:59 UTC
d123efb fix trigger for github action 07 May 2024, 22:25:17 UTC
d2a14f5 split manual workflows 07 May 2024, 22:19:31 UTC
9ad69ee Add additional github actions (#1094) 07 May 2024, 22:03:50 UTC
9199170 fix coordinates merge 05 May 2024, 22:56:02 UTC
6336512 merge sentences whose boundaries are clashing with the annotations from the funding-acknowledgment 05 May 2024, 11:57:05 UTC
664824d Add Kotlin language (#1096) * add kotlin and kotlin-test libraries, update github action's plugins 05 May 2024, 05:15:12 UTC
fb17eec fix lost of the last entity that was sharing boundary with the sentence 05 May 2024, 05:08:41 UTC
ec52f13 get fixes on matchTokenAndString from PR #1099 04 May 2024, 05:07:07 UTC
57f87c4 get fixes on matchTokenAndString from PR #1099 04 May 2024, 05:04:28 UTC
21a0cdd add --open of java.base/java.io (warn from huggingface spaces) 04 May 2024, 04:26:27 UTC
e154167 cleanup 04 May 2024, 04:25:08 UTC
48779a2 Fix another corner case 04 May 2024, 04:23:41 UTC
39892ff Fix wrong Xpath expression 04 May 2024, 02:18:34 UTC
83c7a10 Fix bug in the transformation of the intervals from token-based to character-based when the same tokens occur subsequently and the annotation is composed by a single token 04 May 2024, 01:04:57 UTC
cedee64 Fix bug in the transformation of the intervals from token-based to character-based when the same tokens occur subsequently 03 May 2024, 23:10:10 UTC
097ca93 update xmlunit library 03 May 2024, 23:09:38 UTC
b2873bd enable sentence segmentation in the processing of a text chunk 01 May 2024, 10:06:30 UTC
753a73e report on test failure/success 01 May 2024, 09:42:03 UTC
9dc767f report on test failure/success 01 May 2024, 09:17:44 UTC
364176d Fix incorrect offsets when processing paragraphs and update tests 01 May 2024, 09:12:27 UTC
83416a9 fix test path 01 May 2024, 08:44:27 UTC
7628f40 publish tests results on github actions 01 May 2024, 08:34:14 UTC
4b3a763 fix the funding and acknowledgement parser to preserve the sentence segmentation and the reference markers 01 May 2024, 08:18:08 UTC
9f2edb6 add class to represent the parse of a funding and acknowledgement statement 01 May 2024, 08:07:54 UTC
047af5b add transformation from token to character position 01 May 2024, 08:07:30 UTC
f74466e cosmetics 01 May 2024, 08:07:02 UTC
f5295cd fix bug in token to char position function and add more tests 01 May 2024, 06:04:13 UTC
ea1245a add more tests and add MEXT abbreviation 28 April 2024, 02:58:05 UTC
e0fd3b4 fix missing of last person in the acknowledgment / funding 28 April 2024, 02:35:07 UTC
3900dc2 update test to follow the convention 28 April 2024, 01:54:46 UTC
d4a8261 add tests on the current code 28 April 2024, 01:52:39 UTC
9db8667 cleanup and fix test 28 April 2024, 01:37:18 UTC
2dc07a8 cleanup 28 April 2024, 01:36:06 UTC
c05ec9c Merge branch 'master' into feature/kotlin 28 April 2024, 01:23:12 UTC
7fd4194 fix actions 28 April 2024, 01:06:54 UTC
8443e6d update action's component version 28 April 2024, 01:02:14 UTC
1ebcf3a add kotlin test 28 April 2024, 01:01:22 UTC
84ef802 fix build 28 April 2024, 01:00:33 UTC
d98129f Update general-report.md 24 April 2024, 06:59:35 UTC
6ff15ee keep convention on the token/character calculation 17 April 2024, 01:30:20 UTC
0b5e232 cosmetics 16 April 2024, 12:05:25 UTC
82a6287 cover cases where the PDF annotation is giving us wrong token index (the regex catch few characters more, which are within the pdf annotation) 16 April 2024, 05:16:33 UTC
afc9a76 add tests on corner cases 16 April 2024, 03:51:19 UTC
f904157 correct typos and add some comments 16 April 2024, 03:04:11 UTC
1ff1a73 Avoid including spaces and breaklines at the edges of the URLs 16 April 2024, 01:37:22 UTC
d9c545a Fix spurious characters at the end of the matching URLs when the regex catches too much 15 April 2024, 23:02:53 UTC
abd3e6c improve process 15 April 2024, 12:34:48 UTC
134c048 output URLs to TEI 15 April 2024, 07:17:42 UTC
0f485f7 simplify 15 April 2024, 05:27:49 UTC
back to top