01745c0 | Patrice Lopez | 04 August 2024, 09:11:54 UTC | clean | 04 August 2024, 09:11:54 UTC |
4c50b79 | Patrice Lopez | 04 August 2024, 09:11:35 UTC | fix enum instance | 04 August 2024, 09:11:35 UTC |
82b72f0 | Patrice Lopez | 04 August 2024, 08:11:19 UTC | fix test | 04 August 2024, 08:11:19 UTC |
6a799da | Patrice Lopez | 03 August 2024, 17:33:04 UTC | cleaning | 03 August 2024, 17:33:04 UTC |
5a453dc | Patrice Lopez | 03 August 2024, 17:32:17 UTC | services, models | 03 August 2024, 17:32:17 UTC |
21d4928 | Patrice Lopez | 03 August 2024, 12:01:26 UTC | use parent model config as fallback, add resources | 03 August 2024, 12:01:26 UTC |
4112189 | Patrice Lopez | 03 August 2024, 10:32:19 UTC | Merge branch 'release-0.8.1' into flavor | 03 August 2024, 10:32:19 UTC |
399a3c1 | Patrice Lopez | 03 August 2024, 10:32:10 UTC | add flavor training tasks | 03 August 2024, 10:32:10 UTC |
6e8da87 | Patrice Lopez | 03 August 2024, 09:57:48 UTC | update constructors | 03 August 2024, 09:57:48 UTC |
56d351c | Luca Foppiano | 03 July 2024, 08:05:47 UTC | downgrade to jdk 11 | 03 July 2024, 08:05:47 UTC |
8210d69 | Luca Foppiano | 03 July 2024, 07:55:32 UTC | remove jdk 17 only constructs | 03 July 2024, 07:55:32 UTC |
11f8e6a | Patrice Lopez | 20 June 2024, 18:18:28 UTC | start adding flavors | 20 June 2024, 18:18:28 UTC |
50860c5 | Luca Foppiano | 17 June 2024, 21:49:04 UTC | Merge branch 'master' into release-0.8.1 | 17 June 2024, 21:49:04 UTC |
c408076 | Luca Foppiano | 17 June 2024, 08:28:47 UTC | fix delft command | 17 June 2024, 08:28:52 UTC |
f1d703c | Patrice Lopez | 17 June 2024, 08:18:40 UTC | use crossref by default for consolidation | 17 June 2024, 08:18:40 UTC |
d714650 | Patrice Lopez | 17 June 2024, 08:17:51 UTC | use crossref by default for consolidation | 17 June 2024, 08:17:51 UTC |
4675511 | Patrice Lopez | 16 June 2024, 14:25:08 UTC | update doc, cleaning, support python env>3.9 | 16 June 2024, 14:25:08 UTC |
6afe157 | Luca Foppiano | 12 June 2024, 03:02:12 UTC | Add specific DL configuration for the full docker image (#1117) * Provide a DL-enabled configuration for the full grobid image * add missing copyright and licence models in the configuration | 12 June 2024, 03:02:12 UTC |
516926d | Luca Foppiano | 12 June 2024, 02:55:11 UTC | Merge pull request #1131 from kermitt2/bugfix/sentence-segm-spaces-end Fix issue #1130 | 12 June 2024, 02:55:11 UTC |
c44a755 | Luca Foppiano | 12 June 2024, 02:42:26 UTC | Merge pull request #1125 from kermitt2/bugfix/fix-links-beginning-sent Adjust circuit-breaker for #1113 | 12 June 2024, 02:42:26 UTC |
d4a973a | Luca Foppiano | 12 June 2024, 02:41:53 UTC | Merge pull request #1124 from kermitt2/bugfix/notes-same-label fix notes identification | 12 June 2024, 02:41:53 UTC |
a472cb1 | Luca Foppiano | 12 June 2024, 02:20:00 UTC | avoid NPE on empty notes | 12 June 2024, 02:20:00 UTC |
1657b9c | Luca Foppiano | 11 June 2024, 09:48:25 UTC | apply the conservative approach also when extracting coordinates | 11 June 2024, 09:48:25 UTC |
d1763e0 | Luca Foppiano | 11 June 2024, 05:04:45 UTC | avoid sublist on a value greater than the text string itself, fix possible lost of references in certain cases | 11 June 2024, 06:14:28 UTC |
c5c924e | Luca Foppiano | 10 June 2024, 05:41:56 UTC | Adjust circuit-breaker that would miss to identify urls at the beginning of the sentence/paragraph | 10 June 2024, 05:41:56 UTC |
2f9e211 | Luca Foppiano | 10 June 2024, 03:54:08 UTC | update the search space once a note is found in the text, use the identifier to fetch the specific notes from the map | 10 June 2024, 03:54:08 UTC |
b6a2a20 | Luca Foppiano | 09 June 2024, 22:56:27 UTC | update documentation | 09 June 2024, 22:56:27 UTC |
0a95872 | Luca Foppiano | 09 June 2024, 22:43:43 UTC | update readme and changelog | 09 June 2024, 22:56:19 UTC |
694f0ed | Luca Foppiano | 09 June 2024, 21:25:47 UTC | Merge pull request #1106 from kermitt2/bugfix/sent-seg-ack-fund Add missing sentence segmentation in funding and acknowledgement | 09 June 2024, 21:25:47 UTC |
bbca7dd | Luca Foppiano | 09 June 2024, 21:08:49 UTC | Merge branch 'master' into bugfix/sent-seg-ack-fund | 09 June 2024, 21:08:49 UTC |
cb7118d | Luca Foppiano | 09 June 2024, 20:55:30 UTC | Merge pull request #1099 from kermitt2/feature/identify-urls Identify URLs and output them in TEI | 09 June 2024, 20:55:30 UTC |
76fd16f | Luca Foppiano | 09 June 2024, 20:54:13 UTC | Merge pull request #1097 from kermitt2/feature/preserve-urls Avoid splitting URLs between sentences | 09 June 2024, 20:54:13 UTC |
4d4c1e3 | Luca Foppiano | 27 May 2024, 01:22:59 UTC | Fix corner case | 27 May 2024, 01:22:59 UTC |
10f8465 | Luca Foppiano | 20 May 2024, 07:02:40 UTC | cosmetics 2 | 20 May 2024, 07:02:40 UTC |
779c575 | Luca Foppiano | 20 May 2024, 06:56:41 UTC | cosmetics | 20 May 2024, 06:56:41 UTC |
a710b3e | Luca Foppiano | 20 May 2024, 06:36:29 UTC | add note about the need to always supply the Accept header for processHeaderDocument | 20 May 2024, 06:36:29 UTC |
1a8c826 | Luca Foppiano | 20 May 2024, 06:35:44 UTC | Add the generateIds parameter in the documentation | 20 May 2024, 06:35:44 UTC |
6370de2 | Luca Foppiano | 10 May 2024, 05:33:44 UTC | Merge branch 'feature/preserve-urls' into feature/identify-urls | 10 May 2024, 05:33:44 UTC |
d58633d | Luca Foppiano | 10 May 2024, 04:46:54 UTC | When the annotations are missing and we capture a single closed parenthesis as last character of the url, we should back off | 10 May 2024, 04:46:54 UTC |
878d50c | Luca Foppiano | 10 May 2024, 03:54:35 UTC | More conservative approach | 10 May 2024, 03:54:35 UTC |
bca302d | Luca Foppiano | 10 May 2024, 03:01:46 UTC | Sometimes there is a breakline before the URL | 10 May 2024, 03:01:46 UTC |
cf6fb98 | Luca Foppiano | 09 May 2024, 23:05:18 UTC | Merge branch 'master' into bugfix/sent-seg-ack-fund | 09 May 2024, 23:05:18 UTC |
283a262 | Luca Foppiano | 09 May 2024, 23:05:04 UTC | Merge branch 'master' into feature/identify-urls | 09 May 2024, 23:05:04 UTC |
5bcb8b1 | Luca Foppiano | 09 May 2024, 22:35:55 UTC | Merge branch 'master' into feature/preserve-urls | 09 May 2024, 22:35:55 UTC |
4779385 | Luca Foppiano | 09 May 2024, 22:30:09 UTC | Add space if there is an actual space at the end of layout tokens | 09 May 2024, 22:30:09 UTC |
83f2c81 | lopez | 09 May 2024, 21:40:44 UTC | quick fix for #1113 | 09 May 2024, 21:40:44 UTC |
617aa16 | Luca Foppiano | 09 May 2024, 08:11:07 UTC | Apply url preservation also in tables description and notes | 09 May 2024, 08:11:22 UTC |
f983f25 | Luca Foppiano | 09 May 2024, 03:17:59 UTC | Add additional test and fix to the method so that the offsets are correctly matching the real text (dehypenised) | 09 May 2024, 03:17:59 UTC |
322bf23 | Luca Foppiano | 08 May 2024, 01:24:55 UTC | Merge branch 'master' into feature/preserve-urls | 08 May 2024, 01:24:55 UTC |
7d9044e | Luca Foppiano | 08 May 2024, 00:54:47 UTC | Merge branch 'master' into bugfix/sent-seg-ack-fund | 08 May 2024, 00:54:47 UTC |
c70d6d3 | Luca Foppiano | 08 May 2024, 00:39:59 UTC | Fix merging of coordinates to avoid merge when on different pages, add object for annotations with xml nodes | 08 May 2024, 00:39:59 UTC |
d123efb | Luca Foppiano | 07 May 2024, 22:25:17 UTC | fix trigger for github action | 07 May 2024, 22:25:17 UTC |
d2a14f5 | Luca Foppiano | 07 May 2024, 22:19:31 UTC | split manual workflows | 07 May 2024, 22:19:31 UTC |
9ad69ee | Luca Foppiano | 07 May 2024, 22:03:50 UTC | Add additional github actions (#1094) | 07 May 2024, 22:03:50 UTC |
9199170 | Luca Foppiano | 05 May 2024, 22:56:02 UTC | fix coordinates merge | 05 May 2024, 22:56:02 UTC |
6336512 | Luca Foppiano | 05 May 2024, 11:57:05 UTC | merge sentences whose boundaries are clashing with the annotations from the funding-acknowledgment | 05 May 2024, 11:57:05 UTC |
664824d | Luca Foppiano | 05 May 2024, 05:15:12 UTC | Add Kotlin language (#1096) * add kotlin and kotlin-test libraries, update github action's plugins | 05 May 2024, 05:15:12 UTC |
fb17eec | Luca Foppiano | 05 May 2024, 05:08:41 UTC | fix lost of the last entity that was sharing boundary with the sentence | 05 May 2024, 05:08:41 UTC |
ec52f13 | Luca Foppiano | 04 May 2024, 03:50:26 UTC | get fixes on matchTokenAndString from PR #1099 | 04 May 2024, 05:07:07 UTC |
57f87c4 | Luca Foppiano | 04 May 2024, 03:50:26 UTC | get fixes on matchTokenAndString from PR #1099 | 04 May 2024, 05:04:28 UTC |
21a0cdd | Luca Foppiano | 04 May 2024, 04:25:58 UTC | add --open of java.base/java.io (warn from huggingface spaces) | 04 May 2024, 04:26:27 UTC |
e154167 | Luca Foppiano | 04 May 2024, 04:25:08 UTC | cleanup | 04 May 2024, 04:25:08 UTC |
48779a2 | Luca Foppiano | 04 May 2024, 03:50:26 UTC | Fix another corner case | 04 May 2024, 04:23:41 UTC |
39892ff | Luca Foppiano | 04 May 2024, 02:18:34 UTC | Fix wrong Xpath expression | 04 May 2024, 02:18:34 UTC |
83c7a10 | Luca Foppiano | 04 May 2024, 01:04:57 UTC | Fix bug in the transformation of the intervals from token-based to character-based when the same tokens occur subsequently and the annotation is composed by a single token | 04 May 2024, 01:04:57 UTC |
cedee64 | Luca Foppiano | 03 May 2024, 23:10:10 UTC | Fix bug in the transformation of the intervals from token-based to character-based when the same tokens occur subsequently | 03 May 2024, 23:10:10 UTC |
097ca93 | Luca Foppiano | 03 May 2024, 23:09:38 UTC | update xmlunit library | 03 May 2024, 23:09:38 UTC |
b2873bd | Luca Foppiano | 01 May 2024, 10:06:30 UTC | enable sentence segmentation in the processing of a text chunk | 01 May 2024, 10:06:30 UTC |
753a73e | Luca Foppiano | 01 May 2024, 09:42:03 UTC | report on test failure/success | 01 May 2024, 09:42:03 UTC |
9dc767f | Luca Foppiano | 01 May 2024, 09:17:44 UTC | report on test failure/success | 01 May 2024, 09:17:44 UTC |
364176d | Luca Foppiano | 01 May 2024, 09:12:27 UTC | Fix incorrect offsets when processing paragraphs and update tests | 01 May 2024, 09:12:27 UTC |
83416a9 | Luca Foppiano | 01 May 2024, 08:44:27 UTC | fix test path | 01 May 2024, 08:44:27 UTC |
7628f40 | Luca Foppiano | 01 May 2024, 08:34:14 UTC | publish tests results on github actions | 01 May 2024, 08:34:14 UTC |
4b3a763 | Luca Foppiano | 01 May 2024, 08:18:08 UTC | fix the funding and acknowledgement parser to preserve the sentence segmentation and the reference markers | 01 May 2024, 08:18:08 UTC |
9f2edb6 | Luca Foppiano | 01 May 2024, 08:07:54 UTC | add class to represent the parse of a funding and acknowledgement statement | 01 May 2024, 08:07:54 UTC |
047af5b | Luca Foppiano | 01 May 2024, 08:07:30 UTC | add transformation from token to character position | 01 May 2024, 08:07:30 UTC |
f74466e | Luca Foppiano | 01 May 2024, 08:07:02 UTC | cosmetics | 01 May 2024, 08:07:02 UTC |
f5295cd | Luca Foppiano | 01 May 2024, 06:04:13 UTC | fix bug in token to char position function and add more tests | 01 May 2024, 06:04:13 UTC |
ea1245a | Luca Foppiano | 28 April 2024, 02:58:05 UTC | add more tests and add MEXT abbreviation | 28 April 2024, 02:58:05 UTC |
e0fd3b4 | Luca Foppiano | 28 April 2024, 02:35:07 UTC | fix missing of last person in the acknowledgment / funding | 28 April 2024, 02:35:07 UTC |
3900dc2 | Luca Foppiano | 28 April 2024, 01:54:46 UTC | update test to follow the convention | 28 April 2024, 01:54:46 UTC |
d4a8261 | Luca Foppiano | 28 April 2024, 01:52:39 UTC | add tests on the current code | 28 April 2024, 01:52:39 UTC |
9db8667 | Luca Foppiano | 28 April 2024, 01:37:18 UTC | cleanup and fix test | 28 April 2024, 01:37:18 UTC |
2dc07a8 | Luca Foppiano | 28 April 2024, 01:36:06 UTC | cleanup | 28 April 2024, 01:36:06 UTC |
c05ec9c | Luca Foppiano | 28 April 2024, 01:23:12 UTC | Merge branch 'master' into feature/kotlin | 28 April 2024, 01:23:12 UTC |
7fd4194 | Luca Foppiano | 28 April 2024, 01:06:54 UTC | fix actions | 28 April 2024, 01:06:54 UTC |
8443e6d | Luca Foppiano | 28 April 2024, 01:02:14 UTC | update action's component version | 28 April 2024, 01:02:14 UTC |
1ebcf3a | Luca Foppiano | 28 April 2024, 01:01:22 UTC | add kotlin test | 28 April 2024, 01:01:22 UTC |
84ef802 | Luca Foppiano | 28 April 2024, 01:00:33 UTC | fix build | 28 April 2024, 01:00:33 UTC |
d98129f | Luca Foppiano | 24 April 2024, 06:59:35 UTC | Update general-report.md | 24 April 2024, 06:59:35 UTC |
6ff15ee | Luca Foppiano | 17 April 2024, 01:30:20 UTC | keep convention on the token/character calculation | 17 April 2024, 01:30:20 UTC |
0b5e232 | Luca Foppiano | 16 April 2024, 12:05:25 UTC | cosmetics | 16 April 2024, 12:05:25 UTC |
82a6287 | Luca Foppiano | 16 April 2024, 05:16:33 UTC | cover cases where the PDF annotation is giving us wrong token index (the regex catch few characters more, which are within the pdf annotation) | 16 April 2024, 05:16:33 UTC |
afc9a76 | Luca Foppiano | 16 April 2024, 03:51:19 UTC | add tests on corner cases | 16 April 2024, 03:51:19 UTC |
f904157 | Luca Foppiano | 16 April 2024, 03:04:11 UTC | correct typos and add some comments | 16 April 2024, 03:04:11 UTC |
1ff1a73 | Luca Foppiano | 16 April 2024, 01:37:22 UTC | Avoid including spaces and breaklines at the edges of the URLs | 16 April 2024, 01:37:22 UTC |
d9c545a | Luca Foppiano | 15 April 2024, 23:02:53 UTC | Fix spurious characters at the end of the matching URLs when the regex catches too much | 15 April 2024, 23:02:53 UTC |
abd3e6c | Luca Foppiano | 15 April 2024, 12:34:48 UTC | improve process | 15 April 2024, 12:34:48 UTC |
134c048 | Luca Foppiano | 15 April 2024, 07:16:57 UTC | output URLs to TEI | 15 April 2024, 07:17:42 UTC |
0f485f7 | Luca Foppiano | 15 April 2024, 05:27:49 UTC | simplify | 15 April 2024, 05:27:49 UTC |