https://github.com/kermitt2/grobid

sort by:
Revision Author Date Message Commit Date
5af9f78 Fix synchronization issue due to a starting block without LayoutTokens This issue can happen when the first block of a segmentation zone contains only bitmap and no LayoutToken Former-commit-id: 1c3dca23405b78f9d8410987779fa4a0127743ac 24 December 2015, 22:03:22 UTC
e3c7e28 Merge new master with reference_marker_matcher branch Former-commit-id: 1163c0370e1a692c6e0b7422e78e9c7c070c16e8 24 December 2015, 19:12:30 UTC
b878d4b Correction for install instructions Former-commit-id: 366bd0616756d39703b0174c5710a5427c208108 23 December 2015, 16:09:31 UTC
36775b3 Merge branch 'master' of https://github.com/grobid/grobid Conflicts: grobid-core/src/main/java/org/grobid/core/document/TEIFormater.java grobid-core/src/main/java/org/grobid/core/engines/AffiliationAddressPars er.java grobid-core/src/main/java/org/grobid/core/engines/CitationParser.java grobid-core/src/main/java/org/grobid/core/engines/FullTextParser.java grobid-core/src/main/java/org/grobid/core/engines/HeaderParser.java grobid-core/src/main/java/org/grobid/core/engines/ProcessEngine.java grobid-core/src/main/java/org/grobid/core/engines/config/GrobidAnalysisC onfig.java grobid-core/src/main/java/org/grobid/core/sax/PDF2XMLSaxParser.java Former-commit-id: 141bc3db76d054767ded33e8c15b7a5ea62c7979 23 December 2015, 15:10:40 UTC
d160616 Improvement of the parsing of full text - new models for figures and tables - addition of training data - bug fixing - better matching of citation - update of documentation Former-commit-id: 102eabf116f7ec76f4ba8954bca5274faca1b881 23 December 2015, 14:43:23 UTC
f9feec0 commenting old code to make sure test coverage does not decrease in CI Former-commit-id: 04b2ff576a0b350ff9de969228515b69d703025b 22 December 2015, 17:51:48 UTC
aba5b55 commenting out not passing PDF Former-commit-id: bb253fcd2fcf828901bab40de68ad055c9975ebf 22 December 2015, 17:36:06 UTC
92a62b9 new lucene-based reference marker matcher synchronization of Block tokens and Document.tokenizations LayoutToken dephynezation More assertions for fulltext parser Former-commit-id: ea2bf5f0125ec85b440bf7ac3d9b742a8beb96dd 22 December 2015, 17:23:58 UTC
ffc34c6 preserving information about new line in the token fixing a bug where citation marker was empty and the loop were causing OOM Former-commit-id: cc4024130c2b69272d1bbf04ce8a30ca066efa4c 17 December 2015, 14:25:31 UTC
4832fd2 Merge pull request #84 from kermitt2/tokenLayouts20151026 Token layouts20151026 Former-commit-id: b2b0491f873fe1d97b905bfbcd8bc30a9e2b691c 20 November 2015, 05:56:52 UTC
cf69202 [maven-release-plugin] prepare for next development iteration Former-commit-id: ff350054de0ef5b3918a54eb2d6aa654511d80c2 20 November 2015, 02:02:00 UTC
7d24382 [maven-release-plugin] prepare release grobid-parent-0.3.9 Former-commit-id: 756bcdbbf3bc08277127368fc8d5340cbffc1668 20 November 2015, 02:01:12 UTC
2253c2e fixing NPE when LayoutToken is null Former-commit-id: a5636151120eddd0569a7bed6e28bacfbe6efaab 29 October 2015, 20:25:32 UTC
f06cf21 bounding box polishing Former-commit-id: 2098204ebbdc67750733bbe4bc880b33a8c1b4fb 28 October 2015, 15:13:51 UTC
d1af768 using LayoutTokens whenever possible Former-commit-id: 3f4201c357b42bc0b83e9f5280cbffa3fed8c61c 26 October 2015, 17:34:58 UTC
c2b01eb bounding box fix Former-commit-id: 309869f791c70a53ac7ab6fb281ce881f8684f38 15 October 2015, 16:33:31 UTC
e88b638 Merge pull request #75 from dtkaczyk/master Wapiti for Linux 32-bit Former-commit-id: 1c6f034a356400f2eeb09d6334cc95bec1c9f145 07 September 2015, 23:35:14 UTC
2ed7fea Wapiti for Linux 32-bit Former-commit-id: 91d57a78350da24f2eebfb3d5a51051d770450f2 07 September 2015, 12:56:09 UTC
0c8faca Fix a small bug in the reference bracket markup Former-commit-id: e6a54e960ea27c35543450d4c64ce9e5d71195ec 26 August 2015, 16:08:09 UTC
5947655 Merge pull request #72 from kermitt2/grobid_config_20150825 refactoring to use GrobidAnalysisConfig Former-commit-id: 961ecfb0e728455000d147638e7903ecbfe5d502 26 August 2015, 15:21:17 UTC
444d818 refactoring to use GrobidAnalysisConfig Former-commit-id: d1d374502e2dba3825e119563cf9e1bfd2926dcd 25 August 2015, 14:04:12 UTC
ae89d9f Try to fix ReadTheDocs Former-commit-id: 04a50d2bda3e48d5f7938e9e7e7db04dd561f5ca 21 August 2015, 00:28:31 UTC
06b14d9 Add recursive batch process of directories of PDFs Address issue #3 Former-commit-id: 8d1c038a342e5406d161a874993da5ffbd344c03 21 August 2015, 00:00:12 UTC
01d1430 [maven-release-plugin] prepare for next development iteration Former-commit-id: 7abbe69791672746900835c7c0d216c315233fde 20 August 2015, 13:28:14 UTC
4904f42 [maven-release-plugin] prepare release grobid-parent-0.3.8 Former-commit-id: 165f9601a9c48c2036b1685d6570301e0163d5d0 20 August 2015, 13:28:08 UTC
2ce77e1 preparing for release Former-commit-id: ab01f5e8c2ca9567d1ac813f1782651c909c6f6b 20 August 2015, 13:23:57 UTC
59e3d5d [maven-release-plugin] rollback the release of grobid-parent-0.3.7 Former-commit-id: dbb154226e1bc6fcaf72a6a2df528bc6eb0bf94c 20 August 2015, 13:23:19 UTC
f1b35a1 [maven-release-plugin] prepare release grobid-parent-0.3.7 Former-commit-id: c9fec76c9feb548b031f974430e97a8af142793c 20 August 2015, 13:22:16 UTC
e9c7169 [maven-release-plugin] rollback the release of grobid-parent-0.3.7 Former-commit-id: 2fc86c3050ed062329457790c2e53b18a660fde0 20 August 2015, 13:18:05 UTC
bb72f08 [maven-release-plugin] prepare release grobid-parent-0.3.7 Former-commit-id: 57189881f7eb2f21eec652da8b7269f42909eacc 20 August 2015, 13:16:55 UTC
8b15dcd trying to fix release plugin Former-commit-id: 6929ba51952955016630e6c98a6c11f7dcb10f11 20 August 2015, 12:24:20 UTC
d1fa258 preparing for release Former-commit-id: 9fb0eb23ad881c096b478e6901badbe7f51e9071 20 August 2015, 12:09:50 UTC
a67cee9 fixing maven release plugin version regenerated ant files Former-commit-id: 90d60b374373b2bf4bd08013432e104660e3055d 20 August 2015, 11:52:42 UTC
f5209f5 Fix for issue #65 Former-commit-id: 4582d9c5a02705e239cbeeed7d201e0fa22634bc 20 August 2015, 10:57:59 UTC
ded4887 pom/xml Former-commit-id: 900cba03f03a3fa2842be0c9530e5026d3773841 19 August 2015, 18:30:55 UTC
3829166 Complete commit bae6cea (oops) Former-commit-id: 775777e997b9e5f78f93a3748c7e091271cdb27b 19 August 2015, 14:23:46 UTC
e9b144d Two parallel commits I committed offline in the train! Former-commit-id: 6680788de703a7f2fabbe035ddb5d72cac30b576 19 August 2015, 13:53:55 UTC
9f98811 Careful introduction of a class LayoutTokenization This tokenization in particular propagates the layout information when producing the full text TEI and allows to indicate the coordinates of some structural elements in the original PDF (e.g. the reference markers) Former-commit-id: c919e1de2ee330bfac3499ad89af6a68d532ece8 19 August 2015, 13:37:35 UTC
f89b18c Test for issue #65 Former-commit-id: 8b066bbfb66a193d376ea8d2b64836a9f7f21f45 19 August 2015, 13:37:35 UTC
d320abd Default sha1 for console admin password It seems the default one was not anymore admin! Former-commit-id: d28ed8279747fdb1470bc8f01c3c8e11cb40e6e7 19 August 2015, 13:37:35 UTC
97529db Add back isalive REST call Former-commit-id: e1e205cec26628dfd07c6a9253bd821e807de746 19 August 2015, 13:37:35 UTC
f4f058d Correct extension for temporary pdf files Former-commit-id: 0d6c9c774b16b98ab6161dc1e71df60df5fbaca2 19 August 2015, 13:37:34 UTC
ef954ab Case where tabulations are used as usual space separator in the PDF Former-commit-id: f1bda574f0e1c97382e363ea07ce13e8b6a8c91a 19 August 2015, 13:37:34 UTC
88979ae Merge branch 'master' of https://github.com/kermitt2/grobid Former-commit-id: bd565d2eea4e23009e7fa1310caad0d799f899c8 19 August 2015, 09:47:13 UTC
7cc4ae2 adjusting epsilon for BoundingBoxCalculator + counting pages from 1 Former-commit-id: 8d5ff512fa5a6ddcb37c894b63f4dff24e9fed1a 19 August 2015, 09:45:50 UTC
b9005f8 Add missing isalive REST path declaration #67 Former-commit-id: 124fad7dfa9323501c05e771bcd93cf9183c33eb 19 August 2015, 08:21:19 UTC
21b28bf a small bug Former-commit-id: f56d27b5047455960710af851d24d724e1d5109b 18 August 2015, 18:19:58 UTC
1b5d83d Merge branch 'master' of https://github.com/kermitt2/grobid Former-commit-id: 9d93fee2775f1d81e6567865347ba33f2d68034c 18 August 2015, 16:05:55 UTC
efc5573 bounding box detection for coordinates Former-commit-id: d6729619019c4296a9c2017632ae31c00683fb67 18 August 2015, 16:05:48 UTC
c48c7eb Show failure information from RESTful interface. Former-commit-id: bae6ceaee6816878d5bc58c6725cf877e25d44fa 18 August 2015, 16:03:21 UTC
a302935 cosmetic Former-commit-id: 80b7e97dce761bd0977b6fb4135f0c89e6d45942 18 August 2015, 14:59:59 UTC
87a0a69 Merge pull request #66 from kermitt2/element_pdf_coordinates_20150603 Element pdf coordinates 20150603 Former-commit-id: 80fea0129a41a34cff6bb8be1de2fc889a1e087d 18 August 2015, 11:27:43 UTC
5f20101 merging master branch Former-commit-id: 6874c1e2b7a03ca928a2c52e712bc58903c82349 18 August 2015, 10:45:07 UTC
eab6c1c Merge pull request #64 from chrismattmann/allow-logging-jetty Allow specification of logging in Jetty Grobid Service Former-commit-id: 2c5ca8b86363f849629d71e880865a8790da8bbc 17 August 2015, 22:35:56 UTC
a3e2aa1 Allow specification of log4j logging in Jetty. Former-commit-id: 6772ebb6e580e81353debcedde3a47ac4c628aa7 17 August 2015, 16:14:17 UTC
e1677ca Merge pull request #63 from fmux/master improve handling of HTML entities Former-commit-id: b471ef82e89ac1db8ac870f365cb99295872a627 10 August 2015, 12:08:21 UTC
f578b06 fix error preventing double encoding of " entities (also tidy up some whitespace) Former-commit-id: 6f9e0943a0a0db83fcd3ee237258a3b167603599 10 August 2015, 11:40:17 UTC
f6ef235 make sure xml:id attribute is encoded properly Former-commit-id: a3d12e3e2c68a161e1b8df3a03f163c0a5a0e6e0 10 August 2015, 11:39:36 UTC
9efac05 Merge pull request #62 from sujen1412/GROBID-59 Fix for Grobid #59 - Publishing to Maven Central Former-commit-id: 999c43ac09e14ae483de6a4848da44ad4152ab7d 07 August 2015, 20:52:40 UTC
928bdbc Removed connection url from maven release plugin Former-commit-id: 7b8dd9e98f5f9a2c6cd8ccac8131ee978a3517c8 05 August 2015, 20:24:21 UTC
b4d941e Added release profile for Maven Central in pom.xml Former-commit-id: ebff4ad905b28f7d2d5db7d07a3e42b117b78c7d 05 August 2015, 18:10:17 UTC
49e0694 Some additional robustness related to pdf2xml Former-commit-id: 3399c8c85624b4ede3fb3f8ca859a3ef8d9ae216 03 August 2015, 00:47:27 UTC
db9d449 Generate additional files with training raw texts Additional files with raw text of citations are created when executing createTrainingSegmentation or createTrainingReferenceSegmentation Former-commit-id: 6a658bff5bbf77b74f1fef4626bbe316a881cdcb 06 July 2015, 17:30:41 UTC
5a52b38 Fix issue #57 and some minor typos Former-commit-id: da8843f2757f1f52e1b0c8d8eba82c130558aa28 06 July 2015, 15:57:39 UTC
b184890 preserving coordinates for reference markers Former-commit-id: 90ce22da7b340b12a6214d1fb4ca47e414b4374f 03 June 2015, 12:27:12 UTC
94aa3bb Person name suffix outputted under <genName> in TEI Former-commit-id: 82e23b2f5d1f4fb8874a070400df94b3d930658d 20 May 2015, 22:40:02 UTC
e8a2215 In the batch mode, correcting incorrect TEI file names when the PDF file has a .PDF extension Former-commit-id: 9370901d02899090143011a8bef8984277a60a90 16 May 2015, 06:17:21 UTC
4052371 Simplify the code for patent processing Former-commit-id: 60e12dbda0b6510e01e2ce4fa0b0474fc568b27a 14 May 2015, 08:20:04 UTC
79dde17 Typo Former-commit-id: 03969f415ea7fe628d5762ca93f49b5f3490b74b 13 May 2015, 01:06:48 UTC
ac9d32b Refer to the GROBID ant example project and some updates. Former-commit-id: 933b85c8b2dd9f7484920c4e80f81becdc33a391 13 May 2015, 01:03:02 UTC
d3f13da Update ant build files Former-commit-id: f5187a1b9693639c5d23c016517cac4c0294545c 12 May 2015, 23:46:27 UTC
2600e4c Improve handling of extracted keywords in header Former-commit-id: baefe86a8bdf8247d6f5c4f05695dc3845ee33fb 12 May 2015, 03:44:00 UTC
c61c15f Make documentation notations more consistent Former-commit-id: 93f8f3bf4f20da0e725c3f82e394939b14685548 12 May 2015, 03:41:51 UTC
9337dec Context window parameter for patent training data Former-commit-id: ed5b8e10a25a0f7a2da59925a4f130ea1cf9feff 12 May 2015, 03:41:14 UTC
a6d834a Integrate the new documentation Former-commit-id: 8449d8279d5907c9fbc8bf34ee65952f2af02d30 09 May 2015, 20:59:16 UTC
f92dc7a Move all the doc to mkdocs Former-commit-id: 012e82dde7ea2293ae2295087044e21d9deca9a9 09 May 2015, 20:43:17 UTC
2a6a98f It seems that ReadTheDocs does not use the latest version of mkdocs It does not help ;) Former-commit-id: bbee116dae2f97610f56fccf54768303afb128e6 09 May 2015, 04:34:42 UTC
01411fb Try to solve the section issue observed with ReadTheDocs mkdocs locally works fine Former-commit-id: 8488c381dea712e7723dfd3a261f0e0381759de5 09 May 2015, 04:29:06 UTC
a22ca18 Try to get the doc section correctly Former-commit-id: d162566eaeabd54ecf66f5d39ed405d252e55541 09 May 2015, 04:20:23 UTC
5a9b0c8 New attempt to have the doc building via readthedocs.org Former-commit-id: 998971043d2a0e9a0bf763b3db5e69a2b0ec5d1d 09 May 2015, 04:09:03 UTC
eb462ef Using mkdocs and ReadTheDocs for the doc... Former-commit-id: fce8472d28a92062c78095bc7ffafca18f07e47e 09 May 2015, 03:50:26 UTC
d309a6b New attempt to get the doc built Former-commit-id: 61d67e14bb8a392b8a4a6f1c8bd65d669794cb8a 09 May 2015, 01:04:58 UTC
92f11ef (Re)try to config the doc Former-commit-id: afedce032a42831762da072b2a39c796e27a3a14 09 May 2015, 01:01:26 UTC
6f8cb70 Test doc config Former-commit-id: 45d4d0615b12b4c5f055c9b6bb188131f6010548 09 May 2015, 00:18:37 UTC
ea744b8 Try more serious documentations with mkDocs Former-commit-id: 0d757130a8e0ac7244745bf3daa2d57eaf73146a 08 May 2015, 23:33:43 UTC
7b5ae60 Additional patent training data Former-commit-id: 74f109cab6c88411cce7dee9d177f8b098b6f9b2 06 May 2015, 03:02:35 UTC
2fb6ce5 Move GrobidTimer Former-commit-id: 2a10557374639bf551f1b145b54843b4192b84d5 03 May 2015, 18:01:04 UTC
acdd951 Yet another attempt to make the coveralls maven plugin working Former-commit-id: d2087558c898509d93056f7d0901c5e61038be93 03 May 2015, 00:50:28 UTC
dcf8f7a Still testing the coveralls maven plugin Former-commit-id: 8501c75d36d0dcbcb5e9116f2b3945d6c2ba5b65 02 May 2015, 21:41:08 UTC
1fdfa58 New try for getting coveralls reports Former-commit-id: 9d2c2a375cdfbcba17fba3942f685c972f055c58 02 May 2015, 20:24:33 UTC
cf091ea Still trying coveralls maven plugin Former-commit-id: 36ec55e11f713f4a92dc1137ec4f420a88429607 02 May 2015, 20:02:32 UTC
ac1134a Testing coveralls maven plugin Former-commit-id: bcbd586ae5ddfcbfbef915af6d5075b7b7a1b413 02 May 2015, 19:18:54 UTC
6a723ad Trying https://coveralls.io service Former-commit-id: 7e4893afb92dfaf0a560b36208ce49b23f8646f4 02 May 2015, 18:54:46 UTC
54fbc97 Goodbye Jenkins and welcome travis-ci Former-commit-id: 0e0358a8db22f98a52b4d600de6323ab3334bf11 02 May 2015, 17:57:04 UTC
14c8898 Right version of the sax parser Former-commit-id: 2d858ee11824a38cc017786e9fcfbd2d1926b153 02 May 2015, 17:48:23 UTC
faf2821 Add training data for patents Former-commit-id: 6fab24848a576d7ed580c14bed6bc9d7febd781e 02 May 2015, 17:18:21 UTC
596cbac Try to get back the build status icon... Former-commit-id: 1f22d212c8c55be428c9242e705e009df1389a1e 30 April 2015, 15:08:23 UTC
a15e762 Fix a problem with language code Former-commit-id: 9f88c36253f66e2d399f2c6f4b3bc0206460eeef 30 April 2015, 13:20:19 UTC
500525c Extraction of the kind codes and output of the original patent number in addition to epodoc format Former-commit-id: aa4cf4fd8f4b0b035bd299765dc8ecba620e8f9e 30 April 2015, 01:31:42 UTC
72f7570 Add tests for CJK patent processing Former-commit-id: b5570a1d859b4f79aeba7516f033676aee02b286 26 April 2015, 21:43:56 UTC
back to top