https://github.com/grangier/python-goose

sort by:
Revision Author Date Message Commit Date
066a3c0 Merge branch 'release/1.0.23' 30 December 2014, 07:33:58 UTC
4eda345 bump version 30 December 2014, 07:33:45 UTC
57b1534 #185 - movies info 30 December 2014, 07:31:00 UTC
ce6d8a1 Merge branch 'feature/extract-183' into develop 30 December 2014, 07:06:04 UTC
22ded4b #183 - use article tag for a top node 30 December 2014, 07:04:47 UTC
fe5f5e9 #183 - pep8 30 December 2014, 06:47:52 UTC
4632df7 #182 - rename soup parser 30 December 2014, 06:36:15 UTC
0e6201d #81 - use correct language for stopwords file 30 December 2014, 06:30:47 UTC
413037f Merge branch 'KillaW0lf04-develop' into develop 30 December 2014, 06:19:41 UTC
8d18a8e Merge branch 'develop' of https://github.com/KillaW0lf04/python-goose into KillaW0lf04-develop Conflicts: goose/crawler.py 30 December 2014, 06:16:03 UTC
b70075a Merge branch 'feature/extract-115' into develop 30 December 2014, 06:09:52 UTC
c7ec678 #115 - use known content tags to be article main body 30 December 2014, 06:09:02 UTC
b5ddaf1 #115 - add issue 115 test files 30 December 2014, 06:02:42 UTC
e404f1b #115 - remove businessinsider tests case due to no valid html 30 December 2014, 06:02:06 UTC
4fd94de Merge branch 'feature/title-extraction-137' into develop 30 December 2014, 03:41:13 UTC
148ce9b #137 - more explicit error message 30 December 2014, 03:40:36 UTC
bd96c94 #137 - refactor title extraction based on opengraph, meta headling and title element 30 December 2014, 03:39:57 UTC
66b63fc #137 - fetch opengraph before title 30 December 2014, 03:39:16 UTC
0e370dc #137 - corrected title 30 December 2014, 03:38:05 UTC
d31112b #137 - corrected title 30 December 2014, 03:37:43 UTC
3ff269e #137 - use og:title in test case 30 December 2014, 03:37:16 UTC
655aca6 #137 - test separator 30 December 2014, 03:36:44 UTC
b04f1e9 #137 - opengraph title test case 30 December 2014, 02:09:47 UTC
502053d Merge branch 'feature/empty-content-129' into develop 30 December 2014, 01:42:22 UTC
a36b5a8 #129 - force articleBody to be the document root if found 30 December 2014, 01:41:44 UTC
e452c23 #129 - add issue test case 30 December 2014, 01:41:16 UTC
6f00464 Merge branch 'feature/article-infos-177' into develop 30 December 2014, 00:51:10 UTC
37e24b2 #177 - add top image to returned dict 30 December 2014, 00:49:55 UTC
206f6e2 #177 - info method return article data as dict 30 December 2014, 00:48:11 UTC
6338f68 #177 - title is empty string by default 30 December 2014, 00:47:41 UTC
96caa3c #177 - tags are a list 30 December 2014, 00:47:11 UTC
989ab24 #175 - links extraction tests 30 December 2014, 00:11:52 UTC
321fb86 #173 - authors extraction test case 30 December 2014, 00:05:04 UTC
848acf8 #172 - tweet extraction tests 29 December 2014, 23:53:02 UTC
21a67ca Merge branch 'pistolero-develop' into develop 29 December 2014, 23:33:17 UTC
90b041d #171 - do not increment version yet 29 December 2014, 23:32:47 UTC
90b3cac Replaced bare except with except Exception 29 December 2014, 12:32:02 UTC
27709a4 Merge branch 'feature/extract-twitter-169' into develop 29 December 2014, 04:03:49 UTC
af49328 #169 - extract tweets 29 December 2014, 04:02:28 UTC
675c077 #169 - extract tweets 29 December 2014, 04:01:32 UTC
cda2ef6 #142 - extract authors 29 December 2014, 02:48:55 UTC
8a4ecf2 Merge branch 'feature/links-139' into develop 29 December 2014, 02:27:31 UTC
0a3303e #139 - article links extract method 29 December 2014, 02:26:33 UTC
4adf4bc #139 - extract article links 29 December 2014, 02:26:06 UTC
124371e #139 - article links property 29 December 2014, 02:25:40 UTC
517eb67 Merge branch 'feature/opengraph-165' into develop 29 December 2014, 01:55:14 UTC
6bbe2db #165 - rename article body extraction test 29 December 2014, 01:54:33 UTC
c2eb34e #165 - opengraph extraction test 29 December 2014, 01:53:09 UTC
eb1274b #165 - rename dict 29 December 2014, 01:52:37 UTC
101e69c #165 - opengraph extractor 29 December 2014, 01:43:31 UTC
a27cfff #165 - extract opengraph data 29 December 2014, 01:42:18 UTC
f8fc13d #165 - add opengraph property to article 29 December 2014, 01:41:56 UTC
31cf754 Merge branch 'feature/publish-date-163' into develop 29 December 2014, 01:12:20 UTC
5910f39 #163 - do not use only meta for publication date 29 December 2014, 01:11:36 UTC
2498065 #163 - add schema published date parsing test 29 December 2014, 01:11:01 UTC
f6647fc Merge pull request #5 from cronycle/feature/4-publish-date feature(extractors/publish_date): Extract publish date from meta tags. Conflicts: tests/extractors.py 29 December 2014, 00:54:52 UTC
4a430c8 Merge branch 'feature/parser-fallback-161' into develop 29 December 2014, 00:25:30 UTC
eaaa60a #161 - parser fallback 29 December 2014, 00:25:05 UTC
048bcdb #161 - add parser list variable 29 December 2014, 00:24:42 UTC
ced075f #160 - fail silently for unknown images 29 December 2014, 00:02:21 UTC
f7ccb2e Merge branch 'feature/open-graph-157' into develop 28 December 2014, 23:31:24 UTC
f28a6e7 #157 - refactor 28 December 2014, 23:26:54 UTC
9379cd8 #157 - corrected content with microdata 28 December 2014, 23:17:30 UTC
b8991df #157 - remove print 28 December 2014, 22:35:40 UTC
6215ffa #157 - add test case 28 December 2014, 22:34:28 UTC
71f1dec #157 - hanbdle schema.org microdata 28 December 2014, 22:32:42 UTC
5ac4a32 #157 - remove childnode one by one to keep parent node 28 December 2014, 22:31:26 UTC
c31c9c4 #157 - add test case files 28 December 2014, 22:30:42 UTC
f5dc260 Merge branch 'release/1.0.22' 14 September 2014, 16:30:17 UTC
55c88c1 bumped version 14 September 2014, 16:29:57 UTC
f4495b0 Merge branch 'feature/timeout-138' into develop 14 September 2014, 16:29:30 UTC
d3e3693 Merge branch 'feature/typo-145' into develop 14 September 2014, 16:28:31 UTC
1366806 Merge branch 'polyrabbit-develop' into feature/typo-145 14 September 2014, 16:27:19 UTC
ef1ed89 Fixed a typo 14 September 2014, 16:27:02 UTC
fba20fd Merge branch 'release/1.0.21' 14 September 2014, 16:23:30 UTC
b7ebaf0 bump version 14 September 2014, 16:23:12 UTC
f95c10f Merge branch 'feature/zh-146' into develop 14 September 2014, 16:20:24 UTC
bc928d1 Merge branch 'poying-patch-1' into feature/zh-146 14 September 2014, 16:18:36 UTC
e3e6ec0 Merge branch 'patch-1' of https://github.com/poying/python-goose into poying-patch-1 14 September 2014, 16:18:03 UTC
f2f4fe1 #138 - add default timeout 14 September 2014, 16:17:02 UTC
96db620 add chinese traditional stopwords 14 September 2014, 05:32:47 UTC
b933004 Fix minor spelling error "handling" 03 August 2014, 18:09:08 UTC
2916126 Do not fail when stopword list is not available for a certain language 01 August 2014, 20:15:30 UTC
b6a54f9 Use PEP8 convention for boolean statements 01 August 2014, 19:44:23 UTC
a7898ac Fix minor spelling mistake 01 August 2014, 19:44:08 UTC
a275c45 Merge branch 'master' of github.com:grangier/python-goose 14 July 2014, 13:34:06 UTC
7ba243a Merge branch 'master' of github.com:grangier/python-goose into develop 14 July 2014, 13:33:30 UTC
a881193 Merge branch 'release/1.0.20' into develop 14 July 2014, 13:31:36 UTC
93e8239 Merge branch 'release/1.0.20' 14 July 2014, 13:31:32 UTC
c19accb bumped version 14 July 2014, 13:31:18 UTC
bad1b21 Updated bad classes regex * Sites with *links* in a class previously failed. This should likely be an exact match style (^links$) * Test included. 02 July 2014, 22:10:40 UTC
a4013c1 Merge branch 'release/v1.0.19' into develop 29 June 2014, 09:33:39 UTC
8e2f875 Merge branch 'release/v1.0.19' 29 June 2014, 09:33:33 UTC
0865e77 bumped version 29 June 2014, 09:33:21 UTC
f47b6de Merge branch 'feature/fdopen-110' into develop 29 June 2014, 09:31:56 UTC
4d67d06 Merge branch 'develop' of https://github.com/selvamarcadu/python-goose into selvamarcadu-develop 29 June 2014, 09:30:55 UTC
5177b0c Merge branch 'feature/isnot-114' into develop 29 June 2014, 09:30:19 UTC
1cfc498 Merge branch 'develop' of https://github.com/ankushshah89/python-goose into ankushshah89-develop 29 June 2014, 09:27:42 UTC
06f9ae3 Minor README tweaks 25 June 2014, 16:14:50 UTC
d75bbfb BugFix: wrong equality comparision 29 May 2014, 15:02:17 UTC
back to top