https://github.com/grangier/python-goose

sort by:
Revision Author Date Message Commit Date
05e66b4 Merge branch 'release/v1.0.3' 13 October 2013, 18:57:43 UTC
31e51e8 #50 - add italian stopwords 13 October 2013, 18:42:37 UTC
1bb368e Handle missing src attribute values for known image elements. Calls to UpgradedImageIExtractor.get_image in check_known_elements were not checking the validity of the extracted src before trying to use it to build the image. This caused an exception when build_image tried to parse the src URL. 10 October 2013, 20:43:22 UTC
050b11f Merge pull request #38 from grangier/imageextractor known image css work at element level 20 August 2013, 06:57:49 UTC
67f5ab4 known image css work at element level 19 August 2013, 19:21:50 UTC
e008ac1 add test for known_image_css, known_image_name, opengraph_tag 18 August 2013, 23:41:59 UTC
692a80d move assert_top_image to image test file 18 August 2013, 23:40:43 UTC
2b1ac2a return image even if local_image is None 18 August 2013, 23:39:23 UTC
0c0a3bf pep8 18 August 2013, 22:30:03 UTC
9cd3417 no need for OSX setup info 18 August 2013, 21:08:44 UTC
cf9af14 updated README 18 August 2013, 20:53:49 UTC
22391da Add cookie handeling exemple in known issues 18 August 2013, 18:29:20 UTC
40a1993 add gizmodo tests 18 August 2013, 17:54:39 UTC
6f77d7e add clean_article_tags method, clean classes, ids and name attribute from article tags 18 August 2013, 17:54:09 UTC
68100e4 add delAttribute method to parser 18 August 2013, 17:37:29 UTC
7c989ba better handeling of object videos 14 August 2013, 12:54:43 UTC
2bd252c Update README.rst 13 August 2013, 10:18:02 UTC
79ecfac Merge pull request #33 from grangier/video video extraction of youtube, vimeo, dailymotion and kewego 12 August 2013, 23:56:09 UTC
505f3b1 imports first 12 August 2013, 23:52:12 UTC
7be02d1 bump to version 1.0.2 12 August 2013, 23:49:05 UTC
c1fe968 video exemple 12 August 2013, 23:47:41 UTC
e8bfbca video iframe and embed tests 12 August 2013, 23:42:43 UTC
de01b00 check if video candidate has a src attribute 12 August 2013, 21:36:56 UTC
a8e242c basic video extraction 12 August 2013, 21:28:12 UTC
00dfcaa useless functions 11 August 2013, 19:58:45 UTC
1ddbf2d use loadConfig instead of getArticle in images tests 11 August 2013, 19:33:51 UTC
c88994a use correct version for browser_user_agent configuration 10 August 2013, 17:16:49 UTC
9e2b2be more image detail/utils tests 10 August 2013, 17:11:01 UTC
61041d3 basic ImageUtilsTests test cases 10 August 2013, 17:01:23 UTC
8a9c563 rename ImageTests to ImageExtractionTests 10 August 2013, 16:13:38 UTC
f3161e0 remove useless imports 10 August 2013, 15:04:03 UTC
f611627 remove useless comment 10 August 2013, 15:02:22 UTC
a88448f ignore ._* 10 August 2013, 14:55:42 UTC
190d38b ignore ._.DS_Store files 10 August 2013, 14:54:44 UTC
9ed832b link contributors to github repository 10 August 2013, 12:09:05 UTC
2bafd86 add a TODO notice 10 August 2013, 11:03:16 UTC
3484135 uncomment callback class 10 August 2013, 11:03:04 UTC
0f91505 tags tests are back to extractors 10 August 2013, 10:53:06 UTC
23c0a5f mocked tags tests 10 August 2013, 10:51:03 UTC
5fdbc82 rename tags tests files to reflect testcase name 10 August 2013, 10:11:04 UTC
7e78bef rename article_tags folder to tags 10 August 2013, 10:07:25 UTC
2adfb5e remove useless print 09 August 2013, 21:59:23 UTC
3435aeb extractors tests now used mocked urllib2 09 August 2013, 21:49:34 UTC
4d1ccaf use urlib2.Request instrand of HTTPhandler 09 August 2013, 20:28:26 UTC
f6eb9e2 directories don't existes anymore 09 August 2013, 20:27:15 UTC
afb201f remove useless comments 08 August 2013, 21:18:00 UTC
aa35144 basic mocked handler for images 08 August 2013, 21:09:06 UTC
7f17c0b basic mocked handler and response 08 August 2013, 21:07:35 UTC
2dff006 use ReStructuredText for README instead of markdown 05 August 2013, 07:09:00 UTC
32f4dae fall back for long description 04 August 2013, 23:19:40 UTC
0032233 bump version 1.0.1 04 August 2013, 23:13:39 UTC
1f36c95 package name is goose-extractor since Goose is already a package name in pypi 04 August 2013, 23:04:53 UTC
a36192c add classifiers and long description to setup file 04 August 2013, 22:58:37 UTC
807d992 Update README.md 04 August 2013, 16:12:07 UTC
f31d786 Merge pull request #30 from grangier/testsplit Split tests in sub modules 04 August 2013, 16:09:26 UTC
d002a95 missing blank line 04 August 2013, 16:03:48 UTC
7cd0633 remove tests.py file 04 August 2013, 16:02:51 UTC
5d4dc6d useless imports 04 August 2013, 16:00:53 UTC
5615ade remove print 04 August 2013, 15:58:21 UTC
6a0a034 split test module in subs modules 04 August 2013, 15:55:33 UTC
1aa6e83 correct test case name 04 August 2013, 15:25:28 UTC
65811d7 correct test case name 04 August 2013, 15:24:11 UTC
f8521e5 basic testcase for arabic content 04 August 2013, 15:23:09 UTC
737f09e typo 04 August 2013, 15:17:19 UTC
fae9d53 missing nltk in requirements.txt 03 August 2013, 23:04:46 UTC
39f90f5 Merge pull request #29 from grangier/isri #24 Goose now handles Arabic content 03 August 2013, 23:02:14 UTC
f846577 How to use Goose in Arabic 03 August 2013, 22:57:11 UTC
7026e6d basic arabic handeling 03 August 2013, 22:47:35 UTC
bbfc682 Merge pull request #28 from litso/topic_tags Add support for tags in a /topic/ link 03 August 2013, 22:13:26 UTC
7879711 Add support for tags in a /topic/ link 03 August 2013, 22:04:02 UTC
100e661 typo 01 August 2013, 01:53:31 UTC
89ff5b1 StopWord class refactor 01 August 2013, 01:52:07 UTC
4790fc7 StopWord class refactor 01 August 2013, 01:51:10 UTC
a24cf18 useless code in StopWordsChinese class 01 August 2013, 01:18:10 UTC
9ba4b5b Merge pull request #27 from psilva261/handle_br let br tags create newlines; fixes issue #25 in grangier/python-goose 01 August 2013, 00:47:23 UTC
a7f17a7 let br tags create newlines; fixes issue #25 in grangier/python-goose code mostly from https://github.com/ChorHizzle/python-goose/commit/e6b41bc267efaa9c79a1a214278bc56a44deeb7b 31 July 2013, 21:51:16 UTC
37ed4dc Merge pull request #26 from litso/improve_tags Support for CNET style tags 31 July 2013, 20:11:50 UTC
44dfecc Support for CNET style tags 31 July 2013, 19:30:13 UTC
d2bf2ba fix tests for python 2.6 26 July 2013, 21:54:13 UTC
f14a364 #23 Don't rely on /tmp/goosetmp existing or being writable any more 26 July 2013, 21:30:12 UTC
4d2897f Don't rely on /tmp/goosetmp existing or being writable any more. Use Python to determine an appropriate temporary directory. 26 July 2013, 20:49:32 UTC
458cfe0 Correct typo 24 July 2013, 20:02:09 UTC
e5e3069 we use BS3 not BS4 30 June 2013, 08:06:27 UTC
571a538 Merge branch 'multiparser' Conflicts: goose/extractors.py setup.py 30 June 2013, 08:00:22 UTC
da269bd Corrected URL in setup 30 June 2013, 07:43:20 UTC
5253768 Merge pull request #21 from muckrack/master Use Pillow instead of PIL 27 June 2013, 19:32:26 UTC
c63edba useless self.language 26 June 2013, 17:59:29 UTC
ed58462 Update repository URL 26 June 2013, 17:52:26 UTC
a1ee4e6 Update to work with Pillow instead of PIL 25 June 2013, 19:27:06 UTC
383eb8e Merge pull request #20 from litso/improved_article_tags Improved article tag extraction 16 June 2013, 08:23:01 UTC
fa99a77 Improved tag extraction Including the ability to match tags that have either /tag/ or /tags/ in the URL. 14 June 2013, 01:05:36 UTC
370679f Ability to pass parser_class to configuration object 27 May 2013, 06:49:15 UTC
8c295c1 Goose requires bs3 instead of bs4 27 May 2013, 06:45:20 UTC
65f454b Merge branch 'multiparser' of github.com:xgdlm/python-goose 27 May 2013, 06:38:02 UTC
23861ea Rename TestParserBS4 TestParserSoup 18 May 2013, 08:28:51 UTC
51afcf1 lxml html parser or souparser can be use 18 May 2013, 08:26:45 UTC
3a9142c fix #15 - remove class from cleaner regex 18 May 2013, 08:21:36 UTC
50a787c be ready for bs4 09 May 2013, 14:51:37 UTC
2273703 get parser class from config (something went wrong with previous merge) 09 May 2013, 14:48:43 UTC
9e8c268 Merge branch 'multiparser' of github.com:xgdlm/python-goose into multiparser 09 May 2013, 14:42:30 UTC
back to top