Revision history - refs/heads/poincare_model_gradients - origin: https://github.com/RaRe-Technologies/gensim

visit type:

Revision	Author	Date	Message	Commit Date
90ac214	Jayant Jain	31 October 2017, 20:54:21 UTC	Implements loss and gradients with modified objective	01 November 2017, 10:25:50 UTC
f22d9b2	Jayant Jain	31 October 2017, 11:31:30 UTC	Prints average loss every few iterations instead of current loss	31 October 2017, 11:31:30 UTC
9c51609	Jayant Jain	30 October 2017, 17:38:50 UTC	Fixes typo in clip_vectors	30 October 2017, 17:38:50 UTC
0c57aa1	Jayant Jain	30 October 2017, 09:56:25 UTC	Merge branch 'poincare' into poincare_model	30 October 2017, 09:56:25 UTC
2a5a7fb	Jayant Jain	30 October 2017, 09:47:20 UTC	Minor correction in clipping	30 October 2017, 09:47:20 UTC
3ea3730	Jan Pomikálek	30 October 2017, 09:46:38 UTC	Merge pull request #1643 from RaRe-Technologies/poincare_eval Evaluation of existing Poincaré embedding implementations	30 October 2017, 09:46:38 UTC
71f61d1	Jayant Jain	27 October 2017, 21:07:58 UTC	Adds batch-wise implementation of training and gradient computations	27 October 2017, 21:11:09 UTC
ba82d42	Jayant Jain	27 October 2017, 13:08:22 UTC	Simply sets nan gradients to zero instead of nan_to_num	27 October 2017, 13:08:22 UTC
7d68aae	Jayant Jain	27 October 2017, 13:00:40 UTC	Only calls nan_to_num when gamma has at least one value equal to 1	27 October 2017, 13:00:40 UTC
3b2a383	Jayant Jain	27 October 2017, 08:37:48 UTC	Avoids creating copies of numpy vectors	27 October 2017, 10:51:46 UTC
e1ed24d	Jayant Jain	27 October 2017, 07:54:58 UTC	Avoids doing some numpy computations twice	27 October 2017, 10:51:46 UTC
d439501	Jayant Jain	27 October 2017, 06:15:51 UTC	Compares computed gradients to autograd gradients every few iterations	27 October 2017, 10:51:38 UTC
d72cb10	Jayant Jain	27 October 2017, 06:14:52 UTC	Renames PoincareDistance to PoincareExample for clarity	27 October 2017, 06:14:52 UTC
2e9e31c	Jayant Jain	26 October 2017, 22:17:54 UTC	Better messages while training	26 October 2017, 22:18:58 UTC
99a2270	Jayant Jain	26 October 2017, 22:09:42 UTC	Fixes error in gradient computation	26 October 2017, 22:09:42 UTC
3e28e8b	Jayant Jain	26 October 2017, 14:36:50 UTC	Correct implementation of clipping of updated vectors	26 October 2017, 20:13:26 UTC
e286a0b	Jayant Jain	26 October 2017, 06:18:22 UTC	Adds calculation of gradients for poincare model	26 October 2017, 20:13:26 UTC
1e6aee1	Jayant Jain	25 October 2017, 10:47:32 UTC	minor changes to batch poincare distance computation	26 October 2017, 20:13:26 UTC
b727523	Jayant Jain	25 October 2017, 06:40:02 UTC	batched gradient descent initial implementation	26 October 2017, 20:13:26 UTC
98f94a7	Jayant Jain	25 October 2017, 06:39:23 UTC	allows poincare dist function to be differentiable by autograd	26 October 2017, 20:13:26 UTC
6bd0d4b	Jayant Jain	25 October 2017, 06:38:03 UTC	faster negative sampling, bugfix in vector updates	26 October 2017, 20:13:26 UTC
a804006	Jayant Jain	23 October 2017, 14:38:40 UTC	Initial implementation of training using autograd	26 October 2017, 20:13:26 UTC
6afdd22	Jayant Jain	23 October 2017, 12:53:40 UTC	Initial classes and loading data for poincare model	26 October 2017, 20:13:26 UTC
99089a5	Jayant Jain	26 October 2017, 12:46:18 UTC	Doesnt load all models into memory at once	26 October 2017, 19:31:18 UTC
1e7ddd8	Jayant Jain	26 October 2017, 12:05:26 UTC	Adds poincare nb requirements, moves imports to beginning	26 October 2017, 19:31:18 UTC
e80a834	Jayant Jain	26 October 2017, 07:50:37 UTC	Minor fixes to poincare eval notebook	26 October 2017, 19:31:18 UTC
17390ac	Jayant Jain	25 October 2017, 17:28:33 UTC	Adds results of numpy poincare embeddings on link prediction, minor improvements	26 October 2017, 19:31:18 UTC
c07d582	Jayant Jain	25 October 2017, 09:03:15 UTC	Adds cleaner setup	26 October 2017, 19:31:18 UTC
7cbd6b9	Jayant Jain	25 October 2017, 06:43:37 UTC	Adds patch for external numply implementation to repo	26 October 2017, 19:31:18 UTC
5d72642	Jayant Jain	25 October 2017, 05:59:34 UTC	Adds code for training and loading external numpy models, results	26 October 2017, 19:31:18 UTC
53fcf23	Jayant Jain	24 October 2017, 16:19:08 UTC	More readable results	26 October 2017, 19:31:18 UTC
d7f1840	Jayant Jain	24 October 2017, 11:26:51 UTC	Corrects implementation of MAP, updated results	26 October 2017, 19:31:18 UTC
0d593f0	Jayant Jain	24 October 2017, 08:46:39 UTC	Adds initial implementation of MAP and MAP scores	26 October 2017, 19:31:18 UTC
a415c65	Jayant Jain	24 October 2017, 06:07:06 UTC	Adds results for all models on link prediction	26 October 2017, 19:31:18 UTC
1f15aeb	Jayant Jain	23 October 2017, 11:04:49 UTC	Minor fixes to poincare nb - change in variable name, relative path, misaligned header	26 October 2017, 19:31:18 UTC
c062814	Jayant Jain	18 October 2017, 21:25:30 UTC	Adds patch file to setup and all results	26 October 2017, 19:31:18 UTC
efd1fe0	Jayant Jain	18 October 2017, 11:12:19 UTC	Adds patch for C++ poincare implementation	26 October 2017, 19:31:18 UTC
c17846f	Jayant Jain	17 October 2017, 20:34:05 UTC	Implements link prediction task for poincare and adds results	26 October 2017, 19:31:18 UTC
69b4d61	Jayant Jain	17 October 2017, 07:53:53 UTC	Adds setup and training steps to notebook, tabulated results	26 October 2017, 19:31:18 UTC
fd86c32	Jayant Jain	12 October 2017, 11:13:49 UTC	Adds complete optimized evaluation of lexical entailment to notebook	26 October 2017, 19:31:18 UTC
0a06fd5	Jayant Jain	12 October 2017, 07:10:19 UTC	Adds initial evaluation for lexical entailment on HyperLex	26 October 2017, 19:31:18 UTC
9a511a7	Jayant Jain	11 October 2017, 18:32:22 UTC	More efficient computation of mean rank for graph reconstruction	26 October 2017, 19:31:18 UTC
51bd7ab	Jayant Jain	11 October 2017, 14:09:10 UTC	Adds initial poincare evaluation notebook	26 October 2017, 19:31:18 UTC
a068cbe	Frank Luan	26 October 2017, 12:00:43 UTC	Fix deprecation warnings for regex string literals. Fix #1646 (#1649) * Fix deprecation warnings for regex string literals. Fix #1646 Add raw flag before all Regex strings so Python 3 can stop complaining. * Fix two more occurrences of unescaped Regex strings	26 October 2017, 12:00:43 UTC
00192a8	Alexander Ankudinov	26 October 2017, 11:08:40 UTC	Fix pagerank algorithm. Fix #805 (#1653) * added a regression test for summarization.keywords() * handled case with graph smaller than 3 nodes * removed TODO about complex eigenvectors * added more comments	26 October 2017, 11:08:40 UTC
b912203	Menshikh Ivan	26 October 2017, 05:53:41 UTC	Drop Win x32 support & add 'rolling builds' (#1652) * disable x32 builds * add rolling build	26 October 2017, 05:53:41 UTC
67d9634	Menshikh Ivan	25 October 2017, 18:34:27 UTC	Fix code/docstring style (#1650) * replace open->smart_open in annoy tutorial * style fixes for lda model diff * fix for #1390 * fix for #1423 * fix doc in Phrases	25 October 2017, 18:34:27 UTC
9481915	Bruno Marques	24 October 2017, 13:25:11 UTC	Improve error message for supervised fastText. Fix #1498 (#1645)	24 October 2017, 13:25:11 UTC
a5872fa	Michael W. Sherman	24 October 2017, 12:22:54 UTC	Fix scoring function in Phrases. Fix #1533, #1635 (#1573) * initial commit of fixes in comments of #1423 * removed unnecessary space in logger * added support for custom Phrases scorers * fixed Phrases.__getitem__ to support pluggable scoring #1533 * travisCI style fixes * fixed __next__() to next() for python 3 compatibilyt * misc fixes * spacing fixes for style * custom scorer support in sklearn api * Phrases scikit interface tests for pluggable scoring * missing line breaks * style, clarity, and robustness fixes requested by @piskvorky * check in Phrases init to make sure scorer is pickleable * backwards scoring compatibility when loading a Phrases class * removal of pickle testing objects in Phrases init * switched to six for python 2/3 compatibility * fix docstring	24 October 2017, 12:22:54 UTC
7f23a2c	Marius Cobzarenco	24 October 2017, 11:22:14 UTC	Fix FastText inconsistent dtype. Fix #1637 (#1638)	24 October 2017, 11:22:14 UTC
8097cad	Marius Cobzarenco	24 October 2017, 11:10:09 UTC	Add configuration for flake8 to setup.cfg (#1636)	24 October 2017, 11:10:09 UTC
9266aba	Nehal J Wani	24 October 2017, 06:27:38 UTC	Fix test_filename_filtering test (#1647) CI tests fail with: ====================================================================== FAIL: test_filename_filtering (gensim.test.test_corpora.TestTextDirectoryCorpus) ---------------------------------------------------------------------- Traceback (most recent call last): File ".../lib/python3.6/site-packages/gensim/test/test_corpora.py", line 462, in test_filename_filtering self.assertEqual(expected, filenames) AssertionError: Lists differ: ['/tmp/tmp0j1tou_7/test1.log', '/tmp/tmp0j1tou_7/test2.log'] != ['/tmp/tmp0j1tou_7/test2.log', '/tmp/tmp0j1tou_7/test1.log'] It's not a real failure, since the files are correct, only their order of comparison is not same	24 October 2017, 06:27:38 UTC
58b30d7	Dmitry Petukhov	21 October 2017, 10:59:18 UTC	Add "DOI badge" to README page. Fix #1610 (#1639) * Add "DOI badge" to gensim #1610 * reorder badges	21 October 2017, 10:59:18 UTC
047ab12	Karamax	21 October 2017, 10:54:16 UTC	Remove duplicate notebook. Fix #1415 (#1640)	21 October 2017, 10:54:16 UTC
e92b45d	jodevak	19 October 2017, 06:37:41 UTC	Add build_vocab_from_freq to Word2Vec, speedup scan_vocab (#1599) * fix build vocab speed issue, and new function to build vocab from previously provided word frequencies table * fix build vocab speed issue, function build vocab from previously provided word frequencies table * fix build vocab speed issue, function build vocab from previously provided word frequencies table * fix build vocab speed issue, function build vocab from previously provided word frequencies table * Removing the extra blank lines, documentation in numpy-style to build_vocab_from_freq, and hanging indents in build_vocab * Fixing Indentation * Fixing gensim/models/word2vec.py:697:1: W293 blank line contains whitespace * Remove trailing white spaces * Adding test * fix spaces	19 October 2017, 06:37:41 UTC
1a1fc44	horpto	18 October 2017, 07:05:40 UTC	Fix duplication and wrong markup in docs (#1633) * Fixed build of docs: - duplication of the citates from word2vec and doc2vec, - wrong markup of lists in the scripts, - some typos. * Add missing 'tensor' word	18 October 2017, 07:05:40 UTC
2690289	Jack Wu	17 October 2017, 05:51:34 UTC	Add "most_similar_to_given" method for KeyedVectors (#1582) * finished adding 2 new functions * imported argmax to word2vec * reformatted * remove `most_similar_to_given` from w2v class * Fix PEP8	17 October 2017, 05:51:34 UTC
1c7e72f	Parul Sethi	16 October 2017, 11:18:40 UTC	Refactor dendrogram & topic network notebooks (#1571) * remove plotly's dendrogram code in notebooks * pin plotly, re-run notebooks	16 October 2017, 11:18:40 UTC
e9bbcf3	Filip Stefanak	16 October 2017, 07:01:46 UTC	Remove unnecessary assert blocking direct usage of CSC for LSI (#1622)	16 October 2017, 07:01:46 UTC
9166de2	Menshikh Ivan	16 October 2017, 06:38:00 UTC	Fix release badge (#1631)	16 October 2017, 06:38:00 UTC
16b812c	Filip Stefanak	13 October 2017, 07:35:59 UTC	Add dtype support for LSI (#1620) * Enable float32 for LSI - stochastic SVD * Fix PEP8 issue * - Add testTransformFloat32 - fix float32 for one-pass LSI	13 October 2017, 07:35:59 UTC
44b0403	Filip Stefanak	13 October 2017, 06:19:39 UTC	Add __getitem__ method to Sparse2Corpus to allow direct queries (#1621) * Add __getitem__ method to Sparse2Corpus to allow direct queries * Fix PEP8 * Add docstring for Sparse2Corpus.__getitem__	13 October 2017, 06:19:39 UTC
9a6d78c	ivan	12 October 2017, 08:45:34 UTC	Merge branch 'master' into develop	12 October 2017, 08:45:34 UTC
86e0618	ivan	12 October 2017, 08:44:33 UTC	Merge branch 'release-3.0.1'	12 October 2017, 08:44:33 UTC
1c26225	ivan	12 October 2017, 08:43:38 UTC	update changelog to 3.0.1	12 October 2017, 08:43:38 UTC
90e9a43	ivan	12 October 2017, 05:53:28 UTC	bump version to 3.0.1	12 October 2017, 05:53:28 UTC
b0f80a6	Jan Berkel	11 October 2017, 18:24:07 UTC	Fix spelling (#1625)	11 October 2017, 18:24:07 UTC
c220166	Menshikh Ivan	06 October 2017, 10:26:06 UTC	Fix Keras import, speedup importing time. Fix #1614 (#1615) * Move Keras import to get_embedding_layer. * rename `get_embedding_layer ` as `get_keras_embedding`	06 October 2017, 10:26:06 UTC
0ef1ece	Timofey Efimov	05 October 2017, 11:37:31 UTC	Fix sphinx warnings and retrieve all missing .rst (#1612) * Fix typo * Make `save_corpus` private * Annotate `bleicorpus.py` * Make __save_corpus weakly private * Fix _save_corpus in tests * Fix _save_corpus[2] * Fix relativly obvious sphinx warnings * Fix sphinx warnings * Revert "Fix sphinx warnings" This reverts commit 8c00de8fc4a09bc9e3597eb6d8a95363543634f8. * Revert "Fix relativly obvious sphinx warnings" This reverts commit 7fbdf5db550be685d9b498a4da6a83e3c1e9b23d. * Revert "Fix _save_corpus[2]" This reverts commit b65a69a4b0313a7670b620a28411478ed8715cca. * Revert "Fix _save_corpus in tests" This reverts commit 69fc7e04a1c82cc7b72be231bbd3df207f50fe0b. * Revert "Make __save_corpus weakly private" This reverts commit 342811371b368315786ac8097a90e6612bba9e45. * Revert "Annotate `bleicorpus.py`" This reverts commit 981ebbbbabcf95ae7e2629266bcfb7d9931b7694. * Revert "Make `save_corpus` private" This reverts commit 36d98d11eb464ed74f7e6c22b45adbec7e5618e0. * Revert "Fix typo" This reverts commit b260d4b07114b1c449292cda492a0842b19445ce. * Revert "Revert "Fix relativly obvious sphinx warnings"" This reverts commit b4dddb3ca491a6d18ff470437e05d36adcd0c185. * Revert "Revert "Fix sphinx warnings"" This reverts commit ca3d216844b4818d74cf4be1e9878f006eb957c4. * fix PEP8 * fix last sphinx warnings * add missing submodules to reference * add missing .rst fix new warnings * add [docs] deps for building + remove [wmd] * add doc build to travis * fix PEP8	05 October 2017, 11:37:31 UTC
96d230a	Luiza Orosanu	02 October 2017, 08:58:39 UTC	Fix logger message in lsi_dispatcher (#1603) * Fix logger message typo in lsi_dispatcher * small fix	02 October 2017, 08:58:39 UTC
36a5cb9	ivan	27 September 2017, 08:58:31 UTC	Merge branch 'release-3.0.0' into develop	27 September 2017, 08:58:31 UTC
351bdef	ivan	27 September 2017, 08:57:25 UTC	Merge branch 'release-3.0.0'	27 September 2017, 08:57:25 UTC
af646c4	ivan	27 September 2017, 08:51:30 UTC	update changelog to 3.0.0	27 September 2017, 08:51:30 UTC
aab74b7	ivan	27 September 2017, 08:31:23 UTC	regenerated C files with Cython	27 September 2017, 08:31:23 UTC
c9d1e88	ivan	27 September 2017, 08:30:11 UTC	bump version to 3.0.0	27 September 2017, 08:30:11 UTC
0a2c05d	Xiaohong	26 September 2017, 14:46:23 UTC	Fix typo in translation_matrix notebook (#1598)	26 September 2017, 14:46:23 UTC
33a3ef2	Xiaohong	25 September 2017, 07:18:00 UTC	Fix Translation Matrix (#1594) * fix the comments * remove print function * update the notebook * fix the train method * remove some words for sample * fix the tense * add warning for the translation matrix revist part	25 September 2017, 07:18:00 UTC
09fddf5	Gordon Mohr	20 September 2017, 18:02:21 UTC	correct PathLineSentences comment	20 September 2017, 18:02:21 UTC
6e51156	Chinmaya Pancholi	19 September 2017, 08:17:54 UTC	Add unsupervised FastText to Gensim (#1525) * added initial code for CBOW * updated unit tests for fasttext * corrected use of matrix and precomputed ngrams for vocab words * added EOS token in 'LineSentence' class * added skipgram training code * updated unit tests for fasttext * seeded 'np.random' with 'self.seed' * added test for persistence * updated seeding numpy obj * updated (unclean) fasttext code for review * updated fasttext tutorial notebook * added 'save' and 'load_fasttext_format' functions * updated unit tests for fasttext * cleaned main fasttext code * updated unittests * removed EOS token from LineSentence * fixed flake8 errors * [WIP] added online learning * added tests for online learning * flake8 fixes * refactored code to remove redundancy * reusing 'word_vec' from 'FastTextKeyedVectors' * flake8 fixes * split 'syn0_all' into 'syn0_vocab' and 'syn0_ngrams' * removed 'init_wv' param from Word2Vec * updated unittests * flake8 errors fixed * fixed oov word_vec * updated test_training unittest * Fix broken merge * useless change (need to re-run Appveyour) * Add skipIf for Appveyor x32 (avoid memory error)	19 September 2017, 08:17:54 UTC
5a49a79	Adrian Englhardt	19 September 2017, 05:07:59 UTC	Fix doctag unicode problem. Fix 1543 (#1544) * Fix doctag unicode * Add test for unicode doctags. * Fix doc2vec unicode title test. * Make the unicode tag cast less hidden.	19 September 2017, 05:07:59 UTC
2e58a1c	Roopal Garg	18 September 2017, 15:17:09 UTC	Update WikiCorpus tokenization. Fix #1534 (#1537) * code to better handle tokenization Adding the ability to define: 1. Define min and max token length 2. Define min number of tokens for valid articles 3. Call a custom function to handle tokenization with the configured parameter on the class instance 4. Control if lowercase is desired * adding another test case adding a test case to check "lower" parameter with the custom tokenizer * cleaning up code * clean up code for formatting * cleaning up indentation * missing backtick	18 September 2017, 15:17:09 UTC
02ba343	Federico Barrios	18 September 2017, 10:35:56 UTC	Add verification when summarize_corpus returns null. Fix #1531. (#1570) * Avoid "NoneType is not iterable..." error for few documents in corpus. * Fix comment. * Adding relevant test. * Fixed return types on summarization border cases: - Returns empty list on border case of summarize_corpus. - Returns empty string or empty list on border case of summarize. - Fixed test accordingly. - Removed some test code repetition. * Replace `is` to `==`	18 September 2017, 10:35:56 UTC
4c0737a	Mack	18 September 2017, 08:36:29 UTC	Add word2vec-based coherence (#1530) * #1380: Initial implementation of coherence using word2vec similarity. * #1380: Add the `keyed_vectors` kwarg to the `CoherenceModel` to allow passing in pre-trained, pre-loaded word embeddings, and adjust the similarity measure to handle missing terms in the vocabulary. Add a `with_std` option to all confirmation measures that allows the caller to get the standard deviation between the topic segment sets as well as the means. * #1380: Add tests for `with_std` option for confirmation measures, and add test case to sanity check `word2vec_similarity`. * #1380: Add a `get_topics` method to all topic models, add test coverage for this, and update the `CoherenceModel` to use this for getting topics from models. * #1380: Require topics returned from `get_topics` to be probability distributions for the probabilistic topic models. * #1380: Clean up flake8 warnings. * #1380: Make `topn` a property so setting it to higher values will uncache the accumulator and the topics will be shrunk/expanded accordingly. * #1380: Pass through `with_std` argument for all coherence measures. * #1380: Initial implementation of coherence using word2vec similarity. * #1380: Add the `keyed_vectors` kwarg to the `CoherenceModel` to allow passing in pre-trained, pre-loaded word embeddings, and adjust the similarity measure to handle missing terms in the vocabulary. Add a `with_std` option to all confirmation measures that allows the caller to get the standard deviation between the topic segment sets as well as the means. * #1380: Add tests for `with_std` option for confirmation measures, and add test case to sanity check `word2vec_similarity`. * #1380: Add a `get_topics` method to all topic models, add test coverage for this, and update the `CoherenceModel` to use this for getting topics from models. * #1380: Require topics returned from `get_topics` to be probability distributions for the probabilistic topic models. * #1380: Clean up flake8 warnings. * #1380: Make `topn` a property so setting it to higher values will uncache the accumulator and the topics will be shrunk/expanded accordingly. * #1380: Pass through `with_std` argument for all coherence measures. * Update `test_coherencemodel` to skip Mallet and Vowpal Wabbit tests if the executables are not installed, instead of passing them inappropriately. * Fix trailing whitespace. * Add `get_topics` method to `BaseTopicModel` and update notebook for new Word2Vec-based coherence metric "c_w2v". * Add several helper methods to the `CoherenceModel` for comparing a set of models or top-N lists efficiently. Update the notebook to use the helper methods. Add `TextDirectoryCorpus` import in `corpora.__init__` so it can be imported from package level. Update notebook to use `corpora.TextDirectoryCorpus` instead of redefining it. * fix flake8 whitespace issues * fix order of imports in `corpora.__init__` * fix corpora.__init__ import order * push fix for setting `topn` in `CoherenceModel.for_topics` * Use `dict.pop` in place of checking and optionally getting and deleting topn in `CoherenceModel.for_topics`. * fix non-deterministic test failure in `test_coherencemodel` * Update coherence model selection notebook to use sklearn dataset loader to get 20 newsgroups corpus. Add `with_support` option to the confirmation measures to determine how many words were ignored during calculation. Add `flatten` function to `utils` that recursively flattens an iterable into a list. Improve the robustness of coherence model comparison by using nanmean and mean value imputation when looping over the grid of top-N values to compute coherence for a model. Fix too-long logging statement lines in `text_analysis`.	18 September 2017, 08:36:29 UTC
6b8f1c0	Paul O'Leary McCann	18 September 2017, 08:30:13 UTC	Add comment explaining lack of multistream support (#1515) * Add comment explaining lack of multistream support See #1496, looks like this has confused some people. -POLM * Add file patterns to documentation	18 September 2017, 08:30:13 UTC
e667069	Kimmo Kärkkäinen	14 September 2017, 11:01:37 UTC	Fix incorrect initialization ShardedCorpus with a generator. Fix #1511 (#1512) Fix incorrect initialization ShardedCorpus with a generator. Fix #1511.	14 September 2017, 11:01:37 UTC
1c0098c	Xiaohong	13 September 2017, 12:38:25 UTC	Add TranslationMatrix model (for word2vec and paragraph2vec) (#1434) [MRG] Implement 'Translation Matrix'	13 September 2017, 12:38:25 UTC
224566c	Eric Lind	11 September 2017, 10:01:35 UTC	Improve speed of FastTextKeyedVectors.__contains__ (#1499) * Improve speed of FastTextKeyedVectors __contains__ The current implementation of __contains__ in FastTextKeyedVectors is `O(nm)` where `n` is the number of character ngrams in the query word and `m` is the size of the vocabulary. This is very slow for large corpora. The new implementation is O(n). any() was unnecessary. * Update variable name and docstring to improve clarity	11 September 2017, 10:01:35 UTC
db9e230	Menshikh Ivan	08 September 2017, 19:10:24 UTC	Refactor code with PEP8 and additional limitations. Fix #1521 (#1569) * Replace map(..) to comprehensions * Fix logging (remove '%'/'.format' + longer lines) * style-check[1] * Small fix for bash scripts * style-check[2] (corpora) * flake8 check * Fix shared_corpus API + resolve comment from review * Remove legacy "endclass" from corpora * style-check[3] * style-check[4] * Rename test_base_tm to basetmtest (for preventing direct running with nose) + small changes for models * style-check[5] * Replace LOG -> logger * Return broad exception to dictionary * Replace "dict((" -> dict comprehension * Replace "print(e)" -> "logger.exception(e)" * Fix quotation * Reduce long lines * missed PEP8 * style-check[6]	08 September 2017, 19:10:24 UTC
6d6f5dc	AhnDW	05 September 2017, 14:58:07 UTC	Refactor all python code by PEP8. Partially fix #1521 (#1550) * gensim dir PEP8 fixes * corpora dir PEP8 fixes * example dir PEP8 fixes * model/wrapper dir PEP8 fixes * models dir PEP8 fixes * parsing dir PEP8 fixes * scripts dir PEP8 fixes * similarities dir PEP8 fixes * summarization and topic_coherence dir PEP8 fixes * test dir PEP8 fixes * PEP8 E722 error fixes * PEP8 fixes * list slice whitespace PEP8 fixes * disassemble import * * Fix symlink * fix symlink * fix make_wiki_lemma file * Replace relative import to absolute * fix typo * fix E203 error	05 September 2017, 14:58:07 UTC
13578d4	Menshikh Ivan	05 September 2017, 10:40:34 UTC	Add AppVeyor for all builds (#1565) * init run * rm dup * remove buggy test	05 September 2017, 10:40:34 UTC
32e0257	AhnDW	04 September 2017, 14:16:35 UTC	Fix mutable args in methods definition (#1562) * Change empty list args * Change empty dict args * Fix spaces	04 September 2017, 14:16:35 UTC
ed0b03e	Menshikh Ivan	04 September 2017, 10:03:57 UTC	Add style-checking for notebooks & refactor Travis config (#1522) * add installation script for env * Add run script for test/codestyle * modify travis file * fix misprints * add pytest * Add basic version checking * try to fix FAST_VERSION==-1 * try to fix FAST_VERSION==-1[2] * remove debug info * fix flake8 problems with ignore list * continue with flake (break pep8 in matutils) * fix regexp for grep * restore matutils * echo files for flake8 * Add ipynb checking * special mistakes in .ipynb for testing purposes * Distinct file checking for ipynb * remove mistakes from notebooks	04 September 2017, 10:03:57 UTC
3d2227d	AhnDW	01 September 2017, 15:22:54 UTC	Set trainable flag in get_embedding_layer. Fix 1557 (#1558)	01 September 2017, 15:22:54 UTC
9caf055	Menshikh Ivan	01 September 2017, 13:36:00 UTC	Fix Mallet wrapper and tests for HDPTransform (#1555) * fix type in mallet wrapper * fix tests for sklearn wrapper * debug commit for test * fix seeding and precision * fix pep8 & try to fix unreproducable error * debug unreproduced error * fix test * remove debug output	01 September 2017, 13:36:00 UTC
26b285e	yardos	01 September 2017, 09:28:03 UTC	Add the Google Tag Manager (TGM) (#1556) * Update layout.html - removed old Google Analytics code - added two snippets for Google Tag Manager (GTM), one in head, the other in body * Update layout.html - removed old Google Analytics code (Urchin) - added code for Google Tag Manager - one in head, the other in body * Ignore *.html for flake8	01 September 2017, 09:28:03 UTC
26cd87a	Daniel Zou	31 August 2017, 07:49:51 UTC	Update Doc2vec-IDMB notebook (#1476) * Added introduction, motivation, etc. and cleaned up Doc2vec-IDMB notebook * Fixed a syntax error	31 August 2017, 07:49:51 UTC
ae31c0c	Parul Sethi	30 August 2017, 11:09:51 UTC	Add callback metrics interface for LdaModel and integration with Visdom (#1399) * save log params in a dict * remove redundant line * add diff log * remove diff log * write params to log directory * add convergence, remove alpha * calculate perplexity/diff instead of using log function * add docstrings and comments * add coherence/diff labels in graphs * optional measures for viz * add coherence params to lda init * added Lda Visom viz notebook * add option to specify env * made requested changes * add generic callback API * modified Notebook for new API * fix flake8 * correct lee corpus division * added docstrings * fix flake8 * add shell example * fix queue import for both py2/py3 * store metrics in model instance * add nb example for getting metrics after train * made rquested changes * use dict for saving metrics * use str method for metric classes * correct a notebook description * remove child-classes str method * made requested changes * add visdom screenshot	30 August 2017, 11:09:51 UTC
1a73e4f	Menshikh Ivan	28 August 2017, 12:01:22 UTC	Add Capital One to Adopters page (#1552)	28 August 2017, 12:01:22 UTC
1764e69	Nick Lindberg	25 August 2017, 11:04:39 UTC	Replace viewitems() to iteritems(). Fix 1495 (#1508)	25 August 2017, 11:04:39 UTC
5cefaef	Ilya Vorontsov	25 August 2017, 10:56:50 UTC	Remove extra filter_token from tutorial (#1502)	25 August 2017, 10:56:50 UTC

Newer
Older