2003303 | Frank Seide | 08 September 2020, 19:38:44 UTC | some updates, Marian-related | 08 September 2020, 19:38:44 UTC |
66dba1e | Frank Seide | 13 March 2018, 22:29:47 UTC | implemented marian.h dot(), and added an error check to affine() | 13 March 2018, 22:29:47 UTC |
32c6fe1 | Frank Seide | 13 March 2018, 22:06:55 UTC | updated marian.h | 13 March 2018, 22:06:55 UTC |
da7bdaa | Frank Seide | 03 March 2018, 21:07:46 UTC | implemented emulation of DeviceId, and fake impl of Expr::debug() | 03 March 2018, 21:07:46 UTC |
81abfa4 | Frank Seide | 03 March 2018, 20:22:24 UTC | updated marian ref | 03 March 2018, 20:22:24 UTC |
b2bbda2 | Frank Seide | 03 March 2018, 20:21:27 UTC | moved setting of insertBOS to model-function creation; now logs CUDA_LAUNCH_BLOCKING | 03 March 2018, 20:21:27 UTC |
82229f4 | Frank Seide | 23 February 2018, 03:54:06 UTC | removed the Reshape from the cost, but it was not the problem (can put it back); now supports Philly heartbeats; bug workaround: avoid calling Value() twice for now, which has some rare non-deterministic failures for unknown reasons; bug workaround: no longer uses "tee" inside Philly, as it seems to cause instabilities | 23 February 2018, 03:54:06 UTC |
c79e863 | Frank Seide | 22 February 2018, 02:03:36 UTC | (commented out a debug message) | 22 February 2018, 02:03:36 UTC |
c674c67 | Frank Seide | 22 February 2018, 02:02:57 UTC | (added a missing newline) | 22 February 2018, 02:02:57 UTC |
9cf6b9e | Frank Seide | 22 February 2018, 00:55:57 UTC | bug fix: GetSubSample() should not skip partials with <numWorkers sequences. Instead, allow empty partials initially, and delete them in the final result (risking slight load-imbalance). This way, #samples is identical across different #workers | 22 February 2018, 00:55:57 UTC |
86264b8 | Frank Seide | 22 February 2018, 00:52:26 UTC | bug fix: NcclComm should identify other GPU cards by their UUID instead of (hostname, deviceId), which fails if >1 container land on the same machine. Thanks to Ke Deng for the outline | 22 February 2018, 00:52:26 UTC |
205a28b | Frank Seide | 21 February 2018, 02:35:02 UTC | AutoBatch bg process now logs a stack trace | 21 February 2018, 02:35:02 UTC |
0318782 | Frank Seide | 20 February 2018, 05:30:58 UTC | MT.cpp now uses GetMemoryUsage() to log mem usage stats | 20 February 2018, 05:30:58 UTC |
cd1162c | Frank Seide | 20 February 2018, 04:45:25 UTC | new method InternalVariable::GetMemoryUsage() | 20 February 2018, 04:45:25 UTC |
4531d28 | Frank Seide | 20 February 2018, 00:44:04 UTC | bug fix: GetSubBatches_CreatePartialMinibatches() should skip tiny final batch even if there are only 2 batches | 20 February 2018, 00:44:04 UTC |
c6ec759 | Frank Seide | 19 February 2018, 23:14:34 UTC | separation of loss and metric now working | 19 February 2018, 23:14:34 UTC |
c59511b | Frank Seide | 19 February 2018, 23:01:52 UTC | fixed a shape | 19 February 2018, 23:01:52 UTC |
333fe46 | Frank Seide | 19 February 2018, 22:55:54 UTC | changed to also track a metric, not working yet; bug fix: GetSubBatches_CreatePartialMinibatches() not reliable for left-overs, just dropping them for now | 19 February 2018, 22:55:54 UTC |
f474f32 | Frank Seide | 19 February 2018, 21:59:36 UTC | bug fix: GetSubBatches_CreatePartialMinibatches() must limit max size separately for each worker, to avoid overflow for outliers | 19 February 2018, 21:59:36 UTC |
7828a63 | Frank Seide | 19 February 2018, 21:20:22 UTC | logging now includes a time stamp | 19 February 2018, 21:20:22 UTC |
a09a655 | Frank Seide | 19 February 2018, 20:53:44 UTC | (minor logging changes) | 19 February 2018, 20:53:44 UTC |
e7584e3 | Frank Seide | 19 February 2018, 20:22:48 UTC | GetSubBatches_CreatePartialMinibatches() now ensures there are no empty partial minibatches for some workers | 19 February 2018, 20:22:48 UTC |
a566512 | Frank Seide | 19 February 2018, 19:17:11 UTC | Merge branch 'fseide/dynamite' of https://github.com/Microsoft/cntk into fseide/dynamite | 19 February 2018, 19:17:11 UTC |
482091c | Frank Seide | 19 February 2018, 19:16:41 UTC | bug fix: Serializer::Write() should check for error from the Google code, which does not throw | 19 February 2018, 19:16:41 UTC |
382f162 | Frank Seide | 19 February 2018, 18:56:22 UTC | Merge branch 'fseide/dynamite' of https://github.com/Microsoft/CNTK into fseide/dynamite | 19 February 2018, 18:56:22 UTC |
8fedcad | Frank Seide | 19 February 2018, 18:56:19 UTC | LR and MB size adjustments now done at right place | 19 February 2018, 18:56:19 UTC |
09bfeec | Frank Seide | 19 February 2018, 18:54:36 UTC | GetSubBatches() now splits data over workers by itself, for better balancing after bucketing | 19 February 2018, 18:54:36 UTC |
196ebb9 | Frank Seide | 19 February 2018, 18:42:24 UTC | disabled detailed stats in AutoBatch, as it does not help presently | 19 February 2018, 18:42:24 UTC |
482d419 | Frank Seide | 18 February 2018, 18:48:44 UTC | MT.cpp now sets Marian tied-embedding options based on corpus (word-based cannot tie src and tgt); MB size scaling now clipped to 300k tokens; tee command now uses option to fail on error on Linux; now sets the GPU after redirecting the log file, so we can catch Philly errors | 18 February 2018, 18:48:44 UTC |
5561573 | Frank Seide | 18 February 2018, 18:45:59 UTC | bug fix: NewDense() should not access iter after erasing it | 18 February 2018, 18:45:59 UTC |
2baa0de | Frank Seide | 17 February 2018, 20:55:09 UTC | Merge branch 'fseide/dynamite' of https://github.com/Microsoft/CNTK into fseide/dynamite | 17 February 2018, 20:55:09 UTC |
20a3631 | Frank Seide | 17 February 2018, 20:54:55 UTC | updated submodule ref | 17 February 2018, 20:54:55 UTC |
1554889 | Frank Seide | 17 February 2018, 20:50:36 UTC | AutoBatch allocator now uses 2 GB chunks instead of 1 | 17 February 2018, 20:50:36 UTC |
529df81 | Frank Seide | 17 February 2018, 20:49:01 UTC | updated Marian submodule ref | 17 February 2018, 20:49:01 UTC |
04cd6d8 | Frank Seide | 17 February 2018, 00:16:50 UTC | added a test whether learner->TotalNumberOfSamplesSeen() can replace our own sample counting | 17 February 2018, 00:16:50 UTC |
e20aac1 | Frank Seide | 16 February 2018, 16:01:11 UTC | adjusted to last master update | 16 February 2018, 16:01:11 UTC |
d412b2e | Frank Seide | 16 February 2018, 15:52:47 UTC | Merge branch 'fseide/dynamite' of https://github.com/Microsoft/cntk into fseide/dynamite | 16 February 2018, 15:52:47 UTC |
02e602d | Frank Seide | 16 February 2018, 15:52:38 UTC | towards using learner->TotalNumberOfSamplesSeen() | 16 February 2018, 15:52:38 UTC |
19b0b96 | Frank Seide | 15 February 2018, 19:16:48 UTC | updated marian-dev submodule reference | 15 February 2018, 19:16:48 UTC |
aadb509 | Frank Seide | 15 February 2018, 19:16:15 UTC | now creates the symlink as a relative link, so that it will work when viewed via /hdfs | 15 February 2018, 19:16:15 UTC |
da54965 | Frank Seide | 15 February 2018, 06:24:10 UTC | removed a log message | 15 February 2018, 06:24:10 UTC |
868da71 | Frank Seide | 15 February 2018, 06:04:22 UTC | Merge branch 'fseide/dynamite' of https://github.com/Microsoft/CNTK into fseide/dynamite | 15 February 2018, 06:04:22 UTC |
461d0ed | Frank Seide | 15 February 2018, 06:03:57 UTC | bug fix: --numBits 32 should not use the quantized distributed learner | 15 February 2018, 06:03:57 UTC |
845d59c | Frank Seide | 15 February 2018, 06:03:17 UTC | added missing attOps() | 15 February 2018, 06:03:17 UTC |
66c53ed | Frank Seide | 15 February 2018, 05:44:07 UTC | added dummy for missing op attOps() | 15 February 2018, 05:44:07 UTC |
0bfe022 | Frank Seide | 15 February 2018, 03:23:08 UTC | Merge branch 'fseide/dynamite' of https://github.com/Microsoft/CNTK into fseide/dynamite | 15 February 2018, 03:23:08 UTC |
f60d4fa | Frank Seide | 15 February 2018, 03:22:20 UTC | new methods NDArrayView::get() and set() to emulate Marian's corresponding object; | 15 February 2018, 03:22:20 UTC |
c3bc51a | Frank Seide | 15 February 2018, 01:59:29 UTC | updated submodule ref | 15 February 2018, 01:59:29 UTC |
a1eff23 | Frank Seide | 15 February 2018, 01:58:03 UTC | Marian s2s model now compiles (but RNN ops not implemented yet) | 15 February 2018, 01:58:03 UTC |
c21893c | Frank Seide | 15 February 2018, 01:31:58 UTC | reenabled building for Kepler (needed in Philly); added marian-dev submodule to Makefile; added fake Config::get(); deleted left-over copies of a few Marian sourfes | 15 February 2018, 01:31:58 UTC |
6defd83 | Frank Seide | 15 February 2018, 01:12:32 UTC | updated to newer marian-dev submodule | 15 February 2018, 01:12:32 UTC |
d9c6440 | Frank Seide | 15 February 2018, 01:10:59 UTC | abstracted the result of SubBatch::Indices(), so that rows() can accept both Marian and CNTK format | 15 February 2018, 01:10:59 UTC |
1b85734 | Frank Seide | 14 February 2018, 21:22:26 UTC | adjusted to CNTK-compat changes in marian-dev | 14 February 2018, 21:22:26 UTC |
8c7c88b | Frank Seide | 14 February 2018, 03:16:40 UTC | renamed #define MOCKUP to CNTK_BACKEND | 14 February 2018, 03:16:40 UTC |
12f1a93 | Frank Seide | 14 February 2018, 02:56:53 UTC | now compiling transform from marian-dev directly | 14 February 2018, 02:56:53 UTC |
444eb5f | Frank Seide | 14 February 2018, 01:46:13 UTC | updated submodule version | 14 February 2018, 01:46:13 UTC |
2121d95 | Frank Seide | 14 February 2018, 01:44:41 UTC | redirected headers to marian-dev submodule | 14 February 2018, 01:44:41 UTC |
1f899de | Frank Seide | 14 February 2018, 01:44:22 UTC | added marian-dev as a submodule; redirected some marian headers to the marian-dev submodule | 14 February 2018, 01:44:22 UTC |
8068a8f | Frank Seide | 14 February 2018, 01:13:16 UTC | added marian-dev as a submodule to Dynamite | 14 February 2018, 01:13:16 UTC |
5692cb5 | Frank Seide | 14 February 2018, 00:08:33 UTC | clean-up after last commit | 14 February 2018, 00:08:33 UTC |
40f30bf | Frank Seide | 14 February 2018, 00:02:53 UTC | gradient memory reuse seems to work now; created a new condition that m_gradients only ever gets set in a physical location | 14 February 2018, 00:02:53 UTC |
96ee5c0 | Frank Seide | 13 February 2018, 08:58:44 UTC | refactored backprop of gradients w.r.t. releasing them as early as possible to save GPU RAM. New function ConsumeOutputGradient() | 13 February 2018, 08:58:44 UTC |
90d09d9 | Frank Seide | 13 February 2018, 03:42:30 UTC | added option to scale the MB size with 1/LR decay | 13 February 2018, 03:42:30 UTC |
6a3032c | Frank Seide | 13 February 2018, 03:34:25 UTC | made GetSubBatches() robust to edge case of minibatchSize > maxibatchSize | 13 February 2018, 03:34:25 UTC |
156fc8d | Frank Seide | 13 February 2018, 01:15:54 UTC | bug fix: json format missed commas; removed mt/experiments from path (bake it into the environment vars instead) | 13 February 2018, 01:15:54 UTC |
6150e0e | Frank Seide | 13 February 2018, 01:12:55 UTC | MT.cpp now maintains a checkpointer counter of how many labels have been seen by the model; bug fix: checkpointing should use the consistent #labels counter across workers, to avoid hangs; updated the progress.json, maybe it gets recognized now... | 13 February 2018, 01:12:55 UTC |
468425c | Frank Seide | 12 February 2018, 23:33:09 UTC | reduced maxibatchSize, since 30M was too large for CPU RAM (?); now writing Philly json file | 12 February 2018, 23:33:09 UTC |
87c1cfe | Frank Seide | 12 February 2018, 21:51:10 UTC | now specifying bucketing in absolute samples rather than bucketingFactor | 12 February 2018, 21:51:10 UTC |
97b1a7e | Frank Seide | 12 February 2018, 21:39:56 UTC | changed the way bucketing is counted, in prep for non-uniform #buckets | 12 February 2018, 21:39:56 UTC |
41529bb | Frank Seide | 12 February 2018, 19:58:18 UTC | cleaned up GetSubBatches() from last fix | 12 February 2018, 19:58:18 UTC |
a3ed4be | Frank Seide | 12 February 2018, 19:57:35 UTC | MT.cpp now logs the hostname; also minor other logging improvements | 12 February 2018, 19:57:35 UTC |
1f9c1d8 | Frank Seide | 12 February 2018, 19:56:44 UTC | (comment) | 12 February 2018, 19:56:44 UTC |
7eb2101 | Frank Seide | 12 February 2018, 19:53:46 UTC | GetSubBatches() no longer tries to read in small pieces, now that the underlying issue is fixed | 12 February 2018, 19:53:46 UTC |
bd50768 | Frank Seide | 12 February 2018, 19:53:07 UTC | cherry-picked fix 6556e1, for NcllComm initialization | 12 February 2018, 19:53:07 UTC |
84391fe | Frank Seide | 12 February 2018, 19:52:23 UTC | bug fix: SaveCheckpoint() should not create two communicators | 12 February 2018, 19:52:23 UTC |
e8f989c | Frank Seide | 12 February 2018, 17:19:57 UTC | (spelling error in a message) | 12 February 2018, 17:19:57 UTC |
238d0d9 | Frank Seide | 12 February 2018, 16:42:23 UTC | yet more missing template instantiations | 12 February 2018, 16:42:23 UTC |
5aa28a9 | Frank Seide | 11 February 2018, 07:34:03 UTC | made MSVC happy; undid the fake implementation of AssignValuesOf<int>(), instead resolved all needed templates | 11 February 2018, 07:34:03 UTC |
e7a3733 | Frank Seide | 11 February 2018, 06:14:45 UTC | implemented an int version of Unpack()'s underlying functionality, avoiding float overflows for large minibatches. Not working yet, need to move to Windows to debug. Involved changes to template definitions throughout the matrix stack, but no real code change | 11 February 2018, 06:14:45 UTC |
b5a8adc | Frank Seide | 10 February 2018, 03:31:03 UTC | Merge branch 'fseide/dynamite' of https://github.com/Microsoft/cntk into fseide/dynamite | 10 February 2018, 03:31:03 UTC |
b7663f9 | Frank Seide | 10 February 2018, 03:30:20 UTC | implemented learning-rate decay; exposed learning-rate warm-up as a paraeter | 10 February 2018, 03:30:20 UTC |
8a6d806 | Frank Seide | 10 February 2018, 03:10:12 UTC | minor fix in tag and logging | 10 February 2018, 03:10:12 UTC |
9307403 | Frank Seide | 10 February 2018, 02:54:45 UTC | changed log output to show position relative to epoch; position tag changed to percentage | 10 February 2018, 02:54:45 UTC |
b222d64 | Frank Seide | 10 February 2018, 02:51:23 UTC | GetSubBatches() now breaks loads into small pieces to avoid the float32 overflow for very large batches; removed L=0 from partial log | 10 February 2018, 02:51:23 UTC |
3296d0b | Frank Seide | 10 February 2018, 02:15:10 UTC | bug fix: overflow check for very large minibatches did not typecast correctly. | 10 February 2018, 02:15:10 UTC |
8326dfa | Frank Seide | 10 February 2018, 01:49:02 UTC | bug fix: relPosition should be relative to epochSize | 10 February 2018, 01:49:02 UTC |
ffc34c2 | Frank Seide | 10 February 2018, 01:20:45 UTC | new option --fromLatest, to allow automatic restart from checkpoint in Philly retries | 10 February 2018, 01:20:45 UTC |
489aa6f | Frank Seide | 10 February 2018, 01:07:45 UTC | checkpointing now saves a latest tag | 10 February 2018, 01:07:45 UTC |
6537f78 | Frank Seide | 10 February 2018, 00:50:52 UTC | changed checkpointing to fractions of epochs | 10 February 2018, 00:50:52 UTC |
dc6bfaa | Frank Seide | 09 February 2018, 23:42:06 UTC | added code to reader API to report current sample position; towards an epochSize-based scheduling, checkpointing, etc. (removed a debug-leftover sleep()) | 09 February 2018, 23:42:06 UTC |
6ed5349 | Frank Seide | 09 February 2018, 22:07:59 UTC | added a comment | 09 February 2018, 22:07:59 UTC |
c78abf7 | Frank Seide | 09 February 2018, 21:48:35 UTC | Merge branch 'fseide/dynamite' of https://github.com/Microsoft/cntk into fseide/dynamite | 09 February 2018, 21:48:35 UTC |
d681d42 | Frank Seide | 09 February 2018, 21:46:41 UTC | added an overflow check to Unpack() for the cast of time index into a float32 (which can overflow for large minibatches) | 09 February 2018, 21:46:41 UTC |
70c5eb6 | Frank Seide | 09 February 2018, 21:44:01 UTC | changed pathnames to have and interpolate with environment variables, for running on Philly | 09 February 2018, 21:44:01 UTC |
442c033 | Frank Seide | 09 February 2018, 21:42:44 UTC | (comment) | 09 February 2018, 21:42:44 UTC |
c0383c4 | Frank Seide | 09 February 2018, 21:42:28 UTC | uncommented some checks | 09 February 2018, 21:42:28 UTC |
bfec707 | Frank Seide | 09 February 2018, 21:40:23 UTC | cleaned up checking code in AutoBatch | 09 February 2018, 21:40:23 UTC |
c609923 | Frank Seide | 08 February 2018, 17:58:25 UTC | bug fix: Makefile lib order should pick up explicit boost lib before generic system path | 08 February 2018, 17:58:25 UTC |
adfc1ea | Frank Seide | 07 February 2018, 05:32:15 UTC | increased bucketingFactor and loss-reporting smoothing time constant; turned dumping model parameters into a new command dump_model (and for that, split out the Marian calls into separate functions) | 07 February 2018, 05:32:15 UTC |
f03a669 | Frank Seide | 03 February 2018, 07:20:34 UTC | (added a little Python script to convert Dynamite parameter dumps to a Marian model readable by marian-decoder) | 03 February 2018, 07:20:34 UTC |