Revision e1467a79dc6580ae009d827b5e6f274faff3b339 authored by liqunfu on 27 March 2020, 21:42:04 UTC, committed by GitHub on 27 March 2020, 21:42:04 UTC
support Pooling ops with Sequence axis
README.md
# CNTK example: Text
## License
CNTK distribution contains a subset of the data of The Penn Treebank Project (https://www.cis.upenn.edu/~treebank/):
Marcus, Mitchell, Beatrice Santorini, and Mary Ann Marcinkiewicz. Treebank-2 LDC95T7. Web Download. Philadelphia: Linguistic Data Consortium, 1995.
See License.md in the root level folder of the CNTK repository for full license information.
## Overview
|Data |The Penn Treebank Project (https://www.cis.upenn.edu/~treebank/) annotates naturally-occurring text for linguistic structure .
|:---------|:---|
|Purpose |Showcase how to train a recurrent network for text data.
|Network |SimpleNetworkBuilder for recurrent network with two hidden layers.
|Training |Stochastic gradient descent with adjusted learning rate.
|Comments |The provided configuration file performs class based RNN training.
## Running the example
### Getting the data
The data for this example is already contained in the folder PennTreebank/Data/.
### Setup
Compile the sources to generate the cntk executable (not required if you downloaded the binaries).
__Windows:__ Add the folder of the cntk executable to your path
(e.g. `set PATH=%PATH%;c:/src/cntk/x64/Debug/;`)
or prefix the call to the cntk executable with the corresponding folder.
__Linux:__ Add the folder of the cntk executable to your path
(e.g. `export PATH=$PATH:$HOME/src/cntk/build/debug/bin/`)
or prefix the call to the cntk executable with the corresponding folder.
### Run
Run the example from the Text/Data folder using:
`cntk configFile=../Config/rnn.cntk`
or run from any folder and specify the Data folder as the `currentDirectory`,
e.g. running from the Text folder using:
`cntk configFile=Config/rnn.cntk currentDirectory=Data`
The output folder will be created inside Text/.
## Details
### Config files
The config files define a `RootDir` variable and several other variables for directories.
The `ConfigDir` and `ModelDir` variables define the folders for additional config files and for model files.
These variables will be overwritten when running on the Philly cluster.
__It is therefore recommended to generally use `ConfigDir` and `ModelDir` in all config files.__
To run on CPU set `deviceId = -1`, to run on GPU set deviceId to "auto" or a specific value >= 0.
The configuration contains three commands.
The first writes the word and class information as three separate files into the data directory.
The training command uses the SimpleNetworkBuilder to build a recurrent network
using `rnnType = CLASSLSTM` and the LMSequenceReader.
The test command evaluates the trained network against the specified `testFile`.
The trained models for each epoch are stored in the output models folder.
### Additional files
The 'AdditionalFiles' folder contains perplexity and expected results files for comparison.
Computing file changes ...