https://github.com/Microsoft/CNTK
Raw File
Tip revision: 9894f36d8f10390823f8fb5043bdc1780abf12c1 authored by Rui Zhao (SPEECH) on 31 July 2017, 21:41:00 UTC
fix tab issue
Tip revision: 9894f36
README.md
# CNTK example: Text 

## License

CNTK distribution contains a subset of the data of The Penn Treebank Project (https://www.cis.upenn.edu/~treebank/):

Marcus, Mitchell, Beatrice Santorini, and Mary Ann Marcinkiewicz. Treebank-2 LDC95T7. Web Download. Philadelphia: Linguistic Data Consortium, 1995.

See License.md in the root level folder of the CNTK repository for full license information.

## Overview

|Data      |The Penn Treebank Project (https://www.cis.upenn.edu/~treebank/) annotates naturally-occurring text for linguistic structure .
|:---------|:---|
|Purpose   |Showcase how to train a recurrent network for text data.
|Network   |SimpleNetworkBuilder for recurrent network with two hidden layers.
|Training  |Stochastic gradient descent with adjusted learning rate.
|Comments  |The provided configuration file performs class based RNN training.

## Running the example

### Getting the data

The data for this example is already contained in the folder PennTreebank/Data/.

### Setup

Compile the sources to generate the cntk executable (not required if you downloaded the binaries).

__Windows:__ Add the folder of the cntk executable to your path 
(e.g. `set PATH=%PATH%;c:/src/cntk/x64/Debug/;`) 
or prefix the call to the cntk executable with the corresponding folder. 

__Linux:__ Add the folder of the cntk executable to your path 
(e.g. `export PATH=$PATH:$HOME/src/cntk/build/debug/bin/`) 
or prefix the call to the cntk executable with the corresponding folder. 

### Run

Run the example from the Text/Data folder using:

`cntk configFile=../Config/rnn.cntk`

or run from any folder and specify the Data folder as the `currentDirectory`, 
e.g. running from the Text folder using:

`cntk configFile=Config/rnn.cntk currentDirectory=Data`

The output folder will be created inside Text/.

## Details

### Config files

The config files define a `RootDir` variable and several other variables for directories. 
The `ConfigDir` and `ModelDir` variables define the folders for additional config files and for model files. 
These variables will be overwritten when running on the Philly cluster. 
__It is therefore recommended to generally use `ConfigDir` and `ModelDir` in all config files.__ 
To run on CPU set `deviceId = -1`, to run on GPU set deviceId to "auto" or a specific value >= 0.

The configuration contains three commands. 
The first writes the word and class information as three separate files into the data directory.
The training command uses the SimpleNetworkBuilder to build a recurrent network 
using `rnnType = CLASSLSTM` and the LMSequenceReader.
The test command evaluates the trained network against the specified `testFile`.

The trained models for each epoch are stored in the output models folder. 

### Additional files

The 'AdditionalFiles' folder contains perplexity and expected results files for comparison.
back to top