Content - 9c6bb1483eaa11518bd32f5a9c73f2cdfb255fa2 - b98b964/README.md

visit type:

Tip revision: 864e30807f6988731ac3b4b98af6562c18bb42ff authored by Srinivasan Iyer on 27 January 2021, 00:44:34 UTC
Merge pull request #2 from sriniiyer/add-license-1

Tip revision: 864e308

README.md

# Mapping Language to Code in Programmatic Context

### Install requirements

Pytorch 0.3 (minor changes needed for 0.4)
```
pip install antlr4-python3-runtime==4.6
pip install allennlp==0.3.0
pip install ipython
```

### Download data from Google drive
```
mkdir concode
cd concode
```
Download data from: https://drive.google.com/drive/folders/1kC6fe7JgOmEHhVFaXjzOmKeatTJy1I1W into this folder.

### Create production rules. This restricts the data to 100000 train and 2000 valid/test. If your resources can support it,  you can use more.
`python build.py -train_file concode/train_shuffled_with_path_and_id_concode.json -valid_file concode/valid_shuffled_with_path_and_id_concode.json -test_file concode/test_shuffled_with_path_and_id_concode.json  -output_folder data  -train_num 100000 -valid_num 2000`

### Prepare pytorch datasets
```
mkdir data/d_100k_762
python preprocess.py -train data/train.dataset -valid data/valid.dataset -save_data data/d_100k_762/concode -train_max 100000 -valid_max 2000
``` 

### Train seq2seq
`python train.py -dropout 0.5  -data data/d_100k_762/concode -save_model data/d_100k_762/s2s -epochs 30 -learning_rate 0.001 -seed 1123 -enc_layers 2 -dec_layers 2  -batch_size 50 -src_word_vec_size 1024 -tgt_word_vec_size 512 -rnn_size 1024 -encoder_type regular -decoder_type regular -copy_attn `

### Train seq2prod
`python train.py -dropout 0.5  -data data/d_100k_762/concode -save_model data/d_100k_762/s2p -epochs 30 -learning_rate 0.001 -seed 1123 -enc_layers 2 -dec_layers 2  -batch_size 20 -src_word_vec_size 1024 -tgt_word_vec_size 512 -rnn_size 1024 -encoder_type regular -decoder_type prod -brnn -copy_attn `

### Train Concode
`python train.py -dropout 0.5  -data data/d_100k_762/concode -save_model data/d_100k_762/concode/ -epochs 30 -learning_rate 0.001 -seed 1123 -enc_layers 2 -dec_layers 2  -batch_size 20 -src_word_vec_size 512 -tgt_word_vec_size 512 -rnn_size 512 -decoder_rnn_size 1024 -encoder_type concode -decoder_type concode -brnn -copy_attn -twostep -method_names -var_names`

Prediction:

On Dev: 
```
ipython predict.ipy -- -start 5 -end 30 -beam 3 -models_dir  data/d_100k_762/concode/ -test_file data/valid.dataset -tgt_len 500 
```
On Test (Use best epoch from dev): 
```
ipython predict.ipy -- -start 15 -end 15 -beam 3 -models_dir  data/d_100k_762/concode/ -test_file data/test.dataset -tgt_len 500 
```

For other model types, use the appropriate `-models_dir`.

Browse the archive

https://github.com/sriniiyer/concode