Revision 63451bcc1940892cf2bc4ff59e284185eec2ec86 authored by Ryan Cotterell on 13 August 2018, 13:29:52 UTC, committed by Ryan Cotterell on 13 August 2018, 13:29:52 UTC
2 parent s 1c15228 + 40196fe
Raw File
README
The official evaluation scripts live in this directory. We have provided sample output from the baseline model on 
the development data in sample-output/. You may run the baseline as shown in the examples below.

Task 2 Evaluation:

In task2, two accuracy figures are given. The "original form accuracy" denotes accuracy w.r.t. to the original word 
form in the corpora which were used to create the development and test sets. The "plausible form accuracy", in turn, 
denotes precision w.r.t. a set of contextually plausible word forms which have been manually annotated into the final 
gold standard test sets. For development data, both figures are identical because no additional word forms have been 
supplied. 
 
python eval_task2.py --guess sample_output/de-1-medium-out --gold ../devsets/de-uncovered 

original form acccuracy:       51.05
plausible form acccuracy:      51.05
levenshtein:   1.07
back to top