Daniel Deutsch
Daniel Deutsch
The code to run evaluate and score is duplicated. It appears in both `evaluate.py`, `score.py`, and `metric_command.py`. This should be consolidated into two functions (one for evaluate, one for score)...
As a consequence of #35, the BEwT-E unit tests are disabled on Github. We should debug why it fails on Github and fix it. To repro the failure, remove the...
The original ROUGE script allows for a scoring option: `-f A|B` where `A` means to average over the models and `B` takes the maximum. Similar functionality should be implemented for...