Revision 163c202a425ced43aa09789bccf387144cd961be authored by Ronghang Hu on 02 March 2021, 17:52:45 UTC, committed by Facebook GitHub Bot on 02 March 2021, 17:55:11 UTC
Summary:
This PR is built upon https://github.com/facebookresearch/mmf/pull/736

- change the current evaluation loop (that runs on the union of all
datasets) to per-dataset evaluation.
- generate prediction JSON object during evaluation loop, which can be
consumed by metrics that evaluate the entire dataset (e.g. mAP for
object detection or CIDEr for image captioning) and cannot be expressed
as averaging over per-batch metrics
- run metrics on a subset of datasets with (optional) `datasets` config
- allow specifying per-dataset sampling ratio for multi-task training

Pull Request resolved: https://github.com/facebookresearch/mmf/pull/739

Test Plan:
Tested locally and verified the outputs

 ---
Example on specifying per-dataset metric with `datasets` (here `vqa_accuracy` will only run on vqa2 while `detection_mean_ap` will only run on detection_coco and detection_visual_genome). If `datasets` is not specified, the default behavior is to run on all dataset:
```
evaluation:
  metrics:
  - type: vqa_accuracy
    datasets:
    - vqa2
  - type: detection_mean_ap
    datasets:
    - detection_coco
    - detection_visual_genome
 ```

 ---
 Examples specifying per-dataset sampling ratio during training through `multitasking.enabled` and `multitasking.sampling_ratios`:
 ```
 multitasking:
  enabled: true
  sampling_ratios:
    detection_coco: 0.2
    detection_visual_genome: 0.07
    visual_entailment: 0.12
    vqa2: 0.26
    glue_qnli: 0.1
    glue_mnli_mismatched: 0.1
    glue_qqp: 0.1
    glue_sst2: 0.05
 ```

Reviewed By: apsdehal

Differential Revision: D26709926

Pulled By: ronghanghu

fbshipit-source-id: c6c0ebcdda5750890ffd4f93bcc909d9a39e257e
1 parent 5246609
History
File Mode Size
__init__.py -rw-r--r-- 51 bytes
hm_convert.py -rw-r--r-- 6.6 KB
predict.py -rw-r--r-- 343 bytes
run.py -rw-r--r-- 4.6 KB

back to top