https://github.com/facebookresearch/pythia
Revision 96ecb1128bf1786a8bba9038a7bdbbf407a1648c authored by Amanpreet Singh on 29 April 2021, 08:16:00 UTC, committed by Amanpreet Singh on 30 April 2021, 18:20:57 UTC
Summary:
This PR adds support for audio and video modality encoders to MMF. These
can be used in conjunction with MMFTransformer. An example config has
been added to showcase the usage.

Pull Request resolved: https://github.com/facebookresearch/mmf/pull/879

Test Plan: Unit tests have been added.

Reviewed By: ytsheng

Differential Revision: D27804875

Pulled By: apsdehal

fbshipit-source-id: 9f276dab2dc711fb8e5868a029f73c16083c1782
1 parent 231fb16
Raw File
Tip revision: 96ecb1128bf1786a8bba9038a7bdbbf407a1648c authored by Amanpreet Singh on 29 April 2021, 08:16:00 UTC
[feat] Adds audio (resnet18) and video (r2plus1d18) encoders (#879)
Tip revision: 96ecb11
README.md

<div align="center">
<img src="https://mmf.sh/img/logo.svg" width="50%"/>
</div>

#

<div align="center">
  <a href="https://mmf.sh/docs">
  <img alt="Documentation Status" src="https://readthedocs.org/projects/mmf/badge/?version=latest"/>
  </a>
  <a href="https://circleci.com/gh/facebookresearch/mmf">
  <img alt="CircleCI" src="https://circleci.com/gh/facebookresearch/mmf.svg?style=svg"/>
  </a>
</div>

---

MMF is a modular framework for vision and language multimodal research from Facebook AI Research. MMF contains reference implementations of state-of-the-art vision and language models and has powered multiple research projects at Facebook AI Research. See full list of project inside or built on MMF [here](https://mmf.sh/docs/notes/projects).

MMF is powered by PyTorch, allows distributed training and is un-opinionated, scalable and fast. Use MMF to **_bootstrap_** for your next vision and language multimodal research project by following the [installation instructions](https://mmf.sh/docs/getting_started/installation). Take a look at list of MMF features [here](https://mmf.sh/docs/getting_started/features).

MMF also acts as **starter codebase** for challenges around vision and
language datasets (The Hateful Memes, TextVQA, TextCaps and VQA challenges). MMF was formerly known as Pythia. The next video shows an overview of how datasets and models work inside MMF. Checkout MMF's [video overview](https://mmf.sh/docs/getting_started/video_overview).


## Installation

Follow installation instructions in the [documentation](https://mmf.sh/docs/getting_started/installation).

## Documentation

Learn more about MMF [here](https://mmf.sh/docs).

## Citation

If you use MMF in your work or use any models published in MMF, please cite:

```bibtex
@misc{singh2020mmf,
  author =       {Singh, Amanpreet and Goswami, Vedanuj and Natarajan, Vivek and Jiang, Yu and Chen, Xinlei and Shah, Meet and
                 Rohrbach, Marcus and Batra, Dhruv and Parikh, Devi},
  title =        {MMF: A multimodal framework for vision and language research},
  howpublished = {\url{https://github.com/facebookresearch/mmf}},
  year =         {2020}
}
```

## License

MMF is licensed under BSD license available in [LICENSE](LICENSE) file
back to top