Introduction

Inspur Multimodal Intelligence X (iMIX) is an open source multi-modal model building toolbox. This framework is based on the out-of-the-box design concept. It is compatible with rich multi-modal tasks, models and datasets. It is scalable, ease to use and in high performance.

The master branch works based on PyTorch.

Documentation: https://imix-docs.readthedocs.io/en/latest/

License

This project is released under the Apache 2.0 license.

Changelog

v0.1.0 is our first release. iMIX v0.1.0 supports mainstream multi-modal datasets, models and mixed precision training. And it supports distribute training across multiple GPUs and multiple nodes.

iMIX's subsequent version will optimize the framework further. We will add more dual-stream and single-stream pre-training models, add more data process methods such as mask, back translation and unsupervised data enhancement, and support launch multiple jobs for training on a single machine simultaneously.

Benchmark and model zoo

Results and models are available in the model zoo.

All supported models and tasks are shown in the table below.

Supported backbones:

task	LXMERT	UNITER	ViLBERT	DeVLBert	Oscar	VinVL	MCAN	LCGN	HGL	R2C	VisDial-BERT
VQA	√	√	√	√	√	√	√
GQA	√		√		√	√		√
NLVR	√	√			√	√
VQA_large					√
NLVR_large					√	√
GussWhatPointing			√
VisualEntailment		√	√
GussWhat			√
VCR_QAR				√					√	√
VCR_QA				√					√	√
Visual7w			√
RetrivalFlickr30k			√
GenomeQA			√
Retrivalcoco			√
refcocog			√
refcoco			√
refcoco+			√	√
VisDial											√

Installation

Please refer to get_started.md for installation.

Getting Started

Please see quickrun for the basic usage of iMIX and visual interface for inference. We provide basic introduction of iMIX core module engine, full guidance for configuration, and all the results and model. There are also tutorials for finetuning models, adding new dataset, customizing models, customizing runtime settings and useful tools.

Contributing

We appreciate all contributions to improve iMIX. Please refer to CONTRIBUTING.md for the contributing guideline.

Acknowledgement

iMIX is an open source project that is contributed by researchers and engineers from our company Inspur. We appreciate all the contributors who implement their methods or add new features, as well as users who give valuable feedbacks. We wish that the toolbox and benchmark could serve the growing research community by providing a flexible toolkit to reimplement existing methods and develop their own new detectors.

Citation

If you use this toolbox or benchmark in your research, please cite this project.

@misc{fan2021iMIX,
  author =       {Baoyu Fan, Liang Jin, Runze Zhang, Xiaochuan Li, Cong Xu, Hongzhi Shi, Jian Zhao, Yinyin Chao, Yingjie Zhang, Binqiang Wang, Zhenhua Guo, Yaqian Zhao, Rengang Li},
  title =        {iMIX: A multimodal framework for vision and language research},
  howpublished = {\url{https://github.com/inspur-hsslab/iMIX}},
  year =         {2021}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Introduction

License

Changelog

Benchmark and model zoo

Installation

Getting Started

Contributing

Acknowledgement

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Introduction

License

Changelog

Benchmark and model zoo

Installation

Getting Started

Contributing

Acknowledgement

Citation