Visual Language Models on NVIDIA Hardware with VILA

Originally published at: Visual Language Models on NVIDIA Hardware with VILA | NVIDIA Technical Blog

Visual language models have evolved significantly recently. However, the existing technology typically only supports one single image. They cannot reason among multiple images, support in context learning or understand videos. Also, they don’t optimize for inference speed.  We developed VILA, a visual language model with a holistic pretraining, instruction tuning, and deployment pipeline that helps…

VILA1.5 comes with 4 different sizes that can run on a wide range of NVidia hardwares. Each size also comes with a quantized version. Choose the right size that fits your application and let us know how it works!