MobileVLM
Visit ToolMobileVLM is an open-source vision language model designed for mobile devices. It offers a strong and fast baseline for vision language tasks, enabling on-device AI processing.
At a glance
Trending
Also listed in
MobileVLM is an open-source vision language model designed for mobile devices. It offers a strong and fast baseline for vision language tasks, enabling on-device AI processing.
Trending
Also listed in
About
MobileVLM is a competent multimodal vision language model (MMVLM) specifically engineered to run efficiently on mobile devices. It integrates a novel architectural design, an improved training scheme tailored for mobile VLMs, and high-quality dataset curation to achieve superior performance. The tool comprises language models at 1.4B and 2.7B parameters, trained from scratch, and a multimodal vision model pre-trained in the CLIP fashion. MobileVLM V2, an enhanced version, demonstrates performance comparable to or exceeding much larger VLMs at the 3B and 7B+ scales, while maintaining state-of-the-art inference speeds on mobile hardware like Qualcomm Snapdragon 888 CPU and NVIDIA Jeston Orin GPU. It is an open-source project, providing training and inference code, along with publicly available weights on HuggingFace.
Capabilities
Pricing & Plans
Open Source
Free
FAQs
Trending