Prismer
Visit ToolPrismer is an open-source vision-language model implementation with multi-task experts, providing code for Prismer and PrismerZ models. It supports various vision-language tasks like image captioning and VQA.
At a glance
Trending
Prismer is an open-source vision-language model implementation with multi-task experts, providing code for Prismer and PrismerZ models. It supports various vision-language tasks like image captioning and VQA.
Trending
About
Prismer is an open-source project that provides the implementation of "Prismer: A Vision-Language Model with Multi-Task Experts" and "PrismerZ" models. It is built on PyTorch 1.13 and integrates with Huggingface accelerate toolkit for optimized multi-node multi-GPU training. The repository includes code for pre-training, fine-tuning, and evaluating models on tasks such as image captioning (COCO, NoCaps) and Visual Question Answering (VQAv2). Users can generate modality expert labels, download pre-trained checkpoints, and run minimal examples for image captioning. The project emphasizes multi-task learning and offers both base and large model variants with competitive performance metrics.
Capabilities
Pricing & Plans
Open Source
Free
FAQs
Trending