Qwen/Qwen2.5-VL-7B-Instruct
Image-Text-to-Text
•
Updated
•
3.23M
•
•
1.45k
Vision-language and speech models for multimodal IO and perception tasks. Reference set for captioning, OCR-ish flows, ASR, and VLM reasoning.