VIOLA: Towards Video In-Context Learning with Minimal Annotations Paper • 2601.15549 • Published 1 day ago • 3
VIOLA: Towards Video In-Context Learning with Minimal Annotations Paper • 2601.15549 • Published 1 day ago • 3
Masking Teacher and Reinforcing Student for Distilling Vision-Language Models Paper • 2512.22238 • Published Dec 23, 2025 • 27
Zoom-Zero: Reinforced Coarse-to-Fine Video Understanding via Temporal Zoom-in Paper • 2512.14273 • Published Dec 16, 2025 • 8
Zoom-Zero: Reinforced Coarse-to-Fine Video Understanding via Temporal Zoom-in Paper • 2512.14273 • Published Dec 16, 2025 • 8
Unified Reinforcement and Imitation Learning for Vision-Language Models Paper • 2510.19307 • Published Oct 22, 2025 • 31
GenRecal: Generation after Recalibration from Large to Small Vision-Language Models Paper • 2506.15681 • Published Jun 18, 2025 • 40
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks Paper • 2501.08326 • Published Jan 14, 2025 • 33
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks Paper • 2501.08326 • Published Jan 14, 2025 • 33
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models Paper • 2412.01822 • Published Dec 2, 2024 • 15
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models Paper • 2412.01822 • Published Dec 2, 2024 • 15