My thing
updated
MASS: Motion-Aware Spatial-Temporal Grounding for Physics Reasoning and Comprehension in Vision-Language Models
Paper
•
2511.18373
•
Published
•
6
Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO
Paper
•
2511.13288
•
Published
•
18
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens
Paper
•
2511.19418
•
Published
•
29
SAM 3: Segment Anything with Concepts
Paper
•
2511.16719
•
Published
•
129
Temporal Prompting Matters: Rethinking Referring Video Object
Segmentation
Paper
•
2510.07319
•
Published
•
3
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
Paper
•
2511.16334
•
Published
•
93
O-Mem: Omni Memory System for Personalized, Long Horizon, Self-Evolving Agents
Paper
•
2511.13593
•
Published
•
26
RynnVLA-002: A Unified Vision-Language-Action and World Model
Paper
•
2511.17502
•
Published
•
26
VisMem: Latent Vision Memory Unlocks Potential of Vision-Language Models
Paper
•
2511.11007
•
Published
•
15
Depth Anything 3: Recovering the Visual Space from Any Views
Paper
•
2511.10647
•
Published
•
99
LightRAG: Simple and Fast Retrieval-Augmented Generation
Paper
•
2410.05779
•
Published
•
28
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper
•
2510.14528
•
Published
•
113
TradingAgents: Multi-Agents LLM Financial Trading Framework
Paper
•
2412.20138
•
Published
•
15
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
Paper
•
2410.17799
•
Published
•
9
PhysX-Anything: Simulation-Ready Physical 3D Assets from Single Image
Paper
•
2511.13648
•
Published
•
52
MinerU2.5: A Decoupled Vision-Language Model for Efficient
High-Resolution Document Parsing
Paper
•
2509.22186
•
Published
•
143
Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite
Imagery
Paper
•
2510.15869
•
Published
•
49
GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization
Paper
•
2511.15705
•
Published
•
97
FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable
Reasoning
Paper
•
2510.22543
•
Published
•
14
Agent0-VL: Exploring Self-Evolving Agent for Tool-Integrated Vision-Language Reasoning
Paper
•
2511.19900
•
Published
•
48
From Macro to Micro: Benchmarking Microscopic Spatial Intelligence on Molecules via Vision-Language Models
Paper
•
2512.10867
•
Published
•
16
Full-Duplex-Bench: A Benchmark to Evaluate Full-duplex Spoken Dialogue Models on Turn-taking Capabilities
Paper
•
2503.04721
•
Published
•
2