X-Talk: On the Underestimated Potential of Modular Speech-to-Speech Dialogue System Paper • 2512.18706 • Published Dec 21, 2025 • 1
view article Article Introducing Waypoint-1: Real-time interactive video diffusion from Overworld +3 5 days ago • 19
AI Meets Brain: Memory Systems from Cognitive Neuroscience to Autonomous Agents Paper • 2512.23343 • Published 26 days ago • 28
Nemotron Speech Collection Open, state-of-the-art, production‑ready enterprise speech models from the NVIDIA Speech research team for ASR, TTS, Speaker Diarization and S2S • 17 items • Updated 4 days ago • 29
MIDAS: Multimodal Interactive Digital-human Synthesis via Real-time Autoregressive Video Generation Paper • 2508.19320 • Published Aug 26, 2025 • 29
Sana Collection ⚡️Sana: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer • 22 items • Updated 5 days ago • 98
SANA-1.5 Collection SANA-1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer • 6 items • Updated Sep 13, 2025 • 10
LongLive: Real-time Interactive Long Video Generation Paper • 2509.22622 • Published Sep 26, 2025 • 185
LongAI Collection Boost AI's Long ability, while keeping Efficient. Models in this collection includes LongVILA, LongVILA-R1, LongLive. • 8 items • Updated Nov 6, 2025 • 2
NVILA (HuggingFace) Collection HuggingFace Transformers can load us. • 5 items • Updated Sep 13, 2025 • 5
SANA-Video Collection 🎬 SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer • 8 items • Updated Dec 9, 2025 • 7
MotionStream: Real-Time Video Generation with Interactive Motion Controls Paper • 2511.01266 • Published Nov 3, 2025 • 30
UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions Paper • 2511.03334 • Published Nov 5, 2025 • 53
ChronoEdit Collection ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation • 8 items • Updated 4 days ago • 13