Shivam Kumar's picture

53 275

Shivam Kumar

shivamkumar

·

AI & ML interests

None yet

Recent Activity

upvoted a paper about 13 hours ago

X-Talk: On the Underestimated Potential of Modular Speech-to-Speech Dialogue System

upvoted a paper about 14 hours ago

Qwen3-TTS Technical Report

liked a model about 14 hours ago

nvidia/personaplex-7b-v1

View all activity

Organizations

upvoted a paper about 13 hours ago

X-Talk: On the Underestimated Potential of Modular Speech-to-Speech Dialogue System

Paper • 2512.18706 • Published Dec 21, 2025 • 1

upvoted a paper about 14 hours ago

Qwen3-TTS Technical Report

Paper • 2601.15621 • Published 3 days ago • 26

upvoted an article about 14 hours ago

Article

Introducing Waypoint-1: Real-time interactive video diffusion from Overworld

+3

5 days ago

•

19

upvoted a collection 1 day ago

Qwen3-TTS

7 items • Updated 2 days ago • 191

upvoted a collection 18 days ago

sam-audio

11 items • Updated Dec 16, 2025 • 118

upvoted a paper 18 days ago

AI Meets Brain: Memory Systems from Cognitive Neuroscience to Autonomous Agents

Paper • 2512.23343 • Published 26 days ago • 28

upvoted a collection 18 days ago

Nemotron Speech

Open, state-of-the-art, production‑ready enterprise speech models from the NVIDIA Speech research team for ASR, TTS, Speaker Diarization and S2S • 17 items • Updated 4 days ago • 29

upvoted 2 papers 2 months ago

Yan: Foundational Interactive Video Generation

Paper • 2508.08601 • Published Aug 12, 2025 • 1

MIDAS: Multimodal Interactive Digital-human Synthesis via Real-time Autoregressive Video Generation

Paper • 2508.19320 • Published Aug 26, 2025 • 29

upvoted 3 collections 2 months ago

VILA: On Pre-training for Visual Language Models

10 items • Updated Sep 13, 2025 • 57

Sana

⚡️Sana: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer • 22 items • Updated 5 days ago • 98

SANA-1.5

SANA-1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer • 6 items • Updated Sep 13, 2025 • 10

upvoted a paper 2 months ago

LongLive: Real-time Interactive Long Video Generation

Paper • 2509.22622 • Published Sep 26, 2025 • 185

upvoted 4 collections 2 months ago

LongAI

Boost AI's Long ability, while keeping Efficient. Models in this collection includes LongVILA, LongVILA-R1, LongLive. • 8 items • Updated Nov 6, 2025 • 2

NVILA (HuggingFace)

HuggingFace Transformers can load us. • 5 items • Updated Sep 13, 2025 • 5

Fast-dLLM

Efficient Diffusion LLM • 4 items • Updated Oct 8, 2025 • 8

SANA-Video

🎬 SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer • 8 items • Updated Dec 9, 2025 • 7

upvoted 2 papers 2 months ago

MotionStream: Real-Time Video Generation with Interactive Motion Controls

Paper • 2511.01266 • Published Nov 3, 2025 • 30

UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions

Paper • 2511.03334 • Published Nov 5, 2025 • 53

upvoted a collection 2 months ago

ChronoEdit

ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation • 8 items • Updated 4 days ago • 13