High-performance zero-copy tensor serialization for Inference

Khushiyant · February 10, 2026, 8:52pm

We’re too comfortable with serialization that treats high-end silicon like a text parser. Tenso eliminates the invisible tax where formats like SafeTensors and Pickle burn 40% of your CPU just to move data.

The update introduces a Direct Pinned Memory Reader. It allocates page-locked memory to trigger async DMA transfers directly to VRAM for PyTorch and JAX, bypassing the copy overhead and keeping CPU usage at a minimal 0.8%.

I’ve also hardened the protocol with strict validation guards and optional XXH3 checksums. Bluntly, enabling checksums kills the zero-copy speed, but safety is now a configurable trade-off. With native async support for FastAPI and gRPC, Tenso is finally a transport layer that respects the hardware.

Topic		Replies	Views
Optimising performance non-standard systems 🤗Transformers	2	811	February 16, 2022
Cannot pin 'torch.cuda.LongTensor' only dense CPU tensors can be pinned 🤗Transformers	1	1221	September 26, 2024
How to solve bottleneck of transferring data from cpu to gpu Beginners	6	72	November 5, 2025
Overhead caused by moving eos_token_id to gpu mem 🤗Transformers	14	518	February 7, 2024
Deepspeed ZeRO Inference DeepSpeed	1	2813	November 24, 2021

High-performance zero-copy tensor serialization for Inference

Related topics