Lemon Slice-2
December 2025
We present Lemon Slice-2, a novel video diffusion transformer model and inference framework that enables real-time, interactive avatar experiences. Powered by a 20 billion parameter, few-step causal model, it achieves a generation throughput of 20 frames per second on a single GPU. Efficient attention and caching strategies enable ultra-fast response times in an interactive setting and infinite-length videos with zero error accumulation. Lemon Slice-2 supports full-body avatar generation with expressive and semantically aware gestures. It is now available to the public for general use.
breaking the real-time barrier

Lemon Slice-2 generates video frames faster than they can be watched. Strategies we used to break the real-time barrier include causal attention, a novel distribution matching distillation-inspired training paradigm, efficient caching, CUDA graph acceleration, and quantization.
ultra-fast response times

Users of Lemon Slice-2 experience an average response time of 2.8s. Video generation makes up only 26% of that time (730 milliseconds).
any character
videos generated in real-time from a single image and audio sample on one GPU






any style
videos generated in real-time from a single image and audio sample on one GPU







expressive gestures & scene awareness
videos generated in real-time from a single image and audio sample on one GPU







infinite video

As an auto-regressive model, Lemon Slice-2 is not limited to generating videos of a fixed length. Critically, unlike other autoregressive models, it does not experience error accumulation, allowing for infinite-length video generation.
real-time interactions
lemon slice-2 enables real-time interactions with any character. below we show screen recordings of the embeddable widget powered by the model, now available for general use.




