Introducing Lemon Slice-2
December 2025
Supporting our release of Lemon Slice Agents is Lemon Slice-2, a novel video diffusion transformer model and inference framework that enables real-time, interactive avatar experiences. Lemon Slice-2 is a 20 billion parameter, few-step causal model that achieves a generation throughput of 20 frames per second on a single GPU. Efficient attention and caching strategies enable ultra-fast response times in an interactive setting and infinite-length videos with zero error accumulation. Lemon Slice-2 supports full-body avatar generation with expressive and semantically aware gestures. It is now available to the public for general use.
Breaking the real-time barrier

Lemon Slice-2 generates video frames faster than they can be watched. Strategies we used to break the real-time barrier include causal attention, a novel distribution matching distillation-inspired training paradigm, efficient caching, CUDA graph acceleration, and quantization.
Ultra-fast response times

Users of Lemon Slice-2 experience an average response time of 2.8s. Video generation makes up only 26% of that time (730 milliseconds).
Any character
Videos generated in real-time from a single image and audio sample on one GPU






Any style
Videos generated in real-time from a single image and audio sample on one GPU







Expressive gestures & scene awareness
Videos generated in real-time from a single image and audio sample on one GPU







Infinite video

As an auto-regressive model, Lemon Slice-2 is not limited to generating videos of a fixed length. Critically, unlike other autoregressive models, it does not experience error accumulation, allowing for infinite-length video generation.
Dynamic text control
Lemon Slice-2 enables real-time manipulation of video content via text prompting.
Real-time interactions
Lemon Slice-2 enables real-time interactions with any character. Below we show screen recordings of the embeddable widget powered by the model, now available for general use.









