LemonSlice-2 | Real-time AI Video Research

Introducing LemonSlice-2

December 2025

Supporting our release of LemonSlice Agents is LemonSlice-2, a novel video diffusion transformer model and inference framework that enables real-time, interactive avatar experiences. LemonSlice-2 is a 20 billion parameter, few-step causal model that achieves a generation throughput of 20 frames per second on a single GPU. Efficient attention and caching strategies enable ultra-fast response times in an interactive setting and infinite-length videos with zero error accumulation. LemonSlice-2 supports full-body avatar generation with expressive and semantically aware gestures. It is now available to the public for general use.

Any character

Videos generated in real-time from a single image and audio sample on one GPU

Any style

Videos generated in real-time from a single image and audio sample on one GPU

Expressive gestures & scene awareness

Videos generated in real-time from a single image and audio sample on one GPU

Real-time interactions

LemonSlice-2 enables real-time interactions with any character. Below we show screen recordings of the embeddable widget powered by the model, now available for general use.

Breaking the real-time barrier

LemonSlice-2 generates video frames faster than they can be watched. Strategies we used to break the real-time barrier include causal attention, a novel distribution matching distillation-inspired training paradigm, efficient caching, CUDA graph acceleration, and quantization.

Ultra-fast response times

Users of LemonSlice-2 experience an average response time of 2.8s. Video generation makes up only 26% of that time (730 milliseconds).

Infinite video

As an auto-regressive model, LemonSlice-2 is not limited to generating videos of a fixed length. Critically, unlike other autoregressive models, it does not experience error accumulation, allowing for infinite-length video generation.

Dynamic text control

LemonSlice-2 enables real-time manipulation of video content via text prompting.