Introducing Lemon Slice-2

December 2025

Supporting our release of Lemon Slice Agents is Lemon Slice-2, a novel video diffusion transformer model and inference framework that enables real-time, interactive avatar experiences. Lemon Slice-2 is a 20 billion parameter, few-step causal model that achieves a generation throughput of 20 frames per second on a single GPU. Efficient attention and caching strategies enable ultra-fast response times in an interactive setting and infinite-length videos with zero error accumulation. Lemon Slice-2 supports full-body avatar generation with expressive and semantically aware gestures. It is now available to the public for general use.

Breaking the real-time barrier

Graph showing Lemon Slice-2 video generation speed exceeds real-time barrier and outperforms competitors

Lemon Slice-2 generates video frames faster than they can be watched. Strategies we used to break the real-time barrier include causal attention, a novel distribution matching distillation-inspired training paradigm, efficient caching, CUDA graph acceleration, and quantization.

Ultra-fast response times

Graph showing Lemon Slice-2 has a very fast time to first byte (only 730 milliseconds)

Users of Lemon Slice-2 experience an average response time of 2.8s. Video generation makes up only 26% of that time (730 milliseconds).

Any character

Videos generated in real-time from a single image and audio sample on one GPU

Bear
Bear thumbnail
Woman thumbnail
Cute thumbnail
Rock thumbnail
Gorilla thumbnail

Any style

Videos generated in real-time from a single image and audio sample on one GPU

Catgirl
Catgirl thumbnail
Funko thumbnail
Lemon thumbnail
Manga thumbnail
Pop Art thumbnail
Statue thumbnail

Expressive gestures & scene awareness

Videos generated in real-time from a single image and audio sample on one GPU

Manga
Manga thumbnail
Old Man thumbnail
Peasant thumbnail
Woman thumbnail
Fountain thumbnail
Moto thumbnail

Infinite video

Graph showing Lemon Slice-2 can generate infinitely long videos without error accumulation, far exceeding competitors

As an auto-regressive model, Lemon Slice-2 is not limited to generating videos of a fixed length. Critically, unlike other autoregressive models, it does not experience error accumulation, allowing for infinite-length video generation.

Dynamic text control

Lemon Slice-2 enables real-time manipulation of video content via text prompting.

Real-time interactions

Lemon Slice-2 enables real-time interactions with any character. Below we show screen recordings of the embeddable widget powered by the model, now available for general use.

Education Example
Education Example thumbnail
Shrek Entertainment Example thumbnail
Pikachu Entertainment Example thumbnail
Emotional Support Example thumbnail
Customer Support Screen Recording Example
Customer Support Screen Recording Example thumbnail
E-commerce Screen Recording Example thumbnail
Lead Qualification Screen Recording Example thumbnail
Two-Way Video Screen Recording Example thumbnail
Try it now →