Imagine: you open a browser. No code, no HTML, no CSS layout engine. Every frame on the screen is a pixel video stream generated by an AI model in real time.
It can instantly understand your intent and dynamically reshape the entire interface—from planning a trip to Paris to complex data visualization—all as vivid as hand-drawn illustrations, seamlessly morphing and interacting with every click.
It feels like science fiction has descended upon us!
This is the Flipbook prototype just released by Zain Shah (former OpenAI, YC alumnus) and his team.
Experience it at: flipbook.page
In just one day, it exploded on X with two million views. Netizens have been going wild! It also works on mobile phones.
If you enable live video stream mode, the experience is even more breathtaking~
Zain released a real demo achieved with an optimized LTX Studio video model: 1080p 24fps real-time streaming, backed by Modal GPU servers.
The traditional web development paradigm is completely shattered: no front-end layout needed; browsing feels like flipping through a book.
Current web reading "is generated by rigid code and rules, making it difficult to convey complex and detailed ideas."
Flipbook discards this. Its philosophy is: a picture is worth a thousand words. Every "page" you land on is an image.
The magic lies in the fact that clicking anywhere on the image gives you a new image, allowing you to continue exploring that subject in greater depth.
Nothing you see contains any HTML, code, specific links, or fields. The entire web is merely generated pixels displayed on your screen, and even the text is composed of pixels within the image.
A true page-flipping experience.
For the past 20 years, we have relied on HTML + CSS + JavaScript + React to build interfaces. Now, Flipbook simplifies everything into a "pixel stream": the model directly decides what you see and how you interact.
- No layout engine required: Illustrations adaptively morph with the window, no longer boxed in by CSS.
- Full-screen interaction: Every pixel responds to clicks; the model judges intent in real time, no longer limited to predefined buttons.
- Visual-first: Complex concepts are expressed with illustrations, animations, and realistic renderings, not dull text and rectangles.
The signal conveyed to me is that the era of front-end engineers "writing code to build interfaces" may be coming to an end.
AI-native browsing is truly invincible.
I immediately went to try it out, and indeed, there is a feeling of "infinitely exploratory reading."
For example: The Qwen3.6-27B model was released today. Previously, I always had to carefully look at the comparisons of various benchmark scores. Now, I directly handed it over to Flipbook to interpret for me.
I clicked on "SWE-bench Verified," which is currently highly watched by the industry. Moments later, it acted like a magnifying glass tool, generating more specific comparative figures and analysis for me.
I continued to click on the strongly related "Agentic Flow," and it produced a visual Loop diagram.
This interactive style, which carries a strong sense of exploration, is unprecedented.
What if you want to return to the previous page? The navigation bar has already memorized the path for you; you just need to click back.
Don't underestimate this small tool; the outside world views it as an industry earthquake signal.
Besides interpreting complex charts, I also discovered another clever use: interpreting the micro-expressions of real-life figures.
To analyze the facial expression traits of a celebrity, in the spirit of not wasting resources, I fed Flipbook the "Shen Teng Time Magazine cover." This expression, from eyebrows, eyes to nose, mouth, even the dimples, was interpreted clearly and thoroughly.
Well, Shen Teng's face is just too authoritative.
Actually, there are many bizarre uses—only what you can't think of; there's nothing Flipbook can't do.
Let it help elementary school students with their homework. Abstract math problems are all "visualized," making them so much easier to solve, right?
In short, all abstractions here become vivid and easily perceptible!
And there's more. The imaginative possibilities are vast; it can interpret any image.
If you don't recognize a guest during a live stream, you can ask it. (P.S.: The capability isn't strong enough yet; it can only identify highly recognizable figures. It's prone to mistakes.)
And here comes the real killer feature!
What if there's no image? You can directly enter a prompt in the URL. For instance, I typed:
Make me an exploded view of Sakuragi Hanamichi's slam dunk motion!
And, don't forget, every frame above has a higher-level "real-time video stream" version. It's just that my internet speed isn't fast enough, otherwise, I could watch the slow-motion replay of Sakuragi Hanamichi's dunk directly.
It's foreseeable that future product prototypes will also evolve from "drawing Wireframes" to "directly prompt-generating interactive video interfaces."
And low-code/no-code will evolve into "zero-code AI-native interfaces."
An AI-native browser of infinite vision: the real-time breakthrough of AI video models.
Zain specifically mentioned the evolutionary technology of real-time video on X: the optimized version of the LTXStudio video model.
"To bring these images to life vividly, we significantly optimized the video model by @LTXStudio. It can stream 1080p video directly to users' screens at 24fps, leveraging @modal_labs serverless GPU infrastructure via WebSocket."
Looking at it now, this Flipbook seems like both an AI-native browser and an AI-native player. Behind the scenes, the interaction is seamless without any stutter, which is vastly different from typical video generation. How is this achieved, then?
Actually, the core technology behind Flipbook is the LTX-2/LTX-2.3 series of open-source DiT models by Lightricks (an Israeli tech company focused on AI-first creativity). It enables high-compression latent space, multi-scale rendering, and synchronized audio-video, with speed faster than real-time (generating several seconds of video in seconds on an H100), supporting native 4K/1080p portrait mode.
With it, the generated video possesses real-time, state-aware, and interactive generation capabilities. Combined with agentic search, the model can also pull real-time data to ensure accuracy.
The next ambition: structured UI programming
How will such a thrilling product experience be amplified next?
Zain honestly admits Flipbook is still very limited, so the team has currently chosen to design it around visual explanation.
But their ambition is actually much bigger: as models become more accurate and stateful, the set of things worth doing this way will expand. Even those you think require structured UI, like programming.
As mentioned earlier, this technology could disrupt almost all of our existing workflows, increasing speed tenfold:
- UI/UX Design: From static mockups to dynamic video prototypes, iteration speeds increase by 10x+. Designers can test a complete user journey with a single prompt.
- Content Creation & Education: Travel planning, tutorials, data stories all transform into immersive visual narratives. The education sector might usher in "AI dynamic textbooks."
- E-commerce / Product Demos: Product pages are no longer just images and text, but real-time generated 3D-like interactive scenes that morph personally based on user preferences.
- Gaming & AR/VR: Real-time generation of cutscenes and interfaces paves the way for lightweight AI-driven experiences.
And that's not all. The current product experience is still constrained by the upper limits of model capabilities. If models develop to a sufficiently stable state, entire software interfaces could become "generative."
By then, the browser will still exist, but what runs inside it won't be web pages, but an AI-driven "infinite visual browser."
Behind Flipbook: two words—money-burning. But the future is well worth it.
However, scaling such a forward-looking technological experience for everyone still presents considerable difficulties.
The primary issue is computational cost. As everyone knows, traditional web client rendering is almost free. Flipbook, however, requires continuous server-side GPU inference. Bandwidth and cost bottlenecks (a 50-150x gap between video streams and text streams) need addressing.
But this problem is not unsolvable. If we believe what figures like Jensen Huang and institutions like a16z say, the cost of inference computing drops to 20% or even 10% every year. Moreover, the localization pace of open-source models (FP8 quantization) is accelerating. It's estimated that in 5-10 years, economic feasibility issues will be resolved.
Additionally, companies like Lightricks are pushing open-source + enterprise editions, and infrastructure like Modal is already ready. Whoever solves scaling first reaps the biggest dividends.
An even bigger signal is that the internet computing paradigm is shifting from "client-dominated" to "cloud AI generation-dominated." Consequently, all our current tech stacks—cloud computing, edge computing, browser architecture—will be reshaped, and may even birth a new "AI OS."
The team behind it
I also dug into the background of the team behind Flipbook.
The team is actually a small, cross-disciplinary "creative tech" group rather than a formal company team in the traditional sense.
To some extent, it can even be called a side project, rapidly pieced together by a group of passionate makers and tech geeks in a community lab environment.
The core figure is Zain Shah, the leader and initiator. Zain's resume includes: Creative Technologist at Samsung, responsible for developing future device, wearable, and AI assistant prototypes. Before that, he was a researcher at OpenAI and, of course, a YC S13 alumnus (founded Watchsend), with data science/engineering experience at Opendoor.
In short, Zain excels at combining AI with interactive interfaces and hardware prototypes.
Beyond that, it's worth mentioning that he co-founded MadSci, a non-profit community makerspace and lab in central San Francisco. Much of Flipbook's inspiration and actual development likely happened here.
Also, on his personal website, when mentioning Flipbook, he used the phrase "It took a village," indicating this was a collaborative effort, not a solo achievement.
In his post announcing Flipbook, Zain also specifically thanked the following individuals:
Eddie Jiao, a member of the San Francisco entrepreneur community South Park Commons, formerly at Humane, Slack, and Brown. Yes, another Eastern face!
Another core member is Drew O'Carr, formerly at Apple.
From the resumes of these few, it's not hard to see a commonality: all three are experimental builders exploring the "interface forms of the AI age."
The eve of the AI-native interaction era explosion
Since 2023, the industry has been buzzing with discussions about "AI-native products."
People have made numerous explorations into what products should look like in the generative AI era.
From adding a "Chat box" to traditional internet products, to the pure CLI style of Claude Code, the former has been called gimmicky, while the latter is difficult to rapidly promote among non-programmer groups.
But Flipbook seems to have a chance to succeed!
From writing CSS to prompt engineering; from writing a hint in the URL, letting the information you want flow and recombine between the pixels of the previous frame.
The interactive experience where an image is everything, combined with the generative magic of prompts, arguably touches upon the definition of an "AI-native browser."
It is foreseeable that all our future interactions—whether in work, like marketing design, programming, writing PPTs, prototyping, or in daily life, like traveling, teaching children homework, searching for celebrity trends—these experiences may all undergo a brand new "visual" reshaping.
Final thoughts: HTML isn't far from retirement
Of course, as stated earlier, Flipbook is currently in the prototype stage, mainly used for "visual explanation," and its actual speed still has room for optimization.
But it clearly points to the future: when AI models are fast and smart enough, interfaces will be as rich, immediate, and personalized as the real world.
One can imagine that our websites will eventually shed their "color-blocked web page" form and transform into "visual universes tailored for every individual."
In short, HTML is not far from retiring!
What is certain is that this wave of AI interface revolution has only just begun.
Reference links:
https://x.com/zan2434/status/2046982383430496444
https://sandner.art/ltx-video-locally-facts-and-myths-debunked-tips-included/