Ray 3.2 Explained: Luma's Frame-by-Frame AI Video Model

Jun 11, 2026

For three years, AI video has had one fundamental frustration: you prompt, you wait, you hope. On June 9, 2026, Luma pushed back against that with Ray 3.2 — an update to its Ray3 video model built around a different verb entirely. In Luma's own framing, creators are no longer just prompting; they're directing. With up to 16 keyframes per clip, performance tracking across eight faces, HDR/EXR output, and 20-second 1080p generations, Ray 3.2 is a serious bid for professional pipelines — and for the first time, the whole control surface ships as an API.

What Is Ray 3.2?

Ray 3.2 is the latest version of Luma's Ray series of video generation models, announced June 9, 2026 and designed in collaboration with creatives from the entertainment, advertising, and gaming industries. The pitch is precision: instead of handing the model a prompt and accepting whatever motion comes back, Ray 3.2 lets you control how action moves and changes from the first frame to the last.

That positions it squarely at studios, agencies, and production teams — the part of the market that needs an AI shot to behave like a production asset, not a slot-machine output. But the ideas in it matter to every AI video creator, because they show where the whole category is heading.

What's New in Ray 3.2

The release centers on five upgrades, each aimed at a specific production pain:

  • Frame-level control with multi-keyframe. You can place up to 16 keyframes within a single clip, choreographing exact narrative beats, camera paths, and visual progressions. This is the headline feature — it turns a storyboard into something the model actually follows.
  • Performance preservation. Enhanced Performance Tracking and Expressive Facial Performance carry over an actor's skeletal posture, gestures, and full expressive state — tracking up to eight faces simultaneously, frame by frame.
  • Production-grade output. Native HDR generation with 16-bit EXR export means Ray 3.2 clips drop into professional color grading and compositing workflows without sacrificing dynamic range.
  • Reframe instead of reshoot. The enhanced Reframe capability reshapes a shot after the take — adapt the aspect ratio, extend the frame, or replace a background while preserving the original lighting. Client feedback becomes a targeted edit, not a re-generation.
  • Longer cinematic clips. Generations now run up to 20 seconds at 1080p, enough for real scene-building rather than fragmented teasers.

The Ray 3.2 API: Built for Pipelines

The other half of the announcement: for the first time, the full control surface of the Ray model is available as an API. Luma's framing is "built for pipelines, not just previews" — developers can wire Ray 3.2 into render farms, in-house tools, and creative applications without middleware. Third-party inference platforms like fal.ai and Wavespeed listed Ray 3.2 endpoints (text-to-video and image-to-video) almost immediately, so the model is reachable both through Luma directly and through the broader API ecosystem.

That API release matters beyond Luma. Every time a frontier video model exposes its full controls programmatically, multi-model platforms get a richer menu to build from — and creators get access without committing to one vendor's app.

Why Frame-Level Control Matters

If you've generated AI video, you know the loop: write a prompt, generate, discard, tweak, repeat. The model decides the pacing, the camera, the moment the action lands — you just nudge from outside.

Keyframe direction inverts that. Pinning 16 moments across a clip means the structure of the shot is yours: where the camera is at second 3, when the turn happens at second 9, how the scene resolves at second 18. Combined with Reframe (fix the take you mostly like instead of rerolling it), the economics of iteration change — fewer wasted generations, more deliberate edits. That's the practical meaning of "directing, not prompting," and it's the direction the whole field is moving: more control surfaces, fewer dice rolls.

Ray 3.2 vs Kling 2.1: Where Each Fits

A natural question for anyone generating video today is how Ray 3.2 relates to established models like Kuaishou's Kling 2.1 — the model powering AI video generation on Polyfaced.

They're tuned for different jobs. Ray 3.2's edge is control and pipeline fit: multi-keyframe direction, HDR/EXR deliverables, 20-second takes, face-performance tracking — features that earn their keep in storyboarded, client-driven production. Kling 2.1's strength is cinematic quality with a fast, simple loop: describe a scene, pick aspect ratio and duration, and get a polished 1080p clip in about a minute — no post-production stack required.

In practice the two aren't rivals so much as different altitudes of the same craft. If you're compositing AI shots into a graded timeline, Ray 3.2's EXR output is built for you. If you want strong video from a prompt right now, a streamlined generator gets you there with far less ceremony. Polyfaced's studio runs Kling 2.1 today; as multi-model platforms expand their rosters, expect frame-directed models like Ray 3.2 to shape what "pro mode" looks like everywhere.

How to Access Ray 3.2

  • Luma's app — Ray 3.2 is live in Luma's product for direct use.
  • Luma's API — the first-party route, with the full control surface (keyframes, reframe, performance tools) exposed for integration.
  • Third-party platforms — fal.ai and Wavespeed list Ray 3.2 text-to-video and image-to-video endpoints, with more inference providers likely to follow.

Pricing varies by platform and usage, so check the current rates on whichever route you pick.

Frequently Asked Questions

What is Ray 3.2?

Ray 3.2 is Luma's latest AI video generation model, released June 9, 2026. It focuses on creative control — up to 16 keyframes per clip, performance tracking for up to 8 faces, native HDR with 16-bit EXR export, Reframe editing, and clips up to 20 seconds at 1080p.

What's the difference between Ray 3.2 and Ray3?

Ray 3.2 is an update to the Ray3 family. The key additions are frame-level multi-keyframe control, enhanced performance and facial tracking, improved Reframe, longer 20-second generations, and — for the first time — the model's full control surface available as an API.

Does Ray 3.2 have an API?

Yes, and it's a first for the Ray line. Luma offers the API directly, and third-party inference platforms like fal.ai and Wavespeed list Ray 3.2 endpoints for text-to-video and image-to-video.

Is Ray 3.2 better than Kling 2.1?

They're aimed at different workflows. Ray 3.2 prioritizes frame-level direction and post-production-ready output (HDR/EXR, 20s clips) for professional pipelines; Kling 2.1 — the model behind Polyfaced's video studio — prioritizes cinematic quality from a simple prompt-to-clip loop. Which is "better" depends on whether you're compositing a graded timeline or creating finished clips directly.

How long can Ray 3.2 videos be?

Up to 20 seconds at 1080p in a single generation — long enough for contiguous scene-building rather than short fragmented shots.

The Bottom Line

Ray 3.2 is the clearest statement yet that AI video is splitting into two layers: fast prompt-to-clip creation, and frame-directed production work that slots into professional pipelines. Luma is racing up the second layer — 16 keyframes, EXR deliverables, an API built for render farms. For creators, both layers are good news: the pro tools push the ceiling higher, and the streamlined tools keep getting better underneath. If your next idea needs a finished cinematic clip more than a compositing pipeline, start where the loop is simplest — and keep an eye on this space, because frame-level direction is coming to everything.

Related reading: Claude Fable 5 explained · DiffusionGemma: Google's 4x faster text AI

Source: Luma — Introducing Ray3.2.

Polyfaced Team

Polyfaced Team