Technology

01 — Joint Perception

Audio-Video Learning

We curate paired, long-to-short multimodal data for film, with clean, time-aligned tracks for dialogue, music, and visuals. These pairs enable high-fidelity generation and distillation from long footage into polished short-form assets while preserving temporal synchronization.

Input Video
Music Track
Dialogue Track
Sound Effect Track

Demo: Teaser Generation

Joint audiovisual learning applied to teaser generation — distilling a long video into a compelling short-form clip.

Input Video

02 — Reusable Memory

Scalable Experience Library

Our NeurIPS-accepted work treats existing footage as reusable memory — a growing library of grounded experiences the system can reference, recombine, and adapt to new contexts at scale.

Demo: Short-Form Video Generation

The model retrieves memory from existing footage and incorporates it into generation to produce coherent short-form content.

Input Video

03 — Coherent Control

Unified Multimodal Editing

A proprietary architecture for simultaneous audio-video editing — treating modalities as one coupled stream so changes in motion, events, or structure remain temporally and semantically aligned.

Demo: Audiovisual Highlight

Without Highlight

With Highlight

Demo: Audiovisual Addition

Without Seagull

With Seagull

04 — Spatial Presence

3D Perception and Generation

Models trained on binaural and spatial audio, enabling physically grounded 3D soundscapes that strengthen the system's sense of where events occur in the world.

Demo: 2D 360° Video → Spatial Audio

Downstream Application

Immersive Streaming Platform

Our downstream application transforms ordinary video into real-time multimodal experiences. The system synchronizes visual understanding, spatial audio, and haptic feedback so users can move beyond passive watching.

As scenes evolve, the model captures motion, impact, rhythm, and environmental transitions, then maps them to tightly aligned feedback channels — maintaining temporal coherence across modalities.