OptiWorld

Optimal Control for Video World Generation under Physical Constraints

Purdue University1, University of Oxford2, SixteenMiles Labs3

TL;DR: OptiWorld makes video world models choose better futures: safer, smoother, more efficient, and physically constrained motions before rendering.

OptiWorld teaser

Without explicit planning, generated motion can be visually plausible but unsafe, jerky, or inefficient. OptiWorld first plans an optimized trajectory, then uses that trajectory as generation guidance.

Abstract

Video generation models are becoming a scalable form of world models, but they mainly generate plausible motion rather than proactively control or optimize the underlying dynamics. As a result, an object in the generated video may follow trajectories that are unsafe, not smooth, inefficient, or physically inconsistent. In this work, we propose OptiWorld, a framework that brings classical optimal control into video generation at inference time. OptiWorld first extracts a compact, task-relevant world state, then plans an optimal trajectory under physical constraints, and finally renders the video conditioned on this trajectory. We formulate planning as a geometric problem on a continuous manifold, which converts 3D geometry and task-dependent physical constraints into a unified planning geometry. By adding this optimal-control layer, OptiWorld generates videos with preferable dynamics across goal-conditioned image-to-video generation, video dynamics editing, and counterfactual generation.

How OptiWorld Works

OptiWorld pipeline

Understanding

Builds a compact 3D world state from geometry, segmentation, VLM reasoning, and 3D tracks when editing a source video.

Planning

Converts 3D geometry and task constraints into a continuous planning geometry where hazards, goals, smoothness, and efficiency can be optimized together.

Generation

Renders video from the first frame, prompt, and optimized 3D trajectory using a controllable video generator.

Why Optimal Planning in Unified Manifold?

Open-world video generation mixes geometry, intent, and physical constraints. A unified manifold gives these signals a common space, so planning can reason over them as one geometry rather than many disconnected rules.

A shared space

Different constraints become comparable only after they are placed on the same planning surface.

Geometry as judgment

The manifold expresses what is costly, reachable, or preferred through distance and direction.

One path, many needs

A single trajectory can balance task intent, physical structure, and generative guidance before rendering.

Goal-conditioned Image-to-Video Generation

Click a case to compare OptiWorld with image-to-video and physics-aware baselines.

Input
Input frame with goal and OptiWorld path
Cosmos-Predict
VLIPP
HunyuanVideo
Wan2.2
OptiWorld

Quantitative Comparisons on Goal-conditioned I2V

MethodGoal err. ↓Viol.@succ. ↓Accel. ↓Jerk ↓Energy ↓Mot. smooth ↑BG cons. ↑Flicker ↑
HunyuanVideo-1.50.5480.5230.01180.01970.05760.9960.9610.995
Wan2.20.5490.5230.01720.03040.08600.9930.9580.990
Cosmos-Predict2.50.7330.2860.01040.01860.03980.9940.9360.985
VLIPP0.3310.5110.00990.01760.04190.9940.9640.992
OptiWorld0.3100.2600.00700.01160.01260.9970.9820.996

More OptiWorld I2V Cases

Each case shows the planned trajectory and the generated video rendered from it.

Optimized Plan
OptiWorld optimized plan
Generated Video

Video Dynamics Editing

OptiWorld refines source 3D tracks before rendering, producing shorter, smoother, and more physically reasonable motion.

Source Video
Wan FLF2V
OptiWorld

OptiWorld Result Comparison

The track visualizations compare the source motion with OptiWorld's optimized motion. Optimized Path Comparison overlays one tracked point before and after optimization.

Source Tracks
Source tracks
Optimized Tracks
All optimized tracks
Optimized Path Comparison
Optimized path comparison

Overlay of a single point trajectory before and after optimization.

Quantitative Comparisons on Video Dynamics Editing

MethodTrack dev. ↓Accel. ↓Jerk ↓Path len. ↓Energy ↓Mot. smooth ↑BG cons. ↑Flicker ↑
Source video-0.01240.02050.6470.02690.9930.9690.988
Wan2.1 FLF2V0.1230.01570.02660.7330.03870.9950.9570.993
OptiWorld0.1130.01070.01810.5240.02360.9960.9670.992

Counterfactual Physics Applications

OptiWorld can change goals or constraints before rendering, producing controlled physical variations from the same scene.

Different Goals

For the same scene, changing the target goal produces a different optimized trajectory while keeping the scene understanding and renderer fixed.

Scene 1

Plan A

Counterfactual multi-goal scene 1 plan A

Plan B

Counterfactual multi-goal scene 1 plan B
Scene 2

Plan A

Counterfactual multi-goal scene 2 plan A

Plan B

Counterfactual multi-goal scene 2 plan B
Scene 3

Plan A

Counterfactual multi-goal scene 3 plan A

Plan B

Counterfactual multi-goal scene 3 plan B

Different Safety Constraints

Safety 1
Counterfactual safety example 1
Safety 2
Counterfactual safety example 2
Safety 3
Counterfactual safety example 3
Safety 4
Counterfactual safety example 4
Safety 5
Counterfactual safety example 5

BibTeX

@article{Yuan_2026_OptiWorld,
  title={{OptiWorld}: Optimal Control for Video World Generation under Physical Constraints},
  author={Yuan, Yu and Yuan, Jianhao and Wang, Xijun and Li, Daiqing and He, Liu and Ling, Lu and Chan, Stanley H.},
  journal={arXiv preprint arXiv:2606.00499},
  year={2026}
}