SeeU

Seeing the Unseen World via 4D Dynamics-aware Generation

Purdue University1, Samsung Research America2

TL;DR: SeeU learns the continuous 4D dynamics from 2D inputs, and generates novel content at unseen time and space.


SeeU generates more physical motion and higher geometric consistency than baselines.

(Click the buttons below to switch between tasks)

Input/Reference
Baseline
SeeU

Abstract

Images and videos are discrete 2D projections of the 4D world (3D space + time). Most visual understanding, prediction, and generation operate directly on 2D observations, leading to suboptimal performance. We propose SeeU, a novel approach that learns the continuous 4D dynamics and generate the unseen visual contents. The principle behind SeeU is a new 2D→4D→2D learning framework. SeeU first reconstructs the 4D world from sparse and monocular 2D frames (2D→4D). It then learns the continuous 4D dynamics on a low-rank representation and physical constraints (discrete 4D→continuous 4D). Finally, SeeU rolls the world forward in time, re-projects it back to 2D at sampled times and viewpoints, and generates unseen regions based on spatial-temporal context awareness (4D→2D). By modeling dynamics in 4D, SeeU achieves continuous and physically-consistent novel visual generation, demonstrating strong potentials in multiple tasks including unseen temporal generation, unseen spatial generation, and video editing.

Why Model Continuous Dynamics in 4D?

Motivation Illustration

Projection and the entanglement of camera and scene motions make recovering accurate 3D geometry and physical trajectories directly from 2D frames particularly challenging; however, these quantities can usually be described in the 4D world explicitly, easily, and elegantly.

Method

SeeU follows a three-stage pipeline: (i) 2D→4D, (ii) Discrete 4D→Continuous 4D, and (iii) 4D→2D.

Motivation Illustration

Pipeline of SeeU. (a) A dynamic scene is lifted into a 4D representation. (b) Continuous 4D dynamics are learned efficiently with physical and smoothness priors. (c) The learned dynamics evolve the 4D world, which is re-projected to 2D at unseen times and viewpoints; a spatial–temporal in-context video generator completes the unobserved or uncertain areas.

Unseen Temporal Generation

(Click the buttons below to switch between scenarios)


Input
Learned 4D Continuous Dynamics
Output (5 past, 71 between and 5 future)

Unseen Spatial Generation

(Click the buttons below to switch between scenarios)


Input
Dolly Out
Dolly Right
Dolly Up
Pan Right
Tilt Up

Video Editing

(Click the buttons below to switch between editing tasks)


Input
Ouput
Input
Output

BibTeX

@article{Yuan_2025_SeeU,
  title={{SeeU}: Seeing the Unseen World via 4D Dynamics-aware Generation},
  author={Yuan, Yu and Wickremasinghe, Tharindu and Nadir, Zeeshan and Wang, Xijun and Chi, Yiheng and Chan, Stanley H.},
  journal={arXiv preprint arXiv: },
  year={2025}
}