Computational Photography

Ramesh Raskar, Jack Tumblin

Computational Photography captures a machine-readable representation of our world to synthesize the essence of our visual experience.

What is Computational Photography ?
Computational photography combines plentiful computing, digital sensors, modern optics, actuators, and smart lights to escape the limitations of traditional film cameras and enables novel imaging applications. Unbounded dynamic range, variable focus, resolution, and depth of field, hints about shape, reflectance, and lighting, and new interactive forms of photos that are partly snapshots and partly videos are just some of the new applications found in Computational Photography.

Pixels versus Rays
In traditional film-like digital photography, camera images represent a view of the scene via a 2D array of pixels. Computational Photography attempts to understand and analyze a higher dimensional representation of the scene. Rays are the fundamental primitives. The camera optics encode the scene by bending the rays, the sensor samples the rays over time, and the final 'picture' is decoded from these encoded samples. The lighting (scene illumination) follows a similar path from the source to the scene via optional spatio-temporal modulators and optics. In addition, the processing may adaptively control the parameters of the optics, sensor and illumination.

Encoding and Decoding

The encoding and decoding process differentiates Computational Photography from traditional 'film-like digital photography'. With film-like photography, the captured image is a 2D projection of the scene. Due to limited capabilities of the camera, the recorded image is a partial representation of the view. Nevertheless, the captured image is ready for human consumption: what you see is what you almost get in the photo. In Computational Photography, the goal is to achieve a potentially richer representation of the scene during the encoding process. In some cases, Computational Photography reduces to 'Epsilon Photography', where the scene is recorded via multiple images, each captured by epsilon variation of the camera parameters. For example, successive images (or neighboring pixels) may have a different exposure, focus, aperture, view, illumination, or instant of capture. Each setting allows recording of partial information about the scene and the final image is reconstructed from these multiple observations. In other cases, Computational Photography techniques lead to 'Coded Photography' where the recorded photos capture an encoded representation of the world. In some cases, the raw sensed photos may appear distorted or random to a human observer. But the corresponding decoding recovers valuable information about the scene.  

There are four elements of Computational Photography.

(i) Generalized Optics

(ii) Generalized Sensor

(iii) Processing, and

(iv) Generalized Illumination

The first three form the Computational Camera. Like other imaging fields, in addition to these geometry defining elements, Computational Photography deals with other dimensions such as time, wavelength and polarization.

Defining Computational Photography
(** Slide is inspired by Shree Nayar's slide from his presentation in May 2005. Modification by Raskar and Tumblin are: classification of Computational Photography into four elements,
introduction of Computational Illumination as fourth element and emphasis on ray-modulation in each element.)

What is Next

The field is evolving through three phases. The first phase was about building a super-camera that has enhanced performance in terms of the traditional parameters, such as dynamic range, field of view or depth of field. I call this Epsilon Photography. Due to limited capabilities of a camera, the scene is sampled via multiple photos, each captured by epsilon variation of the camera parameters. It corresponds to the low-level vision: estimating pixels and pixel features. The second phase is building tools that go beyond capabilities of this super-camera. I call this Coded Photography. The goal here is to reversibly encode information about the scene in a single photograph (or a very few photographs) so that the corresponding decoding allows powerful decomposition of the image into light fields, motion deblurred images, global/direct illumination components or distinction between geometric versus material discontinuities. This corresponds to the mid-level vision: segmentation, organization, inferring shapes, materials and edges. The third phase will be about going beyond the radiometric quantities and challenging the notion that a camera should mimic a single-chambered human eye. Instead of recovering physical parameters, the goal will be to capture the visual essence of the scene and analyze the perceptually critical components.  I call this Essence Photography and it may loosely resemble depiction of the world after high level vision processing. It will spawn new forms of visual artistic expression and communication. Please see the Introduction slides in Siggraph 2008 course notes for details on the three phases and example projects.



Other Links