Dappled Photography: Mask Enhanced Cameras for
Ashok
Veeraraghavan, Ramesh Raskar,
Amit
Agrawal, Ankit
Mohan and Jack
Tumblin |
Frequently Asked Questions
Q. What is this paper about? This paper describes a new
theory of light field capture in frequency domain using non-refractive
(passive) optical elements. We present mask-enhanced
cameras,
where a transmissive mask (transparency) is placed in the optical path
of a conventional lens based camera. We present a Fourier domain
analysis of the effect of
placing
a mask in the optical path and show that the mask serves as a modulator
of the incoming light field. This is similar to heterodyning in radio
communications. Effectively, the mask acts as a carrier in optical
domain. Using this theory, we
propose a new class of cameras that can
capture 4D light field on a 2D sensor using just a mask inside the
camera, without any additional lens like optical elements. In addition, our theory
also leads to the design of the optimal mask to be placed at the
aperture for full resolution digital refocusing. Using our framework,
out of focus deblur can be explained in Fourier domain. Dapple means to mark or become marked with spots. The mask placed in the optical path shadows the incoming light and dapples the sensor. Q. Is the mask placed at the lens or near the sensor? The mask can be placed
anywhere between the lens and the sensor, but
the effect of the mask depends on its placement. We propose
two designs based on the mask placement.
In the first design, the mask is placed close to the sensor. We call this Heterodyne Light Field Camera. This design enables us to compute a 4D light field of the scene from a single 2D captured photo. The mask placed close to the sensor is a fine narrowband cosine mask. In the second design, a mask is placed at the lens aperture. We call this Encoded Blur Camera or Coded Focus Camera. This design enables us to digitally refocus a captured photo at full resolution for layered Lambertian scenes. The mask placed at the lens is a coarse broadband mask, which looks like a crossword puzzle. Q. What are the differences between your Heterodyne Light Field Camera and the previously proposed Stanford’s handheld plenoptic camera? There are several
important differences between our design and the
Stanford's handheld plenoptic camera (SHPC) [1].
SHPC uses a lenslet array between the mask and the sensor to capture the light field. The main lens is focused on the lenslet array, which is turn focuses on the sensor. Thus, the light field camera uses refractive optics. In contrast, we place a mask between the sensor and the lens, which only attenuates the incoming rays and does not bend them. Thus, we use non-refractive elements. The captured image is the convolution of the incoming light field with the mask light field. SHPC captures the 4D light field using ray binning. Each ray is sampled individually at the sensor pixel. Our design instead samples coded linear combination of rays. Specifically, we capture the 4D light field directly in the Fourier domain. Thus, a 2D sensor pixel in our case represents a coded linear combination of several rays. In software, we can decode this linear combination to obtain the 4D light field. Q. Are there any design advantages over SHPC? Yes, there are several advantages. 1. Non-refractive optics
is easier to implement, mount and align. In
SHPC design, the alignment of the lenslet array with respect to the
sensor requires high precision. Each microlens in the array must focus
the main lens’s aperture plane onto the sensor, requiring micrometer
precision as mentioned by Ren Ng. Any lenslet-position error reduces
angular resolution, while
the number of lenslets limits spatial resolution.
We do not require such high alignment precision. While the mask must be precisely parallel to the sensor, distance error only changes the angular sampling rate in the frequency domain. We achieved our camera’s 2 cm gap to only millimeter precision by sticking the mask to the glass surface of a cheap US$70 scanner on the back of our old view-camera. 2. A microlens array based light-field camera is better suited to capturing high angular resolution: many sensor pixels under each lenslet rather than just a few, because square lenslets waste sensor pixels when imaging a circular aperture, and off-axis lenslets form off-axis mini-images. Masks avoid this loss by angular/spatial partitioning in the frequency domain; every pixel receives masked light-field signals, and different mask frequencies and mask heights above the sensor give different angular/spatial trade offs. Q. How do you digitally refocus images? There are two ways to obtain digital refocused images. 1. A
refocused image corresponds to a 2D slice of the
4D light field. The first method to obtain a refocused image is to
compute the 4D light field of the scene and take a slice corresponding
to a different plane of focus. We can refer to this as Light Field
based Digital Refocusing. This method was pioneered by the
Stanford’s
plenoptic camera.
The advantage of this method is it can handle very complex scenes as shown by Ren Ng, since the entire light field is captured. A big disadvantage, however, is that the resolution of the refocused image is reduced by the number of angular samples in both horizontal and vertical directions. For example, in Stanford’s design, a 14MP camera results in refocused images of only 300 by 300 pixels. Note that this limitation is inherent when refocusing via captured light fields. Our Heterodyne Light Field Camera also suffers from loss of resolution in Light Field based Digital Refocusing. 2. One can achieve full resolution digital refocusing using our second design, Encoded Blur Camera, by placing a broadband mask at the aperture. In this case, we do not capture the entire light field of the scene. Instead, we model the captured photo as linear convolution of an all-focus image with the depth dependent defocus blur and achieve full resolution refocusing using image deblurring techniques. We call this Deblurring based Digital Refocusing. The limitation, however, is that complex scenes cannot be handled. Using a single image, we show that layered Lambertian scenes, where the scene can be modeled as consisting of few layers can be handled. This could be improved by using more than one image and/or user interaction, which remains an open area of future work. Q. How do you print these masks? For Coded Focus Camera, the broadband mask is a binary 7 by 7 pattern (49 holes, approximately half opaque and half transparent). The size of each hole is >1mm^2 (to avoid diffraction). This can be printed easily as a standard emulsion based transparency. Printing cost is cheap, one can get 20 of these masks printed on a single A4 size transparency for $50. For amateur photography, one can print the mask on a standard overhead transparency on home printer. Q. Usually image deblurring leads to ringing aritfacts. How do you get rid of those artifacts? Image deblurring is a well-known ill-posed problem. Since high spatial frequencies are suppressed in the captured blurred photo, deblurring results in increase noise and ringing artifacts when those frequencies are recovered. Most of the deblurring approaches are hallucination algorithms. Using image priors/training set, these approaches try to generate high spatial frequencies using complicated procedures. In this paper, we show that by modifying the aperture using a broad-band mask, high spatial frequencies are preserved in the captured blurred image. Thus, deblurring becomes a well-posed problem and gives a valid solution. By simply solving the linear system corresponding to deblurring, one can recover fine features such as eye glints and hair strands (see example in paper). |
Contacts: Ramesh
Raskar, Amit
Agrawal