MAS 964 (H): Special Topics in Media Technology:
Camera Culture
Spring 2008


What will a camera look like in 10 years, 20 years?

What will be the dominant platform and why?


Contributors so far:
Matt Hirsch
Cyrus Wilson
Tsuyoshi Johnny Kuroki, Canon

(scroll to individual responses)


Matt Hirsch


Considering the adoption rates for drastically new technologies, in 10
years cameras will likely look superficially very much as they do today.
Small changes in the physical sensors may occur, such as the ability to
image in other spectra of light, with greater dynamic range. The trend
of increasing spatial and temporal resolution will continue, as the
availability of local storage space will continue to grow in capacity
and shrink in size.

Within 10 years even cheap digital cameras will be tens of megapixels,
and expensive cameras could approach gigapixels. Despite the lack of
superficial differences in the physical devices, the ecology of services
and embedded processing power surrounding the cameras will be much
increased. There will be very powerful DSPs and microcontrollers coupled
to image sensors to perform realtime correction for adverse conditions.
The camera will do a better job of capturing the scene as you see it, or
even as you would like to see it, unfazed by low light, camera motion,
glare, high dynamic range, and rapidly evolving dynamic scenes. All this
will happen without significant user intervention (customizing settings
on the fly, etc).

The plethora of web applications focusing on photo and video images will
begin to leverage the vast databases of stored images they have
accumulated. Users will be able to upload their images and then view
them in the context of additional available content. Image sharing sites
will use the collective content of their users (and the users of other
sites), appropriately anonymized, to render the scene surrounding
(spatially and temporally) the images uploaded by a particular user. In
this way your photos and videos would function as a key in to regions of
space-time stored digitally. They would be contextualized in a way
previously unimaginable.

We will have to be careful that this technology is not misused for
restricting freedoms, as overeager copyright enforcers try to embed
restrictive functions into camera firmwares or government spy programs
abuse the ubiquitous nature of potentially sensitive data.

As far as 20 years from today, the number of camera devices in the
environment will begin to reach a critical mass, especially in heavily
populated area. The devices will begin to transition out of the hands of
the users and into the environment. Rather than pointing a camera and
clicking to take a photo, you may paint the scene you wish to capture
with a tiny wand containing an IR laser and terabytes of data storage
capacity. Cameras in the environment will work collectively to image the
scene from your perspective, and from other perspectives of interest,
and then transfer the resulting data to your wand. You could then dial
back from this point to see the scene at other times in the past,
perhaps displayed as a three dimensional scene overlaid in your vision
by small laser projectors that paint the images directly on your retina.
To allow some degree of privacy, the past scenes may only be rendered
for locations where you are physically present. This would allow public
spaces to remain public and private spaces private. It would also give
you the sensation of walking through past scenes, a sort of "time
travel."  One could also render "imaginary" objects into the real world.
This last bit would probably be abused for advertising purposes, which
is a scary prospect.


Cyrus Wilson

The "camera" as described in 2028:

Although the names for photographic devices are ultimately derived from the word
"camera", the traditional form of a dark chamber behind an optical system has been
abandoned entirely. It put too much of a constraint on the arrangement of sensor "pixels"
(a term which, together with "voxels" was eventually replaced by "scexels" in attempt to
be more general, but that new term was quickly repurposed to refer to specific categories
of content) and limited scene capture to a single viewpoint or very narrow field of
viewpoints. In other words, the "camera" form was fine for image capture, but limiting
for scene capture.

Traditional optics have also been abandoned, as the cost of "camron" production is kept
low by fabricating everything on the silicon wafer. The "camron," of course, is the
sensor unit. Together, a collection of camrons capture a scene. (It was the inventor's
intent that the plural of "camron" be "camra", but that never caught on.)

A camron contains 5 photosensors, for UV, blue, green, red, and IR. UV is useful because
there is plenty of UV light available during the day (thanks to depletion of the ozone
layer) and it can also provide a bit of a depth cue based on absorption by smog in the
atmosphere. IR, on the other hand, is plentiful during the night, as most populated areas
have IR lighting for the benefit of vehicles and other motorized devices.  Microlenses
(fabricated on-chip as part of the process) maximize light gathering ability of the
photosensors.

A pressure sensor (thanks to pzs-on-chip technology) allows sensing of sound, though
sound is not further discussed here. Other circuitry includes the DPU, GPS, a solar cell
(with energy storage based on ucap-on-chip technology), and a Wi-Rad transceiver. Since
all of this takes up very little room, and does not require any physical wiring to
anything else to function, hundreds of camrons are fabricated on a wafer, which is then
diced up and can be sprinkled on any surfaces.

The platform by which camrons collectively capture a scene is based on "throng
computing." Throng was developed by a grad student whose research was not getting
anywhere; therefore he took it upon himself to make it possible for a set of obsolete
media players to, together, run the very computationally-intensive game "Rock Idol
Massacre." With throng, running a massive distributed computing task on relatively weak
agents is simply a matter of combining enough agents; the computation proceeds even as
agents come and go due to factors such as proximity (entering or leaving the area).  It
formed the perfect basis for camron scene capture, after the emergence of Wi-Rad (Wi-Fi's
successor's successor), the first wireless technology that could scale to support throng
computing.

Each camron determines its position and orientation based on GPS information, Wi-Rad
signal from nearby camrons, and sensor observations compared to nearby camrons. With
their individual positions and orientations at a given time known, the combined IRGBU
observations of a throng of camrons are used to reconstruct the rays of light and
occlusions in the scene.

When a person in the area wishes to capture the scene, he/she uses a client (which can
run on any device capable of Wi-Rad, Throng, and ideally GPS) to simply request scene
data from the nearby camrons. There are two typical modes of capture:  "Subject View" and
"Object View."  Subject View includes all that could be experienced from the position of
the person, looking in all directions.  Object View is the view of the person (in the
context of the scene), viewed from all directions.

If the person is in a relatively wide open area, a "Subject View" reconstruction from the
position of the "photographer" may suffer from a sparseness of data. For such
applications a "TriCam" gives improved results. The TriCam is simply a small sphere
(attached to a short stick for the photographer to hold up in the air) covered with
camrons. The camrons participate as part of the throng and increase the density of
samples from the photographer's position. Though the spherical arrangement may seem to
give only angular and not spatial resolution, the slight movement of the photographer's
hand allows the TriCam to collect samples from many closely spaced positions, giving
spatial resolution. (Note that the name "TriCam" has little meaning; it is the result of
a sequence of trademark disputes and settlements.)

The format in which a client receives and records scene data is as a point stream:  a set
of points with R, G, B, X, Y, T values. (More accurately, the "point stream" is stored as
blocks which have some extent in space and time, to exploit spatial and especially
temporal coherence in typical scenes.) This matter-centric representation may seem a
suboptimal way to archive the originally light-centric data, but it is the format which
caught on. This representation is then used by viewer software to render views later on;
most applications give control over angle, field of view, and field of time. (More
capable software has never taken hold in the market.) Views are never printed, with the
exception of machines at tourist destinations which will print a 20 cm by 20 cm by 10 cm
wax replica of an Object View for approximately 10 euros.



Tsuyoshi Johnny Kuroki


Cameras will keep getting small with the help of nanotechnology. It
will be possible to make a sheet-like camera, very thin, and a battery
and a wireless interface equipped. This sheet-like camera can be
attached everywhere. In this stage, privacy issues will be growing up,
and use of such ambient cameras may be restricted. They will be
prohibited to use in public space, and just allowed to use in
permitted area.

Cameras will be small enough to be implanted to human, but most people
may not want to implement them. Just eccentric guys, maybe artists and
researchers, take the trouble to do that. For most people,
eyeglass-like cameras may be a substitution. Such cameras will not be
restricted to use.