MAS 964 (H):
Special Topics in Media Technology:
Camera Culture
Spring
2008
What will a camera
look like in 10 years, 20 years?
What will be the
dominant platform and why?
Contributors so
far:
Matt
Hirsch
Cyrus
Wilson
Tsuyoshi
Johnny Kuroki, Canon
(scroll to
individual responses)
Matt Hirsch
Considering the
adoption rates for drastically new technologies, in 10
years cameras
will likely look superficially very much as they do today.
Small changes in
the physical sensors may occur, such as the ability to
image in other
spectra of light, with greater dynamic range. The trend
of increasing
spatial and temporal resolution will continue, as the
availability of
local storage space will continue to grow in capacity
and shrink in
size.
Within 10 years
even cheap digital cameras will be tens of megapixels,
and expensive
cameras could approach gigapixels. Despite the lack of
superficial
differences in the physical devices, the ecology of services
and embedded
processing power surrounding the cameras will be much
increased. There
will be very powerful DSPs and microcontrollers coupled
to image sensors
to perform realtime correction for adverse conditions.
The camera will
do a better job of capturing the scene as you see it, or
even as you
would like to see it, unfazed by low light, camera motion,
glare, high
dynamic range, and rapidly evolving dynamic scenes. All this
will happen
without significant user intervention (customizing settings
on the fly, etc).
The plethora of
web applications focusing on photo and video images will
begin to
leverage the vast databases of stored images they have
accumulated.
Users will be able to upload their images and then view
them in the
context of additional available content. Image sharing sites
will use the
collective content of their users (and the users of other
sites),
appropriately anonymized, to render the scene surrounding
(spatially and
temporally) the images uploaded by a particular user. In
this way your
photos and videos would function as a key in to regions of
space-time
stored digitally. They would be contextualized in a way
previously
unimaginable.
We will have to
be careful that this technology is not misused for
restricting
freedoms, as overeager copyright enforcers try to embed
restrictive
functions into camera firmwares or government spy programs
abuse the
ubiquitous nature of potentially sensitive data.
As far as 20
years from today, the number of camera devices in the
environment will
begin to reach a critical mass, especially in heavily
populated area.
The devices will begin to transition out of the hands of
the users and
into the environment. Rather than pointing a camera and
clicking to take
a photo, you may paint the scene you wish to capture
with a tiny wand
containing an IR laser and terabytes of data storage
capacity.
Cameras in the environment will work collectively to image the
scene from your
perspective, and from other perspectives of interest,
and then
transfer the resulting data to your wand. You could then dial
back from this
point to see the scene at other times in the past,
perhaps
displayed as a three dimensional scene overlaid in your vision
by small laser
projectors that paint the images directly on your retina.
To allow some
degree of privacy, the past scenes may only be rendered
for locations
where you are physically present. This would allow public
spaces to remain
public and private spaces private. It would also give
you the
sensation of walking through past scenes, a sort of "time
travel."
One could also render "imaginary" objects into the real world.
This last bit
would probably be abused for advertising purposes, which
is a scary
prospect.
Cyrus Wilson
The "camera" as
described in 2028:
Although the
names for photographic devices are ultimately derived from the word
"camera", the
traditional form of a dark chamber behind an optical system has been
abandoned
entirely. It put too much of a constraint on the arrangement of sensor
"pixels"
(a term which,
together with "voxels" was eventually replaced by "scexels" in attempt
to
be more general,
but that new term was quickly repurposed to refer to specific
categories
of content) and
limited scene capture to a single viewpoint or very narrow field of
viewpoints. In
other words, the "camera" form was fine for image capture, but limiting
for scene
capture.
Traditional
optics have also been abandoned, as the cost of "camron" production is
kept
low by
fabricating everything on the silicon wafer. The "camron," of course,
is the
sensor unit.
Together, a collection of camrons capture a scene. (It was the
inventor's
intent that the
plural of "camron" be "camra", but that never caught on.)
A camron
contains 5 photosensors, for UV, blue, green, red, and IR. UV is useful
because
there is plenty
of UV light available during the day (thanks to depletion of the ozone
layer) and it
can also provide a bit of a depth cue based on absorption by smog in
the
atmosphere. IR,
on the other hand, is plentiful during the night, as most populated
areas
have IR lighting
for the benefit of vehicles and other motorized devices.
Microlenses
(fabricated
on-chip as part of the process) maximize light gathering ability of the
photosensors.
A pressure
sensor (thanks to pzs-on-chip technology) allows sensing of sound,
though
sound is not
further discussed here. Other circuitry includes the DPU, GPS, a solar
cell
(with energy
storage based on ucap-on-chip technology), and a Wi-Rad transceiver.
Since
all of this
takes up very little room, and does not require any physical wiring to
anything else to
function, hundreds of camrons are fabricated on a wafer, which is then
diced up and can
be sprinkled on any surfaces.
The platform by
which camrons collectively capture a scene is based on "throng
computing."
Throng was developed by a grad student whose research was not getting
anywhere;
therefore he took it upon himself to make it possible for a set of
obsolete
media players
to, together, run the very computationally-intensive game "Rock Idol
Massacre." With
throng, running a massive distributed computing task on relatively weak
agents is simply
a matter of combining enough agents; the computation proceeds even as
agents come and
go due to factors such as proximity (entering or leaving the
area). It
formed the
perfect basis for camron scene capture, after the emergence of Wi-Rad
(Wi-Fi's
successor's
successor), the first wireless technology that could scale to support
throng
computing.
Each camron
determines its position and orientation based on GPS information,
Wi-Rad
signal from
nearby camrons, and sensor observations compared to nearby camrons.
With
their individual
positions and orientations at a given time known, the combined IRGBU
observations of
a throng of camrons are used to reconstruct the rays of light and
occlusions in
the scene.
When a person in
the area wishes to capture the scene, he/she uses a client (which can
run on any
device capable of Wi-Rad, Throng, and ideally GPS) to simply request
scene
data from the
nearby camrons. There are two typical modes of capture: "Subject
View" and
"Object
View." Subject View includes all that could be experienced from
the position of
the person,
looking in all directions. Object View is the view of the person
(in the
context of the
scene), viewed from all directions.
If the person is
in a relatively wide open area, a "Subject View" reconstruction from
the
position of the
"photographer" may suffer from a sparseness of data. For such
applications a
"TriCam" gives improved results. The TriCam is simply a small sphere
(attached to a
short stick for the photographer to hold up in the air) covered with
camrons. The
camrons participate as part of the throng and increase the density of
samples from the
photographer's position. Though the spherical arrangement may seem to
give only
angular and not spatial resolution, the slight movement of the
photographer's
hand allows the
TriCam to collect samples from many closely spaced positions, giving
spatial
resolution. (Note that the name "TriCam" has little meaning; it is the
result of
a sequence of
trademark disputes and settlements.)
The format in
which a client receives and records scene data is as a point
stream: a set
of points with
R, G, B, X, Y, T values. (More accurately, the "point stream" is stored
as
blocks which
have some extent in space and time, to exploit spatial and especially
temporal
coherence in typical scenes.) This matter-centric representation may
seem a
suboptimal way
to archive the originally light-centric data, but it is the format
which
caught on. This
representation is then used by viewer software to render views later
on;
most
applications give control over angle, field of view, and field of time.
(More
capable software
has never taken hold in the market.) Views are never printed, with the
exception of
machines at tourist destinations which will print a 20 cm by 20 cm by
10 cm
wax replica of
an Object View for approximately 10 euros.
Tsuyoshi
Johnny Kuroki
Cameras will
keep getting small with the help of nanotechnology. It
will be possible
to make a sheet-like camera, very thin, and a battery
and a wireless
interface equipped. This sheet-like camera can be
attached
everywhere. In this stage, privacy issues will be growing up,
and use of such
ambient cameras may be restricted. They will be
prohibited to
use in public space, and just allowed to use in
permitted area.
Cameras will be
small enough to be implanted to human, but most people
may not want to
implement them. Just eccentric guys, maybe artists and
researchers,
take the trouble to do that. For most people,
eyeglass-like
cameras may be a substitution. Such cameras will not be
restricted to
use.