Summary, Camera for image ‘search’

Summarized by Lav Varshney

How can we augment the camera to support best 'image search'?

Brute force won’t work

by analogy to speech processing
error-tolerant subgraph isomorphism problem is NP-hard

Feedback

computer-vision directed
equivalent to communication with feedback (twenty questions)
suggested tagging (like list decoding in communications)

Active sensing

active illumination, nanoprojectors
radar or sonar

Active scenes

RFID, hidden messages, beacons

Multimodal

location, attention, sound, smell, temperature (correlated sources)
process capture, camera settings

Social

events database
trip itinerary
collaborative tagging
photographer/photographee identification + social network

Other comments on camera for image search

Photosynth project (http://labs.live.com/photosynth/)
Goal detection in soccer video simply by using audio

How should a query be specified?

- exemplar

- text

- sketch (Tom mentions ImageScape: http://skynet.liacs.nl/imagescape/)

- histogram

- features, relative shape

Overall, several different methods like the ones suggested by psychologists for human category learning (see e.g. F. Gregory Ashby and W. Todd Maddox, “Human Category Learning,” Annual Review of Psychology, vol. 56, pp. 149– 178, Feb. 2005.)

Fredo: Is it better to have the user spend time at indexing time or at querying time (probably querying time, since the user will be motivated and willing)
Tom: relevance feedback might be a useful tool
Femtochemistry to identify materials: ‘ingredients of pictures’

- nonlinear optics

- cancer from breath

Is the standard query-answer model appropriate (see also Other big problems)

Other big problems

Quinn: sometimes a picture does not have the correct lighting or pose for composition. Object-based rather than pixel-based editing would help this

Tilke: this is actually a problem in image search, since goal is to find an ε-perturbation of the image

Ramesh: Bill Freeman has work on generalized viewing

Fredo: Not just a single query, but organization of photos, browsing, etc.

- navigating images, topology of intuitive space

Group photos in public spaces: smile and the picture is to you in the mail automatically.

Sylvain: surveillance, smiling picture sent to you; crime photo sent to police

What are your questions about camera/technology/society?

Ramesh: pn junctions and diodes: what are they or how can they be used in fancy electronics.

Other courses

Eugene: Art & Photography course is very artsy

ε-photography

Name from analogy with ε-geometry, a branch of computational geometry. Intent is to have robustness in estimate of pixel value with respect to changes in exposure time, etc.

A basic result there is that Voronoi partitions are very sensitive to small perturbations in the data. In Bill Freeman’s generalized viewing, e.g. of the leaning tower of Pisa, small perturbations also cause significant changes to the scene.

Ramesh and Fredo disagreed as to whether the analogy is valid or whether the sense of ε in ε-photography is actually related to continuity rather than large changes caused by small changes.

What aspect of human eye are critical/ useless?

Tilke: focus

Tom: illusions, afterimages

Lav: feedback with brain, a la “what the frog's eye tells the frog's brain” but more so “what the brain tells the eye.”

More than stereo, multispectral, etc.

Cyrus: illumination – an animal produces light so as not to cast shadow on prey being pursued

Liquid Lens

Fredo: Philips, etc. also work with liquid lenses.

“Origami Lens”

½ cm in thickness, but same quality as a normal 35mm camera with good thick optics.

Like a telescope, shallow depth of field

Coded photography

Quinn: (midlevel cues): show different polarizations through optical illusions

Bill Freeman has motion without movement, adding optical flow to represent motion.

Windows DreamScene (http://en.wikipedia.org/wiki/Windows_DreamScene) allows a gentle breeze of wavy reflections based on prior statistics

Fredo: multiperspective photography

Fredo/Eugene: Beyond the flash-synchro limit, rolling shutter leads to different parts of the image being exposed at different times.

Ramesh: Camera obscura (room), “in camera” legalese

Essence Photography

Perceptual prosthetics, combinations of physics, algorithms, etc.

Matt: people like that photos are “honest.” Essence photography is inherently biased.

Encoding for several different fidelity criteria simultaneously

Visual Social Computing

Flea market has selling etc., but eBay allows quick information dispersion.

Social computing began before electronic computers. Parallel processing implemented when computers were human (see e.g. David Alan Grier, When Computers Were Human, Princeton University Press, 2005.). Similar parallel processing for visual tasks is visual social computing.

Doug: hate is promoted by text messages in Kenya. Similar in Philippines.

Ramesh: clearly one can create distrust, but is there a way to create trust?