Intuitive Interface

The Intuitive Interface is a research project that investigates new methods
of interaction with the computer by freeing the computer from the non-flexible
desktop setup and integrating the interface in the every-day physical environment
of the user [8]. We focus on creative people, as users, who are often unfamiliar
with computers and hesitate before getting involved with mouse and keyboard,
because of the symbolic and alphanumeric abstraction that is necessary. The
system makes strong use of body movements to trigger commands and real space
in front of a rear projection screen to store and retrieve information (Figure
1). This so called memory-function is based on a technique called "Ars
Memoriae" which was used by orators in antiquity to memorize long speeches
[15]. The orator would create icons of the subjects to be memorized and place
them at chosen locations in an imaginary architecture. The icons are used as
links to the information and as the orator (in his mind) walks to the memorized
locations he is able to retrieve the icons and with them the information. Our
scenario uses this metaphor by allowing the user to put entities at chosen locations
in real space and to retrieve these icons and the linked information at a later
time. A stereo computer vision system is used for sensing user posture and simple
gestures like pointing, together with a speech recognition software that is
used to trigger commands. The vision system is based on color segmentation and
blob analysis.
Pointing and Selecting
A pointing gesture in front of a projection screen differs in many ways from pointing with the mouse on common desktop systems. When pointing with the mouse the achieved accuracy lies within a few pixels. This is due to the fact that visual feedback provides enough information for hand-eye coordination.
|
figure 1: scenario of the Intuitive Interface |
Pointing to objects shown on a projection screen is a different task than traditional mouse pointing. It is more related to pointing like gestures in social communication. In face-to-face conversation, for example, humans frequently use deictic gestures (e.g., the index finger points at something) parallel to verbal descriptions for referent identification. Unlike the usual semantics of mouse clicks in direct manipulation environments, in human conversation the region at which the user points is not necessarily identical with the region to which he or she intends to refer [11]. Natural pointing behavior is often ambiguous or vague. Therefore, an desktop application cannot directly be converted to an application in our environment, interaction paradigms that work well for the desktop metaphor might not work when using gesture input and a projection screen. |
Interaction
Using the interaction paradigms of the Intuitive Interface, we created part of an interactive film planing system. Currently, a generic scene can be arranged, i.e. objects can be moved around, rotated, or deleted, and new objects can be added to the scene. Figure 2 shows a configuration of the example scene with 3 figures and 2 objects that have been arranged.

Figure 2: example scene of the prototype system
Selecting objects. We used two methods for selecting objects: a speech command and an auto-click operation when pointing to the same location for a while. The speech command worked well and is intuitive to use. However, while testing we often find it annoying to repeat the command again and again. The second approach works without a special command and is based on the fact that humans not tend to rest at the same location for a long time while pointing. By pointing to the same location for more than a second, the object becomes selected, i.e. the system locks onto the object. Releasing the object is performed in the same way. We found this technique easy to use and suggest that it should be combined with a speech command as an alternative.
Context switching. Context switching has to be performed when the user wants to add objects to the current scene. We use a selection area and combine this technique with a stage like metaphor, i.e. while the user is acting in the front part of the real space, the scene like figure 3 is shown on the screen. Here, objects may be moved around or manipulated. For adding objects to the scene the user moves backward in real space. The scene zooms out, revealing a stage kind of setup that surrounds the previously shown scene (see figure 4). Here the user can select from various requisites like chairs, tables etc.


Figure 3 and 4: scene zooms back if user moves back in real space, revealing a stage setup for adding items to the scene
To allow a greater selection, the objects are divided up into several groups shown on a conveyor belt at the bottom of the scene shown in figure 4. Each group is placed at a different room location in real space. The user can choose from these objects by moving to the specific location, which will move the conveyor belt to the appropriate position, and pointing to the group. Thereafter, the group of items will be moved on to the stage and the user is able to select an appropriate item. Figure 5 shows a group of persons with one being moved into the scene already.

Figure 5: activated group shown on stage
Linking real objects to virtual.
In addition to tracking the user, we also track two wooden poles that are
marked by colored signs to facilitate tracking. The poles can be linked to virtual
objects by issuing a speech command. Thereafter, the virtual objects will be
moved according to the modification of the poles. This allows the arrangement
to be evaluated in real space. It also allows a haptic feedback because of the
weight of the poles.
Performance
The image processing system, as well as the speech recognition software and the film planning application program, currently run on a 200 MHz Pentium/Pro PC with two Matrox Meteor frame grabber boards at about 12 frames per second (the image processing itself runs at about 20 fps). The current prototype uses Macromedias Director for the film planning application. Using a second PC to run the application program we get a frame rate of about 16 fps. For moving the cursor on the projection screen we found that 12 fps is the minimum frame rate, but working at a rate of 16 fps feels far more comfortable.
Most of the problems we had with our system currently stem from the jittering of the pointing position. This is due to the simple segmentation algorithm that cannot compensate for all shadowing situations and is sensible to hand movements of the user. We are currently working on an improvement of the segmentation using a look-uptable that is determined in a calibration phase. Another problem is the green marker the user has to wear. The user has to face the cameras all the time, which sometimes results in drop-outs of data when the marker is not visible to both cameras.
First results of using the system show that the use of real space and the
memory-function are easy to learn and intuitive. We found that the user must
get used to the deictic gestures for moving objects around. Using real objects,
on the other hand, was easy to use and faster than pointing and selecting. To
get further insight, we currently investigate the needs of directors for such
a film planning system in an empirical study.
Video: Netscape Application of the Intuitive Interface
Video: A Prototype System for Intuitive Film Planning