Home / Issues / 3.2006
Document Actions


Up one level
  1. 2007-01-02

    Content Classification of Multimedia Documents using Partitions of Low-Level Features

    Audio-visual documents obtained from German TV news are classified according to the IPTC topic categorization scheme. To this end usual text classification techniques are adapted to speech, video, and non-speech audio. For each of the three modalities word analogues are generated: sequences of syllables for speech, “video words” based on low level color features (color moments, color correlogram and color wavelet), and “audio words” based on low-level spectral features (spectral envelope and spectral flatness) for non-speech audio. Such audio and video words provide a means to represent the different modalities in a uniform way. The frequencies of the word analogues represent audio-visual documents: the standard bag-of-words approach. Support vector machines are used for supervised classification in a 1 vs. n setting. Classification based on speech outperforms all other single modalities. Combining speech with non-speech audio improves classification. Classification is further improved by supplementing speech and non-speech audio with video words. Optimal F-scores range between 62% and 94% corresponding to 50% - 84% above chance. The optimal combination of modalities depends on the category to be recognized. The construction of audio and video words from low-level features provide a good basis for the integration of speech, non-speech audio and video.

    JVRB, 3(2006), no. 6.

GI VR/AR 2005
  1. 2006-08-23

    Precise Near-to-Head Acoustics with Binaural Synthesis

    For enhanced immersion into a virtual scene more than just the visual sense should be addressed by a Virtual Reality system. Additional auditory stimulation appears to have much potential, as it realizes a multisensory system. This is especially useful when the user does not have to wear any additional hardware, e.g., headphones. Creating a virtual sound scene with spatially distributed sources requires a technique for adding spatial cues to audio signals and an appropriate reproduction. In this paper we present a real-time audio rendering system that combines dynamic crosstalk cancellation and multi-track binaural synthesis for virtual acoustical imaging. This provides the possibility of simulating spatially distributed sources and, in addition to that, near-to-head sources for a freely moving listener in room-mounted virtual environments without using any headphones. A special focus will be put on near-to-head acoustics, and requirements in respect of the head-related transfer function databases are discussed.

    JVRB, 3(2006), no. 2.

  2. 2006-08-11

    Automatic Data Normalization and Parameterization for Optical Motion Tracking

    Methods for optical motion capture often require timeconsuming manual processing before the data can be used for subsequent tasks such as retargeting or character animation. These processing steps restrict the applicability of motion capturing especially for dynamic VR-environments with real time requirements. To solve these problems, we present two additional, fast and automatic processing stages based on our motion capture pipeline presented in [HSK05]. A normalization step aligns the recorded coordinate systems with the skeleton structure to yield a common and intuitive data basis across different recording sessions. A second step computes a parameterization based on automatically extracted main movement axes to generate a compact motion description. Our method does not restrict the placement of marker bodies nor the recording setup, and only requires a short calibration phase.

    JVRB, 3(2006), no. 3.

PerGames 2005
  1. 2006-04-11

    Playing with the Real World

    In this paper we provide a framework that enables the rapid development of applications using non-standard input devices. Flash is chosen as programming language since it can be used for quickly assembling applications. We overcome the difficulties of Flash to access external devices by introducing a very generic concept: The state information generated by input devices is transferred to a PC where a program collects them, interprets them and makes them available on a web server. Application developers can now integrate a Flash component that accesses the data stored in XML format and directly use it in their application.

    JVRB, 3(2006), no. 1.

EuroITV 2006
  1. 2007-05-15

    Video Search: New Challenges in the Pervasive Digital Video Era

    The explosion of multimedia digital content and the development of technologies that go beyond traditional broadcast and TV have rendered access to such content important for all end-users of these technologies. While originally developed for providing access to multimedia digital libraries, video search technologies assume now a more demanding role. In this paper, we attempt to shed light onto this new role of video search technologies, looking at the rapid developments in the related market, the lessons learned from state of art video search prototypes developed mainly in the digital libraries context and the new technological challenges that have risen. We focus on one of the latter, i.e., the development of cross-media decision mechanisms, drawing examples from REVEAL THIS, an FP6 project on the retrieval of video and language for the home user. We argue, that efficient video search holds a key to the usability of the new ”pervasive digital video” technologies and that it should involve cross-media decision mechanisms.

    JVRB, 3(2006), no. 11.

  2. 2006-12-22

    Digital Illumination for Augmented Studios

    Virtual studio technology plays an important role for modern television productions. Blue-screen matting is a common technique for integrating real actors or moderators into computer generated sceneries. Augmented reality offers the possibility to mix real and virtual in a more general context. This article proposes a new technological approach for combining real studio content with computergenerated information. Digital light projection allows a controlled spatial, temporal, chrominance and luminance modulation of illumination – opening new possibilities for TV studios.

    JVRB, 3(2006), no. 8.

  3. 2006-11-23

    An Architecture for End-User TV Content Enrichment

    This paper proposes an extension to the televisionwatching paradigm that permits an end-user to enrich broadcast content. Examples of this enriched content are: virtual edits that allow the order of presentation within the content to be changed or that allow the content to be subsetted; conditional text, graphic or video objects that can be placed to appear within content and triggered by viewer interaction; additional navigation links that can be added to structure how other users view the base content object. The enriched content can be viewed directly within the context of the TV viewing experience. It may also be shared with other users within a distributed peer group. Our architecture is based on a model that allows the original content to remain unaltered, and which respects DRM restrictions on content reuse. The fundamental approach we use is to define an intermediate content enhancement layer that is based on the W3C’s SMIL language. Using a pen-based enhancement interface, end-users can manipulate content that is saved in a home PDR setting. This paper describes our architecture and it provides several examples of how our system handles content enhancement. We also describe a reference implementation for creating and viewing enhancements.

    JVRB, 3(2006), no. 9.

  4. 2006-01-03

    MHP Oriented Interactive Augmented Reality System for Sports Broadcasting Environments

    Television and movie images have been altered ever since it was technically possible. Nowadays embedding advertisements, or incorporating text and graphics in TV scenes, are common practice, but they can not be considered as integrated part of the scene. The introduction of new services for interactive augmented television is discussed in this paper. We analyse the main aspects related with the whole chain of augmented reality production. Interactivity is one of the most important added values of the digital television: This paper aims to break the model where all TV viewers receive the same final image. Thus, we introduce and discuss the new concept of interactive augmented television, i. e. real time composition of video and computer graphics - e.g. a real scene and freely selectable images or spatial rendered objects - edited and customized by the end user within the context of the user's set top box and TV receiver.

    JVRB, 3(2006), no. 13.

GRAPP 2006
  1. 2007-07-25

    High level methods for scene exploration

    Virtual worlds exploration techniques are used in a wide variety of domains — from graph drawing to robot motion. This paper is dedicated to virtual world exploration techniques which have to help a human being to understand a 3D scene. An improved method of viewpoint quality estimation is presented in the paper, together with a new off-line method for automatic 3D scene exploration, based on a virtual camera. The automatic exploration method is working in two steps. In the first step, a set of “good” viewpoints is computed. The second step uses this set of points of view to compute a camera path around the scene. Finally, we define a notion of semantic distance between objects of the scene to improve the approach.

    JVRB, 3(2006), no. 12.

  2. 2007-01-24

    Exploring Urban Environments Using Virtual and Augmented Reality

    In this paper, we propose the use of specific system architecture, based on mobile device, for navigation in urban environments. The aim of this work is to assess how virtual and augmented reality interface paradigms can provide enhanced location based services using real-time techniques in the context of these two different technologies. The virtual reality interface is based on faithful graphical representation of the localities of interest, coupled with sensory information on the location and orientation of the user, while the augmented reality interface uses computer vision techniques to capture patterns from the real environment and overlay additional way-finding information, aligned with real imagery, in real-time. The knowledge obtained from the evaluation of the virtual reality navigational experience has been used to inform the design of the augmented reality interface. Initial results of the user testing of the experimental augmented reality system for navigation are presented.

    JVRB, 3(2006), no. 5.

  3. 2007-01-10

    View-Dependent Extraction of Contours with Distance Transforms for adaptive polygonal Mesh-Simplification

    During decades Distance Transforms have proven to be useful for many image processing applications, and more recently, they have started to be used in computer graphics environments. The goal of this paper is to propose a new technique based on Distance Transforms for detecting mesh elements which are close to the objects' external contour (from a given point of view), and using this information for weighting the approximation error which will be tolerated during the mesh simplification process. The obtained results are evaluated in two ways: visually and using an objective metric that measures the geometrical difference between two polygonal meshes.

    JVRB, 3(2006), no. 4.

  4. 2007-01-03

    System Architecture of a Mixed Reality Framework

    In this paper the software architecture of a framework which simplifies the development of applications in the area of Virtual and Augmented Reality is presented. It is based on VRML/X3D to enable rendering of audio-visual information. We extended our VRML rendering system by a device management system that is based on the concept of a data-flow graph. The aim of the system is to create Mixed Reality (MR) applications simply by plugging together small prefabricated software components, instead of compiling monolithic C++ applications. The flexibility and the advantages of the presented framework are explained on the basis of an exemplary implementation of a classic Augmented Realityapplication and its extension to a collaborative remote expert scenario.

    JVRB, 3(2006), no. 7.

  5. 2006-12-13

    Lag Camera: A Moving Multi-Camera Array for Scence-Acquisition

    Many applications, such as telepresence, virtual reality, and interactive walkthroughs, require a three-dimensional(3D)model of real-world environments. Methods, such as lightfields, geometric reconstruction and computer vision use cameras to acquire visual samples of the environment and construct a model. Unfortunately, obtaining models of real-world locations is a challenging task. In particular, important environments are often actively in use, containing moving objects, such as people entering and leaving the scene. The methods previously listed have difficulty in capturing the color and structure of the environment while in the presence of moving and temporary occluders. We describe a class of cameras called lag cameras. The main concept is to generalize a camera to take samples over space and time. Such a camera, can easily and interactively detect moving objects while continuously moving through the environment. Moreover, since both the lag camera and occluder are moving, the scene behind the occluder is captured by the lag camera even from viewpoints where the occluder lies in between the lag camera and the hidden scene. We demonstrate an implementation of a lag camera, complete with analysis and captured environments.

    JVRB, 3(2006), no. 10.