Home / Issues
Document Actions


Up one level
  1. 2018-05-25

    Audiovisual perception of real and virtual rooms

    Virtual environments utilized in experimental perception research are normally required to provide rich physical cues if they are to yield externally valid perceptual results. We investigated the perceptual difference between a real environment and a virtual environment under optical, acoustic, and optoacoustic conditions by conducting a 2 x 3 mixed design, with environment as a between-subjects factor and domain as a within-subjects factor. The dependent variables comprised auditory, visual, and audiovisual features including geometric estimates, aesthetic judgments, and sense of spatial presence. The real environment consisted of four visible loudspeakers in a small concert hall, playing back an anechoic multichannel recording of a string quartet. In the virtual environment, deemed the Virtual Concert Hall, the scene was reproduced three-dimensionally by applying dynamic binaural synthesis and stereoscopic projection on a 160° cylindrical screen. Most unimodal features were rated almost equally across the environments under both the optical/acoustic and the optoacoustic conditions. Estimates of geometric dimensions were lower (though not necessarily less accurate) in the virtual than in the real environment. Aesthetic features were rated almost equally across the environments under the acoustic condition, but not the optical, and similarly under the optoacoustic condition. Further results indicate that unimodal features of room perception might be subject to cognitive reconstruction due to both information acquired from another stimulus domain and abstract experiential knowledge of rooms. In conclusion, the validity of the Virtual Concert Hall for certain experimental applications is discussed.

    JVRB, 14(2017), no. 5.

  2. 2016-08-09

    Real-time depth camera tracking with CAD models and ICP

    In recent years, depth cameras have been widely utilized in camera tracking for augmented and mixed reality. Many of the studies focus on the methods that generate the reference model simultaneously with the tracking and allow operation in unprepared environments. However, methods that rely on predefined CAD models have their advantages. In such methods, the measurement errors are not accumulated to the model, they are tolerant to inaccurate initialization, and the tracking is always performed directly in reference model's coordinate system. In this paper, we present a method for tracking a depth camera with existing CAD models and the Iterative Closest Point (ICP) algorithm. In our approach, we render the CAD model using the latest pose estimate and construct a point cloud from the corresponding depth map. We construct another point cloud from currently captured depth frame, and find the incremental change in the camera pose by aligning the point clouds. We utilize a GPGPU-based implementation of the ICP which efficiently uses all the depth data in the process. The method runs in real-time, it is robust for outliers, and it does not require any preprocessing of the CAD models. We evaluated the approach using the Kinect depth sensor, and compared the results to a 2D edge-based method, to a depth-based SLAM method, and to the ground truth. The results show that the approach is more stable compared to the edge-based method and it suffers less from drift compared to the depth-based SLAM.

    JVRB, 13(2016), no. 1.

  3. 2014-12-01

    Estimating Gesture Accuracy in Motion-Based Health Games

    This manuscript details a technique for estimating gesture accuracy within the context of motion-based health video games using the MICROSOFT KINECT. We created a physical therapy game that requires players to imitate clinically significant reference gestures. Player performance is represented by the degree of similarity between the performed and reference gestures and is quantified by collecting the Euler angles of the player's gestures, converting them to a three-dimensional vector, and comparing the magnitude between the vectors. Lower difference values represent greater gestural correspondence and therefore greater player performance. A group of thirty-one subjects was tested. Subjects achieved gestural correspondence sufficient to complete the game's objectives while also improving their ability to perform reference gestures accurately.

    JVRB, 11(2014), no. 8.

  4. 2014-01-14

    Investigations into Velocity and Distance Perception Based on Different Types of Moving Sound Sources with Respect to Auditory Virtual Environments

    The characteristics of moving sound sources have strong implications on the listener's distance perception and the estimation of velocity. Modifications of the typical sound emissions as they are currently occurring due to the tendency towards electromobility have an impact on the pedestrian's safety in road traffic. Thus, investigations of the relevant cues for velocity and distance perception of moving sound sources are not only of interest for the psychoacoustic community, but also for several applications, like e.g. virtual reality, noise pollution and safety aspects of road traffic. This article describes a series of psychoacoustic experiments in this field. Dichotic and diotic stimuli of a set of real-life recordings taken from a passing passenger car and a motorcycle were presented to test subjects who in turn were asked to determine the velocity of the object and its minimal distance from the listener. The results of these psychoacoustic experiments show that the estimated velocity is strongly linked to the object's distance. Furthermore, it could be shown that binaural cues contribute significantly to the perception of velocity. In a further experiment, it was shown that - independently of the type of the vehicle - the main parameter for distance determination is the maximum sound pressure level at the listener's position. The article suggests a system architecture for the adequate consideration of moving sound sources in virtual auditory environments. Virtual environments can thus be used to investigate the influence of new vehicle powertrain concepts and the related sound emissions of these vehicles on the pedestrians' ability to estimate the distance and velocity of moving objects.

    JVRB, 10(2013), no. 4.

  5. 2013-12-20

    Impact Study of Nonverbal Facial Cues on Spontaneous Chatting with Virtual Humans

    Non-verbal communication (NVC) is considered to represent more than 90 percent of everyday communication. In virtual world, this important aspect of interaction between virtual humans (VH) is strongly neglected. This paper presents a user-test study to demonstrate the impact of automatically generated graphics-based NVC expression on the dialog quality: first, we wanted to compare impassive and emotion facial expression simulation for impact on the chatting. Second, we wanted to see whether people like chatting within a 3D graphical environment. Our model only proposes facial expressions and head movements induced from spontaneous chatting between VHs. Only subtle facial expressions are being used as nonverbal cues - i.e. related to the emotional model. Motion capture animations related to hand gestures, such as cleaning glasses, were randomly used to make the virtual human lively. After briefly introducing the technical architecture of the 3D-chatting system, we focus on two aspects of chatting through VHs. First, what is the influence of facial expressions that are induced from text dialog? For this purpose, we exploited an emotion engine extracting an emotional content from a text and depicting it into a virtual character developed previously [GAS11]. Second, as our goal was not addressing automatic generation of text, we compared the impact of nonverbal cues in conversation with a chatbot or with a human operator with a wizard of oz approach. Among main results, the within group study -involving 40 subjects- suggests that subtle facial expressions impact significantly not only on the quality of experience but also on dialog understanding.

    JVRB, 10(2013), no. 6.

  6. 2010-10-28

    3-D Audio in Mobile Communication Devices: Effects of Self-Created and External Sounds on Presence in Auditory Virtual Environments

    This article describes a series of experiments which were carried out to measure the sense of presence in auditory virtual environments. Within the study a comparison of self-created signals to signals created by the surrounding environment is drawn. Furthermore, it is investigated if the room characteristics of the simulated environment have consequences on the perception of presence during vocalization or when listening to speech. Finally the experiments give information about the influence of background signals on the sense of presence. In the experiments subjects rated the degree of perceived presence in an auditory virtual environment on a perceptual scale. It is described which parameters have the most influence on the perception of presence and which ones are of minor influence. The results show that on the one hand an external speaker has more influence on the sense of presence than an adequate presentation of one’s own voice. On the other hand both room reflections and adequately presented background signals significantly increase the perceived presence in the virtual environment.

    JVRB, 7(2010), no. 11.

  7. 2008-02-29

    Presence in a Three-Dimensional Test Environment: Benefit or Threat to Market Research?

    In market research, the adoption of interactive virtual reality-techniques could be expected to contain many advantages: artificial lab environments could be designed in a more realistic manner and the consideration of “time to the market”-factors could be improved. On the other hand, with an increasing degree of presence and the notional attendance in a simulated test environment, the market research task could fall prey to the tensing virtual reality adventure. In the following study a 3D-technique is empirically tested for its usability in market research. It will be shown that the interactive 3D-simulation is not biased by the immersion it generates and provides considerably better test results than 2D-stimuli do.

    JVRB, 5(2008), no. 1.

  8. 2007-10-11

    3-D Audio in Mobile Communication Devices: Methods for Mobile Head-Tracking

    Future generations of mobile communication devices will serve more and more as multimedia platforms capable of reproducing high quality audio. In order to achieve a 3-D sound perception the reproduction quality of audio via headphones can be significantly increased by applying binaural technology. To be independent of individual head-related transfer functions (HRTFs) and to guarantee a good performance for all listeners, an adaptation of the synthesized sound field to the listener's head movements is required. In this article several methods of head-tracking for mobile communication devices are presented and compared. A system for testing the identified methods is set up and experiments are performed to evaluate the prosand cons of each method. The implementation of such a device in a 3-D audio system is described and applications making use of such a system are identified and discussed.

    JVRB, 4(2007), no. 13.

  9. 2007-01-02

    Content Classification of Multimedia Documents using Partitions of Low-Level Features

    Audio-visual documents obtained from German TV news are classified according to the IPTC topic categorization scheme. To this end usual text classification techniques are adapted to speech, video, and non-speech audio. For each of the three modalities word analogues are generated: sequences of syllables for speech, “video words” based on low level color features (color moments, color correlogram and color wavelet), and “audio words” based on low-level spectral features (spectral envelope and spectral flatness) for non-speech audio. Such audio and video words provide a means to represent the different modalities in a uniform way. The frequencies of the word analogues represent audio-visual documents: the standard bag-of-words approach. Support vector machines are used for supervised classification in a 1 vs. n setting. Classification based on speech outperforms all other single modalities. Combining speech with non-speech audio improves classification. Classification is further improved by supplementing speech and non-speech audio with video words. Optimal F-scores range between 62% and 94% corresponding to 50% - 84% above chance. The optimal combination of modalities depends on the category to be recognized. The construction of audio and video words from low-level features provide a good basis for the integration of speech, non-speech audio and video.

    JVRB, 3(2006), no. 6.

  10. 2005-12-08

    Interactive Ray Tracing for Virtual TV Studio Applications

    In the last years, the well known ray tracing algorithm gained new popularity with the introduction of interactive ray tracing methods. The high modularity and the ability to produce highly realistic images make ray tracing an attractive alternative to raster graphics hardware. Interactive ray tracing also proved its potential in the field of Mixed Reality rendering and provides novel methods for seamless integration of real and virtual content. Actor insertion methods, a subdomain of Mixed Reality and closely related to virtual television studio techniques, can use ray tracing for achieving high output quality in conjunction with appropriate visual cues like shadows and reflections at interactive frame rates. In this paper, we show how interactive ray tracing techniques can provide new ways of implementing virtual studio applications.

    JVRB, 2(2005), no. 1.

HC 2004
  1. 2004-12-13

    ARTHUR: A Collaborative Augmented Environment for Architectural Design and Urban Planning

    Projects in the area of architectural design and urban planning typically engage several architects as well as experts from other professions. While the design and review meetings thus often involve a large number of cooperating participants, the actual design is still done by the individuals in the time in between those meetings using desktop PCs and CAD applications. A real collaborative approach to architectural design and urban planning is often limited to early paper-based sketches.In order to overcome these limitations, we designed and realized the ARTHUR system, an Augmented Reality (AR) enhanced round table to support complex design and planning decisions for architects. WhileAR has been applied to this area earlier, our approach does not try to replace the use of CAD systems but rather integrates them seamlessly into the collaborative AR environment. The approach is enhanced by intuitiveinteraction mechanisms that can be easily con-figured for different application scenarios.

    JVRB, 1(2004), no. 1.

GI VR/AR 2005
  1. 2006-08-23

    Precise Near-to-Head Acoustics with Binaural Synthesis

    For enhanced immersion into a virtual scene more than just the visual sense should be addressed by a Virtual Reality system. Additional auditory stimulation appears to have much potential, as it realizes a multisensory system. This is especially useful when the user does not have to wear any additional hardware, e.g., headphones. Creating a virtual sound scene with spatially distributed sources requires a technique for adding spatial cues to audio signals and an appropriate reproduction. In this paper we present a real-time audio rendering system that combines dynamic crosstalk cancellation and multi-track binaural synthesis for virtual acoustical imaging. This provides the possibility of simulating spatially distributed sources and, in addition to that, near-to-head sources for a freely moving listener in room-mounted virtual environments without using any headphones. A special focus will be put on near-to-head acoustics, and requirements in respect of the head-related transfer function databases are discussed.

    JVRB, 3(2006), no. 2.

  2. 2006-08-11

    Automatic Data Normalization and Parameterization for Optical Motion Tracking

    Methods for optical motion capture often require timeconsuming manual processing before the data can be used for subsequent tasks such as retargeting or character animation. These processing steps restrict the applicability of motion capturing especially for dynamic VR-environments with real time requirements. To solve these problems, we present two additional, fast and automatic processing stages based on our motion capture pipeline presented in [HSK05]. A normalization step aligns the recorded coordinate systems with the skeleton structure to yield a common and intuitive data basis across different recording sessions. A second step computes a parameterization based on automatically extracted main movement axes to generate a compact motion description. Our method does not restrict the placement of marker bodies nor the recording setup, and only requires a short calibration phase.

    JVRB, 3(2006), no. 3.

PerGames 2005
  1. 2006-04-11

    Playing with the Real World

    In this paper we provide a framework that enables the rapid development of applications using non-standard input devices. Flash is chosen as programming language since it can be used for quickly assembling applications. We overcome the difficulties of Flash to access external devices by introducing a very generic concept: The state information generated by input devices is transferred to a PC where a program collects them, interprets them and makes them available on a web server. Application developers can now integrate a Flash component that accesses the data stored in XML format and directly use it in their application.

    JVRB, 3(2006), no. 1.

ACE 2006
  1. 2007-04-25

    Why Death Matters: Understanding Gameworld Experience

    This article presents a study of the staging and implementation of death and the death penalty in a number of popular MMOGs and relates it to players general experience of gameworlds. Game mechanics, writings and stories by designers and players, and the results of an online survey are analysed and discussed. The study shows that the death penalty is implemented much in the same way across worlds; that death can be both trivial and non-trivial, part of the grind of everyday life, or essential in the creation of heroes, depending on context. In whatever function death may serves, it is argued that death plays an important part in the shaping and emergence of the social culture of a world, and in the individual players experience of life within it.

    JVRB, 4(2007), no. 3.

EuroITV 2006
  1. 2007-07-31

    Methods and Applications in Interactive Broadcasting

    Interactive TV technology has been addressed in many previous works, but there is sparse research on the topic of interactive content broadcasting and how to support the production process. In this article, the interactive broadcasting process is broadly defined to include studio technology and digital TV applications at consumer set-top boxes. In particular, augmented reality studio technology employs smart-projectors as light sources and blends real scenes with interactive computer graphics that are controlled at end-user terminals. Moreover, TV producer-friendly multimedia authoring tools empower the development of novel TV formats. Finally, the support for user-contributed content raises the potential to revolutionize the hierarchical TV production process, by introducing the viewer as part of content delivery chain.

    JVRB, 4(2007), no. 19.

  2. 2007-07-24

    Exploiting OSGi capabilities from MHP applications

    In this paper we introduce a cooperative environment between the Interactive Digital TV (IDTV) and home networking with the aim of allowing the interaction between interactive TV applications and the controllers of the in-home appliances in a natural way. More specifically, our proposal consists of merging MHP (Multimedia Home Platform), one of the main standard frameworks for IDTV, with OSGi (Open Service Gateway Initiative), the most widely used open platform to set up Residential Gateways. To overcome the radically different nature of these specifications the function-oriented MHP middleware and the service-oriented OSGi framework , we define a new kind of application, coined as XbundLET. Although this software bridge is suitable to enable the interaction between MHP and OSGi applications in both directions, we concretely focus on exposing our implementation experience in only one direction: from MHP to the OSGi world.

    JVRB, 4(2007), no. 16.

  3. 2007-07-04

    Semi-Automated Creation of Converged iTV Services: From Macromedia Director Simulations to Services Ready for Broadcast

    While sound and video may capture viewers' attention, interaction can captivate them. This has not been available prior to the advent of Digital Television. In fact, what lies at the heart of the Digital Television revolution is this new type of interactive content, offered in the form of interactive Television (iTV) services. On top of that, the new world of converged networks has created a demand for a new type of converged services on a range of mobile terminals (Tablet PCs, PDAs and mobile phones). This paper aims at presenting a new approach to service creation that allows for the semi-automatic translation of simulations and rapid prototypes created in the accessible desktop multimedia authoring package Macromedia Director into services ready for broadcast. This is achieved by a series of tools that de-skill and speed-up the process of creating digital TV user interfaces (UI) and applications for mobile terminals. The benefits of rapid prototyping are essential for the production of these new types of services, and are therefore discussed in the first section of this paper. In the following sections, an overview of the operation of content, service, creation and management sub-systems is presented, which illustrates why these tools compose an important and integral part of a system responsible of creating, delivering and managing converged broadcast and telecommunications services. The next section examines a number of metadata languages candidates for describing the iTV services user interface and the schema language adopted in this project. A detailed description of the operation of the two tools is provided to offer an insight of how they can be used to de-skill and speed-up the process of creating digital TV user interfaces and applications for mobile terminals. Finally, representative broadcast oriented and telecommunication oriented converged service components are also introduced, demonstrating how these tools have been used to generate different types of services.

    JVRB, 4(2007), no. 17.

  4. 2007-05-22

    Video Composer and Live Video Conductor: Future Professions for the Interactive Digital Broadcasting Industry

    Innovations in hardware and network technologies lead to an exploding number of non-interrelated parallel media streams. Per se this does not mean any additional value for consumers. Broadcasting and advertisement industries have not yet found new formats to reach the individual user with their content. In this work we propose and describe a novel digital broadcasting framework, which allows for the live staging of (mass) media events and improved consumer personalisation. In addition new professions for future TV production workflows which will emerge are described, namely the 'video composer' and the 'live video conductor'.

    JVRB, 4(2007), no. 10.

  5. 2007-05-15

    Video Search: New Challenges in the Pervasive Digital Video Era

    The explosion of multimedia digital content and the development of technologies that go beyond traditional broadcast and TV have rendered access to such content important for all end-users of these technologies. While originally developed for providing access to multimedia digital libraries, video search technologies assume now a more demanding role. In this paper, we attempt to shed light onto this new role of video search technologies, looking at the rapid developments in the related market, the lessons learned from state of art video search prototypes developed mainly in the digital libraries context and the new technological challenges that have risen. We focus on one of the latter, i.e., the development of cross-media decision mechanisms, drawing examples from REVEAL THIS, an FP6 project on the retrieval of video and language for the home user. We argue, that efficient video search holds a key to the usability of the new ”pervasive digital video” technologies and that it should involve cross-media decision mechanisms.

    JVRB, 3(2006), no. 11.

  6. 2007-03-23

    How to Improve the Production Process for interactive TV with semi-formal Methods

    The central question for this paper is how to improve the production process by closing the gap between industrial designers and software engineers of television(TV)-based User Interfaces (UI) in an industrial environment. Software engineers are highly interested whether one UI design can be converted into several fully functional UIs for TV products with different screen properties. The aim of the software engineers is to apply automatic layout and scaling in order to speed up and improve the production process. However, the question is whether a UI design lends itself for such automatic layout and scaling. This is investigated by analysing a prototype UI design done by industrial designers. In a first requirements study, industrial designers had created meta-annotations on top of their UI design in order to disclose their design rationale for discussions with software engineers. In a second study, five (out of ten) industrial designers assessed the potential of four different meta-annotation approaches. The question was which annotation method industrial designers would prefer and whether it could satisfy the technical requirements of the software engineering process. One main result is that the industrial designers preferred the method they were already familiar with, which therefore seems to be the most effective one although the main objective of automatic layout and scaling could still not be achieved.

    JVRB, 4(2007), no. 8.

  7. 2006-12-22

    Digital Illumination for Augmented Studios

    Virtual studio technology plays an important role for modern television productions. Blue-screen matting is a common technique for integrating real actors or moderators into computer generated sceneries. Augmented reality offers the possibility to mix real and virtual in a more general context. This article proposes a new technological approach for combining real studio content with computergenerated information. Digital light projection allows a controlled spatial, temporal, chrominance and luminance modulation of illumination – opening new possibilities for TV studios.

    JVRB, 3(2006), no. 8.

  8. 2006-11-23

    An Architecture for End-User TV Content Enrichment

    This paper proposes an extension to the televisionwatching paradigm that permits an end-user to enrich broadcast content. Examples of this enriched content are: virtual edits that allow the order of presentation within the content to be changed or that allow the content to be subsetted; conditional text, graphic or video objects that can be placed to appear within content and triggered by viewer interaction; additional navigation links that can be added to structure how other users view the base content object. The enriched content can be viewed directly within the context of the TV viewing experience. It may also be shared with other users within a distributed peer group. Our architecture is based on a model that allows the original content to remain unaltered, and which respects DRM restrictions on content reuse. The fundamental approach we use is to define an intermediate content enhancement layer that is based on the W3C’s SMIL language. Using a pen-based enhancement interface, end-users can manipulate content that is saved in a home PDR setting. This paper describes our architecture and it provides several examples of how our system handles content enhancement. We also describe a reference implementation for creating and viewing enhancements.

    JVRB, 3(2006), no. 9.

  9. 2006-01-03

    MHP Oriented Interactive Augmented Reality System for Sports Broadcasting Environments

    Television and movie images have been altered ever since it was technically possible. Nowadays embedding advertisements, or incorporating text and graphics in TV scenes, are common practice, but they can not be considered as integrated part of the scene. The introduction of new services for interactive augmented television is discussed in this paper. We analyse the main aspects related with the whole chain of augmented reality production. Interactivity is one of the most important added values of the digital television: This paper aims to break the model where all TV viewers receive the same final image. Thus, we introduce and discuss the new concept of interactive augmented television, i. e. real time composition of video and computer graphics - e.g. a real scene and freely selectable images or spatial rendered objects - edited and customized by the end user within the context of the user's set top box and TV receiver.

    JVRB, 3(2006), no. 13.

GI VR/AR 2006
  1. 2008-01-22

    Augmenting a Laser Pointer with a Diffraction Grating for Monoscopic 6DOF Detection

    This article illustrates the detection of 6 degrees of freedom (DOF) for Virtual Environment interactions using a modified simple laser pointer device and a camera. The laser pointer is combined with a diffraction rating to project a unique laser grid onto the projection planes used in projection-based immersive VR setups. The distortion of the projected grid is used to calculate the translational and rotational degrees of freedom required for human-computer interaction purposes.

    JVRB, 4(2007), no. 14.

  2. 2008-01-18

    Interactive Augmentation of Live Images using a HDR Stereo Camera

    Adding virtual objects to real environments plays an important role in todays computer graphics: Typical examples are virtual furniture in a real room and virtual characters in real movies. For a believable appearance, consistent lighting of the virtual objects is required. We present an augmented reality system that displays virtual objects with consistent illumination and shadows in the image of a simple webcam. We use two high dynamic range video cameras with fisheye lenses permanently recording the environment illumination. A sampling algorithm selects a few bright parts in one of the wide angle images and the corresponding points in the second camera image. The 3D position can then be calculated using epipolar geometry. Finally, the selected point lights are used in a multi pass algorithm to draw the virtual object with shadows. To validate our approach, we compare the appearance and shadows of the synthetic objects with real objects.

    JVRB, 4(2007), no. 12.

  3. 2007-09-27

    Passive-Active Geometric Calibration for View-Dependent Projections onto Arbitrary Surfaces

    In this paper we present a hybrid technique for correcting distortions that appear when projecting images onto geometrically complex, colored and textured surfaces. It analyzes the optical flow that results from perspective distortions during motions of the observer and tries to use this information for computing the correct image warping. If this fails due to an unreliable optical flow, an accurate -but slower and visiblestructured light projection is automatically triggered. Together with an appropriate radiometric compensation, view-dependent content can be projected onto arbitrary everyday surfaces. An implementation mainly on the GPU ensures fast frame rates.

    JVRB, 4(2007), no. 6.

  4. 2007-08-21

    Tracking of industrial objects by using CAD models

    In this paper we present a model-based approach for real-time camera pose estimation in industrial scenarios. The line model which is used for tracking is generated by rendering a polygonal model and extracting contours out of the rendered scene. By un-projecting a point on the contour with the depth value stored in the z-buffer, the 3D coordinates of the contour can be calculated. For establishing 2D/3D correspondences the 3D control points on the contour are projected into the image and a perpendicular search for gradient maxima for every point on the contour is performed. Multiple hypotheses of 2D image points corresponding to a 3D control point make the pose estimation robust against ambiguous edges in the image.

    JVRB, 4(2007), no. 1.

GRAPP 2006
  1. 2009-11-12

    A Survey of Image-based Relighting Techniques

    Image-based Relighting (IBRL) has recently attracted a lot of research interest for its ability to relight real objects or scenes, from novel illuminations captured in natural/synthetic environments. Complex lighting effects such as subsurface scattering, interreflection, shadowing, mesostructural self-occlusion, refraction and other relevant phenomena can be generated using IBRL. The main advantage of image-based graphics is that the rendering time is independent of scene complexity as the rendering is actually a process of manipulating image pixels, instead of simulating light transport. The goal of this paper is to provide a complete and systematic overview of the research in Imagebased Relighting. We observe that essentially all IBRL techniques can be broadly classified into three categories (Fig. 9), based on how the scene/illumination information is captured: Reflectance function-based, Basis function-based and Plenoptic function-based. We discuss the characteristics of each of these categories and their representative methods. We also discuss about the sampling density and types of light source(s), relevant issues of IBRL.

    JVRB, 4(2007), no. 7.

  2. 2008-11-14

    Assessing Electromyographic Interfaces

    Electronic apppliances are increasingly a part of our everyday lives. In particular, mobile devices, with their reduced dimensions with power rivaling desktop computers, have substantially augmented our communication abilities offering instant availability, anywhere, to everyone. These devices have become essential for human communication but also include a more comprehensive tool set to support productivity and leisure applications. However, the many applications commonly available are not adapted to people with special needs. Rather, most popular devices are targeted at teenagers or young adults with excellent eyesight and coordination. What is worse, most of the commonly used assistive control interfaces are not available in a mobile environment where user's position, accommodation and capacities can vary even widely. To try and address people with special needs new approaches and techniques are sorely needed. This paper presents a control interface to allow tetraplegic users to interact with electronic devices. Our method uses myographic information (Electromyography or EMG) collected from residually controlled body areas. User evaluations validate electromyography as a daily wearable interface. In particular our results show that EMG can be used even in mobility contexts.

    JVRB, 5(2008), no. 12.

  3. 2007-08-30

    Topologically Accurate Dual Isosurfacing Using Ray Intersection

    “Dual contouring” approaches provide an alternative to standard Marching Cubes (MC) method to extract and approximate an isosurface from trivariate data given on a volumetric mesh. These dual approaches solve some of the problems encountered by the MC methods. We present a simple method based on the MC method and the ray intersection technique to compute isosurface points in the cell interior. One of the advantages of our method is that it does not require us to use Hermite interpolation scheme, unlike other dual contouring methods. We perform a complete analysis of all possible configurations to generate a look-up table for all configurations. We use the look-up table to optimize the ray-intersection method to obtain minimum number of points necessarily sufficient for defining topologically correct isosurfaces in all possible configurations. Isosurface points are connected using a simple strategy.

    JVRB, 4(2007), no. 4.

  4. 2007-08-28

    High-Level Modeling of Multimodal Interaction Techniques Using NiMMiT

    The past few years, multimodal interaction has been gaining importance in virtual environments. Although multimodality renders interacting with an environment more natural and intuitive, the development cycle of such an application is often long and expensive. In our overall field of research, we investigate how modelbased design can facilitate the development process by designing environments through the use of highlevel diagrams. In this scope, we present ‘NiMMiT’, a graphical notation for expressing and evaluating multimodal user interaction; we elaborate on the NiMMiT primitives and demonstrate its use by means of a comprehensive example.

    JVRB, 4(2007), no. 2.

  5. 2007-07-25

    High level methods for scene exploration

    Virtual worlds exploration techniques are used in a wide variety of domains — from graph drawing to robot motion. This paper is dedicated to virtual world exploration techniques which have to help a human being to understand a 3D scene. An improved method of viewpoint quality estimation is presented in the paper, together with a new off-line method for automatic 3D scene exploration, based on a virtual camera. The automatic exploration method is working in two steps. In the first step, a set of “good” viewpoints is computed. The second step uses this set of points of view to compute a camera path around the scene. Finally, we define a notion of semantic distance between objects of the scene to improve the approach.

    JVRB, 3(2006), no. 12.

  6. 2007-01-24

    Exploring Urban Environments Using Virtual and Augmented Reality

    In this paper, we propose the use of specific system architecture, based on mobile device, for navigation in urban environments. The aim of this work is to assess how virtual and augmented reality interface paradigms can provide enhanced location based services using real-time techniques in the context of these two different technologies. The virtual reality interface is based on faithful graphical representation of the localities of interest, coupled with sensory information on the location and orientation of the user, while the augmented reality interface uses computer vision techniques to capture patterns from the real environment and overlay additional way-finding information, aligned with real imagery, in real-time. The knowledge obtained from the evaluation of the virtual reality navigational experience has been used to inform the design of the augmented reality interface. Initial results of the user testing of the experimental augmented reality system for navigation are presented.

    JVRB, 3(2006), no. 5.

  7. 2007-01-10

    View-Dependent Extraction of Contours with Distance Transforms for adaptive polygonal Mesh-Simplification

    During decades Distance Transforms have proven to be useful for many image processing applications, and more recently, they have started to be used in computer graphics environments. The goal of this paper is to propose a new technique based on Distance Transforms for detecting mesh elements which are close to the objects' external contour (from a given point of view), and using this information for weighting the approximation error which will be tolerated during the mesh simplification process. The obtained results are evaluated in two ways: visually and using an objective metric that measures the geometrical difference between two polygonal meshes.

    JVRB, 3(2006), no. 4.

  8. 2007-01-03

    System Architecture of a Mixed Reality Framework

    In this paper the software architecture of a framework which simplifies the development of applications in the area of Virtual and Augmented Reality is presented. It is based on VRML/X3D to enable rendering of audio-visual information. We extended our VRML rendering system by a device management system that is based on the concept of a data-flow graph. The aim of the system is to create Mixed Reality (MR) applications simply by plugging together small prefabricated software components, instead of compiling monolithic C++ applications. The flexibility and the advantages of the presented framework are explained on the basis of an exemplary implementation of a classic Augmented Realityapplication and its extension to a collaborative remote expert scenario.

    JVRB, 3(2006), no. 7.

  9. 2006-12-13

    Lag Camera: A Moving Multi-Camera Array for Scence-Acquisition

    Many applications, such as telepresence, virtual reality, and interactive walkthroughs, require a three-dimensional(3D)model of real-world environments. Methods, such as lightfields, geometric reconstruction and computer vision use cameras to acquire visual samples of the environment and construct a model. Unfortunately, obtaining models of real-world locations is a challenging task. In particular, important environments are often actively in use, containing moving objects, such as people entering and leaving the scene. The methods previously listed have difficulty in capturing the color and structure of the environment while in the presence of moving and temporary occluders. We describe a class of cameras called lag cameras. The main concept is to generalize a camera to take samples over space and time. Such a camera, can easily and interactively detect moving objects while continuously moving through the environment. Moreover, since both the lag camera and occluder are moving, the scene behind the occluder is captured by the lag camera even from viewpoints where the occluder lies in between the lag camera and the hidden scene. We demonstrate an implementation of a lag camera, complete with analysis and captured environments.

    JVRB, 3(2006), no. 10.

HC 2006
  1. 2009-01-08

    Articulated Narrowcasting for Privacy and Awareness in Multimedia Conferencing Systems and Design for Implementation Within a SIP Framework

    This article proposes a new focus of research for multimedia conferencing systems which allows a participant to flexibly select another participant or a group for media transmission. For example, in a traditional conference system, participants voices might by default be shared with all others, but one might want to select a subset of the conference members to send his/her media to or receive media from. We review the concept of narrowcasting, a model for limiting such information streams in a multimedia conference, and describe a design to use existing standard protocols (SIP and SDP) for controlling fine-grained narrowcasting sessions.

    JVRB, 5(2008), no. 14.

PerGames 2006
  1. 2007-02-06

    Marker-Based Embodied Interaction for Handheld Augmented Reality Games

    This article deals with embodied user interfaces for handheld augmented reality games, which consist of both physical and virtual components. We have developed a number of spatial interaction techniques that optically capture the device's movement and orientation relative to a visual marker. Such physical interactions in 3-D space enable manipulative control of mobile games. In addition to acting as a physical controller that recognizes multiple game-dependent gestures, the mobile device augments the camera view with graphical overlays. We describe three game prototypes that use ubiquitous product packaging and other passive media as backgrounds for handheld augmentation. The prototypes can be realized on widely available off-the-shelf hardware and require only minimal setup and infrastructure support.

    JVRB, 4(2007), no. 5.

Enactive 2007
  1. 2008-11-27

    Characterizing full-body reach duration across task and viewpoint modalities

    The full-body control of virtual characters is a promising technique for application fields such as Virtual Prototyping. However it is important to assess to what extent the user full-body behavior is modified when immersed in a virtual environment. In the present study we have measured reach durations for two types of task (controlling a simple rigid shape vs. a virtual character) and two types of viewpoint (1st person vs. 3rd person). The paper first describes the architecture of the motion capture approach retained for the on-line full-body reach experiment. We then present reach measurement results performed in a non-virtual environment. They show that the target height parameter leads to reach duration variation of ∓25% around the average duration for the highest and lowest targets. This characteristic is highly accentuated in the virtual world as analyzed in the discussion section. In particular, the discrepancy observed for the first person viewpoint modality suggests to adopt a third person viewpoint when controling the posture of a virtual character in a virtual environment.

    JVRB, 5(2008), no. 15.

GI VR/AR 2007
  1. 2008-12-10

    Evaluation of Binocular Eye Trackers and Algorithms for 3D Gaze Interaction in Virtual Reality Environments

    Tracking user’s visual attention is a fundamental aspect in novel human-computer interaction paradigms found in Virtual Reality. For example, multimodal interfaces or dialogue-based communications with virtual and real agents greatly benefit from the analysis of the user’s visual attention as a vital source for deictic references or turn-taking signals. Current approaches to determine visual attention rely primarily on monocular eye trackers. Hence they are restricted to the interpretation of two-dimensional fixations relative to a defined area of projection. The study presented in this article compares precision, accuracy and application performance of two binocular eye tracking devices. Two algorithms are compared which derive depth information as required for visual attention-based 3D interfaces. This information is further applied to an improved VR selection task in which a binocular eye tracker and an adaptive neural network algorithm is used during the disambiguation of partly occluded objects.

    JVRB, 5(2008), no. 16.

  2. 2008-07-24

    Multi-Contact Grasp Interaction for Virtual Environments

    The grasping of virtual objects has been an active research field for several years. Solutions providing realistic grasping rely on special hardware or require time-consuming parameterizations. Therefore, we introduce a flexible grasping algorithm enabling grasping without computational complex physics. Objects can be grasped and manipulated with multiple fingers. In addition, multiple objects can be manipulated simultaneously with our approach. Through the usage of contact sensors the technique is easily configurable and versatile enough to be used in different scenarios.

    JVRB, 5(2008), no. 7.

GRAPP 2007
  1. 2008-10-23

    Real-Time Joint Coupling of the Spine for Inverse Kinematics

    In this paper we propose a simple model for the coupling behavior of the human spine for an inverse kinematics framework. Our spine model exhibits anatomically correct motions of the vertebrae of virtual mannequins by coupling standard swing and revolute joint models. The adjustement of the joints is made with several simple (in)equality constraints, resulting in a reduction of the solution space dimensionality for the inverse kinematics solver. By reducing the solution space dimensionality to feasible spine shapes, we prevent the inverse kinematics algorithm from providing infeasible postures for the spine.In this paper, we exploit how to apply these simple constraints to the human spine by a strict decoupling of the swing and torsion motion of the vertebrae. We demonstrate the validity of our approach on various experiments.

    JVRB, 5(2008), no. 11.

  2. 2008-08-07

    A Physically Based Transmission Model of Rough Surfaces

    Transparent and translucent objects involve both light reflection and transmission at surfaces. This paper presents a physically based transmission model of rough surface. The surface is assumed to be locally smooth, and statistical techniques is applied to calculate light transmission through a local illumination area. We have obtained an analytical expression for single scattering. The analytical model has been compared to our Monte Carlo simulations as well as to the previous simulations, and good agreements have been achieved. The presented model has potential applications for realistic rendering of transparent and translucent objects.

    JVRB, 5(2008), no. 9.

  3. 2008-07-09

    Predictive-DCT Coding for 3D Mesh Sequences Compression

    This paper proposes a new compression algorithm for dynamic 3d meshes. In such a sequence of meshes, neighboring vertices have a strong tendency to behave similarly and the degree of dependencies between their locations in two successive frames is very large which can be efficiently exploited using a combination of Predictive and DCT coders (PDCT). Our strategy gathers mesh vertices of similar motions into clusters, establish a local coordinate frame (LCF) for each cluster and encodes frame by frame and each cluster separately. The vertices of each cluster have small variation over a time relative to the LCF. Therefore, the location of each new vertex is well predicted from its location in the previous frame relative to the LCF of its cluster. The difference between the original and the predicted local coordinates are then transformed into frequency domain using DCT. The resulting DCT coefficients are quantized and compressed with entropy coding. The original sequence of meshes can be reconstructed from only a few non-zero DCT coefficients without significant loss in visual quality. Experimental results show that our strategy outperforms or comes close to other coders.

    JVRB, 5(2008), no. 6.

  4. 2008-06-05

    Multi-Mode Tensor Representation of Motion Data

    In this paper, we investigate how a multilinear model can be used to represent human motion data. Based on technical modes (referring to degrees of freedom and number of frames) and natural modes that typically appear in the context of a motion capture session (referring to actor, style, and repetition), the motion data is encoded in form of a high-order tensor. This tensor is then reduced by using N-mode singular value decomposition. Our experiments show that the reduced model approximates the original motion better then previously introduced PCA-based approaches. Furthermore, we discuss how the tensor representation may be used as a valuable tool for the synthesis of new motions.

    JVRB, 5(2008), no. 5.

  5. 2008-03-20

    Adaptive Cube Tessellation for Topologically Correct Isosurfaces

    Three dimensional datasets representing scalar fields are frequently rendered using isosurfaces. For datasets arranged as a cubic lattice, the marching cubes algorithm is the most used isosurface extraction method. However, the marching cubes algorithm produces some ambiguities which have been solved using different approaches that normally imply a more complex process. One of them is to tessellate the cubes into tetrahedra, and by using a similar method (marching tetrahedra), to build the isosurface. The main drawback of other tessellations is that they do not produce the same isosurface topologies as those generated by improved marching cubes algorithms. We propose an adaptive tessellation that, being independent of the isovalue, preserves the topology. Moreover the tessellationallows the isosurface to evolve continuously when the isovalue is changed continuously.

    JVRB, 5(2008), no. 3.

  6. 2008-03-12

    Rendering Falling Leaves on Graphics Hardware

    There is a growing interest in simulating natural phenomena in computer graphics applications. Animating natural scenes in real time is one of the most challenging problems due to the inherent complexity of their structure, formed by millions of geometric entities, and the interactions that happen within. An example of natural scenario that is needed for games or simulation programs are forests. Forests are difficult to render because the huge amount of geometric entities and the large amount of detail to be represented. Moreover, the interactions between the objects (grass, leaves) and external forces such as wind are complex to model. In this paper we concentrate in the rendering of falling leaves at low cost. We present a technique that exploits graphics hardware in order to render thousands of leaves with different falling paths in real time and low memory requirements.

    JVRB, 5(2008), no. 2.

  7. 2008-01-29

    Fitting 3D morphable models using implicit representations

    We consider the problem of approximating the 3D scan of a real object through an affine combination of examples. Common approaches depend either on the explicit estimation of point-to-point correspondences or on 2-dimensional projections of the target mesh; both present drawbacks. We follow an approach similar to [IF03] by representing the target via an implicit function, whose values at the vertices of the approximation are used to define a robust cost function. The problem is approached in two steps, by approximating first a coarse implicit representation of the whole target, and then finer, local ones; the local approximations are then merged together with a Poisson-based method. We report the results of applying our method on a subset of 3D scans from the Face Recognition Grand Challenge v.1.0.

    JVRB, 4(2007), no. 18.

  8. 2008-01-24

    The art to keep in touch The good use of Lagrange multipliers

    Physically-based modeling for computer animation allows to produce more realistic motions in less time without requiring the expertise of skilled animators. But, a computer animation is not only a numerical simulation based on classical mechanics since it follows a precise story-line. One common way to define aims in an animation is to add geometric constraints. There are several methods to manage these constraints within a physically-based framework. In this paper, we present an algorithm for constraints handling based on Lagrange multipliers. After few remarks on the equations of motion that we use, we present a first algorithm proposed by Platt. We show with a simple example that this method is not reliable. Our contribution consists in improving this algorithm to provide an efficient and robust method to handle simultaneous active constraints.

    JVRB, 4(2007), no. 15.

PerGames 2007
  1. 2008-11-26

    The Design of Networked Exertion Games

    Incorporating physical activity and exertion into pervasive gaming applications can provide health and social benefits. Prior research has resulted in several prototypes of pervasive games that encourage exertion as interaction form; however, no detailed critical account of the various approaches exists. We focus on networked exertion games and detail some of our work while identifying the remaining issues towards providing a coherent framework. We outline common lessons learned and use them as the basis for generalizations for the design of networked exertion games. We propose possible directions of further investigation, hoping to provide guidance for future work to facilitate greater awareness and exposure of exertion games and their benefits.

    JVRB, 5(2008), no. 13.

  2. 2008-08-08

    Gesture-Based, Touch-Free Multi-User Gaming on Wall-Sized, High-Resolution Tiled Displays

    Having to carry input devices can be inconvenient when interacting with wall-sized, high-resolution tiled displays. Such displays are typically driven by a cluster of computers. Running existing games on a cluster is non-trivial, and the performance attained using software solutions like Chromium is not good enough. This paper presents a touch-free, multi-user, humancomputer interface for wall-sized displays that enables completely device-free interaction. The interface is built using 16 cameras and a cluster of computers, and is integrated with the games Quake 3 Arena (Q3A) and Homeworld. The two games were parallelized using two different approaches in order to run on a 7x4 tile, 21 megapixel display wall with good performance. The touch-free interface enables interaction with a latency of 116 ms, where 81 ms are due to the camera hardware. The rendering performance of the games is compared to their sequential counterparts running on the display wall using Chromium. Parallel Q3A’s framerate is an order of magnitude higher compared to using Chromium. The parallel version of Homeworld performed on par with the sequential, which did not run at all using Chromium. Informal use of the touch-free interface indicates that it works better for controlling Q3A than Homeworld.

    JVRB, 5(2008), no. 10.

  3. 2008-07-25

    The TViews Table Role-Playing Game

    The TViews Table Role-Playing Game (TTRPG) is a digital tabletop role-playing game that runs on the TViews table, bridging the separate worlds of traditional role-playing games with the growing area of massively multiplayer online role-playing games. The TViews table is an interactive tabletop media platform that can track the location of multiple tagged objects in real-time as they are moved around its surface, providing a simultaneous and coincident graphical display. In this paper we present the implementation of the first version of TTRPG, with a content set based on the traditional Dungeons & Dragons rule-set. We also discuss the results of a user study that used TTRPG to explore the possible social context of digital tabletop role-playing games.

    JVRB, 5(2008), no. 8.

  4. 2008-04-02

    RFIDice - Augmenting Tabletop Dice with RFID

    Augmented dice allow players of tabletop games to have the result of a roll be automatically recorded by a computer, e.g., for supporting strategy games. We have built a set of three augmented-dice-prototypes based on radio frequency identification (RFID) technology, which allows us to build robust, cheap, and small augmented dice. Using a corresponding readout infrastructure and a sample application, we have evaluated our approach and show its advantages over other dice augmentation methods discussed in the literature.

    JVRB, 5(2008), no. 4.

CVMP 2008
  1. 2010-10-26

    Effects of camera aperture correction on keying and compositing of broadcast video

    This contribution discusses the effects of camera aperture correction in broadcast video on colour-based keying. The aperture correction is used to ’sharpen’ an image and is one element that distinguishes the ’TV-look’ from ’film-look’. ’If a very high level of sharpening is applied, as is the case in many TV productions then this significantly shifts the colours around object boundaries with hight contrast. This paper discusses these effects and their impact on keying and describes a simple low-pass filter to compensate for them. Tests with colour-based segmentation algorithms show that the proposed compensation is an effective way of decreasing the keying artefacts on object boundaries.

    JVRB, 7(2010), no. 9.

  2. 2010-10-01

    Algorithms For Automatic And Robust Registration Of 3D Head Scans

    wo methods for registering laser-scans of human heads and transforming them to a new semantically consistent topology defined by a user-provided template mesh are described. Both algorithms are stated within the Iterative Closest Point framework. The first method is based on finding landmark correspondences by iteratively registering the vicinity of a landmark with a re-weighted error function. Thin-plate spline interpolation is then used to deform the template mesh and finally the scan is resampled in the topology of the deformed template. The second algorithm employs a morphable shape model, which can be computed from a database of laser-scans using the first algorithm. It directly optimizes pose and shape of the morphable model. The use of the algorithm with PCA mixture models, where the shape is split up into regions each described by an individual subspace, is addressed. Mixture models require either blending or regularization strategies, both of which are described in detail. For both algorithms, strategies for filling in missing geometry for incomplete laser-scans are described. While an interpolation-based approach can be used to fill in small or smooth regions, the model-driven algorithm is capable of fitting a plausible complete head mesh to arbitrarily small geometry, which is known as "shape completion". The importance of regularization in the case of extreme shape completion is shown.

    JVRB, 7(2010), no. 7.

  3. 2010-09-22

    Reflectance Transfer for Material Editing and Relighting

    We present a new approach to diffuse reflectance estimation for dynamic scenes. Non-parametric image statistics are used to transfer reflectance properties from a static example set to a dynamic image sequence. The approach allows diffuse reflectance estimation for surface materials with inhomogeneous appearance, such as those which commonly occur with patterned or textured clothing. Material editing is also possible by transferring edited reflectance properties. Material reflectance properties are initially estimated from static images of the subject under multiple directional illuminations using photometric stereo. The estimated reflectance together with the corresponding image under uniform ambient illumination form a prior set of reference material observations. Material reflectance properties are then estimated for video sequences of a moving person captured under uniform ambient illumination by matching the observed local image statistics to the reference observations. Results demonstrate that the transfer of reflectance properties enables estimation of the dynamic surface normals and subsequent relighting combined with material editing. This approach overcomes limitations of previous work on material transfer and relighting of dynamic scenes which was limited to surfaces with regions of homogeneous reflectance. We evaluate our approach for relighting 3D model sequences reconstructed from multiple view video. Comparison to previous model relighting demonstrates improved reproduction of detailed texture and shape dynamics.

    JVRB, 7(2010), no. 6.

  4. 2010-07-19

    Increasing Realism and Supporting Content Planning for Dynamic Scenes in a Mixed Reality System incorporating a Time-of-Flight Camera

    For broadcasting purposes MIXED REALITY, the combination of real and virtual scene content, has become ubiquitous nowadays. Mixed Reality recording still requires expensive studio setups and is often limited to simple color keying. We present a system for Mixed Reality applications which uses depth keying and provides threedimensional mixing of real and artificial content. It features enhanced realism through automatic shadow computation which we consider a core issue to obtain realism and a convincing visual perception, besides the correct alignment of the two modalities and correct occlusion handling. Furthermore we present a possibility to support placement of virtual content in the scene. Core feature of our system is the incorporation of a TIME-OF-FLIGHT (TOF)-camera device. This device delivers real-time depth images of the environment at a reasonable resolution and quality. This camera is used to build a static environment model and it also allows correct handling of mutual occlusions between real and virtual content, shadow computation and enhanced content planning. The presented system is inexpensive, compact, mobile, flexible and provides convenient calibration procedures. Chroma-keying is replaced by depth-keying which is efficiently performed on the GRAPHICS PROCESSING UNIT (GPU) by the usage of an environment model and the current ToF-camera image. Automatic extraction and tracking of dynamic scene content is herewith performed and this information is used for planning and alignment of virtual content. An additional sustainable feature is that depth maps of the mixed content are available in real-time, which makes the approach suitable for future 3DTV productions. The presented paper gives an overview of the whole system approach including camera calibration, environment model generation, real-time keying and mixing of virtual and real content, shadowing for virtual content and dynamic object tracking for content planning.

    JVRB, 7(2010), no. 4.

  5. 2010-07-16

    An Empirical Study of Non-Rigid Surface Feature Matching of Human from 3D Video

    This paper presents an empirical study of affine invariant feature detectors to perform matching on video sequences of people with non-rigid surface deformation. Recent advances in feature detection and wide baseline matching have focused on static scenes. Video frames of human movement capture highly non-rigid deformation such as loose hair, cloth creases, skin stretching and free flowing clothing. This study evaluates the performance of six widely used feature detectors for sparse temporal correspondence on single view and multiple view video sequences. Quantitative evaluation is performed of both the number of features detected and their temporal matching against and without ground truth correspondence. Recall-accuracy analysis of feature matching is reported for temporal correspondence on single view and multiple view sequences of people with variation in clothing and movement. This analysis identifies that existing feature detection and matching algorithms are unreliable for fast movement with common clothing.

    JVRB, 7(2010), no. 3.

  6. 2010-03-23

    Registration of Sub-Sequence and Multi-Camera Reconstructions for Camera Motion Estimation

    This paper presents different application scenarios for which the registration of sub-sequence reconstructions or multi-camera reconstructions is essential for successful camera motion estimation and 3D reconstruction from video. The registration is achieved by merging unconnected feature point tracks between the reconstructions. One application is drift removal for sequential camera motion estimation of long sequences. The state-of-the-art in drift removal is to apply a RANSAC approach to find unconnected feature point tracks. In this paper an alternative spectral algorithm for pairwise matching of unconnected feature point tracks is used. It is then shown that the algorithms can be combined and applied to novel scenarios where independent camera motion estimations must be registered into a common global coordinate system. In the first scenario multiple moving cameras, which capture the same scene simultaneously, are registered. A second new scenario occurs in situations where the tracking of feature points during sequential camera motion estimation fails completely, e.g., due to large occluding objects in the foreground, and the unconnected tracks of the independent reconstructions must be merged. In the third scenario image sequences of the same scene, which are captured under different illuminations, are registered. Several experiments with challenging real video sequences demonstrate that the presented techniques work in practice.

    JVRB, 7(2010), no. 2.

GI VR/AR 2008
  1. 2010-03-18

    GPU-based Ray Tracing of Dynamic Scenes

    Interactive ray tracing of non-trivial scenes is just becoming feasible on single graphics processing units (GPU). Recent work in this area focuses on building effective acceleration structures, which work well under the constraints of current GPUs. Most approaches are targeted at static scenes and only allow navigation in the virtual scene. So far support for dynamic scenes has not been considered for GPU implementations. We have developed a GPU-based ray tracing system for dynamic scenes consisting of a set of individual objects. Each object may independently move around, but its geometry and topology are static.

    JVRB, 7(2010), no. 1.

GRAPP 2008
  1. 2009-03-18

    Quasi-Convolution Pyramidal Blurring

    Efficient image blurring techniques based on the pyramid algorithm can be implemented on modern graphics hardware; thus, image blurring with arbitrary blur width is possible in real time even for large images. However, pyramidal blurring methods do not achieve the image quality provided by convolution filters; in particular, the shape of the corresponding filter kernel varies locally, which potentially results in objectionable rendering artifacts. In this work, a new analysis filter is designed that significantly reduces this variation for a particular pyramidal blurring technique. Moreover, the pyramidal blur algorithm is generalized to allow for a continuous variation of the blur width. Furthermore, an efficient implementation for programmable graphics hardware is presented. The proposed method is named “quasi-convolution pyramidal blurring” since the resulting effect is very close to image blurring based on a convolution filter for many applications.

    JVRB, 6(2009), no. 6.

VRIC 2008
  1. 2010-03-19

    Considering Stage Direction as Building Informed Virtual Environments

    This article begins with some recent considerations about real-time music, inspired by the latest contribution of French composer Philippe Manoury. Then, through the case study of the scenic performance La Traversée de la nuit, we analyse some perspectives for designing an Informed Virtual Environment dedicated to live show artistic domain.

    JVRB, 6(2009), no. 10.

  2. 2010-01-29

    VR Based Visualization and Exploration of Plant Biological Data

    This paper investigates the use of virtual reality (VR) technologies to facilitate the analysis of plant biological data in distinctive steps in the application pipeline. Reconstructed three-dimensional biological models (primary polygonal models) transferred to a virtual environment support scientists' collaborative exploration of biological datasets so that they obtain accurate analysis results and uncover information hidden in the data. Examples of the use of virtual reality in practice are provided and a complementary user study was performed.

    JVRB, 6(2009), no. 8.

  3. 2009-12-03

    Survey on haptic rendering of data sets: Exploration of scalar and vector fields

    Complementary to automatic extraction processes, Virtual Reality technologies provide an adequate framework to integrate human perception in the exploration of large data sets. In such multisensory system, thanks to intuitive interactions, a user can take advantage of all his perceptual abilities in the exploration task. In this context the haptic perception, coupled to visual rendering, has been investigated for the last two decades, with significant achievements. In this paper, we present a survey related to exploitation of the haptic feedback in exploration of large data sets. For each haptic technique introduced, we describe its principles and its effectiveness.

    JVRB, 6(2009), no. 9.

  4. 2009-02-20

    Spatial audition in a static virtual environment : the role of auditory-visual interaction

    The integration of the auditory modality in virtual reality environments is known to promote the sensations of immersion and presence. However it is also known from psychophysics studies that auditory-visual interaction obey to complex rules and that multisensory conflicts may disrupt the adhesion of the participant to the presented virtual scene. It is thus important to measure the accuracy of the auditory spatial cues reproduced by the auditory display and their consistency with the spatial visual cues. This study evaluates auditory localization performances under various unimodal and auditory-visual bimodal conditions in a virtual reality (VR) setup using a stereoscopic display and binaural reproduction over headphones in static conditions. The auditory localization performances observed in the present study are in line with those reported in real conditions, suggesting that VR gives rise to consistent auditory and visual spatial cues. These results validate the use of VR for future psychophysics experiments with auditory and visual stimuli. They also emphasize the importance of a spatially accurate auditory and visual rendering for VR setups.

    JVRB, 6(2009), no. 5.

  5. 2009-02-05

    Transfer of spatial knowledge from a virtual environment to reality: Impact of route complexity and subject’s strategy on the exploration mode

    The use of virtual reality as tool in the area of spatial cognition raises the question of the quality of learning transfer from a virtual to a real environment. It is first necessary to determine with healthy subjects, the cognitive aids that improve the quality of transfer and the conditions required, especially since virtual reality can be used as effective tool in cognitive rehabilitation. The purpose of this study was to investigate the influence of the exploration mode of virtual environment (Passive vs. Active) according to Route complexity (Simple vs. Complex) on the quality of spatial knowledge transfer in three spatial tasks. Ninety subjects (45 men and 45 women) participated. Spatial learning was evaluated by Wayfinding, sketch-mapping and picture classification tasks in the context of the Bordeaux district. In the Wayfinding task, results indicated that active learning in a Virtual Environment (VE) increased the performances compared to the passive learning condition, irrespective of the route complexity factor. In the Sketch-mapping task, active learning in a VE helped the subjects to transfer their spatial knowledge from the VE to reality, but only when the route was complex. In the Picture classification task, active learning in a VE when the route was complex did not help the subjects to transfer their spatial knowledge. These results are explained in terms of knowledge levels and frame/strategy of reference [SW75, PL81, TH82].

    JVRB, 6(2009), no. 4.

  6. 2009-02-04

    Gaze behavior nonlinear dynamics assessed in virtual immersion as a diagnostic index of sexual deviancy: preliminary results

    This paper presents preliminary results about the use of virtual characters, penile plethysmography and gaze behaviour dynamics to assess deviant sexual preferences. Pedophile patients’ responses are compared to those of non-deviant subjects while they were immersed with virtual characters depicting relevant sexual features.

    JVRB, 6(2009), no. 3.

  7. 2009-02-03

    Real Walking through Virtual Environments by Redirection Techniques

    We present redirection techniques that support exploration of large-scale virtual environments (VEs) by means of real walking. We quantify to what degree users can unknowingly be redirected in order to guide them through VEs in which virtual paths differ from the physical paths. We further introduce the concept of dynamic passive haptics by which any number of virtual objects can be mapped to real physical proxy props having similar haptic properties (i. e., size, shape, and surface structure), such that the user can sense these virtual objects by touching their real world counterparts. Dynamic passive haptics provides the user with the illusion of interacting with a desired virtual object by redirecting her to the corresponding proxy prop. We describe the concepts of generic redirected walking and dynamic passive haptics and present experiments in which we have evaluated these concepts. Furthermore, we discuss implications that have been derived from a user study, and we present approaches that derive physical paths which may vary from the virtual counterparts.

    JVRB, 6(2009), no. 2.

  8. 2009-01-26

    The MIRELA framework: modeling and analyzing mixed reality applications using timed automata

    Mixed Reality (MR) aims to link virtual entities with the real world and has many applications such as military and medical domains [JBL+00, NFB07]. In many MR systems and more precisely in augmented scenes, one needs the application to render the virtual part accurately at the right time. To achieve this, such systems acquire data related to the real world from a set of sensors before rendering virtual entities. A suitable system architecture should minimize the delays to keep the overall system delay (also called end-to-end latency) within the requirements for real-time performance. In this context, we propose a compositional modeling framework for MR software architectures in order to specify, simulate and validate formally the time constraints of such systems. Our approach is first based on a functional decomposition of such systems into generic components. The obtained elements as well as their typical interactions give rise to generic representations in terms of timed automata. A whole system is then obtained as a composition of such defined components. To write specifications, a textual language named MIRELA (MIxed REality LAnguage) is proposed along with the corresponding compilation tools. The generated output contains timed automata in UPPAAL format for simulation and verification of time constraints. These automata may also be used to generate source code skeletons for an implementation on a MR platform. The approach is illustrated first on a small example. A realistic case study is also developed. It is modeled by several timed automata synchronizing through channels and including a large number of time constraints. Both systems have been simulated in UPPAAL and checked against the required behavioral properties.

    JVRB, 6(2009), no. 1.

CVMP 2009
  1. 2013-06-28

    Constructing And Rendering Vectorised Photographic Images

    We address the problem of representing captured images in the continuous mathematical space more usually associated with certain forms of drawn ('vector') images. Such an image is resolution-independent so can be used as a master for varying resolution-specific formats. We briefly describe the main features of a vectorising codec for photographic images, whose significance is that drawing programs can access images and image components as first-class vector objects. This paper focuses on the problem of rendering from the isochromic contour form of a vectorised image and demonstrates a new fill algorithm which could also be used in drawing generally. The fill method is described in terms of level set diffusion equations for clarity. Finally we show that image warping is both simplified and enhanced in the vector form and that we can demonstrate real histogram equalisation with genuinely rectangular histograms straightforwardly.

    JVRB, 9(2012), no. 3.

  2. 2013-06-27

    A multi-modal approach to perceptual tone mapping

    We present an improvement of TSTM, a recently proposed tone mapping operator for High Dynamic Range (HDR) images, based on a multi-modal analysis. One of the key features of TSTM is a suitable implementation of the Naka-Rushton equation that mimics the visual adaptation performed by the human visual system coherently with Weber-Fechner's law of contrast perception. In the present paper we use the Gaussian Mixture Model (GMM) in order to detect the modes of the log-scale luminance histogram of a given HDR image and then we use the information provided by GMM to properly devise a Naka-Rushton equation for each mode. Finally, we properly select the parameters in order to merge those equations into a continuous function. Tests and comparisons to show how this new method is capable of improving the performances of TSTM are provided and commented, as well as comparisons with state of the art methods.

    JVRB, 9(2012), no. 7.

  3. 2013-04-25

    Sharpness Matching in Stereo Images

    When stereo images are captured under less than ideal conditions, there may be inconsistencies between the two images in brightness, contrast, blurring, etc. When stereo matching is performed between the images, these variations can greatly reduce the quality of the resulting depth map. In this paper we propose a method for correcting sharpness variations in stereo image pairs which is performed as a pre-processing step to stereo matching. Our method is based on scaling the 2D discrete cosine transform (DCT) coefficients of both images so that the two images have the same amount of energy in each of a set of frequency bands. Experiments show that applying the proposed correction method can greatly improve the disparity map quality when one image in a stereo pair is more blurred than the other.

    JVRB, 9(2012), no. 4.

  4. 2012-02-22

    Cosine Lobe Based Relighting from Gradient Illumination Photographs

    We present an image-based method for relighting a scene by analytically fitting cosine lobes to the reflectance function at each pixel, based on gradient illumination photographs. Realistic relighting results for many materials are obtained using a single per-pixel cosine lobe obtained from just two color photographs: one under uniform white illumination and the other under colored gradient illumination. For materials with wavelength-dependent scattering, a better fit can be obtained using independent cosine lobes for the red, green, and blue channels, obtained from three achromatic gradient illumination conditions instead of the colored gradient condition. We explore two cosine lobe reflectance functions, both of which allow an analytic fit to the gradient conditions. One is non-zero over half the sphere of lighting directions, which works well for diffuse and specular materials, but fails for materials with broader scattering such as fur. The other is non-zero everywhere, which works well for broadly scattering materials and still produces visually plausible results for diffuse and specular materials. We also perform an approximate diffuse/specular separation of the reflectance, and estimate scene geometry from the recovered photometric normals to produce hard shadows cast by the geometry, while still reconstructing the input photographs exactly.

    JVRB, 9(2012), no. 2.

  5. 2011-01-31

    Visual Fixation for 3D Video Stabilization

    Visual fixation is employed by humans and some animals to keep a specific 3D location at the center of the visual gaze. Inspired by this phenomenon in nature, this paper explores the idea to transfer this mechanism to the context of video stabilization for a handheld video camera. A novel approach is presented that stabilizes a video by fixating on automatically extracted 3D target points. This approach is different from existing automatic solutions that stabilize the video by smoothing. To determine the 3D target points, the recorded scene is analyzed with a stateof- the-art structure-from-motion algorithm, which estimates camera motion and reconstructs a 3D point cloud of the static scene objects. Special algorithms are presented that search either virtual or real 3D target points, which back-project close to the center of the image for as long a period of time as possible. The stabilization algorithm then transforms the original images of the sequence so that these 3D target points are kept exactly in the center of the image, which, in case of real 3D target points, produces a perfectly stable result at the image center. Furthermore, different methods of additional user interaction are investigated. It is shown that the stabilization process can easily be controlled and that it can be combined with state-of-theart tracking techniques in order to obtain a powerful image stabilization tool. The approach is evaluated on a variety of videos taken with a hand-held camera in natural scenes.

    JVRB, 8(2011), no. 2.

GI VR/AR 2009
  1. 2012-02-10

    XSAMPL3D: An Action Description Language for the Animation of Virtual Characters

    In this paper we present XSAMPL3D, a novel language for the high-level representation of actions performed on objects by (virtual) humans. XSAMPL3D was designed to serve as action representation language in an imitation-based approach to character animation: First, a human demonstrates a sequence of object manipulations in an immersive Virtual Reality (VR) environment. From this demonstration, an XSAMPL3D description is automatically derived that represents the actions in terms of high-level action types and involved objects. The XSAMPL3D action description can then be used for the synthesis of animations where virtual humans of different body sizes and proportions reproduce the demonstrated action. Actions are encoded in a compact and human-readable XML-format. Thus, XSAMPL3D describtions are also amenable to manual authoring, e.g. for rapid prototyping of animations when no immersive VR environment is at the animator's disposal. However, when XSAMPL3D descriptions are derived from VR interactions, they can accomodate many details of the demonstrated action, such as motion trajectiories,hand shapes and other hand-object relations during grasping. Such detail would be hard to specify with manual motion authoring techniques only. Through the inclusion of language features that allow the representation of all relevant aspects of demonstrated object manipulations, XSAMPL3D is a suitable action representation language for the imitation-based approach to character animation.

    JVRB, 9(2012), no. 1.

  2. 2011-01-25

    Real-time Human Motion Capture with Simple Marker Sets and Monocular Video

    In this paper we present a hybrid method to track human motions in real-time. With simplified marker sets and monocular video input, the strength of both marker-based and marker-free motion capturing are utilized: A cumbersome marker calibration is avoided while the robustness of the marker-free tracking is enhanced by referencing the tracked marker positions. An improved inverse kinematics solver is employed for real-time pose estimation. A computer-visionbased approach is applied to refine the pose estimation and reduce the ambiguity of the inverse kinematics solutions. We use this hybrid method to capture typical table tennis upper body movements in a real-time virtual reality application.

    JVRB, 8(2011), no. 1.

GRAPP 2009
  1. 2010-10-27

    Spare Time Activity Sheets from Photo Albums

    Given arbitrary pictures, we explore the possibility of using new techniques from computer vision and artificial intelligence to create customized visual games on-the-fly. This includes coloring books, link-the-dot and spot-the-difference popular games. The feasibility of these systems is discussed and we describe prototype implementation that work well in practice in an automatic or semi-automatic way.

    JVRB, 7(2010), no. 10.

VRIC 2009
  1. 2011-02-24

    Intelligent Virtual Patients for Training Clinical Skills

    The article presents the design process of intelligent virtual human patients that are used for the enhancement of clinical skills. The description covers the development from conceptualization and character creation to technical components and the application in clinical research and training. The aim is to create believable social interactions with virtual agents that help the clinician to develop skills in symptom and ability assessment, diagnosis, interview techniques and interpersonal communication. The virtual patient fulfills the requirements of a standardized patient producing consistent, reliable and valid interactions in portraying symptoms and behaviour related to a specific clinical condition.

    JVRB, 8(2011), no. 3.

  2. 2010-10-21

    Efficient Bimanual Symmetric 3D Manipulation for Bare-Handed Interaction

    Recently, stable markerless 6 DOF video based handtracking devices became available. These devices simultaneously track the positions and orientations of both user hands in different postures with at least 25 frames per second. Such hand-tracking allows for using the human hands as natural input devices. However, the absence of physical buttons for performing click actions and state changes poses severe challenges in designing an efficient and easy to use 3D interface on top of such a device. In particular, for coupling and decoupling a virtual object’s movements to the user’s hand (i.e. grabbing and releasing) a solution has to be found. In this paper, we introduce a novel technique for efficient two-handed grabbing and releasing objects and intuitively manipulating them in the virtual space. This technique is integrated in a novel 3D interface for virtual manipulations. A user experiment shows the superior applicability of this new technique. Last but not least, we describe how this technique can be exploited in practice to improve interaction by integrating it with RTT DeltaGen, a professional CAD/CAS visualization and editing tool.

    JVRB, 7(2010), no. 8.

  3. 2010-08-19

    Virtual characters designed for forensic assessment and rehabilitation of sex offenders: standardized and made-to-measure

    This paper presents two studies pertaining to the use of virtual characters applied in clinical forensic rehabilitation of sex offenders. The first study is about the validation of the perceived age of virtual characters designed to simulate primary and secondary sexual character of typical adult and child individuals. The second study puts to use these virtual characters in comparing a group of sex offenders and a group of non deviant individuals on their sexual arousal responses as recorded in virtual immersion. Finally, two clinical vignettes illustrating the use of made-to-measure virtual characters to more closely fit sexual preferences are presented in Discussion.

    JVRB, 7(2010), no. 5.

GI VR/AR 2010
  1. 2012-11-28

    OCTAVIS: Optimization Techniques for Multi-GPU Multi-View Rendering

    We present a high performance-yet low cost-system for multi-view rendering in virtual reality (VR) applications. In contrast to complex CAVE installations, which are typically driven by one render client per view, we arrange eight displays in an octagon around the viewer to provide a full 360° projection, and we drive these eight displays by a single PC equipped with multiple graphics units (GPUs). In this paper we describe the hardware and software setup, as well as the necessary low-level and high-level optimizations to optimally exploit the parallelism of this multi-GPU multi-view VR system.

    JVRB, 9(2012), no. 6.

CVMP 2010
  1. 2014-01-22

    Generating Realistic Camera Shake for Virtual Scenes

    When depicting both virtual and physical worlds, the viewer's impression of presence in these worlds is strongly linked to camera motion. Plausible and artist-controlled camera movement can substantially increase scene immersion. While physical camera motion exhibits subtle details of position, rotation, and acceleration, these details are often missing for virtual camera motion. In this work, we analyze camera movement using signal theory. Our system allows us to stylize a smooth user-defined virtual base camera motion by enriching it with plausible details. A key component of our system is a database of videos filmed by physical cameras. These videos are analyzed with a camera-motion estimation algorithm (structure-from-motion) and labeled manually with a specific style. By considering spectral properties of location, orientation and acceleration, our solution learns camera motion details. Consequently, an arbitrary virtual base motion, defined in any conventional animation package, can be automatically modified according to a user-selected style. In an animation package the camera motion base path is typically defined by the user via function curves. Another possibility is to obtain the camera path by using a mixed reality camera in motion capturing studio. As shown in our experiments, the resulting shots are still fully artist-controlled, but appear richer and more physically plausible.

    JVRB, 10(2013), no. 7.

  2. 2014-01-14

    A Video Database for the Development of Stereo-3D Post-Production Algorithms

    This paper introduces a database of freely available stereo-3D content designed to facilitate research in stereo post-production. It describes the structure and content of the database and provides some details about how the material was gathered. The database includes examples of many of the scenarios characteristic to broadcast footage. Material was gathered at different locations including a studio with controlled lighting and both indoor and outdoor on-location sites with more restricted lighting control. The database also includes video sequences with accompanying 3D audio data recorded in an Ambisonics format. An intended consequence of gathering the material is that the database contains examples of degradations that would be commonly present in real-world scenarios. This paper describes one such artefact caused by uneven exposure in the stereo views, causing saturation in the over-exposed view. An algorithm for the restoration of this artefact is proposed in order to highlight the usefuiness of the database.

    JVRB, 10(2013), no. 3.

  3. 2013-12-31

    Bitmap Movement Detection: HDR for Dynamic Scenes

    Exposure Fusion and other HDR techniques generate well-exposed images from a bracketed image sequence while reproducing a large dynamic range that far exceeds the dynamic range of a single exposure. Common to all these techniques is the problem that the smallest movements in the captured images generate artefacts (ghosting) that dramatically affect the quality of the final images. This limits the use of HDR and Exposure Fusion techniques because common scenes of interest are usually dynamic. We present a method that adapts Exposure Fusion, as well as standard HDR techniques, to allow for dynamic scene without introducing artefacts. Our method detects clusters of moving pixels within a bracketed exposure sequence with simple binary operations. We show that the proposed technique is able to deal with a large amount of movement in the scene and different movement configurations. The result is a ghost-free and highly detailed exposure fused image at a low computational cost.

    JVRB, 10(2013), no. 2.

  4. 2013-07-23

    Virtual camera synthesis for soccer game replays

    In this paper, we present a set of tools developed during the creation of a platform that allows the automatic generation of virtual views in a live soccer game production. Observing the scene through a multi-camera system, a 3D approximation of the players is computed and used for the synthesis of virtual views. The system is suitable both for static scenes, to create bullet time effects, and for video applications, where the virtual camera moves as the game plays.

    JVRB, 9(2012), no. 5.

  5. 2012-12-28

    High Resolution Image Correspondences for Video Post-Production

    We present an algorithm for estimating dense image correspondences. Our versatile approach lends itself to various tasks typical for video post-processing, including image morphing, optical flow estimation, stereo rectification, disparity/depth reconstruction, and baseline adjustment. We incorporate recent advances in feature matching, energy minimization, stereo vision, and data clustering into our approach. At the core of our correspondence estimation we use Efficient Belief Propagation for energy minimization. While state-of-the-art algorithms only work on thumbnail-sized images, our novel feature downsampling scheme in combination with a simple, yet efficient data term compression, can cope with high-resolution data. The incorporation of SIFT (Scale-Invariant Feature Transform) features into data term computation further resolves matching ambiguities, making long-range correspondence estimation possible. We detect occluded areas by evaluating the correspondence symmetry, we further apply Geodesic matting to automatically determine plausible values in these regions.

    JVRB, 9(2012), no. 8.

VRIC 2010
  1. 2014-02-26

    Connecting Interactive Arts and Virtual Reality with Enaction

    This paper reports on a Virtual Reality theater experiment named Il était Xn fois, conducted by artists and computer scientists working in cognitive science. It offered the opportunity for knowledge and ideas exchange between these groups, highlighting the benefits of collaboration of this kind. Section 1 explains the link between enaction in cognitive science and virtual reality, and specifically the need to develop an autonomous entity which enhances presence in an artificial world. Section 2 argues that enactive artificial intelligence is able to produce such autonomy. This was demonstrated by the theatrical experiment, "Il était Xn fois" (in English: Once upon Xn time), explained in section 3. Its first public performance was in 2009, by the company Dérézo. The last section offers the view that enaction can form a common ground between the artistic and computer science areas.

    JVRB, 11(2014), no. 2.

GRAPP 2011
  1. 2013-12-04

    Using Opaque Image Blur for Real-Time Depth-of-Field Rendering and Image-Based Motion Blur

    While depth of field is an important cinematographic means, its use in real-time computer graphics is still limited by the computational costs that are necessary to achieve a sufficient image quality. Specifically, color bleeding artifacts between objects at different depths are most effectively avoided by a decomposition into sub-images and the independent blurring of each sub-image. This decomposition, however, can result in rendering artifacts at silhouettes of objects. We propose a new blur filter that increases the opacity of all pixels to avoid these artifacts at the cost of physically less accurate but still plausible rendering results. The proposed filter is named "opaque image blur" and is based on a glow filter that is applied to the alpha channel. We present a highly efficient GPU-based pyramid algorithm that implements this filter for depth-of-field rendering. Moreover, we demonstrate that the opaque image blur can also be used to add motion blur effects to images in real time.

    JVRB, 10(2013), no. 5.

  2. 2012-12-31

    Head Tracking Based Avatar Control for Virtual Environment Teamwork Training

    Virtual environments (VE) are gaining in popularity and are increasingly used for teamwork training purposes, e.g., for medical teams. One shortcoming of modern VEs is that nonverbal communication channels, essential for teamwork, are not supported well. We address this issue by using an inexpensive webcam to track the user's head. This tracking information is used to control the head movement of the user's avatar, thereby conveying head gestures and adding a nonverbal communication channel. We conducted a user study investigating the influence of head tracking based avatar control on the perceived realism of the VE and on the performance of a surgical teamwork training scenario. Our results show that head tracking positively influences the perceived realism of the VE and the communication, but has no major influence on the training outcome.

    JVRB, 9(2012), no. 9.

GI VR/AR 2011
  1. 2013-07-29

    Generating and Rendering Large Scale Tiled Plant Populations

    Generating and visualizing large areas of vegetation that look natural makes terrain surfaces much more realistic. However, this is a challenging field in computer graphics, because ecological systems are complex and visually appealing plant models are geometrically detailed. This work presents Silva (System for the Instantiation of Large Vegetated Areas), a system to generate and visualize large vegetated areas based on the ecological surrounding. Silva generates vegetation on Wang-tiles with associated reusable distributions enabling multi-level instantiation. This paper presents a method to generate Poisson Disc Distributions (PDDs) with variable radii on Wang-tile sets (without a global optimization) that is able to generate seamless tilings. Because Silva has a freely configurable generation pipeline and can consider plant neighborhoods it is able to incorporate arbitrary abiotic and biotic components during generation. Based on multi-level instancing and nested kd-trees, the distributions on the Wang-tiles allow their acceleration structures to be reused during visualization. This enables Silva to visualize large vegetated areas of several hundred square kilometers with low render times and a small memory footprint.

    JVRB, 10(2013), no. 1.

GI VR/AR 2012
  1. 2015-01-29

    A Comparative Evaluation of Three Skin Color Detection Approaches

    Skin segmentation is a challenging task due to several influences such as unknown lighting conditions, skin colored background, and camera limitations. A lot of skin segmentation approaches were proposed in the past including adaptive (in the sense of updating the skin color online) and non-adaptive approaches. In this paper, we compare three skin segmentation approaches that are promising to work well for hand tracking, which is our main motivation for this work. Hand tracking can widely be used in VR/AR e.g. navigation and object manipulation. The first skin segmentation approach is a well-known non-adaptive approach. It is based on a simple, pre-computed skin color distribution. Methods two and three adaptively estimate the skin color in each frame utilizing clustering algorithms. The second approach uses a hierarchical clustering for a simultaneous image and color space segmentation, while the third approach is a pure color space clustering, but with a more sophisticated clustering approach. For evaluation, we compared the segmentation results of the approaches against a ground truth dataset. To obtain the ground truth dataset, we labeled about 500 images captured under various conditions.

    JVRB, 12(2015), no. 1.

  2. 2014-12-09

    Application of Time-Delay Estimation to Mixed Reality Multisensor Tracking

    Spatial tracking is one of the most challenging and important parts of Mixed Reality environments. Many applications, especially in the domain of Augmented Reality, rely on the fusion of several tracking systems in order to optimize the overall performance. While the topic of spatial tracking sensor fusion has already seen considerable interest, most results only deal with the integration of carefully arranged setups as opposed to dynamic sensor fusion setups. A crucial prerequisite for correct sensor fusion is the temporal alignment of the tracking data from several sensors. Tracking sensors are typically encountered in Mixed Reality applications, are generally not synchronized. We present a general method to calibrate the temporal offset between different sensors by the Time Delay Estimation method which can be used to perform on-line temporal calibration. By applying Time Delay Estimation on the tracking data, we show that the temporal offset between generic Mixed Reality spatial tracking sensors can be calibrated. To show the correctness and the feasibility of this approach, we have examined different variations of our method and evaluated various combinations of tracking sensors. We furthermore integrated this time synchronization method into our UBITRACK Mixed Reality tracking framework to provide facilities for calibration and real-time data alignment.

    JVRB, 11(2014), no. 3.

  3. 2014-09-04

    Hands-Free Navigation in Immersive Environments for the Evaluation of the Effectiveness of Indoor Navigation Systems

    While navigation systems for cars are in widespread use, only recently, indoor navigation systems based on smartphone apps became technically feasible. Hence tools in order to plan and evaluate particular designs of information provision are needed. Since tests in real infrastructures are costly and environmental conditions cannot be held constant, one must resort to virtual infrastructures. This paper presents the development of an environment for the support of the design of indoor navigation systems whose center piece consists in a hands-free navigation method using the Microsoft Kinect in the four-sided Definitely Affordable Virtual Environment (DAVE). Navigation controls using the user's gestures and postures as the input to the controls are designed and implemented. The installation of expensive and bulky hardware like treadmills is avoided while still giving the user a good impression of the distance she has traveled in virtual space. An advantage in comparison to approaches using a head mounted display is that the DAVE allows the users to interact with their smartphone. Thus the effects of different indoor navigation systems can be evaluated already in the planning phase using the resulting system

    JVRB, 11(2014), no. 4.

  4. 2014-01-31

    Learning Two-Person Interaction Models for Responsive Synthetic Humanoids

    Imitation learning is a promising approach for generating life-like behaviors of virtual humans and humanoid robots. So far, however, imitation learning has been mostly restricted to single agent settings where observed motions are adapted to new environment conditions but not to the dynamic behavior of interaction partners. In this paper, we introduce a new imitation learning approach that is based on the simultaneous motion capture of two human interaction partners. From the observed interactions, low-dimensional motion models are extracted and a mapping between these motion models is learned. This interaction model allows the real-time generation of agent behaviors that are responsive to the body movements of an interaction partner. The interaction model can be applied both to the animation of virtual characters as well as to the behavior generation for humanoid robots.

    JVRB, 11(2014), no. 1.

GI VR/AR 2013
  1. 2015-10-20

    Influence of Information and Instructions on Human Behavior in Tunnel Accidents: A Virtual Reality Study

    Human behavior is a major factor modulating the consequences of road tunnel accidents. We investigated the effect of information and instruction on drivers' behavior as well as the usability of virtual environments to simulate such emergency situations. Tunnel safety knowledge of the general population was assessed using an online questionnaire, and tunnel safety behavior was investigated in a virtual reality experiment. Forty-four participants completed three drives through a virtual road tunnel and were confronted with a traffic jam, no event, and an accident blocking the road. Participants were randomly assigned to a control group (no intervention), an informed group who read a brochure containing safety information prior to the tunnel drives, or an informed and instructed group who read the same brochure and received additional instructions during the emergency situation. Informed participants showed better and quicker safety behavior than the control group. Self-reports of anxiety were assessed three times during each drive. Anxiety was elevated during and after the emergency situation. The findings demonstrate problematic safety behavior in the control group and that knowledge of safety information fosters adequate behavior in tunnel emergencies. Enhanced anxiety ratings during the emergency situation indicate external validity of the virtual environment.

    JVRB, 12(2015), no. 3.

  2. 2015-07-07

    Influence of Comfort on 3D Selection Task Performance in Immersive Desktop Setups

    Immersive virtual environments (IVEs) have the potential to afford natural interaction in the three-dimensional (3D) space around a user. However, interaction performance in 3D mid-air is often reduced and depends on a variety of ergonomics factors, the user's endurance, muscular strength, as well as fitness. In particular, in contrast to traditional desktop-based setups, users often cannot rest their arms in a comfortable pose during the interaction. In this article we analyze the impact of comfort on 3D selection tasks in an immersive desktop setup. First, in a pre-study we identified how comfortable or uncomfortable specific interaction positions and poses are for users who are standing upright. Then, we investigated differences in 3D selection task performance when users interact with their hands in a comfortable or uncomfortable body pose, while sitting on a chair in front of a table while the VE was displayed on a headmounted display (HMD). We conducted a Fitts' Law experiment to evaluate selection performance in different poses. The results suggest that users achieve a significantly higher performance in a comfortable pose when they rest their elbow on the table.

    JVRB, 12(2015), no. 2.

  3. 2014-12-19

    Simulating Wind and Warmth in Virtual Reality: Conception, Realization and Evaluation for a CAVE Environment

    Wind and warmth sensations proved to be able to enhance users' state of presence in Virtual Reality applications. Still, only few projects deal with their detailed effect on the user and general ways of implementing such stimuli. This work tries to fill this gap: After analyzing requirements for hardware and software concerning wind and warmth simulations, a hardware and also a software setup for the application in a CAVE environment is proposed. The setup is evaluated with regard to technical details and requirements, but also - in the form of a pilot study - in view of user experience and presence. Our setup proved to comply with the requirements and leads to satisfactory results. To our knowledge, the low cost simulation system (approx. 2200 Euro) presented here is one of the most extensive, most flexible and best evaluated systems for creating wind and warmth stimuli in CAVE-based VR applications.

    JVRB, 11(2014), no. 10.

  4. 2014-11-28

    Comparison of 2D and 3D GUI Widgets for Stereoscopic Multitouch Setups

    Recent developments in the area of interactive entertainment have suggested to combine stereoscopic visualization with multi-touch displays, which has the potential to open up new vistas for natural interaction with interactive three-dimensional (3D) applications. However, the question arises how the user interfaces for system control in such 3D setups should be designed in order to provide an effective user experience. In this article we introduce 3D GUI widgets for interaction with stereoscopic touch displays. The design of the widgets was inspired to skeuomorphism and affordances in such a way that the user should be able to operate the virtual objects in the same way as their real-world equivalents. We evaluate the developed widgets and compared them with their 2D counterparts in the scope of an example application in order to analyze the usability of and user behavior with the widgets. The results reveal differences in user behavior with and without stereoscopic display during touch interaction, and show that the developed 2D as well as 3D GUI widgets can be used effectively in different applications.

    JVRB, 11(2014), no. 7.

VRIC 2012
  1. 2014-10-15

    Virtual Reality as a Support Tool for Ergonomic-Style Convergence

    The competitive industrial context compels companies to speed-up every new product design. In order to keep designing products that meet the needs of the end user, a human centered concurrent product design methodology has been proposed. Its setting up is complicated by the difficulties of collaboration between experts involved inthe design process. In order to ease this collaboration, we propose the use of virtual reality as an intermediate design representation in the form of light and specialized immersive convergence support applications. In this paper, we present the As Soon As Possible (ASAP) methodology making possible the development of these tools while ensuring their usefulness and usability. The relevance oft his approach is validated by an industrial use case through the design of an ergonomic-style convergence support tool.

    JVRB, 11(2014), no. 5.

GI VR/AR 2014
  1. 2016-03-18

    Advanced luminance control and black offset correction for multi-projector display systems

    In order to display a homogeneous image using multiple projectors, differences in the projected intensities must be compensated. In this paper, we present novel approaches to combine and extend existing techniques for edge blending and luminance harmonization to achieve a detailed luminance control. Furthermore, we apply techniques for improving the contrast ratio of multi-segmented displays also to the black offset correction. We also present a simple scheme to involve the displayed context in the correction process to dynamically improve the contrast in brighter images. In addition, we present a metric to evaluate the different methods and their influence on the visual quality.

    JVRB, 12(2015), no. 4.

VRIC 2011
  1. 2014-10-21

    Collision Detection: Broad Phase Adaptation from Multi-Core to Multi-GPU Architecture

    We present in this paper several contributions on the collision detection optimization centered on hardware performance. We focus on the broad phase which is the first step of the collision detection process and propose three new ways of parallelization of the well-known Sweep and Prune algorithm. We first developed a multi-core model takes into account the number of available cores. Multi-core architecture enables us to distribute geometric computations with use of multi-threading. Critical writing section and threads idling have been minimized by introducing new data structures for each thread. Programming with directives, like OpenMP, appears to be a good compromise for code portability. We then proposed a new GPU-based algorithm also based on the "Sweep and Prune" that has been adapted to multi-GPU architectures. Our technique is based on a spatial subdivision method used to distribute computations among GPUs. Results show that significant speed-up can be obtained by passing from 1 to 4 GPUs in a large-scale environment.

    JVRB, 11(2014), no. 6.

CVMP 2013
  1. 2014-12-08

    A user supported object tracking framework for interactive video production

    We present a user supported tracking framework that combines automatic tracking with extended user input to create error free tracking results that are suitable for interactive video production. The goal of our approach is to keep the necessary user input as small as possible. In our framework, the user can select between different tracking algorithms - existing ones and new ones that are described in this paper. Furthermore, the user can automatically fuse the results of different tracking algorithms with our robust fusion approach. The tracked object can be marked in more than one frame, which can significantly improve the tracking result. After tracking, the user can validate the results in an easy way, thanks to the support of a powerful interpolation technique. The tracking results are iteratively improved until the complete track has been found. After the iterative editing process the tracking result of each object is stored in an interactive video file that can be loaded by our player for interactive videos.

    JVRB, 11(2014), no. 9.

  1. 2017-01-19

    A Comprehensive Framework for Evaluation of Stereo Correspondence Solutions in Immersive Augmented and Virtual Realities

    In this article, a comprehensive approach for the evaluation of hardware and software solutions to support stereo vision and depth-dependent interactions based on the specific requirements of the human visual system within the context of augmented reality applications is presented. To evaluate stereo correspondence solutions in software, we present an evaluation model that integrates existing metrics of stereo correspondence algorithms with additional metrics that consider human factors that are relevant in the context of outdoor augmented reality systems. Our model provides modified metrics of stereoacuity, average outliers, disparity error, and processing time. These metrics have been modified to provide more relevant information with respect to the target application. We illustrate how this model can be used to evaluate two stereo correspondence methods: the OpenCV implementation of the semi-global block matching, also known as SGBM, which is a modified version of the semi-global matching by Hirschmuller; and ADCensusB, our implementation of ADCensus, by Mei et al.. To test these methods, we use a sample of fifty-two image pairs selected from the Kitti stereo dataset, which depicts many situations typical of outdoor scenery. Further on, we present an analysis of the effect and the trade-off of the post processing steps in the stereo algorithms between the accuracy of the results and performance. Experimental results show that our proposed model can provide a more detailed evaluation of both algorithms. To evaluate the hardware solutions, we use the characteristics of the human visual system as a baseline to characterize the state-of-the-art in equipment designed to support interactions within immersive augmented and virtual reality systems. The analysis suggests that current hardware developments have not yet reached the point where their characteristics adequately match the capabilities of the human visual system and serves as a reference point as to what are the desirable characteristics of such systems.

    JVRB, 13(2016), no. 2.

EuroVR 2015
  1. 2017-02-28

    Presenting a Holistic Framework for Scalable, Marker-less Motion Capturing: Skeletal Tracking Performance Analysis, Sensor Fusion Algorithms and Usage in Automotive Industry

    Even though there is promising technological progress, input is currently still one of virtual reality's biggest issues. Off-the-shelf depth cameras have the potential to resolve these tracking problems. These sensors have become common in several application areas due to their availability and affordability. However, various applications in industry and research still require large-scale tracking systems e.g. for interaction with virtual environments. As single depth-cameras have limited performance in this context, we propose a novel set of methods for multiple depth-camera registration and heuristic-based sensor fusion using skeletal tracking. An in-depth accuracy analysis of Kinect v2 skeletal tracking is presented in which a robot moves a mannequin for accurate, reproducible motion paths. Based on the results of this evaluation, a distributed and service-oriented marker-less tracking system consisting of multiple Kinect v2 sensors is developed for real-time interaction with virtual environments. Evaluation shows that this approach helps in increasing tracking areas, resolving occlusions and improving human posture analysis. Additionally, an advanced error prediction model is proposed to further improve skeletal tracking results. The overall system is evaluated by using it for realistic ergonomic assessments in automotive production verification workshops. It is shown that performance and applicability of the system is suitable for the use in automotive industry and may replace conventional high-end marker-based systems partially in this domain.

    JVRB, 13(2016), no. 3.

VRIC 2015
  1. 2018-06-06

    A Classification of Human-to-Human Communication during the Use of Immersive Teleoperation Interfaces

    We propose a classification of human-to-human communication during the use of immersive teleoperation interfaces based on real-life examples. While a large body of research is concerned with communication in collaborative virtual environments (CVEs), less research focuses on cases where only one of two communicating users is immersed in a virtual or remote environment. Furthermore, we identify the unmediated communication between co-located users of an immersive teleoperation interface as another conceptually important — but usually neglected — case. To cover these scenarios, one of the dimensions of the proposed classification is the level of copresence of the communicating users. Further dimensions are the virtuality of the immersive environment, the virtual transport of the immersed user(s), the point of view of the user(s), the asynchronicity of the users’ communication, the communication channel, and the mediation of the communication. We find that an extension of the proposed classification to real environments can offer useful reference cases. Using this extended classification not only allows us to discuss and understand differences and similarities of various forms of communication in a more systematic way, but it also provides guidelines and reference cases for the design of immersive teleoperation interfaces to better support human-to-human communication.

    JVRB, 14(2017), no. 1.