EuroITV 2006
Video Composer and Live Video Conductor:
Future Professions for
the Interactive Digital Broadcasting Industry
extended and revised for JVRB
urn:nbn:de:0009-6-10767
Keywords: Interactive TV, Video Composer, Video Conductor, Professions, Digital Broadcasting Framework
Subjects: Interactive Television
Abstract
Innovations in hardware and network technologies lead to an exploding number of non-interrelated parallel media streams. Per se this does not mean any additional value for consumers. Broadcasting and advertisement industries have not yet found new formats to reach the individual user with their content.
In this work we propose and describe a novel digital broadcasting framework, which allows for the live staging of (mass) media events and improved consumer personalisation. In addition new professions for future TV production workflows which will emerge are described, namely the ′video composer′ and the ′live video conductor′.
Television is undergoing a historical change. Interactive Digital Broadcasting will be reality in 2010+. Video material will be generated by TV producers in abundance, which will overwhelm both the broadcasters as well as the consumers. Current TV formats and forms of broadcasting do not satisfy the personal moods and interests of the consumer.
Television originated from theatre and film. Both media (and art) forms tend to be of a primarily entertaining nature and entail a lean-back mentality of the consumer [ Mel03 ]. To expect that consumers will simply switch to a complete leanforward mentality like they do when sitting in front of a PC once iTV becomes reality is an illusion [ FL03 ]. Moreover it is not desirably either, since both, producer and end user, are overwhelmed by the mere quantity of live video material provided.
On the other hand art has a long tradition in dealing with the question of variability in the process of presentation. Already at the end of the 19th century random processes were introduced as a deliberate aesthetic element. Early experiments on the basis of electronic machines were carried out in music in the middle of the 20th century. Young Artists tried to overcome the extreme determinism of serial compositions. Under the term aleatoric composition musicians like Karlheinz Stockhausen, John Cage, Pierre Boulez and Earle Brown worked on more open methods for composition. Openness in art means that certain details of a piece are left undefined by the composer. The conductor (i.e. the interpreter) is left with the task to fill the gaps while performing the piece. Another possibility is to determine the undefined parts by using the outcome of random processes. But randomness in art does not mean that the resulting work is completely unforeseeable. Art uses random processes as a medium to achieve variable but still planned and well-formed structures. Randomness is used as a generative method to achieve structural freedom and to attack the stiffness of closed works.
Bearing in mind the existing (and future) infrastructures of digital video broadcasting when bringing this kind of variability to the medium TV we hence propose the development of a TV environment, which allows for the establishment of ′virtual personalised channels′. To do so, (live) semantic annotation of video material as well as methods for live staging of media events have to be designed. The result is a drastically different more evolutionary process of content production and a different form of consuming. This evolutionary character will give rise to multiple variations and categories of possible broadcasting and presentation formats, which are better suited for satisfaction of individual human needs. The approaches outlined in this paper are the basis for our upcoming IST research project LIVE.
Due to the growing technical possibilities the today broadcasting industry is able to offer a huge number of different channels. Nevertheless we can observe two main tendencies due to the ′24 hour per day restriction′ for every channel:
-
Channels serving and restricting themselves to a singular and very special category, e.g. Golf, news loops, cartoons, region, etc. (i.e. ′Greatest Common Divider′ approach)
-
Channels serving a mean set of interests trying to reach the mass audience (i.e. ′Least Common Denominator′ approach)
Of course there are some exceptions to these tendencies. Broadcasters like some public stations try to combine broad overview with intermittent in depth coverage (e.g. arte theme night). But the current setting of autonomous media streams (i.e. ′channels′) will not lead to a satisfaction of the individual consumer′s needs. And obviously the number of channels serving certain moods are (and will always be) too small compared to billions of individual moods. A ′solution′, to offer one channel per single consumer, again is an illusory vision (too many consumers, too few ′classical producers′). In addition each single end user has varying (and unknown) moods. To offer ′video on demand′ (mainly feature films) is just a fragmentary answer to this problem since the hunger of being part of the world community and the coverage of live events cannot be satisfied. Hence solutions are urgently needed for the treatment of such live events for future digital broadcasting.
For the creation of iTV productions some commercial products (authoring tools) are available. One of the products, ′Cardinal Studio′ from Cardinal Information Systems Ltd. [ Ltd06 ], allows for the design of MHP-based iTV applications. Possible applications which can be created with Cardinal Studio are enhanced digital teletext with embedded graphics, news-tickers displaying the latest news and stock headlines, weather service with a world weather overview, electronic program guides and simple trivia games. Services like this are provided by almost every public broadcaster who is capable of transmitting iTV productions. With these applications the limits are already reached, since the application only allows for the editing of the graphical layer on top of the audio-visual content.
In order to realise a TV application, which accompanies the content, the described system is not sufficient, because the interrelationships of the TV content cannot be synchronised for the final application. For such cases ′on-Q Create Suite′ by Ensequence [ Ens06 ] offers a solution. It provides a timeline to arrange the content. It offers a huge range of solutions from organising the TV program to defining the behaviour of interactive elements such as buttons and texts and assists the author during the entire workflow. A comparable product to realise the above described scenario is ′Modelstream′ by emuse-technologies [ ET06 ]. All of the above products do not support more than one TV program and therefore do not provide any way to create a relationship between parallel TV content streams. Their main focus is the graphical presentation and user interface. Since they all were developed for linear TV productions they do not consider the new possibilities of nonlinear productions or events with parallel streams.
One approach to link the different audio-visual contents is done in the DVD production ′Switching′ by Oncotype [ Pro03 ]. All through the story the actors pause for a moment in order to give notice to the viewer of the possibility to switch to another ′story branch′. Thus every user is following her or his own interest, creates an own path through the story and defines the duration of the story. Nevertheless this intriguing approach is not suitable for television because of the time-based nature of broadcasting. For television the story of every viewer may be variable in its path but not in its duration.
In [ MW03 ] the scheduling and automation system of the BBC′s interactive TV playout system is presented. iTV events can be activated by an operator, by a calendar or by filters listening to incoming data from the automation systems. This rule based system is fully integrated in the broadcasting chain.
For combining media streams in a meaningful way a description of the media content is required. To do so media content has to be described in a multiperspective way, ranging from description of camera angles for a single shot up to the semantics of a movie or the action of humans within a video. In Marc Davis iconic visual language Media Streams [ Dav95 ] this issue was addressed for offline annotation and repurposing of audio and video content. It is an open question if this approach is suitable for TV productions, and in particular if this approach can be use for annotating live events in (near-)real-time.
A promising approach for the personalisation of broadcasting content is described in [ RWC04 ]. With the help of a trellis graph (i.e. a directed acyclic graph arranged on a timeline) a storyline is modelled. Nodes represent content segments and edges between the nodes represent permitted transitions between these content segments. This formal model is integrated into an overall framework, where adaptive content is produced at the broadcaster side. Yet it is still unclear, how this approach can be adapted for the production of live events, here exact knowledge about the time when certain media content is produced is not available and also the semantics of the events can not be anticipated accurately.
A more evolutionary character of the envisioned future digital broadcasting environment entails three key elements:
-
Variation: The existence and availability of a range of video material possibly dealing with one and the same event.
-
Selection: Out of this range consumers and professional users (live video conductors) choose only certain individual video objects during the running broadcasting scenario. It is important to stress this latter point, since it is this very type of live selection, which gives the environment a performative character in contrast to usual, preproduction and pre-selection authoring methods in the TV industry.
-
Recombination: ′Successful′ (in terms of extensive use) video objects are reused, cited and rearranged during future staging processes. Due to the huge variety of user and consumer interests most of the video objects will ′survive′ and only a few might get lost forever (in analogy to natural evolution). It is important however, that the environment ′is able to utilise′ these objects, meaning, there has to be an abstract description of the video object (the actual representation).
Putting the consumer in the centre of our research we ask ourselves how we can create an environment which allows for personalised TV in covering mass media events. Our approach to this dilemma is to offer ′everything′ (i.e. all, possibly unedited camera streams) and at the same time provide the prerequisites for the establishment of ′virtual personalised channels′. To do so we have to realise the following:
-
Production methods and tools for ′staging′ or ′conducting′ a live event have to be conceived. Design principles for staging live media events and for content creation have to be developed and workflows have to be defined which support the collaborative staging by professional users. The interfaces have to hide the underlying complexity of the media objects and the workflow requirements.
-
Personal user interests have to be linked semantically with video content. Within a robust framework, which allows for the live staging of media events, methodologies for staging and content research, content oriented detection, extraction and annotation of video material and the personalisation of the users [ LLN02 ] have to work together and enforce each other in an intelligent way. The main effort will be to develop an open framework for broadcasting environments, which utilises the interaction of cognition based content knowledge with social based consumer knowledge (connect item based knowledge with user based knowledge). Scientific questions to be answered will be:
-
How/what will be annotated?
-
How do we link the annotated video objects (consumer/ video conductor)
The goals of this research theme are to develop new methods, tools and interfaces for detection, extraction (of knowledge) and annotation of video material and other resources. A special focus is the ′in-vivo generation of knowledge′, i.e. annotation metadata is created while content is created. More specifically this will include
-
Automatic offline and online detection and extraction of knowledge from media archives and from live produced media and the annotation of media objects
-
Semi-automatic online annotation of live video material. Human annotation with metaknowledge about content will be used for constellations where automation is not possible (e.g. social meaning).
-
Automatic integration of knowledge and content from external resources
-
Integration of all the above methods to support the staging process
-
Equipping the audio-visual component of an intelligent video object with an additional component which holds (invisible) semantic metadata radically changes the process dynamics of producing, broadcasting and consuming. Now all essential prerequisites are given for a more ′evolutionary′ character of the production process:
Instead of broadcasting just a single, carefully edited video stream chosen by a producer, several (or even all) camera streams are broadcasted.
Besides the mere increase of the number of ′allowed′ images this means the availability of an entire set of images for a particular incident. The variability of these (possibly imperfect) images allows for a direct or indirect (e.g. mood based) selection and a recombination not only by the professional producer, but also by the individual consumer.
As a consequence new and ′personalised′ perspectives (or formats) evolve as an additional value for consumers. Closing the system by a feedback of consumer desires (e.g. annotation information) to the on-site producers (or even camera men) allows for a replication loop through the consumer′s influence on the generation of new images.
The vision developed above gives rise to the emergence of completely novel professions such as Video Composer and Video Conductor as an analogue to a music composer and conductor who stages live events for an audience. Staging for digital TV is comparable to composing and conducting a master piece of music for an orchestra. The music composer as well as the conductor have the parameter of time and the possibility of using several different instruments playing parallel at the same time to create a master piece of music. The composer writes the piece and the conductor directs the live music event.
During staging for digital TV the video composer and the conductor are also restricted by the time limit of the TV broadcast and they can use parallel videos to create a nonlinear story composition or a live media event. The video composer and the conductor can synchronise and link the parallel video streams at special points in time to allow the viewer to switch channels due to her/his interests. The added value for the user is a deeper emotional and intellectual experience by intuitive mood based navigation and the serving of her/his individual interests.
A remaining key question for the switching concept is: how can the video composer and conductor stimulate the viewer to actively switch the video stream at the right time without disturbing her/his immersion into the story or the live media event?
The role of the Video Composer is to investigate and develop the different design principles for staging live media events in the off-line mode whereas the role of the Video Conductor is to test out these staging methods during the live broadcast (or maybe even invent new ones by improvising during the live show).
Staging live media events means for the Video Conductor to create a nonlinear multi-stream video show in real-time, which changes due to the interests of the end user (consumer). Today, with the introduction of digital TV it is possible and easy for public broadcasters such as BBC, ARD and ORF to offer multi-stream videos (i.e. a digital bouquet) about live media events such as the Olympic Games, because live and archived material is existing in abundance.
But there are missing methods and tools for the creation of real-time broadcasted parallel video streams with the evolutionary character described in Section 3 .
The upcoming IST Integrated Project ′LIVE′ will address these research questions and will use the experience and results gathered in the previous project ′MECiTV′.
In the IST-research project MECiTV a first set of
design principles to stage nonlinear iTV stories were
developed. An interactive docudrama ′Vision Europe′
was produced, where these design principles where
tested.
The basic idea behind ′Vision Europe′ is described by
the video composers as follows:
“Vision Europe is a nonlinear docudrama
for iTV which leads the viewer through a
staged story composition, a labyrinth of time
and space. It is a time travel through the
past, present and future, stories of the past
are linked to stories of the present and
the future by reappearing protagonists and
locations.
The viewer sits on a sofa in front of a TV screen, the remote control in her/his hand, and will follow individual people on the screen on their search for the Vision Europe.
As soon as she/he is attracted by a protagonist, the visitor can follow her/him into her/his life, work, find out more about her/him and take a travel through time with her/him. Just by pressing the ′OK′ button on the remote control in special moments in time the visitor can switch the video streams. The visitor is immersed into the story and she/he will press intuitively, when she/he feels like it. Sometimes she/he gets an explicit hint by the moderator on the screen, that it is possible to switch the video service, sometimes it is a second video screen or just a feeling. If the visitor feels satisfied with what she/he watches on the screen in the moment, she/he may not switch at all. There is no punishment for passive viewers.”
The production of the docudrama started on April 12th, 2003 in Budapest, when the Hungarians voted for joining the EU and the filming ended on May 1st, 2004, when the country finally entered the EU. Some material used in the film is historical and archived video material.
In ′Vision Europe′ ten different ′paths′ of a story are told in parallel. Two tracks (i.e. ′Sziget′ and ′Michelle′) are called ′main streams′ and form the structural backbone of the production. In Figure 2 the ten tracks are shown as horizontal bars. The time when the viewer is able to switch from one track to another is represented by a transition point.
Thus the viewer is only able to switch during these intervals from the current track she or he watches to another by pressing the ′OK′ button on the remote control. Pressing the ′OK′ button outside these intervals leads to one of the two main streams.
In MECiTV it was searched for alternatives to the ′classical′ approach of guiding the consumer with the help of a graphical user interface through the video production. It was proposed not to use any special graphical elements for a production or at least to use them only sparsely. To achieve this, two design patterns were proposed, namely: Moderator Driven Navigation and Navigation by Visualisation. These are content based approaches to stimulate audience interaction with the video stream. These patterns were explored in the actual production ′Vision Europe′, which was produced with the help of the iTV Composing Tool developed within the MECiTV project.
This approach is realised by a story path where the end user is directly addressed by a moderator. The moderator explicitly asks the user to use the ′OK′ button of the remote control to get for example more information on a person. The moderator also explains the alternatives the user has. As an example, not using the switching functionality will lead to an automatic start of a certain story.
Figure 3. Guidance: The moderator tells the user explicitly to follow her by pressing the remote control
This easy concept helps to introduce the viewers step by step to the capabilities of iTV and motivates interaction. On the other hand, the drawback of this method is that a moderator or at least a voice is needed to explain the interaction possibilities. Besides the increase of personal expenses, the tension of the global story or immersion of the viewer may be lost if many interventions of a moderator are made.
The second pattern which has been explored is called ′navigation by visualisation′. Instead of the proactive interaction request by a moderator, the viewer is seduced to switch by providing audiovisual temptations. The idea is to not break the story flow and the tension but instead give the viewer some subtle audio-visual clues on the actions in another story path. In our production this is achieved by attaching a small moving window showing the current scenery of another story path without sound (Figures 4 - 9 ).
In contrast to the first pattern, the viewer can already see what to expect if pressing the OK button and switching to the target path. The target path itself can again have reference to another target path and so forth. In Vision Europe three different variations of transitions were implemented: Seduction, Illustration and Temptation:
-
Seduction: the protagonist seduces the user by a body gesture to follow her (Figure 4 , Figure 5 ).
-
Illustration: a second video shows the content of the main video and gives a deeper insight into the story (Figure 6 , Figure 7 ).
-
Temptation: a second video appears on the screen tempting the user to switch to another channel and change the experience (Figure 8 , Figure 9 ).
Among others this first set of design patterns will also be tested and used in our upcoming research project LIVE by the video conductor for staging of live media events.
After its premiere for an invited audience at the Laboratory for Mixed Realities in 2004 the nonlinear docudrama ′Vision Europe′ was presented at the IBC 2004 in Amsterdam and at the 2004 IST Conference in The Hague. In addition a special DVD version of the docudrama was created which was handed out to numerous individuals from the media production and TV industry around Europe.
People who watched and interacted with our ′Vision Europe′ production for the most part liked the concept and understood it quickly. Only a very short introduction of two sentences was necessary to explain the interaction concept and the use of the OK button of the remote control. Reactions of viewers ranged from ′what′s the novelty?′ to ′very interesting′. Interestingly, out of the small percentage of people asking for the novelty in this concept many were expecting ′typical′ iTV behaviour like graphical user interfaces and popup windows. Overall it is fair to say, that this prototype of an iTV production was a great success and people are ready to accept this kind of iTV production. It was even proposed by some to transfer this concept to other media such as broadband internet access (video on demand) and radio (interactive radio production).
As a result of the reactions several suggestions concerning the improvement of iTV production and application could be extracted. Envisioned formats that would go far beyond our prototype were the transition from only pre-produced material to the use of live material. Thus in MECiTV the central idea for the project LIVE was born, that the interactive coverage of live events e.g. elections and major sport events like the Olympic Games would be the perfect playground for challenging new forms of iTV content.
A novel digital broadcasting framework was proposed, which allows for the live staging of (live) media events and an improved consumer personalisation due to its evolutionary character. In addition new professions for future TV production workflows will emerge, namely the ′Video Composer′ and the ′Live Video Conductor′.
Starting from results achieved in a preceding iTV research project MECiTV several research questions will be addressed in the upcoming IST Integrated Project ′LIVE′
One of the main problem is the missing of methods and tools for the video conductor to stage live media event such as the 2008 Olympic Games. The professional user should be enabled to link live video streams at special (spontaneous) points in time, so that the consumer can switch channels due to her/his interests.
Connected with this question another question has to be answered: How can the professional user stage content based and time-driven live video streams and get feedback from the consumers directly? Content based means that the professional user can link videos when the content ′demands′ it.
If for example in a live moderation about the Olympics an athlete wins unexpectedly, the video conductor should be enabled to spontaneously link in this moment the winners video to a live interview with an (archived) background information about the athlete. In addition links might be established to other competitions in the same sport discipline out of the archive. The consumer must be enabled to switch to one of these video streams which are thematically interlinked to the video stream she/he is just watching.
Furthermore methods and tools for the professional user to analyse and visualise in real-time the consumer feedback are required, who needs answers to questions like:
-
Does the consumers like the video show in progress or not (voting/explicit behaviour)?
-
How does she/he navigate through the offered streams (implicit behaviour)?
The vision is that the viewer of tomorrow will not follow channels anymore. Rather individual consumers will follow their favourite Video Conductor or even navigate by themselves through live media events such asthe Olympic Games. At times the viewer will be in the mood to lean back on the sofa and be guided by the Video Conductor, then she/he leans forward and takes control over the journey through the video streams. Thus the human need of loosing control (by being entertained) as well as taking over the control (by interaction) will be adressed.
The authors would like to thank Anibal Kreki, Misael Mikic and Stefan Krauss (†) for their contributions and support. This work was partially funded by the European Commission within the 5th framework of the IST under grant number IST- 2001-37330. Our research work is continued in the LIVE project, which is funded by the European Commission within the 6th framework of the IST under grant number FP6-27312.
All statements in this work reflect the personal ideas and opinions of the authors and not necessarily the opinions of the European Commission.
[Dav95] Readings in Human-Computer Interaction: Toward the Year 2000, Chapter 13 - Hypertext and Multimedia: Media Strams: An Iconic Visual Language for Video Representation, pp. 854—866, Morgan Kaufman Publishers, Inc., 1995, isbn 1-55860-246-1.
[Ens06] 2006, http://www.ensequence.com, Retrieved March 12th, 2006. on-Q Create Suite,
[ET06] 2006, http://www.emuse-tech.com, Retrieved March 12th, 2006. Modelstream,
[FL03] Using Attitude Based Segmentation to Better Understand Viewers′ Usability Issues with Digital and Interactive TV Proceedings of the European Conference on Interactive Television EuroITV '03: From viewers to actors?, 2003, pp. 91—97.
[LLN02] Personalized Contents Guide and Browsing based on User Preference , Proceedings of the 2nd Workshop on Personalization in Future TV, 2002, Malaga, Spain.
[Ltd06] 2006, www.cardinal.fi, Retrieved March 12th, 2006. Cardinal Information Studio,
[Mel03] Interactive Telecommunications Program, Tisch School of the Arts, New York, 2003, Retrived March 12th, 2006, from massive-media.org . Massive Media,
[MW03] 2003, IBC Conference Papers, Retrieved March 12th, 2006, from www.itvworld.com. Case Study: Automation of the BBC's interactive TV Playout System,
[Pro03] 2003, www.switching.dk/en, Retrieved March 12th, 2006. Switchin,
[RWC04] Personalized digital television, Human-Computer Interaction Series 6, Part 2: Broadcast News and Personalized Content - 9: Content Morphing: A Novel System for Broadcast Delivery of Personalizable Content, pp. 235—255, Kluwer Academic Publishers, 2004, isbn 1-4020-2163-1.
Fulltext ¶
- Volltext als PDF ( Size 544.2 kB )
License ¶
Any party may pass on this Work by electronic means and make it available for download under the terms and conditions of the Digital Peer Publishing License. The text of the license may be accessed and retrieved at http://www.dipp.nrw.de/lizenzen/dppl/dppl/DPPL_v2_en_06-2004.html.
Recommended citation ¶
Richard Wages, Carmen Mac Williams, Stefan M. Grünvogel, and Georg Trogemann, Video Composer and Live Video Conductor: Future Professions for the Interactive Digital Broadcasting Industry. JVRB - Journal of Virtual Reality and Broadcasting, 4(2007), no. 10. (urn:nbn:de:0009-6-10767)
Please provide the exact URL and date of your last visit when citing this article.