HC 2006
Articulated Narrowcasting for Privacy and Awareness
in Multimedia Conferencing Systems
and Design for Implementation Within a SIP Framework
First presented at the International Conference of Human and Computer HC 2006,
extended and revised for JVRB
urn:nbn:de:0009-6-14724
Abstract
This article proposes a new focus of research for multimedia conferencing systems which allows a participant to flexibly select another participant or a group for media transmission. For example, in a traditional conference system, participants' voices might by default be shared with all others, but one might want to select a subset of the conference members to send his/her media to or receive media from. We review the concept of narrowcasting, a model for limiting such information streams in a multimedia conference, and describe a design to use existing standard protocols (sip and sdp) for controlling fine-grained narrowcasting sessions.
Keywords: Narrowcasting, sip, sdp, conferencing, device capability, media direction control, privacy and awareness
Subjects: Session Initiation Protocol, Privacy
Multimedia conferencing has been in the research agenda for many years. A traditional conferencing system over the pstn (public switched telephone network) has many features implemented in a centrally controlled conference server. The development of ip technology has brought new media (e.g. video) into conferencing systems. H.323 [ IT03 ] and sip (Session Initiation Protocol) [ RSC02, Joh04 ] are popular protocols for ip-based conferencing systems. Sip, a simpler text-based protocol developed by the ietf (Internet Engineering Task Force), added presence features allowing users to discover the availability of participants and also, with a large extension, to control media transmission and the direction from the endpoints. Although sip was designed for multimedia conferencing systems, only VoIP applications have yet gained popularity in the industry and received priority in the sip design community (working groups). Sipping [ CBP08 ] and xcon [ JRPJ08 ] wgs inside the ietf are considering conferencing frameworks. While sipping is designing a conferencing framework using sip, the xcon system is independent of any signaling protocol. Both conferencing models focus only on centralized conferencing systems, where the signaling and media mixing are handled by a central conference server and centralized media mixer.
However, one may want to control media of a particular participant- e.g., participant P1 wanting to block media from participant P2 or wanting to receive media streams only from participant P3. Controlling such media vectors from an endpoint has been a challenging issue. As a simple example, a user's voice might by default be shared with all others in a conference, but a versatile interface would allow a secret to be shared only with some selected subset of the members. Current commercially-available conference systems do not generally support such features.
Our research introduces a flexible multiparty multimedia user-adjustable conference system, including “narrowcasting” functionality, as an application within the sip framework. A human user wants to distribute attention and availability, and narrowcasting provides a formalization of such presence filters. Narrowcasting systems extend broad- and multicasting systems by allowing media streams to be filtered- for relevancy control, privacy, security, and user interface optimization. As sip was designed for multimedia session control, narrowcasting attributes can be implemented within the existing sip framework. In this article, we propose a design for narrowcasting attributes and consider the feasibility of implementing it in a sip framework.
The rest of this article is structured as follows. Section 2 reviews some background information regarding conferencing. In section 3 is explained our proposal for a sip-based implementation. Section 4 details the call flow of narrowcasting implementation in sip. Finally, the conclusion and ideas for future research are presented in section 5 .
This section discusses a common conference architecture, requirements of a typical conference systems, and limitations of existing systems.
Conventional conferencing systems can be categorized into three different types, depending upon where media streams from participants are mixed.
Centralized Conferencing
A centralized conference [
SKBR03
] bridge exists
in a centralized model. The conference bridge is a
conceptually simple device, consisting of a sip user agent
to handle signaling, an rtp mixer to handle the media
streams, a conference application layer for authentication,
authorization & accounting services, and possibly
conference control functions. Participants establish
one-to-one media and signaling connections with the
bridge. The bridge establishes voice paths between
endpoints by collecting input signals and returning
summed signals to conferees. Figure 1 illustrates how
the media is mixed, (en/de)coded if necessary, and
redistributed to participants.
Most current multimedia conferencing systems fall into this category. As permissions are controlled by an administrator (a.k.a. floor controller), end users don't have much access to configuration features.
Decentralized Conferencing
In a decentralized model, signaling control is centralized
but media are exchanged between participants without
going through a centralized bridge. There is no conference
server or central point of control. Decentralized
conferencing can be either of two types: full mesh or
multicast.
A. Full Mesh Conferencing: A full-duplex media link (Figure 2) can be established between every pair of participants, resulting in a fully-connected mesh. Each endpoint transmits a copy of its stream to the N - 1 other endpoints, and receives N - 1 streams in return, on separate ports. Each pair of participants can communicate through any mutually supported codec type.
B. Multicast Conferencing: In a multicast conference, participants join a session by subscribing to a conference multicast address. This address might be advertised by one of the participants or a central server, or distributed to conferees prior to a conference. Each participant transmits a single copy of his stream to the conference multicast address, receiving N - 1 streams in return. From a receiver perspective, nothing changes from the full mesh arrangement except that the streams arrive on a single port. Multicast conferences can scale up to millions of users and do not really require any sip signaling. However, native multicast is not yet widely available.
To implement a flexible end-to-end conferencing system, the following considerations apply:
General requirements: A conference control framework should be scalable, extensible, generic, reliable, and secure. The scalability requirement means that the conference control framework must support reasonably large, geographically distributed, conferences. Moreover, it should be extensibly modular so that new components can be easily added or existing components changed. The conference control framework must also be generic so that it is not tied to any particular application. While conference control protocols are likely to consume significantly less bandwidth than media streams, some care needs to be taken for large conferences. Since the conference description and policy information can be massive, incremental updates are preferred to having to resend entire descriptions after each change. Similarly, changes in participant lists should be distributed as additions and removals. Also, not all participants care about the same level of detail; for example, some may only be interested when new members join or leave, but not when a participant adds herself to a floor queue. The importance of reliability and security is obvious.
Session establishment: A mechanism is required to establish connections among multiple participants, to manipulate and describe media “mixing” or “topology” for multiple media types (audio, video, text, position data, etc.). Sip is a good candidate for this purpose. Technical challenges involve flexibly defining the media and its transmission using the sdp (Session Description Protocol) [ HJ98 ].
Network resource management: Network resources are an important factor determining the communication quality of a conference, or “QoS” (quality of service). Conferencing on a best-effort internet is an on-going challenge. Large delay or jitter irritates participants and degrades conference quality. Considering network characteristics and available bandwidth, proper encoding/decoding schemes must be deployed.
Policy: A user rights database specifies the privileges of potential participants. User rights lists might include information about who can authorize the admission or expulsion of participants and who can act on floor control requests. Such functions are often combined into the role of a moderator, but a flexible system should allow them to be distributed among a set of participants.
Security and privacy: Unwelcome participants are excluded, so no unauthorized party may intrude upon or eavesdrop in a conference. A mechanism for membership and authorization control is required. The policy may describe which users are pre-authorized to join (“white list”), are explicitly forbidden from joining (“black list” or “block list”), or may join but in listen-only (“lurk”) mode. Since internet-based signaling protocols offer a variety of authentication mechanisms, a policy might also define at what strength each participant must authenticate. Unauthenticated users may be rejected or relegated to audience status.
Over the years, there have been many studies in the area of conference control [ KSW02, SNS01 ]. Most earlier works discuss only floor control aspects of conference control. Standardization efforts have met with limited success. H.323, developed by the itut, has several problems, including scalability issues due to insufficient T.124 database replication protocol and its limitation to binary asn.1 format (not text-based) protocol. Sip, in contrast, is a text-based protocol which can easily interact with other internet protocols. Sip is a signaling protocol for creating, modifying, and terminating multimedia sessions between multiple participants. Conferencing is possible using standard sip methods [ RSC02 ], allowing users to join and leave conferences and allowing invitation of other participants. However, sip by itself does not offer configurable conference policies, participant access lists, floor control, or user privilege levels. The sipping (Session Initiation Protocol Project Investigation) [ CBP08 ] wg is chartered to develop requirements for extensions to sip needed for multi-party applications. Xcon, working closely with sipping, focuses on development of a standardized suite of protocols for tightly coupled multimedia conferences [ JRPJ08 ].
A limitation of traditional conferencing systems is that a participant (not a conference administrator) can not control other participants' displays. Current conferencing systems generally do not have capability to select a subset of the conference participants to whom his media are sent or from whom streams are received. In this article, we introduce narrowcasting attributes to implement media restriction features within a sip framework.
In this section, we describe the feature set for narrowcasting in sip-based conferences. In our group's earlier publications [ FCDK05, ACA05, FAD06 ], we introduced the concept of narrowcasting attributes, described functions to apply these features in a standard conferencing model (recapitulated in Figure 6), and proposed how features could be implemented using standard sip methods and headers defined in rfc 3261 [ RSC02 ]. Advantages of such a deployment include the convenience that no new methods or header extensions would be required to implement the features.
Figure 3 shows a famous Japanese sculpture which is good example of narrowcasting attributes. Three monkeys: Mizaru (the monkey with eyes covered), Iwazaru (mouth covered), and Kikazaru (ears blocked) manifest the notion of limiting media vectors. Mizaru can not see (but can hear and speak); Iwazaru can not speak (but can see and hear); Kikazaru can not hear (but can speak and see).
In analogy to broad-, multi-, and any-casting, narrowcasting is a technique for limiting and focusing information streams, either sources or sinks (receivers). We employ the paradigm of multiple simultaneous chatspaces, each with several or many conversants and across which one has “multipresence,” permitted designation of multiple instances of one's “self.” The audio windows narrowcasting predicate calculus [ Coh00 ] is an formalization for such a permission scheme. In Table 1, narrowcasting audio attributes are listed and their characteristics explained. This article proposes deployment of these attributes within a sip framework.
Figure 4 shows the initial state of conference in which three participants- P1, P2 and P3 -can talk to and hear each other. In other words, all the participants are in a fully connected media relationship. Our design will allow each user to send or receive data streams to/from a specific set of recipients in a session. For easier understanding, we consider only audio streams in this article. However, this design applies equally well to other media types, including video, text, and data (geographic location, for example).
Table 1. Proposed Audio Narrowcasting Attributes
Attributes |
Description |
Mute |
blocks the media stream coming from a source. In Table 2(a,b), P1 mutes P2, i.e. P1 blocks the media coming from P2. As a result, P1 does not hear P2. However, P2 can still hear P1. |
Select |
limits the projected sound to particular sources. In Table 2(c,d), P1 selects P2, i.e. P1 focuses on media coming from P2. As a result, P1 can listen only to P2's voice; P1 can not hear other participants. |
Deafen |
blocks media streams going to a sink. In Table 2(e,f), P1 deafens P2, i.e. P1 blocks media going towards P2. As a result, P2 can not hear P1. The relationship between P1 and other participants remains the same. |
Attend |
limits received sound to particular sinks. In Table 2(g,h), P1 attends P2, i.e. media from P1 can go only to P2. As a result, only P2 can hear P1 but others can't. |
A “mute” function is available in present-day conference systems. However, in most cases, a participant mutes herself by connecting the other conversant to “music on hold.” On-hold parties hear to the music, but no voice media is transmitted. In our definition, a user can explicitly mute another party.
In Table 2(a), three participants participate in a conference in which P2 has been muted by P1. This means P1 doesn't want to hear P2, but only P3. Specifically,
-
P1 has a simplex (one-way) relationship with P2, P1 → P2.
-
P1 has a duplex (two-way) media relationship with P3, P1 ↔ P3.
-
P2 has a duplex media relationship with P3, P2 ↔ P3.
-
When P1 speaks, both P2 and P3 will hear.
-
When P2 speaks, only P3 will hear (and NOT P1).
-
When P3 speaks, both P1 and P2 will hear.
Equivalently for this simple example, P3 might be selected by P1. The connectivity matrix of the situation shown in Table 2(a) can be portrayed as
representable in matrix form as
where entry cij of the matrix represents connectivity of source i to sink j, and the main diagonal is populated by “don't care”s.
A scenario with four participants in a session is shown in Table 2(d). Here P2 is selected by P1, so P1 can hear only P2 but not others. Other participants can hear as usual. The connectivity of Table 2(d) is represented as
Remote deafen is also available in full-functioned conferencing systems as “Listen-only mode.” In most cases, only an end-user or administrator may invoke this feature. In our definition, any user can control the media sent to or received from another.
Control |
Mute |
Select |
Deafen |
Attend |
P1 → P2 |
|
|
|
|
|
|
|
|
|
Situation |
A participant wants to block media from a specific participant. In this scenario, P1 mutes P2. |
A participant wants to receive media only from a particular participant. In this scenario, P1 selects P2. |
A participant wants to block media to specific participant(s). In this scenario, P1 deafens P2. |
A participant wants to send media to a specific participant. In this scenario, P1 attends P2 |
Result |
P1 has only send-only relationship with P2. Other media vectors remain the same. |
Only P1 ↔ P2 remains same. Other participants have receive-only media relationship withP1. |
P1 has a receive-only media relationship with P2. Others remain the same. |
Media from P1 only goes to P2. Others only send to P1 but cannot receive media from P1. |
In Table 2(e), P2 is deafend by P1. This means P1 doesn't want to send his voice to P2 to hear. Specifically,
-
P1 has a simplex media relationship with P2, P1 ← P2.
-
P1 has a duplex media relationship with P3.
-
P2 has a duplex media relationship with P3.
In this case:
-
When P1 speaks, P3 will hear, but P2 won't.
-
When P2 speaks, both P1 and P3 will hear.
-
When P3 speaks, both P1 and P2 will hear.
Equivalently, P3 might be attended by P1, so that only P3 can hear P1. P1 could still hear all other streams. The connectivity matrix for Table 2(e) is
In Table 2(h), P2 is attended by P1. As a result only P2 can hear from P1. The connectivity matrix of this situation is
For egalitarian models with flat hierarchies, there is an asymmetry regarding both mute/select and deafen/attend: audibility of a source with respect to a sink is treated as a revocable privilege and a forsakable right. A sink can by default hear collocated sources, adjustable by narrowcasting commands. For example, if P2 attends P1 but P1 has muted P2, P1 won't hear P2. Further policy extensions will extend the permissions of such a protocol, including the ability to force audibility by overriding a source's mute or sink's deafen (which a parent might invoke when telechiding a distracted child: “How dare you mute me?!”). Consideration of such role-based issues will be the focus of future research.
Peers in a sip session are called user agents, and can function in the following roles:
User-Agent Client (uac) A client application that initiates a sip request.
User-Agent Server (uas) A server application that contacts the user when a sip request is received and returns a response on behalf of the user.
A sip end-point is capable of functioning as both a uac and a uas, but typically functions as only one or the other per session, depending upon the user agent that initiated the request.
Sip makes use of elements called proxy servers to help route requests to users' current locations, authenticate and authorize users for services, implement provider call-routing policies, and provide features to users. Sip also provides a registration function that allows users to upload their current locations (ip addresses) for use by proxy servers.
A typical hand-shaking exchange is shown in Figure 5, P1 sending an invite request with media capabilities to P2. A 100/trying and a 180/ringing message confirm that P2 is being alerted. A 200/ok message (which might also contain the final session description message body, whose significance will be explained later) is sent once P2 accepts the invite, notifying that a connection has been made. Upon receiving the 200/ok from P2, P1 sends an ack, usually triggered by a human user. A two-party duplex session is established at this point. The delay between the 180/ringing and 200/ok messages depends upon after how many rings the user accepts the call. Participants wishing to leave a session send a bye request within the session dialog [ ACA04 ].
Sip signaling can be transported on either tcp or udp; a standard SIP entity must support both types [ RSC02 ]. For realizing narrowcasting attributes over sip, a client will follow the guideline of rfc 3261 Section 18: If a request is within 200 bytes of the path mtu (maximum transmission unit), or if it is larger than 1300 bytes, or the path mtu is unknown, the request must be sent using an rfc 2914 congestion-controlled transport protocol, such as tcp.
Narrowcasting attributes can be implemented in both centralized and decentralized conferences. This article focuses on a decentralized conference architecture, for which the media is mixed at each end-point. Figure 6 illustrates components of the conferencing system and their roles. We have extended the model being proposed by the ietf with narrowcasting attributes.
Focus: The focus is a sip user agent addressed by a conference uri (uniform resource identifier). It handles sip signaling between participants in a conference. The focus establishes media exchange among participants in a conference, and also implements conference policies. Its logical role is in analogy to that of a controller in a centrally signaling, distributed media architecture.
Participants: User agents are identified by a uri, communicating with each other after having been connected through the focus.
Conference notification service: The focus can act logically as a notifier [ Roa02 ], accepting subscriptions to the conference and notifying subscribers about changes to that state. The state includes the state maintained by the focus itself, the conference policy, and the media policy.
Conference policy server: A conference policy server stores and manipulates rules using an xcap (Extensible Markup Language Configuration Access Protocol) [ Ros07 ] database associated with participation in a conference. These rules include directives on the lifespan of the conference, who can and cannot join it, who can override the media policy, definitions of roles available in the conference, and the responsibilities associated with those roles.
Conference policy: The complete set of rules governing a particular conference is interpreted and enforced by the conference policy server.
Implementation of narrowcasting attributes inside sip can be implemented by modifying only the generator of the sdp message body. Section 3.3 described session establishment in sip, where sdp is used to indicate media capabilities and destination addresses.
Media negotiation is part of the invite/200/ack sequence to establish a sip session between two endpoints. Sip itself doesn't provide media negotiation, but it enables media negotiation between user agents using sdp. Each participant sends information via sdp in either an invite or in an ack about her terminal's media capabilities and the transport address at which she wishes to receive rtp packets. In the sdp body attached to the sip header, the user agents specify the media type, codec, ip address, and port number for each media stream. In the message body of the 200/ok response to the invite, the server sends the transport address to which the participant should send his accepted media capabilities rtp packets. Our implementation in sip [ ACA07 ] will use the narrowcasting attributes mute, select, deafen, and attend, along with the media capabilities in the invite/200/ack sequence in the sdp bodies.
Figure 4 showed multiparty voice communication between P1, P2, and P3. Considering the participants' media flow, we propose the protocol elaborated below. In our design we consider the existing standard media session and send a re-invite by modifying the sdp body.
Figure 7 illustrates a scenario in which P1, P2, and P3 are in an rtp media session. If P1 wants to mute P2, P1 sends a re-invite to P2 with a modified sdp attribute, a=sendonly. P2 then responds with 200/ok including a=recvonly along with other sdp attributes. As the negotiation determines to only send media from P1 to P2, a one-way rtp connection is established (P1 → P2). Thus is P1 muted by P2. The status of other participants (i.e., P3 in this example) remains unchanged. An example of the re-invite/ok handshake in Figure 7 is shown below, where the first block of each log is the sip header and the second block is the sdp body.
INVITE sip:cohen@voice.u-aizu.ac.jp
SIP/2.0
Via: SIP/2.0/UDP 123.456.789.101
From: sabbir <sip:sabbir@judo.u-aizu.ac.jp>
To: cohen <sip:cohen@voice.u-aizu.ac.jp>
Call-ID:627802096@judo.u-aizu.ac.jp
CSeq: 1 INVITE
Contact:<sip:sabbir@123.456.789.101>
Content-type: application/sdp
Content-Length: 110
v=0
o=sabbir 2345 3345 IN IP4 judo.u-aizu.ac.jp
c=IN IP4 123.456.789.101
m=audio 2410 RTP/AVP 0
a=sendonly
The 200/ok sequence looks like
SIP/2.0 200 OK
Via: SIP/2.0/UDP 123.456.789.101
From: sabbir<sip:sabbir@judo.u-aizu.ac.jp>
To: cohen <sip:cohen@voice.u-aizu.ac.jp>;
tag=659882290
Call-ID:627802096@1judo.u-aizu.ac.jp
CSeq: 1 INVITE
Contact:<sip:cohen@123.456.789.102>
Content-type: application/sdp
Content-Length: 110
v=0
o=sabbir 2345 3345 IN IP4 voice.u-aizu.ac.jp
c=IN IP4 123.456.789.102
m=audio 2410 RTP/AVP 0
a=recvonly
In order to deafen P2, P1 sends a re-invite to P2 with a modified sdp attribute, a=recvonly. P2 then responds with 200/ok including a=sendonly along with other sdp attributes. As the negotiation determines only to transmit the media from P2 to P1, a simplex media connection is established (P2 → P1), thereby deafening P2 by P1.
INVITE sip:cohen@voice.u-aizu.ac.jp
SIP/2.0
Via: SIP/2.0/UDP 123.456.789.101
From: sabbir <sip:sabbir@judo.u-aizu.ac.jp>
To: cohen <sip:cohen@voice.u-aizu.ac.jp>
Call-ID:627802097@judo.u-aizu.ac.jp
CSeq: 2 INVITE
Contact:<sip:sabbir@123.456.789.101>
Content-type: application/sdp
Content-Length: 110
v=0
o=sabbir 2345 3345 IN IP4 judo.u-aizu.ac.jp
c=IN IP4 123.456.789.101
m=audio 2410 RTP/AVP 0
a=recvonly
The 200/ok sequence looks like
SIP/2.0 200 OK
Via: SIP/2.0/UDP 123.456.789.101
From: sabbir<sip:sabbir@judo.u-aizu.ac.jp>
To: cohen <sip:cohen@voice.u-aizu.ac.jp>;
tag=659882291
Call-ID:627802097@1judo.u-aizu.ac.jp
CSeq: 2 INVITE
Contact:<sip:cohen@123.456.789.102>
Content-type: application/sdp
Content-Length: 110
v=0
o=sabbir 2345 3345 IN IP4 voice.u-aizu.ac.jp
c=IN IP4 123.456.789.102
m=audio 2410 RTP/AVP 0
a=sendonly
In order for P1 to select P2, P1 sends a re-invite to all other participants except for P2 with a modified sdp, a=sendonly, and other participants in the conference respond with 200/ok with a=recvonly along with other sdp attributes. A one-way media connection is established between P1 and other participants excepting P2, so P2 is selected by P1.
INVITE sip:ashir@gifu.u-aizu.ac.jp
SIP/2.0
Via: SIP/2.0/UDP 123.456.789.101
From: sabbir <sip:sabbir@judo.u-aizu.ac.jp>
To: ashir <sip:ashir@gifu.u-aizu.ac.jp>
Call-ID:627802098@judo.u-aizu.ac.jp
CSeq: 3 INVITE
Contact:<sip:sabbir@123.456.789.101>
Content-type: application/sdp
Content-Length: 110
v=0
o=sabbir 2345 3345 IN IP4 judo.u-aizu.ac.jp
c=IN IP4 123.456.789.101
m=audio 2410 RTP/AVP 0
a=sendonly
A 200/ok from P3 returned to P1 confirms the implicit mute.
SIP/2.0 200 OK
Via: SIP/2.0/UDP 123.456.789.101
From: sabbir <sip:sabbir@judo.u-aizu.ac.jp>
To: ashir <sip:ashir@gifu.u-aizu.ac.jp>;
tag=659882292
Call-ID:627802098@1judo.u-aizu.ac.jp
CSeq: 3 INVITE
Contact:<sip:ashir@123.456.789.103>
Content-type: application/sdp
Content-Length: 110
v=0
o=sabbir 2345 3345 IN IP4 sound.u-aizu.ac.jp
c=IN IP4 123.456.789.103
m=audio 2410 RTP/AVP 0
a=recvonly
As illustrated by Figure 10, P1 sends a re-invite to all other participants (except for P2) with a modified sdp attribute, a=recvonly, who respond with 200/ok including a=sendonly along with other sdp attributes. A one-way rtp media connection is thus established with other participants (excepting P2), so P2 is attended by P1.
INVITE sip:ashir@gifu.u-aizu.ac.jp
SIP/2.0
Via: SIP/2.0/UDP 123.456.789.101
From: sabbir <sip:sabbir@judo.u-aizu.ac.jp>
To: ashir <sip:ashir@gifu.u-aizu.ac.jp>
Call-ID:627802099@judo.u-aizu.ac.jp
CSeq: 4 INVITE
Contact:<sip:sabbir@123.456.789.101>
Content-type: application/sdp
Content-Length: 110
v=0
o=sabbir 2345 3345 IN IP4 judo.u-aizu.ac.jp
c=IN IP4 123.456.789.101
m=audio 2410 RTP/AVP 0
a=recvonly
The 200/ok sequence looks like
SIP/2.0 200 OK
Via: SIP/2.0/UDP 123.456.789.101
From: sabbir <sip:sabbir@judo.u-aizu.ac.jp>
To: ashir <sip:ashir@gifu.u-aizu.ac.jp>;
tag=659882293
Call-ID:627802099@1judo.u-aizu.ac.jp
CSeq: 4 INVITE
Contact:<sip:ashir@123.456.789.104>
Content-type: application/sdp
Content-Length: 110
v=0
o=sabbir 2345 3345 IN IP4 sound.u-aizu.ac.jp
c=IN IP4 123.456.789.104
m=audio 2410 RTP/AVP 0
a=sendonly
In ordinary conversation, participants generally observe turn-taking, as in a cdma (collision detection, multiple access) protocol with discretionary backup. That is, an utterance that collides with another will cause one or both of the simultaneous speakers to stop and wait until a break before repeating.
One might wonder what happens to such conversational turn-taking in the presence of asymmetric media filters and the absence of a moderator. Narrowcasting features - like blocklists, side channels, and call-within-a-call - complicate teleconferences, since a deafened conversant might not be aware that another is talking and multiple sources might speak at once. If some avatars in a conference are muted or deafened to some other participants, without formal floor control there is a danger of some “talking on top of” others. In the absence of common floor control, won't private chats and decentralized control lead to anarchy? Without “traffic signals,” how can collisions be avoided?
In fact, such parallel conversation streams are not a problem. For example, if two participants set up a private side-conference using narrowcasting commands, even though their utterances might collide with others', they wouldn't expect or want others to stop conversing. Rather they “listen with one ear” to ongoing conversations while enjoying their own caucus. Listeners can still untangle conversational threads, by context, voice quality, etc. Just as in real social contexts, including informal gatherings like parties, multiple simultaneous speakers are analyzable. Even “linear” conversations like formal meetings might have some subsets of conversants whispering among themselves while a main speaker is talking. Narrowcasting interfaces will be even more useful when extended by spatial audio and attenuation based on mutual virtual position (source projection, sink bearing, and distance), distributing the respective voices across a soundscape.
The status of each participant's privacy in terms of the media relationship with other participants requires consideration. In this article, we have introduced a design of new features for multimedia conferencing systems. These features could provide enhanced conference functions at the user end, “the edge of the network,” rather than at the server. As a result, a conference participant (not an administrator) could easily control media transmission. We also described the design of these features and method of implementation within the standard sip framework.
Future challenges include developing an algorithm for role-based policy, and adaptive media-mixing at a centralized media mixer for subscribed users.
[ACA04] A Case Study of VoIP Performance Across Different Networks, Proc. icece: 3rd Int. Conf. on Electrical & Computer Engineering (Dhaka), December 2004, pp. 295—298, isbn 984-32-1804-4.
[ACA05] Design of Narrowcasting Implementation in Sip, Proc. HC-2005: Eighth Int. Conf. on Human and Computer (Aizu-Wakamatsu), August 2005, pp. 255—260.
[ACA07] Narrowcasting: Controlling Media Privacy in Sip Multimedia Conferencing, 4th ieee Consumer Communications and Networking Conference ccnc 2007 (Las Vegas), January 2007, pp. 110—115, isbn 1-4244-0667-6.
[CBP08] 2008, www.ietf.org/html.charters/sipping-charter.html, Last Accessed July 11th, 2008. Session Initiation Proposal Investigation (Sipping),
[Coh00] Exclude and include for audio sources and sinks: Analogs of mute & solo are deafen & attend, Presence: Teleoperators and Virtual Environments, (2000), no. 1, 84—96, issn 1054-7460.
[FAD06] Audio Narrowcasting and Privacy for Multipresent Avatars on Workstations and Mobile Phones, ieice Trans. on Information and Systems E89-D, (2006), no. 1, 73—87, issn 0916-8532.
[FCDK05] Duplex narrowcasting operations for multipresent groupware avatars on mobile devices, ijwmc: Int. J. of Wireless and Mobile Computing, (2005), no. 5, Special Issue on Mobile Multimedia Systems and Applications, issn 1741-1084.
[HJ08] rfc 2327 sdp: Session Description Protocol, 1998, www.ietf.org/rfc/rfc2327.txt, Last Accessed July 11th, 2008.
[IT03] itu-t Recommendation H.323 (07/2003): Packetbased Multimedia Communications Systems, 2003, http://www.itu.int/rec/T-REC-H.323-200307-S/en, Series H: Audiovisual and multimedia systems, Last Accessed July 11th, 2008.
[Joh04] Sip: Understanding the Session Initiation Protocol, Artech House, London, 2004, isbn 1580531687.
[JRPJ08] (xcon), 2008, www.ietf.org/html.charters/xcon-charter.html, Last Accessed July 11th, 2008. Centralized Conferencing
[KSW02] A sip-based Conference Control Framework, nossdav '02: Proc. 12th Int. Wkshp. on Network and Operating Systems Support for Digital Audio and Video New York, NY, ACM Press, 2002, pp. 53—61, isbn 1-58113-512-2.
[Roa08] rfc 3265 - Session Initiation Protocol (sip) Specific Event Notification, 2002, www.ietf.org/rfc/rfc3265.txt, Last Accessed July 11th, 2008.
[Ros07] rfc 4825: The Extensible Markup Language (XML) Configuration Access Protocol (xcap), may, 2007, www.ietf.org/rfc/rfc4825.txt, Last Accessed July 11th, 2008.
[RSC08] rfc 3261: sip: Session Initiation Protocol 2002, www.ietf.org/rfc/rfc3261.txt, Last Accessed July 11th, 2008.
[SKBR03] Tandem-Free VoIP Conferencing: A Bridge to Next-Generation Networks, ieee Communications Magazine, (2003), no. 5, 136—145, issn 0163-6804.
[SNS01] Centralized Conferencing using sip, Proc. Internet Telephony Workshop, April 2001, New York.
Volltext ¶
- Volltext als PDF ( Größe: 968.8 kB )
Lizenz ¶
Jedermann darf dieses Werk unter den Bedingungen der Digital Peer Publishing Lizenz elektronisch übermitteln und zum Download bereitstellen. Der Lizenztext ist im Internet unter der Adresse http://www.dipp.nrw.de/lizenzen/dppl/dppl/DPPL_v2_de_06-2004.html abrufbar.
Empfohlene Zitierweise ¶
Mohammad Sabbir Alam, Michael Cohen, and Ashir Ahmed, Articulated Narrowcasting for Privacy and Awareness in Multimedia Conferencing Systems and Design for Implementation Within a SIP Framework. JVRB - Journal of Virtual Reality and Broadcasting, 5(2008), no. 14. (urn:nbn:de:0009-6-14724)
Bitte geben Sie beim Zitieren dieses Artikels die exakte URL und das Datum Ihres letzten Besuchs bei dieser Online-Adresse an.