W3C

WebRTC 1.0: Real-time Communication Between Browsers

W3C Editor's Draft 23 August 2011

This version:
http://dev.w3.org/2011/webrtc/editor/webrtc-20110823.html
Latest published version:
http://www.w3.org/TR/webrtc/
Latest editor's draft:
http://dev.w3.org/2011/webrtc/editor/webrtc.html
Previous version:
none
Editors:
Adam Bergkvist, Ericsson
Daniel C. Burnett, Voxeo
Cullen Jennings, Cisco
Anant Narayanan, Mozilla

Abstract

TBD

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document was published by the Web Real-Time Communications Working Group as an Editor's Draft. If you wish to make comments regarding this document, please send them to public-webrtc@w3.org@w3.org (subscribe, archives). All feedback is welcome.

Publication as a Editor's Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Table of Contents

1. Introduction

This section is non-normative.

There are a number of facets to video-conferencing in HTML:

This document defines the APIs used for these features.

2. Obtaining local multimedia content

2.1 Definition

2.2 Examples

A voice chat feature in a game could attempt to get access to the user's microphone by calling the API as follows:

<script>
 navigator.getUserMedia('audio', gotAudio);
 function gotAudio(stream) {
   // ... use 'stream' ...
 }
</script>

A video-conferencing system would ask for both audio and video:

<script>
 function beginCall() {
   navigator.getUserMedia('audio,video user', gotStream);
 }
 function gotStream(stream) {
   // ... use 'stream' ...
 }
</script>

3. Stream API

3.1 Introduction

The MediaStream interface is used to represent streams of media data, typically (but not necessarily) of audio and/or video content, e.g. from a local camera or a remote site. The data from a MediaStream object does not necessarily have a canonical binary form; for example, it could just be "the video currently coming from the user's video camera". This allows user agents to manipulate media streams in whatever fashion is most suitable on the user's platform.

Each MediaStream object can represent zero or more tracks, in particular audio and video tracks. Tracks can contain multiple channels of parallel data; for example a single audio track could have nine channels of audio data to represent a 7.2 surround sound audio track.

Each track represented by a MediaStream object has a corresponding MediaStreamTrack object.

A MediaStream object has an input and an output. The input depends on how the object was created: a LocalMediaStream object generated by a getUserMedia() call, for instance, might take its input from the user's local camera, while a MediaStream created by a PeerConnection object will take as input the data received from a remote peer. The output of the object controls how the object is used, e.g. what is saved if the object is written to a file, what is displayed if the object is used in a video element, or indeed what is transmitted to a remote peer if the object is used with a PeerConnection object.

Each track in a MediaStream object can be disabled, meaning that it is muted in the object's output. All tracks are initially enabled.

A MediaStream can be finished, indicating that its inputs have forever stopped providing data. When a MediaStream object is finished, all its tracks are muted regardless of whether they are enabled or disabled.

The output of a MediaStream object must correspond to the tracks in its input. Muted audio tracks must be replaced with silence. Muted video tracks must be replaced with blackness.

A MediaStream object's output can be "forked" by creating a new MediaStream object from it using the MediaStream() constructor. The new MediaStream object's input is the output of the object from which it was created, with any disabled tracks removed, and its output is therefore at most a subset of that "parent" object. (Merely muted tracks are not removed, so the tracks do not change when the parent is finished.) When such a fork's parent finishes, the fork is also said to have finished.

This can be used, for instance, in a video-conferencing scenario to display the local video from the user's camera and microphone in a local monitor, while only transmitting the audio to the remote peer (e.g. in response to the user using a "video mute" feature).

When a track in a MediaStream parent is disabled, any MediaStreamTrack objects corresponding to the tracks in any MediaStream objects that were created from parent are disassociated from any track, and must not be reused for tracks again. If a disabled track in a MediaStream parent is re-enabled, from the perspective of any MediaStream objects that were created from parent it is a new track and thus new MediaStreamTrack objects must be created for the tracks that correspond to the re-enabled track.

The LocalMediaStream interface is used when the user agent is generating the stream's data (e.g. from a camera or streaming it from a local video file). It allows authors to control individual tracks during the generation of the content, e.g. to allow the user to temporarily disable a local camera during a video-conference chat.

When a LocalMediaStream object is being generated from a local file (as opposed to a live audio/video source), the user agent should stream the data from the file in real time, not all at once. This reduces the ease with which pages can distinguish live video from pre-recorded video, which can help protect the user's privacy.

3.2 Interface definitions

3.2.1 MediaStream

The MediaStream(parentStream) constructor must return a new MediaStream object whose tracks at any moment in time are the enabled tracks of parentStream at that moment, and whose label is equal to the parentStream's.

A MediaStream object is said to end when the user agent learns that no more data will ever be forthcoming for this stream.

When a MediaStream object ends for any reason (e.g. because the user rescinds the permission for the page to use the local camera, or because the data comes from a finite file and the file's end has been reached and the user has not requested that it be looped, or because the stream comes from a remote peer and the remote peer has permanently stopped sending data, or because the MediaStream was created from another MediaStream and that stream has just itself ended), it is said to be finished. When this happens for any reason other than the stop() method being invoked, the user agent must queue a task that runs the following steps:

  1. If the object's readyState attribute has the value ENDED (2) already, then abort these steps. (The stop() method was probably called just before the stream stopped for other reasons, e.g. the user clicked an in-page stop button and then the user-agent-provided stop button.)

  2. Set the object's readyState attribute to ENDED (2).

  3. Fire a simple event named ended at the object.

As soon as a MediaStream object is finished, the stream's tracks start outputting only silence and/or blackness, as appropriate, as defined earlier.

If the end of the stream was reached due to a user request, the task source for this task is the user interaction task source. Otherwise the task source for this task is the networking task source.

[Constructor (in MediaStream parentStream)]
interface MediaStream {
    readonly attribute DOMString            label;
    readonly attribute MediaStreamTrackList tracks;
    MediaStreamRecorder record ();
    const unsigned short LIVE = 1;
    const unsigned short ENDED = 2;
    readonly attribute unsigned short       readyState;
             attribute Function?            onended;
};
3.2.1.1 Attributes
label of type DOMString, readonly
Returns a label that is unique to this stream, so that streams can be recognised after they are sent through the PeerConnection API.
No exceptions.
onended of type Function, nullable
This event handler, of type ended, must be supported by all objects implementing the MediaStream interface.
No exceptions.
readyState of type unsigned short, readonly

The readyState attribute represents the state of the stream. It must return the value to which the user agent last set it (as defined below). It can have the following values: LIVE or ENDED.

When a MediaStream object is created, its readyState attribute must be set to LIVE (1), unless it is being created using the MediaStream() constructor whose argument is a MediaStream object whose readyState attribute has the value ENDED (2), in which case the MediaStream object must be created with its readyState attribute set to ENDED (2).

No exceptions.
tracks of type MediaStreamTrackList, readonly

Returns a MediaStreamTrackList object representing the tracks that can be enabled and disabled.

A MediaStream can have multiple audio and video sources (e.g. because the user has multiple microphones, or because the real source of the stream is a media resource with many media tracks). The stream represented by a MediaStream thus has zero or more tracks.

The tracks attribute must return an array host object for objects of type MediaStreamTrack that is fixed length and read only. The same object must be returned each time the attribute is accessed. [WEBIDL]

The array must contain the MediaStreamTrack objects that correspond to the the tracks of the stream. The relative order of all tracks in a user agent must be stable. All audio tracks must precede all video tracks. Tracks that come from a media resource whose format defines an order must be in the order defined by the format; tracks that come from a media resource whose format does not define an order must be in the relative order in which the tracks are declared in that media resource. Within these constraints, the order is user-agent defined.

No exceptions.
3.2.1.2 Methods
record

Begins recording the stream. The returned MediaStreamRecorder object provides access to the recorded data.

When the record() method is invoked, the user agent must return a new MediaStreamRecorder object associated with the stream.

No parameters.
No exceptions.
Return type: MediaStreamRecorder
3.2.1.3 Constants
ENDED of type unsigned short
The stream has finished (the user agent is no longer receiving or generating data, and will never receive or generate more data for this stream).
LIVE of type unsigned short
The stream is active (the user agent is making a best-effort attempt to receive or generate data in real time).
MediaStream implements EventTarget;

All instances of the MediaStream type are defined to also implement the EventTarget interface.

3.2.2 LocalMediaStream

interface LocalMediaStream : MediaStream {
    void stop ();
};
3.2.2.1 Methods
stop

When a LocalMediaStream object's stop() method is invoked, the user agent must queue a task that runs the following steps:

  1. If the object's readyState attribute is in the ENDED (2) state, then abort these steps.

  2. Permanently stop the generation of data for the stream. If the data is being generated from a live source (e.g. a microphone or camera), and no other stream is being generated from a live source, then the user agent should remove any active "on-air" indicator. If the data is being generated from a prerecorded source (e.g. a video file), any remaining content in the file is ignored. The stream is finished. The stream's tracks start outputting only silence and/or blackness, as appropriate, as defined earlier.

  3. Set the object's readyState attribute to ENDED (2).

  4. Fire a simple event named ended at the object.

The task source for the tasks queued for the stop() method is the DOM manipulation task source.

No parameters.
No exceptions.
Return type: void

3.2.3 MediaStreamTrack

typedef MediaStreamTrack[] MediaStreamTrackList;
Throughout this specification, the identifier MediaStreamTrackList is used to refer to the array of MediaStreamTrack type.
interface MediaStreamTrack {
    readonly attribute DOMString kind;
    readonly attribute DOMString label;
             attribute boolean   enabled;
};
3.2.3.1 Attributes
enabled of type boolean

The MediaStreamTrack.enabled attribute, on getting, must return the last value to which it was set. On setting, it must be set to the new value, and then, if the MediaStreamTrack object is still associated with a track, must enable the track if the new value is true, and disable it otherwise.

Thus, after a MediaStreamTrack is disassociated from its track, its enabled attribute still changes value when set, it just doesn't do anything with that new value.

No exceptions.
kind of type DOMString, readonly

The MediaStreamTrack.kind attribute must return the string "audio" if the object's corresponding track is or was an audio track, "video" if the corresponding track is or was a video track, and a user-agent defined string otherwise.

No exceptions.
label of type DOMString, readonly

When a LocalMediaStream object is created, the user agent must generate a globally unique identifier string, and must initialize the object's label attribute to that string. Such strings must only use characters in the ranges U+0021, U+0023 to U+0027, U+002A to U+002B, U+002D to U+002E, U+0030 to U+0039, U+0041 to U+005A, U+005E to U+007E, and must be 36 characters long.

When a MediaStream is created to represent a stream obtained from a remote peer, the label attribute is initialized from information provided by the remote source.

When a MediaStream is created from another using the MediaStream() constructor, the label attribute is initialized from the original.

The label attribute must return the value to which it was initialized when the object was created.

The label of a MediaStream object is unique to the source of the stream, but that does not mean it is not possible to end up with duplicates. For example, when a MediaStream object is created from another using the MediaStream() constructor, the fork has the same label as the original. Similarly, a locally generated stream could be sent from one user to a remote peer using PeerConnection, and then sent back to the original user in the same manner, in which case the original user will have multiple streams with the same label (the locally-generated one and the one received from the remote peer).

User agents may label audio and video sources (e.g. "Internal microphone" or "External USB Webcam"). The MediaStreamTrack.label attribute must return the label of the object's corresponding track, if any. If the corresponding track has or had no label, the attribute must instead return the empty string.

Thus the kind and label attributes do not change value, even if the MediaStreamTrack object is disassociated from its corresponding track.

No exceptions.

3.2.4 MediaStreamRecorder

interface MediaStreamRecorder {
    voice getRecordedData (in BlobCallback? callback);
};
3.2.4.1 Methods
getRecordedData

Creates a Blob of the recorded data, and invokes the provided callback with that Blob.

When the getRecordedData() method is called, the user agent must run the following steps:

  1. Let callback be the callback indicated by the method's first argument.

  2. If callback is null, abort these steps.

  3. Let data be the data that was streamed by the MediaStream object from which the MediaStreamRecorder was created since the creation of the MediaStreamRecorder object.

  4. Return, and run the remaining steps asynchronously.

  5. Generate a file that containing data in a format supported by the user agent for use in audio and video elements.

  6. Let blob be a Blob object representing the contents of the file generated in the previous step. [FILE-API]

  7. Queue a task to invoke callback with blob as its argument.

The getRecordedData() method can be called multiple times on one MediaStreamRecorder object; each time, it will create a new file as if this was the first time the method was being called. In particular, the method does not stop or reset the recording when the method is called.

ParameterTypeNullableOptionalDescription
callbackBlobCallback
No exceptions.
Return type: voice

3.2.5 BlobCallback

[Callback=FunctionOnly, NoInterfaceObject]
interface BlobCallback {
    void handleEvent (in Blob blob);
};
3.2.5.1 Methods
handleEvent
Def TBD
ParameterTypeNullableOptionalDescription
blobBlob
No exceptions.
Return type: void

3.2.6 URL

Note that the following is actually only a partial interface, but ReSpec does not yet support that.

interface URL {
    static DOMString createObjectURL (in MediaStream stream);
};
3.2.6.1 Methods
createObjectURL

Mints a Blob URL to refer to the given MediaStream.

When the createObjectURL() method is called with a MediaStream argument, the user agent must return a unique Blob URL for the given MediaStream. [FILE-API]

For audio and video streams, the data exposed on that stream must be in a format supported by the user agent for use in audio and video elements.

A Blob URL is the same as what the File API specification calls a Blob URI, except that anything in the definition of that feature that refers to File and Blob objects is hereby extended to also apply to MediaStream and LocalMediaStream objects.

ParameterTypeNullableOptionalDescription
streamMediaStream
No exceptions.
Return type: static DOMString

3.3 Examples

This sample code exposes a button. When clicked, the button is disabled and the user is prompted to offer a stream. The user can cause the button to be re-enabled by providing a stream (e.g. giving the page access to the local camera) and then disabling the stream (e.g. revoking that access).

<input type="button" value="Start" onclick="start()" id="startBtn">
<script>
 var startBtn = document.getElementById('startBtn');
 function start() {
   navigator.getUserMedia('audio,video', gotStream);
   startBtn.disabled = true;
 }
 function gotStream(stream) {
   stream.onended = function () {
     startBtn.disabled = false;
   }
 }
</script>

This example allows people to record a short audio message and upload it to the server. This example even shows rudimentary error handling.

<input type="button" value="⚫" onclick="msgRecord()" id="recBtn">
<input type="button" value="◼" onclick="msgStop()" id="stopBtn" disabled>
<p id="status">To start recording, press the ⚫ button.</p>
<script>
 var recBtn = document.getElementById('recBtn');
 var stopBtn = document.getElementById('stopBtn');
 function report(s) {
   document.getElementById('status').textContent = s;
 }
 function msgRecord() {
   report('Attempting to access microphone...');
   navigator.getUserMedia('audio', gotStream, noStream);
   recBtn.disabled = true;
 }
 var msgStream, msgStreamRecorder;
 function gotStream(stream) {
   report('Recording... To stop, press to ◼ button.');
   msgStream = stream;
   msgStreamRecorder = stream.record();
   stopBtn.disabled = false;
   stream.onended = function () {
     msgStop();     
   }
 }
 function msgStop() {
   report('Creating file...');
   stopBtn.disabled = true;
   msgStream.onended = null;
   msgStream.stop();
   msgStreamRecorder.getRecordedData(msgSave);
 }
 function msgSave(blob) {
   report('Uploading file...');
   var x = new XMLHttpRequest();
   x.open('POST', 'uploadMessage');
   x.send(blob);
   x.onload = function () {
     report('Done! To record a new message, press the ⚫ button.');
     recBtn.disabled = false;
   };
   x.onerror = function () {
     report('Failed to upload message. To try recording a message again, press the ⚫ button.');
     recBtn.disabled = false;
   };
 }
 function noStream() {
   report('Could not obtain access to your microphone. To try again, press the ⚫ button.');
   recBtn.disabled = false;
 }
</script>

This example allows people to take photos of themselves from the local video camera.

<article>
 <style scoped>
  video { transform: scaleX(-1); }
  p { text-align: center; }
 </style>
 <h1>Snapshot Kiosk</h1>
 <section id="splash">
  <p id="errorMessage">Loading...</p>
 </section>
 <section id="app" hidden>
  <p><video id="monitor" autoplay></video> <canvas id="photo"></canvas>
  <p><input type=button value="&#x1F4F7;" onclick="snapshot()">
 </section>
 <script>
  navigator.getUserMedia('video user', gotStream, noStream);
  var video = document.getElementById('monitor');
  var canvas = document.getElementById('photo');
  function gotStream(stream) {
    video.src = URL.getObjectURL(stream);
    video.onerror = function () {
      stream.stop();
    };
    stream.onended = noStream;
    video.onloadedmetadata = function () {
      canvas.width = video.videoWidth;
      canvas.height = video.videoHeight;
      document.getElementById('splash').hidden = true;
      document.getElementById('app').hidden = false;
    };
  }
  function noStream() {
    document.getElementById('errorMessage').textContent = 'No camera available.';
  }
  function snapshot() {
    canvas.getContext('2d').drawImage(video, 0, 0);
  }
 </script>
</article>

4. Peer-to-peer connections

A PeerConnection allows two users to communicate directly, browser-to-browser. Communications are coordinated via a signaling channel provided by script in the page via the server, e.g. using XMLHttpRequest.

Calling "new PeerConnection(configuration, signalingCallback)" creates a PeerConnection object.

The configuration string gives the address of a STUN or TURN server to use to establish the connection. [STUN] [TURN]

The allowed formats for this string are:

"TYPE 203.0.113.2:3478"

Indicates a specific IP address and port for the server.

"TYPE relay.example.net:3478"

Indicates a specific host and port for the server; the user agent will look up the IP address in DNS.

"TYPE example.net"

Indicates a specific domain for the server; the user agent will look up the IP address and port in DNS.

The "TYPE" is one of:

STUN
Indicates a STUN server
STUNS
Indicates a STUN server that is to be contacted using a TLS session.
TURN
Indicates a TURN server
TURNS
Indicates a TURN server that is to be contacted using a TLS session.

The signalingCallback argument is a method that will be invoked when the user agent needs to send a message to the other host over the signaling channel. When the callback is invoked, convey its first argument (a string) to the other peer using whatever method is being used by the Web application to relay signaling messages. (Messages returned from the other peer are provided back to the user agent using the processSignalingMessage() method.)

A PeerConnection object has an associated PeerConnection signaling callback, a PeerConnection ICE Agent, a PeerConnection data UDP media stream, a PeerConnection readiness state and an ICE started flag. These are initialized when the object is created.

When the PeerConnection() constructor is invoked, the user agent must run the following steps. This algorithm has a synchronous section (which is triggered as part of the event loop algorithm). Steps in the synchronous section are marked with ⌛.

  1. Let serverConfiguration be the constructor's first argument.

  2. Let signalingCallback be the constructor's second argument.

  3. Let connection be a newly created PeerConnection object.

  4. Create an ICE Agent and let connection's PeerConnection ICE Agent be that ICE Agent. [ICE]

  5. If serverConfiguration contains a U+000A LINE FEED (LF) character or a U+000D CARRIAGE RETURN (CR) character (or both), remove all characters from serverConfiguration after the first such character.

  6. Split serverConfiguration on spaces to obtain configuration components.

  7. If configuration components has two or more components, and the first component is a case-sensitive match for one of the following strings:

    • "STUN"
    • "STUNS"
    • "TURN"
    • "TURNS"

    ...then run the following substeps:

    1. Let server type be STUN if the first component of configuration components is "STUN" or "STUNS", and TURN otherwise (the first component of configuration components is "TURN" or "TURNS").

    2. Let secure be true if the first component of configuration components is "STUNS" or "TURNS", and false otherwise.

    3. Let host be the contents of the second component of configuration components up to the character before the first U+003A COLON character (:), if any, or the entire string otherwise.

    4. Let port be the contents of the second component of configuration components from the character after the first U+003A COLON character (:) up to the end, if any, or the empty string otherwise.

    5. Configure the PeerConnection ICE Agent's STUN or TURN server as follows:

      • If server type is STUN, the server is a STUN server. Otherwise, server type is TURN and the server is a TURN server.
      • If secure is true, the server is to be contacted using TLS-over-TCP, otherwise, it is to be contacted using UDP.
      • The IP address, host name, or domain name of the server is host.
      • The port to use is port. If this is the empty string, then only a domain name is configured (and the ICE Agent will use DNS SRV requests to determine the IP address and port).
      • The long-term username for the STUN or TURN server is the ASCII serialization of the entry script's origin; the long-term password is the empty string.

      If the given IP address, host name, domain name, or port are invalid, then the user agent must act as if no STUN or TURN server is configured.

  8. Let the connection's PeerConnection signaling callback be signalingCallback.

  9. Set connection's PeerConnection readiness state to NEW (0).

  10. Set connection's ICE started flag to false.

  11. Let connection's PeerConnection data UDP media stream be a new data UDP media stream.

  12. Let connection's localStreams attribute be an empty read-only MediaStream array. [WEBIDL]

  13. Let connection's remoteStreams attribute be an empty read-only MediaStream array. [WEBIDL]

  14. Return connection, but continue these steps asynchronously.

  15. Await a stable state. The synchronous section consists of the remaining steps of this algorithm. (Steps in synchronous sections are marked with ⌛.)

  16. ⌛ If connection's ICE started flag is still false, start the PeerConnection ICE Agent and send the initial offer. The initial offer must include a media description for the PeerConnection data UDP media stream, marked as "sendrecv", and for all the streams in localStreams (marked as "sendonly"). [ICE] [SDPOFFERANSWER]

  17. ⌛ Let connection's ICE started flag be true.

  18. ⌛ If connection's PeerConnection readiness state is still NEW (0), then queue a task that sets it to NEGOTIATING (1) and then fires a simple event named connecting at the PeerConnection object.

When a PeerConnection ICE Agent is required to send SDP offers or answers, the user agent must follow these steps:

  1. Let sdp be the SDP offer or answer to be sent. [SDPOFFERANSWER]

  2. Let message be the concatenation of the string "SDP", a U+000A LINE FEED (LF) character, and sdp, in that order.

  3. Queue a task to invoke that PeerConnection ICE Agent's PeerConnection signaling callback with message as its first argument and the PeerConnection as its second argument.

All streams represented by MediaStream objects must be marked as "sendonly" by the peer that initially adds the stream to the session. The PeerConnection API does not support bidirectional ("sendrecv") audio or video media streams. [SDPOFFERANSWER]

User agents may negotiate any codec and any resolution, bitrate, or other quality metric. User agents are encouraged to initially negotiate for the native resolution of the stream. For streams that are then rendered (using a video element), user agents are encouraged to renegotiate for a resolution that matches the rendered display size.

Starting with the native resolution means that if the Web application notifies its peer of the native resolution as it starts sending data, and the peer prepares its video element accordingly, there will be no need for a renegotiation once the stream is flowing.

All SDP media descriptions for streams represented by MediaStream objects must include a label attribute ("a=label:") whose value is the value of the MediaStream object's label attribute. [SDP] [SDPLABEL]

PeerConnection ICE Agents must not generate any candidates for media streams whose media descriptions do not have a label attribute ("a=label:"). [ICE] [SDP] [SDPLABEL]

When a user agent starts receiving media for a component and a candidate was provided for that component by a PeerConnection ICE Agent, the user agent must follow these steps:

  1. Let connection be the PeerConnection whose ICE Agent is expecting this media.

  2. If there is already a MediaStream object for the media stream to which this component belongs, then associate the component with that media stream and abort these steps. (Some media streams have multiple components; this API does not expose the role of these individual components in ICE.)

  3. Create a MediaStream object to represent the media stream. Set its label attribute to the value of the SDP Label attribute for that component's media stream.

  4. Queue a task to run the following substeps:

    1. If the connection's PeerConnection readiness state is CLOSED (3), abort these steps.

    2. Add the newly created MediaStream object to the end of connection's remoteStreams array.

    3. Fire a stream event named addstream with the newly created MediaStream object at the connection object.

When a PeerConnection ICE Agent finds that a stream from the remote peer has been removed (its port has been set to zero in a media description sent on the signaling channel), the user agent must follow these steps:

  1. Let connection be the PeerConnection whose PeerConnection ICE Agent has determined that a stream is being removed.

  2. Let stream be the MediaStream object that represents the media stream being removed, if any. If there isn't one, then abort these steps.

  3. By definition, stream is now finished.

    A task is thus queued to update stream and fire an event.

  4. Queue a task to run the following substeps:

    1. If the connection's PeerConnection readiness state is CLOSED (3), abort these steps.

    2. Remove stream from connection's remoteStreams array.

    3. Fire a stream event named removestream with stream at the connection object.

The task source for the tasks listed in this section is the networking task source.

To prevent network sniffing from allowing a fourth party to establish a connection to a peer using the information sent out-of-band to the other peer and thus spoofing the client, the configuration information should always be transmitted using an encrypted connection.

4.1 PeerConnection

[Constructor (in DOMString configuration, in SignalingCallback signalingCallback)]
interface PeerConnection {
    void processSignalingMessage (in DOMString message);
    const unsigned short NEW = 0;
    const unsigned short NEGOTIATING = 1;
    const unsigned short ACTIVE = 2;
    const unsigned short CLOSED = 3;
    readonly attribute unsigned short readyState;
    void send (in DOMString text);
    void addStream (in MediaStream stream);
    void removeStream (in MediaStream stream);
    readonly attribute MediaStream[]  localStreams;
    readonly attribute MediaStream[]  remoteStreams;
    void close ();
             attribute Function?      onconnecting;
             attribute Function?      onopen;
             attribute Function?      onmessage;
             attribute Function?      onaddstream;
             attribute Function?      onremovestream;
};

4.1.1 Attributes

localStreams of type array of MediaStream, readonly

Returns a live array containing the streams that the user agent is currently attempting to transmit to the remote peer (those that were added with addStream()).

Specifically, it must return the read-only MediaStream array that the attribute was set to when the PeerConnection's constructor ran.

No exceptions.
onaddstream of type Function, nullable
This event handler, of event handler event type addstream, must be supported by all objects implementing the PeerConnection interface.
No exceptions.
onconnecting of type Function, nullable
This event handler, of event handler event type connecting, must be supported by all objects implementing the PeerConnection interface.
No exceptions.
onmessage of type Function, nullable
This event handler, of event handler event type message, must be supported by all objects implementing the PeerConnection interface.
No exceptions.
onopen of type Function, nullable
This event handler, of event handler event type open, must be supported by all objects implementing the PeerConnection interface.
No exceptions.
onremovestream of type Function, nullable
This event handler, of event handler event type removestream, must be supported by all objects implementing the PeerConnection interface.
No exceptions.
readyState of type unsigned short, readonly

The readyState attribute must return the PeerConnection object's PeerConnection readiness state, represented by a number from the following list:

PeerConnection . NEW (0)
The object was just created, and no networking has yet occurred.
PeerConnection . NEGOTIATING (1)
The user agent is attempting to establish a connection.
PeerConnection . ACTIVE (2)
The connection is as good as it's going to get.
PeerConnection . CLOSED (3)
The connection is closed.
No exceptions.
remoteStreams of type array of MediaStream, readonly

Returns a live array containing the streams that the user agent is currently receiving from the remote peer.

Specifically, it must return the read-only MediaStream array that the attribute was set to when the PeerConnection's constructor ran.

This array is updated when addstream and removestream events are fired.

No exceptions.

4.1.2 Methods

addStream

Attempts to starting sending the given stream to the remote peer.

When the other peer starts sending a stream in this manner, an addstream event is fired at the PeerConnection object.

When the addStream() method is invoked, the user agent must run the following steps:

  1. Let stream be the method's argument.

  2. If the PeerConnection object's PeerConnection readiness state is CLOSED (3), throw an INVALID_STATE_ERR exception.

  3. If stream is already in the PeerConnection object's localStreams object, then abort these steps.

  4. Add stream to the end of the PeerConnection object's localStreams object.

  5. Return from the method.

  6. If the PeerConnection's ICE started flag is false, then abort these steps.

  7. Have the PeerConnection's PeerConnection ICE Agent add a media stream for stream the next time the user agent provides a stable state. Any other pending stream additions and removals must be processed at the same time. [ICE]

ParameterTypeNullableOptionalDescription
streamMediaStream
No exceptions.
Return type: void
close

When the close() method is invoked, the user agent must run the following steps:

  1. If the PeerConnection object's PeerConnection readiness state is CLOSED (3), throw an INVALID_STATE_ERR exception.

  2. Destroy the PeerConnection ICE Agent, abruptly ending any active ICE processing and any active streaming, and releasing any relevant resources (e.g. TURN permissions).

  3. Set the object's PeerConnection readiness state to CLOSED (3).

The localStreams and remoteStreams objects remain in the state they were in when the object was closed.

No parameters.
No exceptions.
Return type: void
processSignalingMessage

When a message is relayed from the remote peer over the signaling channel is received by the Web application, pass it to the user agent by calling the processSignalingMessage() method.

The order of messages is important. Passing messages to the user agent in a different order than they were generated by the remote peer's user agent can prevent a successful connection from being established or degrade the connection's quality if one is established.

When the processSignalingMessage() method is invoked, the user agent must run the following steps:

  1. Let message be the method's argument.

  2. Let connection be the PeerConnection object on which the method was invoked.

  3. If connection's PeerConnection readiness state is CLOSED (3), throw an INVALID_STATE_ERR exception.

  4. If the first four characters of message are not "SDP" followed by a U+000A LINE FEED (LF) character, then abort these steps. (This indicates an error in the signaling channel implementation. User agents may report such errors to their developer consoles to aid debugging.)

    Future extensions to the PeerConnection interface might use other prefix values to implement additional features.

  5. Let sdp be the string consisting of all but the first four characters of message.

  6. If connection's ICE started flag is true, then pass sdp to the PeerConnection ICE Agent as a subsequent offer or answer, to be interpreted as appropriate given the current state of the ICE Agent, and abort these steps. [ICE]

  7. The ICE started flag is false. Start the PeerConnection ICE Agent and pass it sdp as the initial offer from the other peer; the ICE Agent will then (asynchronously) construct the initial answer and transmit it as described above.

    If there is a remotely-initiated data UDP media stream in the initial offer, and it has an encryption key advertised in its media description that is 16 bytes long, then that is the PeerConnection data UDP media stream.

    After the initial answer has been sent, the ICE Agent must add all the streams in localStreams to the session, as described above. [ICE]

  8. Let connection's ICE started flag be true.

  9. Queue a task that sets connection's PeerConnection readiness state to NEGOTIATING (1) and then fires a simple event named connecting at the PeerConnection object.

When a PeerConnection ICE Agent completes ICE processing (even if there are no active streams), the user agent must queue a task that sets the PeerConnection object's PeerConnection readiness state to ACTIVE (2) and then fires a simple event named open at the PeerConnection object.

When a PeerConnection ICE Agent restarts ICE processing for any reason (e.g. because a peer is adding or removing a stream), the user agent must queue a task that sets the PeerConnection object's PeerConnection readiness state to NEGOTIATING (1) and then fires a simple event named connecting at the PeerConnection object.

ParameterTypeNullableOptionalDescription
messageDOMString
No exceptions.
Return type: void
removeStream

Steps sending the given stream to the remote peer.

When the other peer stops sending a stream in this manner, a removestream event is fired at the PeerConnection object.

When the removeStream() method is invoked, the user agent must run the following steps:

  1. Let stream be the method's argument.

  2. If the PeerConnection object's PeerConnection readiness state is CLOSED (3), throw an INVALID_STATE_ERR exception.

  3. If stream is not in the PeerConnection object's localStreams object, then abort these steps.

  4. Remove stream from the PeerConnection object's localStreams object.

  5. Return from the method.

  6. If the PeerConnection's ICE started flag is false, then abort these steps.

  7. Have the PeerConnection's PeerConnection ICE Agent remove the media stream for stream the next time the user agent provides a stable state. Any other pending stream additions and removals must be processed at the same time. [ICE]

ParameterTypeNullableOptionalDescription
streamMediaStream
No exceptions.
Return type: void
send

Attempts to send the given text to the remote peer. This uses UDP, which is inherently unreliable; there is no guarantee that every message will be received.

When a message sent in this manner from the other peer is received, a message event is fired at the PeerConnection object.

The maximum length of text is 504 bytes after encoding the string as UTF-8; attempting to send a payload greater than 504 bytes results in an INVALID_ACCESS_ERR exception.

When the send() method is invoked, the user agent must run the following steps:

  1. Let message be the method's first argument.

  2. If the PeerConnection object's PeerConnection readiness state is CLOSED (3), throw an INVALID_STATE_ERR exception.

  3. Let data be message encoded as UTF-8. [UTF-8]

  4. If data is longer than 504 bytes, throw an INVALID_ACCESS_ERR exception and abort these steps.

  5. If the PeerConnection's PeerConnection data UDP media stream is not an active data UDP media stream, abort these steps. No message is sent.

  6. If the user agent is rate-limiting packets sent using this API, and sending the data packet at this time would exceed the limit, then abort these steps. User agents may report this to the user, e.g. in a development console.

  7. Transmit a data packet to a peer using the PeerConnection's PeerConnection data UDP media stream with data as the message.

ParameterTypeNullableOptionalDescription
textDOMString
No exceptions.
Return type: void

4.1.3 Constants

ACTIVE of type unsigned short
The ICE Agent has concluded ICE processing. If any media streams were successfully negotiated, any relevant media is streaming.
CLOSED of type unsigned short
The close() method has been invoked.
NEGOTIATING of type unsigned short
The ICE Agent is actively performing ICE processing.
NEW of type unsigned short
The object was just created and its ICE Agent has not yet been started.
PeerConnection implements EventTarget;

All instances of the PeerConnection type are defined to also implement the EventTarget interface.

4.2 SignalingCallback

[Callback=FunctionOnly, NoInterfaceObject]
interface SignalingCallback {
    void handleEvent (in DOMString message, in PeerConnection source);
};

4.2.1 Methods

handleEvent
Def TBD
ParameterTypeNullableOptionalDescription
messageDOMString
sourcePeerConnection
No exceptions.
Return type: void

4.3 Examples

When two peers decide they are going to set up a connection to each other, they both go through these steps. The STUN/TURN server configuration describes a server they can use to get things like their public IP address or to set up NAT traversal. They also have to send data for the signaling channel to each other using the same out-of-band mechanism they used to establish that they were going to communicate in the first place.

// the first argument describes the STUN/TURN server configuration
var local = new PeerConnection('TURNS example.net', sendSignalingChannel);
local.signalingChannel(...); // if we have a message from the other side, pass it along here

// (aLocalStream is some LocalMediaStream object)
local.addStream(aLocalStream); // start sending video

function sendSignalingChannel(message) {
  ... // send message to the other side via the signaling channel
}

function receiveSignalingChannel (message) {
  // call this whenever we get a message on the signaling channel
  local.signalingChannel(message);
}

local.onaddstream = function (event) {
  // (videoElement is some <video> element)
  videoElement.src = URL.getObjectURL(event.stream);
};

5. The data stream

All PeerConnection connections include a data UDP media stream, which is used to send data packets peer-to-peer, for instance game control packets. This data channel is unreliable (packets are not guaranteed to be delivered), and packets received out of order are discarded.

SDP media descriptions for data UDP media streams must use the "application" media type, the "udp" transport protocol, and the "application/html-peer-connection-data" media format description. [SDP]

All SDP media descriptions for data UDP media streams must include a label attribute ("a=label:") whose value is the string "data". [SDP] [SDPLABEL]

All SDP media descriptions for data UDP media streams must also include a key field ("k="), with the value being a base64-encoded representation of 16 cryptographically random bytes determined on a per-ICE-Agent basis. [SDP]

PeerConnection ICE Agents must attempt to establish a connection for their PeerConnection data UDP media stream with the initial offer/answer exchange, and must maintain that UDP media stream for the ICE Agents' whole lifetime.

Each PeerConnection data UDP media stream has a sending sequence number, which must initially be set to one (1), and a most recently received sequence number, much must initially be zero (0).

A data UDP media stream is an active data UDP media stream if the PeerConnection ICE Agent has selected a destination for it. A data UDP media stream can change active status many times during the lifetime of its PeerConnection object (e.g. any time the network topology changes and the ICE Agent performs an ICE Restart). [ICE]

Bytes transmitted on a data UDP media stream are masked so as to prevent cross-protocol attacks (data UDP media stream always appear to contain random noise to other protocols). For the purposes of masking, the data UDP media stream masking salt is defined to be the following 16 bytes, described here as hexadecimal numbers: DB 68 B5 FD 17 0E 15 77 56 AF 7A 3A 1A 57 75 02

Bytes transmitted on a data UDP media stream are also hashed so as to prevent forgery attacks (an attacker cannot change the data without knowing the key negotiated via the signaling channel). For the purposes of this hashing, the data UDP media stream hashing salt is defined to be the following 16 bytes, described here as hexadecimal numbers: 4E 2F 96 AB 0A 39 92 A2 56 94 91 F5 7E 58 2E FA

When the user agent is to transmit a data packet to a peer using a data UDP media stream and with a byte string payload raw message, the user agent must run the following steps:

  1. Let nonce be 16 cryptographically random bytes.

  2. Let ice-key be the 16 bytes given as the encryption key for the data UDP media stream in its media description, as defined above.

  3. Let sending sequence number be the current sending sequence number.

  4. Increment the sending sequence number by one (1).

  5. Let mask-key be the first 16 bytes of the HMAC-SHA1 of the 16 data UDP media stream masking salt bytes keyed with the 16 ice-key bytes. [HMAC] [SHA1]

  6. Let typed raw message be the concatenation of the sequence number as a big-endian 64 bit integer, three 0x00 bytes, a 0x01 byte, and raw message.

  7. Let masked message be the result of encrypting typed raw message using AES-128-CTR keyed with mask-key and using the 16 nonce bytes as the initial counter value. [AES]

  8. Let masked message with nonce be the concatenation of nonce and masked message.

  9. Let hash-key be the first 16 bytes of the HMAC-SHA1 of the 16 data UDP media stream hashing salt bytes keyed with the 16 ice-key bytes. [HMAC] [SHA1]

  10. Let hash be the first 16 bytes of the HMAC-SHA1 of masked message with nonce keyed with the 16 hash-key bytes. [HMAC] [SHA1]

  11. Let hashed masked message with nonce be the concatenation of hash and masked message with nonce.

  12. Send hashed masked message with nonce in a UDP packet to the destination that the relevant PeerConnection ICE Agent has selected a destination for the data UDP media stream.

When a packet that is part of a data UDP media stream is received, the user agent must run the following steps:

  1. Let hashed masked message with nonce be the UDP packet's data.

  2. If hashed masked message with nonce is shorter than 32 bytes, then abort these steps.

  3. Let ice-key be the 16 bytes given as the encryption key for the data UDP media stream in the media description for this media stream. [SDP]

  4. Let hash-key be the first 16 bytes of the HMAC-SHA1 of the 16 data UDP media stream hashing salt bytes keyed with the 16 ice-key bytes. [HMAC] [SHA1]

  5. Let hash be the first 16 bytes of the hashed masked message with nonce.

  6. Let masked message with nonce be all but the first 16 bytes of hashed masked message with nonce.

  7. If hash does not equal the first 16 bytes of the HMAC-SHA1 of masked message with nonce keyed with the 16 hash-key bytes, abort these steps. [HMAC] [SHA1]

  8. Let nonce be the first 16 bytes of the masked message with nonce.

  9. Let masked message be all but the first 16 bytes of masked message with nonce.

  10. Let mask-key be the first 16 bytes of the HMAC-SHA1 of the 16 data UDP media stream masking salt bytes keyed with the 16 ice-key bytes. [HMAC] [SHA1]

  11. Let typed raw message be the result of decrypting masked message using AES-128-CTR keyed with mask-key and using the 16 nonce bytes as the initial counter value. [AES]

  12. Let sequence number be the result of interpreting the first eight bytes of typed raw message as a 64 bit big-endian integer.

  13. If sequence number is less than the most recently received sequence number then abort these steps.

  14. Let the most recently received sequence number be sequence number.

  15. If the ninth, tenth, eleventh, and twelfth bytes of typed raw message are not 0x00, 0x00, 0x00, and 0x01 respectively, then abort these steps.

  16. Let raw message be the byte string consisting of all but the first twelve bytes of typed raw message.

  17. Let message be raw message decoded as UTF-8, with error handling.

  18. Create an event that uses the MessageEvent interface, with the name message, which does not bubble, is not cancelable, has no default action, and has a data attribute whose value is message, and queue a task to dispatch the event at the PeerConnection object responsible for this side of the data UDP media stream.

Though described above as being computed for each packet, the ice-key, hash-key, and mask-key values can be precomputed as soon as the PeerConnection ICE Agent is started.

The format of a packet sent over a data UDP media stream, as generated and parsed by the algorithms above, is as follows. The total overhead per packet is thus 44 bytes, of which four are intended for future extensions.

                /'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''.
+--------------+ +---------------+ +-ENCRYPTED------------------------------------------------------------+ :
| 16 byte hash | | 16 byte nonce | | [ 8 bytes of sequence number ] [ 4 bytes of frame type ] [ data... ] | :
+--------------+ +---------------+ +----------------------------------------------------------------------+ :
                \...........................................................................................'

A remotely-initiated data UDP media stream is the first "sendrecv" media stream in the initial offer whose media is "application", whose transport protocol is "udp", whose media format description is "application/html-peer-connection-data", and whose label attribute ("a=label:") has the value "data".

The task source for this task is the networking task source.

5.1 Security considerations

The data UDP media stream packet format is designed to protect against several obvious attacks. The data is made to appear pseudo-random, so that it cannot be used in a cross-protocol attack, even if somehow the stream were to be directed at an unsuspecting remote host. The data is hashed in such a way that it cannot be modified in transit. That data is encrypted so that it cannot be read in transit.

These security mechanisms rely in part on a key that is negotiated over the signalling channel; as such, the security is only as strong as the security of the signaling channel. Authors are encouraged to use TLS to protect the signalling channel and the page(s) hosting the application, and are encouraged to secure the host used to relay the signalling channel.

To avoid network traffic congestion and other denial of service attacks based on traffic volume, user agents should apply rate-limiting to data UDP media streams.

6. Garbage collection

A Window object has a strong reference to any PeerConnection objects created from the constructor whose global object is that Window object.

7. Event definitions

The addstream and removestream events use the MediaStreamEvent interface:

7.1 MediaStreamEvent

Firing a stream event named e with a MediaStream stream means that an event with the name e, which does not bubble (except where otherwise stated) and is not cancelable (except where otherwise stated), and which uses the MediaStreamEvent interface with the stream attribute set to stream, must be created and dispatched at the given target.

interface MediaStreamEvent : Event {
    readonly attribute MediaStream? stream;
    void initMediaStreamEvent (in DOMString typeArg, in boolean canBubbleArg, in boolean cancelableArg, in MediaStream? streamArg);
};

7.1.1 Attributes

stream of type MediaStream, readonly, nullable

The stream attribute represents the MediaStream object associated with the event.

No exceptions.

7.1.2 Methods

initMediaStreamEvent

The initMediaStreamEvent() method must initialize the event in a manner analogous to the similarly-named method in the DOM Events interfaces. [DOM-LEVEL-3-EVENTS]

ParameterTypeNullableOptionalDescription
typeArgDOMString
canBubbleArgboolean
cancelableArgboolean
streamArgMediaStream
No exceptions.
Return type: void

8. Event summary

This section is non-normative.

The following event fires on MediaStream objects:

Event name Interface Fired when...
ended Event The MediaStream object will no longer stream any data, either because the user revoked the permissions, or because the source device has been ejected, or because the remote peer stopped sending data, or because the stop() method was invoked.

The following events fire on PeerConnection objects:

Event name Interface Fired when...
connecting Event The ICE Agent has begun negotiating with the peer. This can happen multiple times during the lifetime of the PeerConnection object.
open Event The ICE Agent has finished negotiating with the peer.
message MessageEvent A data UDP media stream message was received.
addstream MediaStreamEvent A new stream has been added to the remoteStreams array.
removestream MediaStreamEvent A stream has been removed from the remoteStreams array.

9. application/html-peer-connection-data

This registration is for community review and will be submitted to the IESG for review, approval, and registration with IANA.

Type name:
application
Subtype name:
html-peer-connection-data
Required parameters:
No required parameters
Optional parameters:
No optional parameters
Encoding considerations:
This MIME type defines a binary protocol format which uses UTF-8 for text encoding.
Security considerations:

This format is used for encoding UDP packets transmitted by potentially hostile Web page content via a trusted user agent to a destination selected by a potentially hostile remote server. To prevent this mechanism from being abused for cross-protocol attacks, all the data in these packets is masked so as to appear to be random noise. The intent of this masking is to reduce the potential attack scenarios to those already possible previously.

However, this feature still allows random data to be sent to destinations that might not normally have been able to receive them, such as to hosts within the victim's intranet. If a service within such an intranet cannot handle receiving UDP packets containing random noise, it might be vulnerable to attack from this feature.

Interoperability considerations:
Rules for processing both conforming and non-conforming content are defined in this specification.
Published specification:
This document is the relevant specification.
Applications that use this media type:
This type is only intended for use with SDP. [SDP]
Additional information:
Magic number(s):
No sequence of bytes can uniquely identify data in this format, as all data in this format is intentionally masked to avoid cross-protocol attacks.
File extension(s):
This format is not for use with files.
Macintosh file type code(s):
This format is not for use with files.
Person & email address to contact for further information:
Daniel C. Burnett <dburnett@voxeo.com>
Intended usage:
Common
Restrictions on usage:
No restrictions apply.
Author:
Daniel C. Burnett <dburnett@voxeo.com>
Change controller:
W3C

Fragment identifiers cannot be used with application/html-peer-connection-data as URLs cannot be used to identify streams that use this format.

A. Acknowledgements

The editors wish to thank the Working Group chairs, Harald Alvestrand and Stefan Håkansson, for their support.

B. References

B.1 Normative references

[AES]
NIST FIPS 197: Advanced Encryption Standard (AES). November 2001. URL: http://csrc.nist.gov/publications/fips/fips197/fips-197.pdf
[DOM-LEVEL-3-EVENTS]
Björn Höhrmann; Tom Pixley; Philippe Le Hégaret. Document Object Model (DOM) Level 3 Events Specification. 31 May 2011. W3C Working Draft. (Work in progress.) URL: http://www.w3.org/TR/2011/WD-DOM-Level-3-Events-20110531/
[FILE-API]
Arun Ranganathan. File API. 17 November 2009. W3C Working Draft. (Work in progress.) URL: http://www.w3.org/TR/2009/WD-FileAPI-20091117/
[UTF-8]
F. Yergeau. UTF-8, a transformation format of ISO 10646. IETF RFC 3629. November 2003. URL: http://www.ietf.org/rfc/rfc3629.txt
[WEBIDL]
Cameron McCormack. Web IDL. 19 December 2008. W3C Working Draft. (Work in progress.) URL: http://www.w3.org/TR/2008/WD-WebIDL-20081219

B.2 Informative references

No informative references.