JSEP1 BRANCH - WebRTC 1.0: Real-time Communication Between Browsers

1. Conformance
2. Introduction
3. Network Stream API
4. Peer-to-peer connections
- 4.1 PeerConnection
5. IANA Registrations
- 5.1 Constraints
6. Simple Example
7. Advanced Example
8. Peer-to-peer Data API
- 8.1 DataChannel
- 8.2 Examples
9. Garbage collection
10. Event definitions
11. Event summary
12. Change Log
A. Acknowledgements
B. References
- B.1 Normative references
- B.2 Informative references

1. Conformance

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

The key words must, must not, required, should, should not, recommended, may, and optional in this specification are to be interpreted as described in [RFC2119].

Implementations that use ECMAScript to implement the APIs defined in this specification must implement them in a manner consistent with the ECMAScript Bindings defined in the Web IDL specification [WEBIDL], as this specification uses that specification and terminology.

3. Network Stream API

3.1 Introduction

The MediaStream interface, as defined in the [GETUSERMEDIA] specification, typically represents a stream of data of audio and/or video. A MediaStream may be extended to represent a stream that either comes from or is sent to a remote node (and not just the local camera, for instance). The extensions required to enable this capability on the MediaStream object will be described in this document.

A MediaStream as defined in [GETUSERMEDIA] may contain zero or more MediaStreamTrack objects. A MediaStreamTrack sent to another peer will appear as one and only one MediaStreamTrack to the recipient.

Channels are the smallest unit considered in the MediaStream specification. Channels are intended to be encoded together for transmission as, for instance, an RTP payload type. All of the channels that a codec needs to encode jointly must be in the same MediaStreamTrack and the codecs should be able to encode, or discard, all the channels in the track.

The concepts of an input and output to a given MediaStream apply in the case of MediaStream objects transmitted over the network as well. A MediaStream created by a PeerConnection object (later described in this document) will take as input the data received from a remote peer. Similarly, a MediaStream from a local source, for instance a camera via [GETUSERMEDIA] will have an output that represents what is transmitted to a remote peer if the object is used with a PeerConnection object.

The concept of duplicating MediaStream objects as described in [GETUSERMEDIA] is also applicable here. This feature can be used, for instance, in a video-conferencing scenario to display the local video from the user’s camera and microphone in a local monitor, while only transmitting the audio to the remote peer (e.g. in response to the user using a "video mute" feature). Combining tracks from different MediaStream objects into a new MediaStream is useful in certain cases.

3.2 Interface definitions

In this section, we only specify aspects of the the following objects that are relevant when used along with a PeerConnection. Please refer to the original definitions of the objects in the [GETUSERMEDIA] document for general information on using MediaStream and MediaStreamTrack both in and outside the context of PeerConnection.

3.2.1 MediaStream

3.2.1.1 label

The label attribute specified in MediaStream returns a label that is unique to this stream, so that streams can be recognized after they are sent through the PeerConnection API.

When a MediaStream is created to represent a stream obtained from a remote peer, the label attribute is initialized from information provided by the remote source.

The label of a MediaStream object is unique to the source of the stream, but that does not mean it is not possible to end up with duplicates. For example, a locally generated stream could be sent from one user to a remote peer using PeerConnection, and then sent back to the original user in the same manner, in which case the original user will have multiple streams with the same label (the locally-generated one and the one received from the remote peer).

3.2.1.2 Events on MediaStream

A new media component may be associated with an existing MediaStream. This happens, e.g., on the A-side when the B-side adds a new MediaStreamTrack object to one of the track lists of a MediaStream that is being sent over a PeerConnection. If this happens for the reason exemplified, or for any other reason than the add() [GETUSERMEDIA] method being invoked locally on a MediaStreamTrackList or tracks are being added as the stream is created (i.e. the stream is initialized with tracks), the user agent must run the following steps:

Create a MediaStreamTrack object track to represent the new media component.
If track’s kind attribute equals "audio", add it to the MediaStream object’s audioTracks MediaStreamTrackList object. [[OPEN ISSUE: Is there a way to generalize this so that if we add a "smell" track this continues to work.]]
If track’s kind attribute equals "video", add it to the MediaStream object’s videoTracks MediaStreamTrackList object.
Fire a track event named addtrack with the newly created track at the MediaStreamTrackList object.

An existing media component may also be disassociated from a MediaStream. If this happens for any other reason than the remove() [GETUSERMEDIA] method being invoked locally on a MediaStreamTrackList or the stream is being destroyed, the user agent must run the following steps:

Let track be the MediaStreamTrack object representing the media component about to be removed.
Remove track from the MediaStreamTrackList object.
Fire a track event named removetrack with track at the MediaStreamTrackList object.

The event source for the onended event in the networked case is the PeerConnection object.

3.2.2 MediaStreamTrack

A MediaStreamTrack object’s reference to its MediaStream in the non-local media source case (an RTP source, as is the case for a MediaStream received over a PeerConnection) is always strong.

When a track belongs to a MediaStream that comes from a remote peer and the remote peer has permanently stopped sending data the ended event must be fired on the track, as specified in [GETUSERMEDIA]. [[ OPEN ISSUE: How do you know when it has stopped? This seems like an SDP question, not a media-levelquestion.]]

A track in a MediaStream, received with a PeerConnection, must have its readyState attribute [GETUSERMEDIA] set to MUTED (1) until media data arrives.

In addition, a MediaStreamTrack has its readyState set to MUTED on the B-side if the A-side disables the corresponding MediaStreamTrack in the MediaStream that is being sent. When the addstream event triggers on a PeerConnection, all MediaStreamTrack objects in the resulting MediaStream are muted until media data can be read from the RTP source. [[ OPEN ISSUE: How do you know when it has been disabled? This seems like an SDP question, not a media-levelquestion.]]

3.3 AudioMediaStreamTrack

The AudioMediaStreamTrack is a specialization of of a normal MediaStreamTrack that only carries audio and is extended to have the capability to send and/or receive DTMF codes.

interface AudioMediaStreamTrack : MediaStreamTrack {
    readonly attribute boolean canInsertDTMF;
    void insertDTMF (DOMString tones, optional long duration);
};

3.3.1 Attributes

canInsertDTMF of type boolean, readonly: The canInsertDTMF attribute must indicate if the AudioMediaStreamTrack is capable of sending DTMF.

3.3.2 Methods

insertDTMF

When a AudioMediaStreamTrack object’s insertDTMF() method is invoked, the user agent must queue a task that that sends the DTMF tones.

The tone parameters is treated as a series of characters. The characters 0 to 9, A to D, #, and * generated the associated DTMF tones. The characters a to d are equivalent to A to D. The character , indicates a an delay of 2 seconds before processing the next character in the tones parameter. Unrecognized characters are ignored.

The duration parameters indicates the duration in ms to play the each DTMF passed in the tones parameters. The duration can not be more than 6000 or less than 70. The default duration is 100 ms for each tone. The gap between tones must be at least 50 ms but should be as short as possible. [[OPEN ISSUE: How are invalid values handled?]]

If insertDTMF is called on the same object while an existing task for this object is generate DTMF is still running, the previous task is canceled. Calling insertDTMF with an empty tones parameter can be used to cancel any tones currently being send.

Editor Note: We need to add a callback that is set on the object that is called after the tones are sent. This is needed to allow the application to know when it can send new tones without canceling the tones that are currently being sent.

Editor Note: It seems we would want a callback or event for incoming tones. The proposal sent to the list had them played as audio to the speaker but I don’t see how that is useful.

Parameter	Type	Nullable	Optional	Description
tones	`DOMString`	✘	✘
duration	`long`	✘	✔

Return type: void

4. Peer-to-peer connections

A PeerConnection allows two users to communicate directly, browser to browser. Communications are coordinated via a signaling channel which is provided by unspecified means, but generally by a script in the page via the server, e.g. using XMLHttpRequest.

Calling new PeerConnection(configuration ) creates a PeerConnection object.

The configuration has the information to find and access the [STUN] and [TURN] servers. There may be multiple servers of each type and any TURN server also acts as a STUN server.

A PeerConnection object has an associated ICE Agent, PeerConnection state, and ICE State. These are initialized when the object is created.

When the PeerConnection() constructor is invoked, the user agent must run the following steps. This algorithm has a synchronous section (which is triggered as part of the event loop algorithm).

Create an ICE Agent and let connection’s PeerConnection ICE Agent be that ICE Agent and provide it the STUN and TURN servers from the configuration array. The [ICE] will proceed with gathering as soon as the IceTransports constraint is not set to "none". At this point the ICE Agent does not know how many ICE components it needs (and hence the number of candidates to gather) but it can make a reasonable assumption and as the PeerConnection object gets more information, it can adjust the number of components.
Set connection’s PeerConnection readiness state to "new".
Set connection’s PeerConnection ice state to "new".
Let connection’s localStreams attribute be an empty read-only MediaStream array.
Let connection’s remoteStreams attribute be an empty read-only MediaStream array.
Return connection, but continue these steps asynchronously.
Await a stable state. The synchronous section consists of the remaining steps of this algorithm.

During the lifetime of the PeerConnection object, the following procedures are followed:

If the ice state is "new" and the IceTransports constraint is not set to "none", it must queue a task to start gathering ICE address and set the ice state to "gathering".
If the ICE Agent has found one or more candidate pairs for any MediaTrack that forms a valid connection, the ICE state is changed to "connected".
When the ICE Agent finishes checking all candidate pairs, if at least one connection has been found for some MediaTrack, the iceState is changed to "completed" and if no connection has been found for any MediaTrack, the iceState is changed to "failed". [[OPEN ISSUE: Note that this means that if I was able to negotiate audio but not video via ICE, then iceState == "completed". Is this really what is desired?]]
If the iceState is "connected" or "completed" and both the local and remote session descriptions are set, the peerState is set to "active".
If the iceState is "failed", a task is queued to calls the close method. Open Issue: CJ - this seems wrong to me.

User agents negotiate the codec resolution, bitrate, and other media parameters. User agents are encouraged to initially negotiate for the maximum resolution of a video stream. For streams that are then rendered (using a video element), user agents are encouraged to renegotiate for a resolution that matches the rendered display size.

Starting with the native resolution means that if the Web application notifies its peer of the native resolution as it starts sending data, and the peer prepares its video element accordingly, there will be no need for a renegotiation once the stream is flowing.

The word "components" in this context refers to an RTP media flow and does not have anything to do with how [ICE] uses the term "component".

When a user agent has reached the point where a MediaStream can be created to represent incoming components, the user agent must run the following steps:

Let connection be the PeerConnection expecting this media.
Create a MediaStream object to represent the media stream. [[OPEN ISSUE: What if one already exists?]]
Run the following steps for each component in the media stream.
1. Create a MediaStreamTrack object track to represent the component. [[EDITORIAL: Can we just reference 3.2.1.2 here?]]
2. If track's kind attribute equals "audio", add it to the MediaStream object's audioTracks MediaStreamTrackList object.
3. If track's kind attribute equals "video", add it to the MediaStream object's videoTracks MediaStreamTrackList object.
The creation of new incoming MediaStreams may be triggered either by SDP negotiation or by the receipt of media on a given flow.

The internal order in the MediaStreamTrackList objects on the receiving side should reflect the order on the sending side. One way to enforce this is to specify the order in the SDP.
Queue a task to run the following substeps:
1. If the connection’s PeerConnection readiness state is CLOSED (3), abort these steps.
2. Add the newly created MediaStream object to the end of connection’s remoteStreams array.
3. Fire a stream event named addstream with the newly created MediaStream object at the connection object.

When a user agent has negotiated media for a component that belongs to a media stream that is already represented by an existing MediaStream object, the user agent must associate the component with that MediaStream object.

When a PeerConnection finds that a stream from the remote peer has been removed , the user agent must follow these steps:

Let connection be the PeerConnection associated with the stream being removed.
Let stream be the MediaStream object that represents the media stream being removed, if any. If there isn't one, then abort these steps.
By definition, stream is now finished.

A task is thus queued to update stream and fire an event.
Queue a task to run the following substeps:
1. If the connection’s PeerConnection readiness state is CLOSED (3), abort these steps.
2. Remove stream from connection’s remoteStreams array.
3. Fire a stream event named removestream with stream at the connection object.

The task source for the tasks listed in this section is the networking task source.

If something in the browser changes that causes the PeerConnection object to need to initiate a new session descipriton negotiation, an renegotiationneeded event is fired at the PeerConnection object.

In particular, if a PeerConnection object is consuming a MediaStream and a track is added to one of the stream's MediaStreamTrackList objects, by, e.g., the add() method being invoked, the PeerConnection object must fire the "renegotiationneeded" event. Removal of media components must also trigger "renegotianneeded".

To prevent network sniffing from allowing a fourth party to establish a connection to a peer using the information sent out-of-band to the other peer and thus spoofing the client, the configuration information should always be transmitted using an encrypted connection.

4.1 PeerConnection

The general operation of the PeerConnection is described in [RTCWEB-JSEP].

4.1.1 SdpType

The SdpType enums serve as arguments to setLocalDescription and setRemoteDescription. They provide information as to how the SDP should be handled.

 enum SdpType { "offer", "pranswer", "answer" }

"offer": An SdpType of "offer" indicates that a description should be treated as an [SDP] offer.
"pranswer": An SdpType of "pranswer" indicates that a description should be treated as an [SDP] answer, but not a final answer. A description used as a SDP "pranswer" may be applied as a response to a SDP offer, or an update to a previously sent SDP "pranswer".
"answer": An SdpType of "answer" indicates that a description should be treated as an [SDP] final answer, and the offer-answer exchange should be considered complete. A description used as a SDP answer may be applied as a response to a SDP offer, or an update to a previously send SDP "pranswer".

4.1.2 SessionDescription Class

The SessionDescription() constructor takes one argument, description, whose content is used to construct the new SessionDescription object. This class is a future extensible carrier for for the data contained in it and does not perform any substantive processing.

[Constructor (DOMString description)]
interface SessionDescription {
    attribute SdpType   type;
    attribute DOMString sdp;
    stringifier DOMString ();
};

4.1.2.1 Attributes

sdp of type DOMString: The string representation of the SDP [SDP]
type of type SdpType: What type of SDP this SessionDescription represents.

4.1.2.2 Methods

DOMString: Objects that implement the SessionDescription interface must stringify as [SDP].

No parameters.
Return type: stringifier

4.1.3 SessionDescriptionCallback

 callback SessionDescriptionCallback = void (SessionDescription
          sdp)

SessionDescription sdp: The object containing the SDP [SDP].

4.1.4 PeerConnectionErrorCallback

 callback PeerConnectionErrorCallback = void (DOMString errorInformation)

DOMString errorInformation: Information about what went wrong. Open Issue: How does this work? Is it human readable? I18N? ENUM?

TODO: Open Issue: should this be defined as event like NavigatorUserMediaErrorCallback in getusermedia

4.1.5 PeerState Enum

enum PeerState { "new" "opening", "active", "closing", "closed"
          }

"new": The object was just created, and no networking has yet occurred.
"opening": The user agent is attempting to establish an connection with the ICE Agent and waiting for local and remote SDP to be set. (Open Issue: do we need more states between "opening" and "active")
"active": The ICE Agent has found a connection both the local and remote SDP have been set. It is possible for media to flow.
"closing": The PeerConnection object is terminating all media and is in the process of closing the connection.
"closed": The connection is closed.

4.1.6 IceState Enum

 enum IceState { "new" "gathering", "waiting", "checking",
          "connected", "completed","failed", "closed" }

"new": The PeerConnection object was just created, and no networking has yet occurred.
"gathering": The ICE Agent is attempting to gather addresses.
"waiting": The ICE Agent is not gathering any addresses and is waiting for candidates from the other side before it can start checking.
"checking": The ICE Agent is checking candidate pairs but has not yet found a connection. In addition to checking, it may also still be gathering.
"connected": The ICE Agent has found a connection but is still checking other candidate pairs to see if there is a better connection. It may also still be gathering.
"completed": The ICE Agent has finished gathering and checking and found a connection.
"failed": The ICE Agent is finished checking all candidate pairs and failed to find a connection.
"closed": The ICE Agent has shut down and is no longer responding to STUN requests.

4.1.7 IceCandidate Type

The IceCandidate() constructor takes one argument, candidate, whose content is used to construct the new IceCandidate object. This class is a future extensible carrier for for the data contained in it and does not perform any substantive processing.

[Constructor (DOMString candidate)]
interface IceCandidate {
    attribute DOMString candidate;
    stringifier DOMString ();
};

4.1.7.1 Attributes

candidate of type DOMString: This carries the candidate-attribute as defined in section 15.1 of [ICE]. ( TODO - need to add more information to allow this to match to correct m line - Open Issue: How to correlate. Need to wait for the mapping from media tracks to SDP to be resolved in IETF before tackling this problem).

4.1.7.2 Methods

DOMString: Objects that implement the IceCandidate interface must stringify as the candidate-attribute as defined in section 15.1 of [ICE].

No parameters.
Return type: stringifier

4.1.8 IceCandidateCallback

 callback IceCandidateCallback = void (IceCandidate candidate)

IceCandidate candidate: The new ICE candidate.

4.1.9 IceServers Type - Option 1

Open Issue: choose option 1 or option 2 for IceServers Type.

interface IceServers {
    attribute DOMString servers[][];
};

4.1.9.1 Attributes

servers[][] of type DOMString: The IceServers type is an array of pairs where each pair is defined as an array. Each pair provides the information to reach and use one STUN or TURN server. The first element in each pair is a stun or turn URIs as defined in [STUN-URI] and [TURN-URI]. If the first element of the pair is TURN URI, then the second element of the pair is the credential to use with that TURN server.

In network topologies with multiple layers of NATs, it is desirable to have a STUN servers between every layer of NATs in addition to the TURN servers to minimize the number peer to peer network latency.

An example configuration object is:

{ servers:[ ["stun:stun.example.net"] , ["turn:user@turn.example.org","myPassword"] ]}

4.1.10 IceServers Type - Option 2

Open Issue: choose option 1 or option 2 for IceServers Type.

interface IceServers {
    attribute DOMString servers[];
};

4.1.10.1 Attributes

servers[] of type DOMString: The IceServers type is an array of strings where each string provides the URL and credentials for a server. Each string is either a the URL to reach a STUN server ad defined in [STUN-URI] or is the URL of a TURN server as defined in [TURN-URI] followed by a single space and then the rest of the string is the credential used to access that server. Note the credential may contains spaces.

In network topologies with multiple layers of NATs, it is desirable to have a STUN servers between every layer of NATs in addition to the TURN servers to minimize the number peer to peer network latency.

An example configuration object is:

{ servers:[ "stun:stun.example.net" , "turn:user@turn.example.org myPassword" ]}

4.1.11 PeerConnection Interface

Open Issue: should we collapse some of these functions a single "processRemoteSignal" method?

[Constructor (IceServers configuration, optional MediaConstraints constraints)]
interface PeerConnection {
    void        createOffer (SessionDescriptionCallback successCallback, optional PeerConnectionErrorCallback failureCallback, optional MediaConstraints constraints);
    void        createAnswer (SessionDescription offer, SessionDescriptionCallback successCallback, optional PeerConnectionErrorCallback failureCallback, optional MediaConstraints constraints, optional Boolean createProvisionalAnswer=false);
    void        setLocalDescription (SdpType action, SessionDescription description);
    readonly attribute SessionDescription localDescription;
    void        setRemoteDescription (SdpType action, SessionDescription description);
    readonly attribute SessionDescription remoteDescription;
    readonly attribute PeerState          readyState;
    void        updateIce (optional IceServers configuration, optional MediaConstraints constraints, optional Boolean restart=false);
    void        addIceCandidate (IceCandidate candidate);
    readonly attribute IceState           iceState;
    readonly attribute MediaStream[]      localStreams;
    readonly attribute MediaStream[]      remoteStreams;
    DataChannel createDataChannel ([TreatNullAs=EmptyString] DOMString? label, optional DataChannelInit? dataChannelDict);
             attribute Function?          ondatachannel;
    void        addStream (MediaStream stream, optional MediaConstraints constraints);
    void        removeStream (MediaStream stream);
    void        close ();
             attribute Function?          onrenegotationneeded;
             attribute Function?          onicecandidate;
             attribute Function?          onconnecting;
             attribute Function?          onopen;
             attribute Function?          onstatechange;
             attribute Function?          onaddstream;
             attribute Function?          onremovestream;
             attribute Function?          onicechange;
};

4.1.11.1 Attributes

iceState of type IceState, readonly

The iceState attribute must return the state of the PeerConnection ICE Agent ICE state.

localDescription of type SessionDescription, readonly

The localDescription method returns a copy of the SessionDescription that was most recently passed to setLocalDescription, plus any local candidates that have been generated by the ICE Agent since then.

A null object will be returned if the local description has not yet been set.

localStreams of type array of MediaStream, readonly

Returns a live array containing the local streams (those that were added with addStream()).

onaddstream of type Function, nullable

This event handler, of event handler event type


                addstream