WebRTC for Web Demystified

Mohyaddin Alaoddin
8 min readJul 24, 2020

--

WebRTC, the web real-time communication, a standard that enables developers to create live audio, video and, text chat applications, whether on web or mobile devices, in this article I’ll be explaining its implementation on web, so that you can start creating your own live chat web application or site.

The standard is supported by a wide variety of browsers but, they differ in some details, the implementation explained in this article should get you up and running on Google Chrome as well as Mozilla Firefox, as I’m still working on its usage on other web browsers like Apple Safari and, Microsoft Edge.

The Concept

The idea of this standard is simple, it creates peer-to-peer connections between the parties who want to have a call, by introducing their device capabilities — audio and video encoding and decoding — and, networking details to one another, in other words when you call your friend Pete, you exchange your devices media and networking details with him, in order for you both to be connected in a call.

The Main Protocols

The standard works essentially on top of two major protocols, first is the SDP “Session Description Protocol” which defines the device’s media capabilities and constraints and, ICE “Internet Connection Establishment” protocol which defines the network information that other devices can use to communicate with your device.

The Device Media

Before you initiate a call you must initialize your own device’s media capturing i.e. you need access to your microphone to stream your audio and, webcam or screen feed to stream your video, of course you can decide to have audio-only call or video. You can get the stream of your media devices in the JavaScript of your app like this:

var localStream;
navigator.mediaDevices.getUserMedia({video: true, audio: true}).then(function(stream){
localStream = stream;
handlePendingData(); // Defined later on.
}).catch(function(error){
console.log(error);
});

Note the object given to the getUserMedia method above, it decides whether to get audio, video or, both, to view the captured stream you’ll need a video tag in your html view but, with muted attribute added so that you don’t hear your own audio back, then in the script you’ll write $('#your-video-id')[0].srcObject = localStream;, of course it’s up to you how you’d name and select your video element — and yes the$ sign is jQuery.

Caveats to know Before we move on

The first thing to bear in mind is that most of WebRTC related methods are async functions that returns a Promise i.e. the next step after calling any of them must be carried out in a closure passed to the then method of the returned Promise, or by using the await expression where applicable.

The second thing is to realize that your own device should not be defined as a peer i.e. you don’t create an RTCPeerConnection object for your own device, but rather you create it for other devices who are contacting you.

The RTCPeerConnection object is what defines the connection between your device and another device, so in a call between two parties, each party will be having one RTCPeerConnection that ties him to the other device, in a group call of four parties, each party will be having three RTCPeerConnection objects, consider that object as the line that connects two dots together, and each dot is a peer — or in other words a device.

STUN, TURN and, Signaling servers

Peers — devices — need to know each other’s networking and media information in order to have a proper connection where the call can take place, to achieve this WebRTC standard requires helping servers to introduce those peers to one another.

First is the STUN server —Session Traversal Utilities for NAT — which identifies your network details in a way that enables other peers to communicate with your device over the internet.

Second is the TURN server — Traversal Using Relays around NAT — which can be used to relay the communication between your device and other peers, so that your communication with other devices is not direct peer-to-peer anymore, as the call stream goes through this server first then, in turn it pushes the stream to the other end of the call.

Those two types of servers collect what is called the RTCIceCandidate which defines a possible network path for remote peers — other devices — to use in their communication with your device, and as you’ve noticed the data gathered is the details of your own device, so how to exchange that data with other devices — including SDPs — is what the signaling server does.

You can use some of STUN and TURN servers found in here and here.

Let’s Begin

Now we know the basics, and have initialized our media devices in the localStream variable, we can start defining the main requirements in our application’s script, following the code block written above we add:

// ICE candidates gathering servers, you can use up to five at once.
var iceServers = [
{url: "stun:stun3.l.google.com:19302"},
{url: "stun:stun3.l.google.com:19302"}
];
// Initializing the signaling server, which is basically a websocket.
var signalingChannel = new WebSocket("wss://your_websocket_server");
var peers = {}, pendingPeers = [], pendingMessages = [], pendingSdps = {}, pendingCandidates = {}, ready = false, currentId = Math.floor(Date.now() / 1000);

STUN and, TURN servers are provided as cloud service, some will be free and, some will be paid and require credentials but, the signaling server is ours to implement, it’s up to you how to implement it, whether in socket.io, node-js or, php with Ratchet like shown in here.

Let’s define some handlers

The next functions will handle different events that occurs within the application, initially the application is set to not ready, which means we didn’t get the permission of access to user’s media devices yet, once given the localStream will be set and the following function will be called:

function handlePendingData(){
ready = true;
for(var i in pendingPeers)
initPeer(pedingPeers[i])
pendingPeers = [];
setTimeout(function(){
for(var i in pendingMessages)
handleSignalingMessage(pendingMessages[i]);
pendingMessages = [];
}, 1000);
}
function initPeer(peerId){
peers[peerId] = new RTCPeerConnection({
iceServers: iceServers
});
initRemoteStream(peerId);
addIceListeners(peerId);
// Creating an offer SDP for the joining peer.
if(peerId > currentId){
var peer = peers[peerId];
peer.createOffer().then(function(offer){
return peer.setLocalDescription(offer);
}).then(function(){
pendingSdps[peerId] = {
action: 'offer',
id: peer.id,
offer: peer.localDescription
};
});
}
}

The function is basically initializing any existing peers that were in the room when we’ve come in, attaching new video tags into our html view for each of them and, adding their corresponding stream listeners in the function initRemoteStream, plus setting ICE candidate events listeners in the addIceListeners function, let’s see those two functions in detail:

function initRemoteStream(peerId){
var peer = peers[peerId];
$('body').append('<video id="'+peerId+'"></video>');
if(!sendLocalStream(peer))
var localStreamSent = setInterval(function(){
if(sendLocalStream(peer))
clearInterval(localStreamSent);
}, 100);
var stream = new MediaStream();
peer.addEventListener('track', function(e){
var peerVideo = $('#' + peerId).find('video')[0];
stream.addTrack(e.track);
if(!peerVideo.srcObject)
peerVideo.srcObject = stream;
console.log('Remote Track Added', e.track);
});
}
function addIceListeners(peerId){
var peer = peers[peerId];
peer.addEventListener('icecandidate', function(e){
if(e.candidate)
signalingChannel.send(JSON.stringify({
action: 'candidate',
id: peerId,
candidate: e.candidate
}));
});
peer.addEventListener('icecandidateerror', function(e){
console.error('ICE error:', e);
});
peer.addEventListener('icegatheringstatechange', function(e){
var connection = e.target;
if(connection.iceGatheringState == 'complete' && pendingSdps[peerId] != null){
signalingChannel.send(JSON.stringify(pendingSdps[peerId]));
pendingSdps[peerId] = null;
}
});
}

Now, if you take a closer look at the last event listener added icegatheringstatechange, you’ll see that the signalingChannel is used to send a pending SDP to the other peer, which is your device’s media capabilities and constraints object, which means we’re not sending the SDP once it’s created, but rather when all ICE candidates are gathered.

And as for sendLocalStream call above, it’s not a one shot call because, our device’s stream might not have been initialized yet, that’s why it was put in a repetitive interval, which stops once the stream is successfully initialized, let’s see what’s that function does:

function sendLocalStream(peerId){
var peer = peers[peerId];
if(localStream != null){
localStream.getTracks().forEach(function(track){
peer.addTrack(track);
console.log('Local Track Added', track);
});
return true;
}
return false;
}`

In short the previous function sends our media stream — audio and video depending on our initialization — to the other peer over its RTCPeerConnection object i.e. peer is an RTCPeerConnection object, here and in all other mentioned functions.

Wrapping things up!

I know that it’s been a long journey but, bear with me a little longer and, by the end of it you’ll have a full picture of making a functional WebRTC video meeting — or chat room — , let’s define the handleSignalingMessage function which handles incoming signaling server’s messages that are sent to us by other peers in the room — or by the server itself.

function handleSignalingMessage(message){
switch(message.action){
case 'offer':
handleOffer(message.offer, message.senderId);
break;
case 'answer':
handleAnswer(message.answer, message.senderId);
break;
case 'candidate':
handleCandidate(message.candidate, message.senderId);
break;
case 'close':
peers.splice(message.id, 1);
$('#' + message.id).remove();
}
}
function handleOffer(offer, senderId){
var peer = peers[senderId];
peer.setRemoteDescription(new RTCSessionDescription(offer)).then(function(){
handlePendingCandidates(peerId);
return peer.createAnswer();
}).then(function(answer){
return peer.setLocalDescription(answer);
}).then(function(){
pendingSdps[senderId] = {
action: 'answer',
id: senderId,
answer: peer.localDescription
};
});
}
function handleAnswer(answer, peerId){
var peer = peers[peerId];
if(!peer.remoteDescription){
var peer = peers[peerId];
peer.setRemoteDescription(new RTCSessionDescription(answer)).then(function(){
handlePendingCandidates(peerId);
});
}
}
function handleCandidate(candidate, peerId){
var peer = peers[peerId];
if(!peer.remoteDescription){
if(pendingCandidates[peerId] === undefined)
pendingCandidates[peerId] = [candidate];
else
pendingCandidates[peerId].push(candidate);
} else
peer.addIceCandidate(new RTCIceCandidate(candidate)).catch(function(e){
console.error('Could not add received ICE candidate', e);
});
}
function handlePendingCandidates(peerId){
var peer = peers[peerId];
pendingCandidates[peerId].forEach(function(candidate){
peer.handleCandidate(candidate);
});
pendingCandidates[peerId] = [];
}
signalingChannel.onmessage = function(e){
var message = JSON.parse(e.data);
if(message.action == 'init'){
if(ready)
initPeer(message.id);
else
pendingPeers.push(message.id);
} else if(ready)
handleSignalingMessage(message);
else
pendingMessages.push(message);
};
signalingChannel.onopen = function(){
signalingChannel.send(JSON.stringify({
action: 'init',
id: currentId
}));
};

That’s all of the application logic laid out, the previous handlers handleOffer, handleAnswer and, handleCandidate handle incoming Offer SDPs, Answer SDPs and, ICE candidates from other peers, handling some of them is delayed until the application is set to ready — which happens when access to the webcam or screen feed is granted — , as for the candidates they’re not handled until the remote SDP of the peer is set — if handled before, the candidates gathering of the current peer would stop as seen in Google Chrome.

The start point of the application is the onopen handler set on the signalingChannel, which sends the ID of the current peer to all other peers once the channel — websocket — opens, and all other actions are triggered in the onmessage handler of the channel.

In summary this meeting web application opens your webcam and, whenever someone else joins by running the application, he calls all other existing parties in the meeting — including you — by creating the offer SDP then sending it over to them, the created offer SDP is set as local description to the peer, this would trigger ICE candidates — network paths — gathering of the peer and sending them to other parties to connect.

Finally when the candidates gathering is complete, the created offer SDP — along with any other collected SDPs — are sent, other peers getting the offer SDPs will set them as remote description, create answer SDPs and set them as local description — which triggers ICE candidates gathering on their side — then, finally send them back to peers who sent them offer SDPs, the gathered candidates on the answering side is sent to the offering side, and now all parties know one another’s networking and media details so they’ll seamlessly connect and stream their webcams — or recording — to one another.

I hope I’ve given you the clearest picture about WebRTC and, how to start using it, I’ll try to provide a functional prototype of the app explained in this article on GitHub whenever the time allows. Thanks for reading this far and, if there are any questions don’t hesitate to ask.

--

--

Mohyaddin Alaoddin
Mohyaddin Alaoddin

Written by Mohyaddin Alaoddin

Computer geek and, emotional spontaneous writer

No responses yet