What kinds of HTML5 streaming are there (and why “mp4 streaming” is not one of them)

Customers often ask us if our server can do “mp4 streaming to HTML5.” In 99% of cases, they don’t understand what they’re talking about. It’s hard to blame the customers though: due to confusing terminology, technical complexity, and overwhelming variety of ways to stream video on the web, it’s no big deal to get confused.

In this article, we will once and for all clear up the confusion. We’ll tell what kinds of HTML5 streaming there are, which of them are any good, and why, for God’s sake, you can’t say “mp4 streaming.” html5-video-logo

Glossary

HTML5 video is when you put a <video> tag in your web page and set a certain src for it. HTML5 streaming is the same thing but when src points not to a complete video file but rather to an ever-updating video stream. YouTube does HTML video, Twitch does HTML5 streaming.

The <video> tag doesn’t care how the stream is formed or transmitted, or whether the browser will be able to play it at all. It only cares about src pointing to any video stream. Technically speaking, the HTML5 spec tell nothing about which protocols, transports, or codecs should be used with HTML5 video.

Protocol defines how communicating parties exchange data to receive data. The two parties are almost always a client and a server, client being the one initiating the communication. The stream can flow from the server to the client (video playback) or from the client to the server (broadcasting). Even when a monstrous petaFLOPS supercomputer connects to a tiny IP camera, the supercomputer is the client and the camera is the server.

A protocol usually implies at least the Play command (“start playback”), but there can be are others: pause, continue, broadcast, fast-forward, etc.

Examples of protocols: RTSP, RTMP, HTTP, HLS, IGMP.

Transport, a.k.a. transport container, a.k.a. container defines how compressed video get packed into bytes for transmission over the wire from one party to another (using some protocol).

Examples of containers: MPEG-TS, RTMP, RTP.

Notice that RTMP is both a protocol and a transport. That is because RTMP’s spec describes both how the communicating parties should interact in order for the video stream to flow (i.e. protocol) and how the video should be packed (i.e. transport). This is not always the case. For example, the RTSP protocol uses RTP transport.

Codec actually means a lot of things but in our case it’s a way to compress raw video. Codecs are about compressing the video before streaming, transports are about sending the compressed video over a particular protocol. Most streaming services don’t deal with codec-level compression and work with protocols and transports only.

Examples of codes: h264, aac, mp3.

Because the word codec can mean a lot of related things, it’s easy to get confused. For instance, H.264 is a standard for compressing huge video frames into small amount of bytes, libx264 is a video compression library that uses this standard, and there’s also eponymous software for Windows that decodes h264 and plays it on the screen.

codec-protocol-transport

To sum it up, the HTML5 spec doesn’t describe protocols, transports, or codecs. This is why browser developers are free to choose what combinations to support. All these different combinations are called “HTML5 streaming.”

However, there are combinations supported by most browsers. Let’s look at the most promising ones.

HLS

hls-logo

HLS is h264-compressed video with aac- or mp3-compressed audio, transported with MPEG-TS. The stream is divided into chunks described in m3u8 playlists and is transmitted over HTTP. HLS supports multi-bitrate streams, Live/VOD. HLS is overall quite plain but complex, so it behaves differently on different devices.

HLS was developed by Apple and initially worked only in Safari on iOS and Mac. Even now Safari for Windows can’t play HLS. However, as of today almost all STBs and Android devices can.

Nonetheless, there’s a serious issue with HLS support in 3rd-party players. At a certain point, their developers dumped Apple’s standard for delivering multiple audio tracks and added everything there is in regular MPEG-TS: mpeg2 video, mpeg2 audio, etc. Because of this, you have to provide different playlist formats for different players.

MPEG-DASH

mpeg-dash-logo

MPEG-DASH typically is h264/h265-compressed video with aac audio transported with mp4, or vp8/vp9 transported with WebM. The standard however is not bound to any particular codecs, protocols, or transports. As with HLS, the stream can be divided into chunks but it’s optional here. Instead of playlists MPEG-DASH uses XML-formatted MPD manifests.

MPEG-DASH is similar to HLS in many ways. It may be even more popular since giants like YouTube and Netflix have been using it for years and a main means of delivering content.

MPEG-DASH’s advantage is that it’s supported natively in most browsers via MSE (read below). It doesn’t even have a Flash-based implementation, which makes it pure, uncompromised HTML5.

MPEG-DASH is definitely the pure HTML5 streaming of the future.

MSE

When it finally became clear that Flash was dying out (after what seemed like a hundred failed attempts), we were caught face to face with the question of what was to replace Flash. It would be great to have a way to play a video in a browser as easily as Flash let us, and with comparable quality.

For a long time, Flash has had this convenient mechanism for playing various video types called appendBytes. It works like this: client code itself fetches the frames of a compressed video, packs them into an agreed-upon container (with Flash it’s flv) and puts them in a player. The important bit is that the protocol and transport are implemented with the client code running in the browser.

MSE (Media Sources Extensions) is an extension of the HTML5 spec that provides a mechanism similar to appendBytes. Unfortunately, MSE is a lot harder to understand and implement.

MPEG-DASH bases on MSE and is even trickier. Therefore, some aspects of working with MPEG-DASH are quite painful: tons of XML, parsing binary containers with JavaScript, dealing with poorly designed segmentation policy.

Interestingly enough, MSE works with HLS as well. There’s an implementation called hls.js that downloads HLS playlists and MPEG-TS chunks, repacks them into MSE-compatible format, and plays them via MSE. Apple even made a move towards MPEG-DASH compatibility by using mp4 containers in HLS.

By the end of 2017, Flash will probably disappear completely, and it’s a good idea to use MPEG-DASH for new projects already.

WebRTC

webrtc-logo

Flash combined real-time communication and broadcasting under the same technological umbrella. Unfortunately, HTML5 is not quite there. We have MSE for broadcasting and WebRTS—for online video calls.

WebRTC is basically SIP in browser, a way to build audio, video, and data channel between two browsers with the help of a server in the middle.

This technology is not intended for streaming but technically it’s capable of it, so it would be wrong of us to forget about it in this article. WebRTC is also considered HTML5 since it requires nothing apart from JavaScript in the browser. On the other hand, it does require the latest versions of the two most popular browsers, and doesn’t work at all on Microsoft Edge yet.

A major source of confusion in understanding WebRTC is its usage in torrent-based TV delivery. What happens is browsers build a network of data channels with WebRTC and transmit HLS- or MSE-based video chunks over this network. The playback happens via Flash or MSE. In this scenario, WebRTC is used for delivery, MSE is used for playback—not to confuse with using WebRTC for video playback.

What’s the deal with mp4 streaming?

Any modern browser can request an h264/aac-compressed, mp4-packed file over HTTP. Some will even attempt to play it. This is the most convenient, easy-to-understand, and standard way of playing video files online. The file just lays on a disk and gets served with nginx. The code responsible for mp4 playback in browsers is sophisticated enough to even download video in chunks (unlike Flash player that fetches the entire file at once).

There’s a certain infamy surrounding h264 due to its “closeness.” So there’s an “open” alternative forced actively by Google—vp8 and vp9 video codecs packed in WebM transport. WebM is a subset of mkv transport (a.k.a. Matroshka). WebM is similar to mp4 in but unlike mp4 is “binary.”

Here’s where this whole “mp4 streaming”, which works like WebM, comes from. The thing is, in the beginning of a regular mp4-compressed video file, the size of the whole container is defined. Thus we can’t stream live via mp4. In order for it to work, there’s a trick: send mp4 without frames and append blocks of frames several seconds long. This is what is called mp4 fragmented, or mp4 streaming.

All in all, it’s no streaming at all. It’s a crutch that lets you create an impression of one. Mp4 is a great format for downloadable videos but it’s no fit for video streaming. So it’s safe to forget about mp4 in the context of HTML5 streaming and just never say “mp4 streaming.”

Summary

  • Good choices for HTML5 streaming are MPEG-DASH and HLS. They work on mobile devices, desktops, and STBs.
  • Flash will die, and MSE is what takes its place.
  • WebRTC is an HTML5 technology primarily for video calls, not for video broadcasting.
  • Avoid bringing old codecs to the web; mp2video and mp2audio have no place in HLS, even if player can play it.
  • Do not say “mp4 streaming.” Please.