28 May 2011

HTML5 video and DRM, part one

Being long interested in HTML5 (and all that other stuff) I sometimes think about it's future, especially regarding it's commercial use. Once, I've got a particular evil vision about possible (video) DRM implementation using HTML5 goodness. 
We can argue whether the whole DRM concept is a good or a bad thing. It is though inevitable for commercial content to come to the HTML5 world. Here, probably, should I rant about music/movies industry being stagnant, old-fashioned and closed-minded. I won't, there's plenty of examples in the aforementioned Wikipedia article. Folks are really vocal.
In this (and probably one future) post I'm going to loudly think about the technical side of this problem.

The problem

Right now, the HTML5 video works approximately like this (don't mind my poor drawing skills):
What's wrong not corporate-friendly in above picture:
  • once you have a video in your cache you can practically watch it any way you like, yay!
  • user is free to download video/audio data for further playing, editing, etc.
  • live streaming and dynamic quality adjusting is (as far as I know) not possible
  • it's too easy to filter out the ads if they're not embedded directly into the video (man, advertising is hard)
These just make some content-clingy people sad and angry. There's no easy way to specify how and when you can play a particular video. Also, our Flash and Silverlight friends seem to have it all figured out.

Enter WebSockets
Using a WebSocket to transport video data to the client may present a few advantages. One can build  a dedicated header-light video serving architecture. You can stream chunks to the browser in any fashion you like. You get all sorts of real-time feedback since the protocol is full-duplex. We can cautiously claim the third issue is taken care of. 
However, you also inherit a new problem. WebSocket protocol currently supports only well-formed UTF-8 payloads (still, only 2-byte overhead -- cool stuff). Your video is not a UTF-8 string, it has a binary form. Your best option is to base-64 encode it on the server side and then decode it via JavaScript on the browser side. Firefox, Chrome and possibly some other browser have window.atob included by default. This method does the base-64 decoding for you. You then use Data URL (or maybe a Blob from File API) to place received content inside your video player. The previous diagram still applies -- you only put another data conversion layer on top.
But boy, is it slow -- base-64 encoding adds up to 30%-40% overhead. You eat up more resources and waste a lot of bandwidth. Thankfully, HTML5 people are working hard to provide binary frame support for WebSockets. 
Meanwhile, one can say we eliminated the last problem as it's difficult (if not impossible) to differentiate between main (expected) content and advertising. Our workflow provides us with another layer of opacity.

That's neat but there's another issue. It is not possible to just inject intermediate video data into your HTML5 video player. You have to provide a complete video file, headers included. Otherwise the browser will just ignore you, at best. Some work is being done at Mozilla to provide low level access to audio data. There's no such thing for video content, shoot. 
Unfortunately, transferring whole video file (with 30% overhead), decoding it with JavaScript and storing in memory is a no-go. By doing this you'd undoubtedly kill mobile browsers. Desktop users won't be happy too. Currently your only option is to segment your video content into small (try few seconds long) full-fledged video files and send them to the browser one after another. You'd have to invent another encapsulation layer in the process -- a client needs to know where in the stream one video file stops and when the next file begins. On the client side you hook onto video element's events and replace the media source after each chunk is finished playing.
I guess that means the first problem is also solved. You can now use many signals (headers, cookies data, browser info, you name it) and make fine-grained decisions about the content that's being played. You react to the changes on a (small) chunk level. After implementing your own player controls (those provided by the browser are now useless) you decide which piece of content is viewed by whom. After all you can simply cut the cord by closing the WebSocket.

Enough for this post. Still it's not the end of the journey.
Some issues to consider:
  • the feasibility of serialized files-chunks model -- conversion cost, synchronicity issues,
  • the last, probably the most serious problem -- content security
  • seeking -- the caching problem


  1. This comment has been removed by the author.

  2. Your chunks approach is already done, as "http adaptive streaming" - see Adobe HDS, Apple HLS, MS Smooth Streaming. The simplest IMO is Apple HLS. Its index of chunks is simply a .m3u8 file, plaintext extension of mp3 playlist.
    On Apple devices (including computers), their browser can use a .m3u8 as src of a video tag and play this natively - quite nice. It even has provision for encryption (128 bit AES), but the problem is protecting the key - not during transport - that is done with SSL easily-enough, but once it's on the client; Problem with non-compiled approach is easier to reverse engineer and extract decryption key; why Silverlight and Flash are still attractive in this case, even if WebSockets did binary efficiently.

  3. Hey Nick, thanks for the info about adaptive streaming. I'm aware of the security implications of storing/handling a crypto key in the JS layer. Still I don't think it'd particularly hard for a determined hacker to extract such key from Silverlight/Flash environment. As we all well know one leak is usually enough to lose a hold of loads of content.
    On the other hand I have to agree, there is not much to do (for now?) without a browser plugin.