Explaining Video Accessibility For The Web

Oct 6, 2016


JW Player has included support for captions with its online video player for years by using internal player components to handle both the rendering and parsing processes. As support for HTML5 video expanded and captions support was added natively to browsers, captions could now be rendered by the browser itself. This is great in theory but in actual implementation not every browser handles HTML5 captions the same way.

In addition to the progress made by browsers, new FCC mandates require broadcast content to include captions which can be styled based on user preference and positioned appropriately based on the video itself. For example, if a character is speaking offscreen on the left, text should be located on the left side of the screen. Because of the new FCC requirements and the progress made by native HTML5 video, it made sense to revisit our captions rendering model.

While captions assist those who may have issues with their hearing, screen readers are also a very important part of accessibility for the web for those who may have visual impairments by introducing ARIA support. Individual buttons can now be tabbed to, and screen readers can recognize and call out the particular button.

The latest releases of JW Player feature some extensive improvements to our captions rendering process and our ARIA voice narration support, both of which we’ll discuss today.

But what are captions?

Regardless of format, captions data in its most basic form includes two pieces of information: the text that should be shown and the time it should be displayed. For example, at 5 minutes into a video, display “This is a caption!” for 10 seconds. In VTT format, this would look like:



00:05:00.000 –> 00:05:10.000

This is a caption!

Including captions inside of an HTML5 video embed is a relatively simple process that involves adding a track element as the child of a video tag and setting the src to a .vtt file hosted locally or on a remote server. Eg.:


However, this method only works with the VTT caption format. Other captions formats like SRT and DFXP require some additional parsing outside of browser APIs. JW Player has had support for these other formats, but we’ve now normalized the caption rendering. In these cases, we parse out the data and timestamps and add them to the video object with the Text Track API. When it comes to embedded captions like CEA 608 and 708, caption data is included within an HLS stream. This information is parsed and rendered like sideloaded captions files. It’s worth noting that VTT files and 708 captions can also include positioning data. When it is detected, this data will be leveraged in JW 7.5 as well.

Positioning of Captions

One of the major reasons for the captions work in JW 7.5 included the use of positioning with captions. We currently support positioning data when using sideloaded VTT files and 708 captions inside of HLS streams when rendering in HTML5. Positioning is very important for broadcasters for a few reasons. First, the location of captions indicates which character is speaking at a given time, even if they are offscreen. Likewise, when important information is displayed in a certain location of the video, positioning allows captions to relocate to a less intrusive position.

Tracks Across Browsers & Providers

Since tracks are included in a W3C spec that is implemented across browsers, the Text Track API can be used to varying degrees in all modern browsers. However, because of these varying degrees, this solution alone did not universally solve all of our captions rendering woes. There are a few inconsistencies with text tracks that popped up during the development process. For example, Firefox did not include a way to natively style captions and IE/Edge did not provide a VTTCue interface, which is required to support captions positioning.

In addition to browser discrepancies, there is still a reliance on Flash for video playback on the web. Although its use is decreasing, this meant that our HTML5 method of rendering captions would not work for our Flash provider. Due to these two reasons, an alternate method of rendering captions was needed.

To help alleviate these issues, we’ve leveraged Mozilla’s VTT.js polyfill for rendering and styling captions on certain browsers. VTT.js is implemented to the W3C spec and allows the player to be browser-agnostic when rendering captions. JW player is able to choose native or VTT.js rendering based on browser support and provider.

A final table of our captions rendering modes can be seen below:

ARIA Support

ARIA, or “Accessible Rich Internet Applications”, are standards that dictate how web applications should accommodate users that rely on screen readers. Typically, this is accomplished through certain markup that is added to a web page. When tabbed over, or selected, this markup will be read out loud, giving the user context to what is happening on the page.

Due to the work done by github user francoismassart, JW Player now has support for these screen readers. Users can now hear the name of a particular button as it is in focus, providing great insight into various player functions.

Newer operating systems offer this narration support natively, and it can be enabled from the accessibility settings menu. The use of screen narration on macOS can be seen (But not heard) below:


Accessibility in the age of the internet is something that JW Player takes very seriously. With the inclusion of ARIA and with the improvements to our captions, we are striving to be the most accessible player available.

The goal with our captions refactor was to provide a more streamlined and straightforward experience. In certain cases, when tracks are being rendered by the operating system, caption styles on iOS and OSX can be set on a per-device basis for the Safari browser, providing a more consistent experience all around. Going forward, we certainly aim to keep accessibility both easily configurable and available for publishers and viewers alike.