What is Media Alternatives and Captions? · Rocket Validator Glossary

When audio or video content appears on a web page, not everyone can perceive it the same way. A person who is deaf or hard of hearing cannot access spoken dialogue in a video. A person who is blind cannot see on-screen action or text overlays. Media alternatives and captions address these gaps by providing equivalent information in a different format: captions render speech and meaningful sounds as synchronized text, transcripts present the full content as a readable document, and audio descriptions narrate visual information that is not conveyed through the existing soundtrack.

These provisions are defined across several WCAG success criteria. WCAG 1.2.1 requires alternatives for prerecorded audio-only and video-only media. WCAG 1.2.2 requires captions for prerecorded audio content in synchronized media. WCAG 1.2.3 requires audio descriptions or a media alternative for prerecorded video. WCAG 1.2.4 and 1.2.5 extend caption and audio description requirements to live content and stricter conformance levels.

Why media alternatives and captions matter

People who are deaf, hard of hearing, or in noisy environments rely on captions to follow spoken content. People who are blind or have low vision rely on audio descriptions to understand visual-only information like character actions, scene changes, or on-screen text. People with cognitive disabilities sometimes prefer reading a transcript at their own pace rather than processing audio in real time.

Without these alternatives, media content is inaccessible to a large portion of users. In the United States, roughly 15% of adults report some degree of hearing difficulty (National Institute on Deafness and Other Communication Disorders). Captions also benefit anyone watching video without sound, which is common on mobile devices and in public spaces.

From a compliance standpoint, failing to provide captions or media alternatives causes WCAG Level A and Level AA failures, which can create legal risk and exclude users from publicly available content.

How media alternatives and captions work

Captions

Captions are time-synchronized text that appears over or below a video. They include dialogue, speaker identification, and descriptions of meaningful non-speech audio such as music, laughter, or sound effects. In HTML, captions are delivered through the <track> element inside a <video> element.

The <track> element references a WebVTT (.vtt) file that contains the timed text. Setting kind="captions" tells the browser the track contains captions. The srclang attribute identifies the language, and the default attribute causes the track to load automatically.

Transcripts

A transcript is a text document that presents all the spoken content and relevant non-speech audio in a readable form. For audio-only content like a podcast, a transcript is the minimum requirement under WCAG 1.2.1. Transcripts are typically placed on the same page as the media or linked nearby.

Audio descriptions

Audio descriptions are a supplementary audio track that narrates visual information during natural pauses in dialogue. They are required at WCAG Level A (as an alternative) and Level AA (as a dedicated track). In HTML, an audio description track can be added with <track kind="descriptions">, though browser support for rendering description tracks is limited. A common workaround is to provide a separate version of the video with descriptions mixed into the audio.

The `<track>` element

The <track> element accepts several values for its kind attribute:

captions for closed captions (includes non-speech audio cues)
subtitles for translations of dialogue only
descriptions for audio descriptions of visual content
chapters for navigation within the media
metadata for machine-readable data

Each <track> must have a src attribute pointing to a valid WebVTT file and a srclang attribute when kind is subtitles.

Code examples

A video element with no caption track fails WCAG 1.2.2:

<!-- Bad: no captions provided -->
<video controls>
  <source src="interview.mp4" type="video/mp4">
  Your browser does not support the video element.
</video>

Adding a <track> element with kind="captions" fixes the issue:

<!-- Good: captions provided via a WebVTT file -->
<video controls>
  <source src="interview.mp4" type="video/mp4">
  <track kind="captions" src="interview-captions.vtt" srclang="en" label="English captions" default>
  Your browser does not support the video element.
</video>

For audio-only content, a transcript satisfies WCAG 1.2.1. Placing the transcript on the same page next to the player makes it easy to find:

<!-- Good: audio with a linked transcript -->
<h2>Episode 12: Accessible design patterns</h2>
<audio controls>
  <source src="episode-12.mp3" type="audio/mpeg">
  Your browser does not support the audio element.
</audio>
<details>
  <summary>Read transcript</summary>
  <p>[Host] Welcome to episode 12. Today we discuss accessible design patterns...</p>
  <p>[Guest] Thanks for having me. The first pattern I want to cover is...</p>
</details>

A video with both captions and an audio description track:

<!-- Good: captions and descriptions -->
<video controls>
  <source src="demo.mp4" type="video/mp4">
  <track kind="captions" src="demo-captions.vtt" srclang="en" label="English captions" default>
  <track kind="descriptions" src="demo-descriptions.vtt" srclang="en" label="Audio descriptions">
  Your browser does not support the video element.
</video>

Because browser support for the descriptions track kind is inconsistent, a practical alternative is to link to a separate video version that has audio descriptions baked into the soundtrack:

<!-- Good: link to described version as a fallback -->
<p>
  <a href="demo-described.mp4">Watch version with audio descriptions</a>
</p>

A well-formatted WebVTT caption file looks like this:

WEBVTT

00:00:01.000 --> 00:00:04.500
[Host] Welcome to the show.

00:00:05.000 --> 00:00:08.200
[Guest] Thanks, happy to be here.

00:00:09.000 --> 00:00:11.500
[upbeat music playing]

Each cue has a start time, end time, and text content. Speaker identification and non-speech audio descriptions are wrapped in square brackets by convention.

Providing captions, transcripts, and audio descriptions where needed ensures that media content is perceivable regardless of a user's abilities or environment. These are not optional extras; they are baseline requirements for accessible web content under WCAG.

Text alternatives are textual substitutes for non-text content such as images, icons, videos, and controls, enabling people who cannot perceive the original content to understand its meaning and purpose through assistive technologies like screen readers.

WCAG Success Criteria are the individual, testable requirements defined by the Web Content Accessibility Guidelines that determine whether web content is accessible to people with disabilities. Each criterion addresses a specific accessibility barrier and is assigned a conformance level of A, AA, or AAA.

WCAG (Web Content Accessibility Guidelines) is the global technical standard for making web content accessible to people with disabilities, published by the W3C Web Accessibility Initiative.

Alt text (alternative text) is a short written description added to an image's HTML code that conveys the image's content and function to users who cannot see it, primarily through screen readers.

An accessible description is a supplementary text string, computed by the browser's accessibility API, that provides additional context or instructions about a user interface element beyond what its accessible name conveys.

Embedded content and iframes accessibility refers to the set of practices that make content loaded inside <iframe>, <video>, <audio>, <object>, and similar elements perceivable, operable, and understandable for all users, including those who rely on assistive technologies.

Help us improve this glossary term

Was this guide helpful?

Scan your site

Rocket Validator scans thousands of pages in seconds, detecting accessibility and HTML issues across your entire site.

Start Free Trial See Pricing

Media Alternatives and Captions

Why media alternatives and captions matter

How media alternatives and captions work

Captions

Transcripts

Audio descriptions

The `<track>` element

Code examples

Help us improve this glossary term

Scan your site

Validate at scale.
Ship accessible websites, faster.

Pro Trial

Media Alternatives and Captions

Why media alternatives and captions matter

How media alternatives and captions work

Captions

Transcripts

Audio descriptions

The <track> element

Code examples

Related terms

Help us improve this glossary term

Scan your site

Validate at scale. Ship accessible websites, faster.

Pro Trial

The `<track>` element

Validate at scale.
Ship accessible websites, faster.