# Media Alternatives and Captions

> Canonical HTML version: https://rocketvalidator.com/glossary/media-alternatives-captions
> Attribution: Rocket Validator (https://rocketvalidator.com)
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

Media alternatives and captions are accessibility provisions that make audio and video content perceivable to people who cannot hear the audio track or see the video track, typically through captions, transcripts, and audio descriptions.

When audio or video content appears on a web page, not everyone can perceive it the same way. A person who is deaf or hard of hearing cannot access spoken dialogue in a video. A person who is blind cannot see on-screen action or text overlays. Media alternatives and captions address these gaps by providing equivalent information in a different format: captions render speech and meaningful sounds as synchronized text, transcripts present the full content as a readable document, and audio descriptions narrate visual information that is not conveyed through the existing soundtrack.

These provisions are defined across several WCAG success criteria. WCAG 1.2.1 requires alternatives for prerecorded audio-only and video-only media. WCAG 1.2.2 requires captions for prerecorded audio content in synchronized media. WCAG 1.2.3 requires audio descriptions or a media alternative for prerecorded video. WCAG 1.2.4 and 1.2.5 extend caption and audio description requirements to live content and stricter conformance levels.

## Why media alternatives and captions matter

People who are deaf, hard of hearing, or in noisy environments rely on captions to follow spoken content. People who are blind or have low vision rely on audio descriptions to understand visual-only information like character actions, scene changes, or on-screen text. People with cognitive disabilities sometimes prefer reading a transcript at their own pace rather than processing audio in real time.

Without these alternatives, media content is inaccessible to a large portion of users. In the United States, roughly 15% of adults report some degree of hearing difficulty (National Institute on Deafness and Other Communication Disorders). Captions also benefit anyone watching video without sound, which is common on mobile devices and in public spaces.

From a compliance standpoint, failing to provide captions or media alternatives causes WCAG Level A and Level AA failures, which can create legal risk and exclude users from publicly available content.

## How media alternatives and captions work

### Captions

Captions are time-synchronized text that appears over or below a video. They include dialogue, speaker identification, and descriptions of meaningful non-speech audio such as music, laughter, or sound effects. In HTML, captions are delivered through the `<track>` element inside a `<video>` element.

The `<track>` element references a WebVTT (.vtt) file that contains the timed text. Setting `kind="captions"` tells the browser the track contains captions. The `srclang` attribute identifies the language, and the `default` attribute causes the track to load automatically.

### Transcripts

A transcript is a text document that presents all the spoken content and relevant non-speech audio in a readable form. For audio-only content like a podcast, a transcript is the minimum requirement under WCAG 1.2.1. Transcripts are typically placed on the same page as the media or linked nearby.

### Audio descriptions

Audio descriptions are a supplementary audio track that narrates visual information during natural pauses in dialogue. They are required at WCAG Level A (as an alternative) and Level AA (as a dedicated track). In HTML, an audio description track can be added with `<track kind="descriptions">`, though browser support for rendering description tracks is limited. A common workaround is to provide a separate version of the video with descriptions mixed into the audio.

### The `<track>` element

The `<track>` element accepts several values for its `kind` attribute:

- `captions` for closed captions (includes non-speech audio cues)
- `subtitles` for translations of dialogue only
- `descriptions` for audio descriptions of visual content
- `chapters` for navigation within the media
- `metadata` for machine-readable data

Each `<track>` must have a `src` attribute pointing to a valid WebVTT file and a `srclang` attribute when `kind` is `subtitles`.

## Code examples

A video element with no caption track fails WCAG 1.2.2:

```html
<!-- Bad: no captions provided -->
<video controls>
  <source src="interview.mp4" type="video/mp4">
  Your browser does not support the video element.
</video>
```

Adding a `<track>` element with `kind="captions"` fixes the issue:

```html
<!-- Good: captions provided via a WebVTT file -->
<video controls>
  <source src="interview.mp4" type="video/mp4">
  <track kind="captions" src="interview-captions.vtt" srclang="en" label="English captions" default>
  Your browser does not support the video element.
</video>
```

For audio-only content, a transcript satisfies WCAG 1.2.1. Placing the transcript on the same page next to the player makes it easy to find:

```html
<!-- Good: audio with a linked transcript -->
<h2>Episode 12: Accessible design patterns</h2>
<audio controls>
  <source src="episode-12.mp3" type="audio/mpeg">
  Your browser does not support the audio element.
</audio>
<details>
  <summary>Read transcript</summary>
  <p>[Host] Welcome to episode 12. Today we discuss accessible design patterns...</p>
  <p>[Guest] Thanks for having me. The first pattern I want to cover is...</p>
</details>
```

A video with both captions and an audio description track:

```html
<!-- Good: captions and descriptions -->
<video controls>
  <source src="demo.mp4" type="video/mp4">
  <track kind="captions" src="demo-captions.vtt" srclang="en" label="English captions" default>
  <track kind="descriptions" src="demo-descriptions.vtt" srclang="en" label="Audio descriptions">
  Your browser does not support the video element.
</video>
```

Because browser support for the `descriptions` track kind is inconsistent, a practical alternative is to link to a separate video version that has audio descriptions baked into the soundtrack:

```html
<!-- Good: link to described version as a fallback -->
<p>
  <a href="demo-described.mp4">Watch version with audio descriptions</a>
</p>
```

A well-formatted WebVTT caption file looks like this:

```text
WEBVTT

00:00:01.000 --> 00:00:04.500
[Host] Welcome to the show.

00:00:05.000 --> 00:00:08.200
[Guest] Thanks, happy to be here.

00:00:09.000 --> 00:00:11.500
[upbeat music playing]
```

Each cue has a start time, end time, and text content. Speaker identification and non-speech audio descriptions are wrapped in square brackets by convention.

Providing captions, transcripts, and audio descriptions where needed ensures that media content is perceivable regardless of a user's abilities or environment. These are not optional extras; they are baseline requirements for accessible web content under WCAG.
