When audio or video content appears on a web page, not everyone can perceive it the same way. A person who is deaf or hard of hearing cannot access spoken dialogue in a video. A person who is blind cannot see on-screen action or text overlays. Media alternatives and captions address these gaps by providing equivalent information in a different format: captions render speech and meaningful sounds as synchronized text, transcripts present the full content as a readable document, and audio descriptions narrate visual information that is not conveyed through the existing soundtrack.
These provisions are defined across several WCAG success criteria. WCAG 1.2.1 requires alternatives for prerecorded audio-only and video-only media. WCAG 1.2.2 requires captions for prerecorded audio content in synchronized media. WCAG 1.2.3 requires audio descriptions or a media alternative for prerecorded video. WCAG 1.2.4 and 1.2.5 extend caption and audio description requirements to live content and stricter conformance levels.
Why media alternatives and captions matter
People who are deaf, hard of hearing, or in noisy environments rely on captions to follow spoken content. People who are blind or have low vision rely on audio descriptions to understand visual-only information like character actions, scene changes, or on-screen text. People with cognitive disabilities sometimes prefer reading a transcript at their own pace rather than processing audio in real time.
Without these alternatives, media content is inaccessible to a large portion of users. In the United States, roughly 15% of adults report some degree of hearing difficulty (National Institute on Deafness and Other Communication Disorders). Captions also benefit anyone watching video without sound, which is common on mobile devices and in public spaces.
From a compliance standpoint, failing to provide captions or media alternatives causes WCAG Level A and Level AA failures, which can create legal risk and exclude users from publicly available content.
How media alternatives and captions work
Captions
Captions are time-synchronized text that appears over or below a video. They include dialogue, speaker identification, and descriptions of meaningful non-speech audio such as music, laughter, or sound effects. In HTML, captions are delivered through the <track> element inside a <video> element.
The <track> element references a WebVTT (.vtt) file that contains the timed text. Setting kind="captions" tells the browser the track contains captions. The srclang attribute identifies the language, and the default attribute causes the track to load automatically.
Transcripts
A transcript is a text document that presents all the spoken content and relevant non-speech audio in a readable form. For audio-only content like a podcast, a transcript is the minimum requirement under WCAG 1.2.1. Transcripts are typically placed on the same page as the media or linked nearby.
Audio descriptions
Audio descriptions are a supplementary audio track that narrates visual information during natural pauses in dialogue. They are required at WCAG Level A (as an alternative) and Level AA (as a dedicated track). In HTML, an audio description track can be added with <track kind="descriptions">, though browser support for rendering description tracks is limited. A common workaround is to provide a separate version of the video with descriptions mixed into the audio.
The <track> element
The <track> element accepts several values for its kind attribute:
captionsfor closed captions (includes non-speech audio cues)subtitlesfor translations of dialogue onlydescriptionsfor audio descriptions of visual contentchaptersfor navigation within the mediametadatafor machine-readable data
Each <track> must have a src attribute pointing to a valid WebVTT file and a srclang attribute when kind is subtitles.
Code examples
A video element with no caption track fails WCAG 1.2.2:
<!-- Bad: no captions provided -->
<video controls>
<source src="interview.mp4" type="video/mp4">
Your browser does not support the video element.
</video>
Adding a <track> element with kind="captions" fixes the issue:
<!-- Good: captions provided via a WebVTT file -->
<video controls>
<source src="interview.mp4" type="video/mp4">
<track kind="captions" src="interview-captions.vtt" srclang="en" label="English captions" default>
Your browser does not support the video element.
</video>
For audio-only content, a transcript satisfies WCAG 1.2.1. Placing the transcript on the same page next to the player makes it easy to find:
<!-- Good: audio with a linked transcript -->
<h2>Episode 12: Accessible design patterns</h2>
<audio controls>
<source src="episode-12.mp3" type="audio/mpeg">
Your browser does not support the audio element.
</audio>
<details>
<summary>Read transcript</summary>
<p>[Host] Welcome to episode 12. Today we discuss accessible design patterns...</p>
<p>[Guest] Thanks for having me. The first pattern I want to cover is...</p>
</details>
A video with both captions and an audio description track:
<!-- Good: captions and descriptions -->
<video controls>
<source src="demo.mp4" type="video/mp4">
<track kind="captions" src="demo-captions.vtt" srclang="en" label="English captions" default>
<track kind="descriptions" src="demo-descriptions.vtt" srclang="en" label="Audio descriptions">
Your browser does not support the video element.
</video>
Because browser support for the descriptions track kind is inconsistent, a practical alternative is to link to a separate video version that has audio descriptions baked into the soundtrack:
<!-- Good: link to described version as a fallback -->
<p>
<a href="demo-described.mp4">Watch version with audio descriptions</a>
</p>
A well-formatted WebVTT caption file looks like this:
WEBVTT
00:00:01.000 --> 00:00:04.500
[Host] Welcome to the show.
00:00:05.000 --> 00:00:08.200
[Guest] Thanks, happy to be here.
00:00:09.000 --> 00:00:11.500
[upbeat music playing]
Each cue has a start time, end time, and text content. Speaker identification and non-speech audio descriptions are wrapped in square brackets by convention.
Providing captions, transcripts, and audio descriptions where needed ensures that media content is perceivable regardless of a user's abilities or environment. These are not optional extras; they are baseline requirements for accessible web content under WCAG.
Related terms
Help us improve this glossary term
Scan your site
Rocket Validator scans thousands of pages in seconds, detecting accessibility and HTML issues across your entire site.