About This Accessibility Rule
Captions are the primary way deaf and hard-of-hearing users access the audio portion of video content. When a video lacks captions, these users can see the visual content but miss everything communicated through sound — including spoken dialogue, narration, music, ambient sounds, and sound effects that provide context or meaning. This creates a critical barrier to understanding.
This rule relates to WCAG 2.0/2.1/2.2 Success Criterion 1.2.2: Captions (Prerecorded) at Level A, which requires that captions be provided for all prerecorded audio content in synchronized media. It is also required under Section 508 and EN 301 549. Because this is a Level A requirement, it represents the minimum baseline for accessibility — failing to provide captions is one of the most impactful accessibility issues a video can have.
Captions vs. Subtitles
It’s important to understand the difference between captions and subtitles, as they serve different purposes:
-
Captions (
kind="captions") are designed for deaf and hard-of-hearing users. They include all dialogue plus descriptions of meaningful non-speech audio such as sound effects, music, speaker identification, and other auditory cues (e.g., “[door slams]”, “[dramatic orchestral music]”, “[audience applause]”). -
Subtitles (
kind="subtitles") are language translations of dialogue and narration, intended for hearing users who don’t understand the spoken language. Subtitles generally do not include non-speech audio descriptions.
For accessibility compliance, you must use kind="captions", not kind="subtitles".
What Makes Good Captions
Good captions go beyond transcribing dialogue. They should:
- Identify who is speaking when it’s not visually obvious
- Include meaningful sound effects (e.g., “[phone ringing]”, “[glass shattering]”)
- Describe music when it’s relevant (e.g., “[soft piano music]”, “[upbeat pop song playing]”)
- Note significant silence or pauses when they carry meaning
- Be accurately synchronized with the audio
- Use proper spelling, grammar, and punctuation
How to Fix the Problem
Add a <track> element inside your <video> element with the following attributes:
-
src— the URL of the caption file (typically in WebVTT.vttformat) -
kind— set to"captions" -
srclang— the language code of the captions (e.g.,"en"for English) -
label— a human-readable label for the track (e.g.,"English")
Only src is technically required, but kind, srclang, and label are strongly recommended for clarity and to ensure assistive technologies and browsers handle the track correctly.
Examples
Incorrect: Video with no captions
<video width="640" height="360" controls>
<source src="presentation.mp4" type="video/mp4">
</video>
This video has no <track> element, so deaf and hard-of-hearing users cannot access any of the audio content.
Incorrect: Using subtitles instead of captions
<video width="640" height="360" controls>
<source src="presentation.mp4" type="video/mp4">
<track src="subs_en.vtt" kind="subtitles" srclang="en" label="English">
</video>
While this provides a text track, kind="subtitles" does not satisfy the captions requirement. Subtitles typically include only dialogue and won’t convey non-speech audio information.
Correct: Video with captions
<video width="640" height="360" controls>
<source src="presentation.mp4" type="video/mp4">
<track src="captions_en.vtt" kind="captions" srclang="en" label="English">
</video>
Correct: Video with captions in multiple languages
<video width="640" height="360" controls>
<source src="presentation.mp4" type="video/mp4">
<track src="captions_en.vtt" kind="captions" srclang="en" label="English" default>
<track src="captions_es.vtt" kind="captions" srclang="es" label="Español">
</video>
The default attribute indicates which caption track should be active by default when the user has captions enabled.
Example WebVTT Caption File
A basic captions_en.vtt file looks like this:
WEBVTT
00:00:01.000 --> 00:00:04.000
[upbeat music playing]
00:00:05.000 --> 00:00:08.000
Sarah: Welcome to our annual conference!
00:00:09.000 --> 00:00:12.000
[audience applause]
00:00:13.000 --> 00:00:17.000
Sarah: Today we'll explore three key topics.
Notice how the captions identify the speaker, describe non-speech sounds, and are synchronized to specific timestamps.
Help us improve our guides
Detect accessibility issues automatically
Rocket Validator scans thousands of pages with Axe Core and the W3C Validator, finding accessibility issues across your entire site.