# <audio> elements must have a captions track

> Canonical HTML version: https://rocketvalidator.com/accessibility-validation/axe/4.11/audio-caption
> Attribution: Rocket Validator (https://rocketvalidator.com)
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

When an `<audio>` element lacks a captions track, all of the information it conveys — dialogue, narration, sound effects, musical cues, and speaker identification — becomes completely inaccessible to users who are deaf or deafblind. This is considered a **critical** accessibility issue because it blocks entire groups of users from accessing content.

This rule relates to **WCAG Success Criterion 1.2.1: Audio-only and Video-only (Prerecorded)** (Level A), which requires that a text alternative be provided for prerecorded audio-only content. It also falls under **Section 508** requirements and **EN 301 549**. Level A criteria represent the most fundamental accessibility requirements — failing to meet them means significant barriers exist for users with disabilities.

## Captions vs. Subtitles

It's important to understand that captions and subtitles are not the same thing:

- **Captions** (`kind="captions"`) are designed for deaf and hard-of-hearing users. They include dialogue, speaker identification, sound effects (e.g., "[door slams]"), musical cues (e.g., "[soft piano music]"), and other meaningful audio information.
- **Subtitles** (`kind="subtitles"`) are language translations intended for hearing users who don't understand the spoken language. They typically include only dialogue and narration.

Because of this distinction, you must use `kind="captions"`, not `kind="subtitles"`, to satisfy this rule.

## How to Fix It

1. Create a captions file (typically in [WebVTT `.vtt` format](https://developer.mozilla.org/en-US/docs/Web/API/WebVTT_API)) that includes all meaningful audio information: who is speaking, what they say, and relevant non-speech sounds.
2. Add a `<track>` element inside your `<audio>` element.
3. Set the `kind` attribute to `"captions"`.
4. Set the `src` attribute to the path of your captions file.
5. Use the `srclang` attribute to specify the language of the captions.
6. Use the `label` attribute to give the track a human-readable name.

While only `src` is technically required on a `<track>` element, including `kind`, `srclang`, and `label` is strongly recommended for clarity and proper functionality.

## Examples

### Incorrect: `<audio>` with no captions track

```html
<audio controls>
  <source src="podcast.mp3" type="audio/mp3">
</audio>
```

This fails the rule because there is no `<track>` element providing captions.

### Incorrect: `<track>` with wrong `kind` value

```html
<audio controls>
  <source src="podcast.mp3" type="audio/mp3">
  <track src="subs_en.vtt" kind="subtitles" srclang="en" label="English">
</audio>
```

This fails because `kind="subtitles"` does not satisfy the captions requirement. Subtitles are not a substitute for captions.

### Correct: `<audio>` with a captions track

```html
<audio controls>
  <source src="podcast.mp3" type="audio/mp3">
  <track src="captions_en.vtt" kind="captions" srclang="en" label="English Captions">
</audio>
```

### Correct: `<audio>` with multiple caption tracks for different languages

```html
<audio controls>
  <source src="interview.mp3" type="audio/mp3">
  <track src="captions_en.vtt" kind="captions" srclang="en" label="English Captions">
  <track src="captions_es.vtt" kind="captions" srclang="es" label="Subtítulos en español">
</audio>
```

Providing captions in multiple languages ensures broader accessibility and is especially helpful when your audience speaks different languages.

### Example WebVTT captions file

A basic `captions_en.vtt` file might look like this:

```
WEBVTT

00:00:01.000 --> 00:00:04.000
[Upbeat intro music]

00:00:04.500 --> 00:00:07.000
Host: Welcome to the show, everyone.

00:00:07.500 --> 00:00:10.000
Guest: Thanks for having me!

00:00:10.500 --> 00:00:12.000
[Audience applause]
```

Notice how the captions include speaker identification (`Host:`, `Guest:`), non-speech sounds (`[Upbeat intro music]`, `[Audience applause]`), and the full dialogue. This level of detail is what makes captions effective for deaf and deafblind users.
