# Character reference expands to a control character (U+0002).

> Canonical HTML version: https://rocketvalidator.com/html-validation/character-reference-expands-to-a-control-character-u-0002
> Attribution: Rocket Validator (https://rocketvalidator.com)
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

## What Are Control Characters?

Control characters occupy code points U+0000 through U+001F and U+007F through U+009F in Unicode. They were originally designed for controlling hardware devices (e.g., U+0002 is "Start of Text," U+0007 is "Bell," U+001B is "Escape"). These characters have no visual representation and carry no semantic meaning in a web document.

The HTML specification explicitly forbids character references that resolve to most control characters. Even though the syntax `&#2;` is a structurally valid character reference, the character it points to is not a permissible content character. The W3C validator raises this error to flag references like `&#0;`, `&#2;`, `&#8;`, `&#11;`, and others that fall within the control character ranges.

## Why This Is a Problem

- **Standards compliance:** The [WHATWG HTML Living Standard](https://html.spec.whatwg.org/multipage/syntax.html#character-references) defines a specific set of "noncharacter" and "control character" code points that must not be referenced. Using them produces a parse error.
- **Unpredictable rendering:** Browsers handle illegal control characters inconsistently. Some may silently discard them, others may render a replacement character (�), and others may exhibit unexpected behavior.
- **Accessibility:** Screen readers and other assistive technologies may choke on or misinterpret control characters, degrading the experience for users who rely on these tools.
- **Data integrity:** Control characters in your markup often indicate a copy-paste error, a corrupted data source, or a templating bug that inserts raw binary data into HTML output.

## How to Fix It

1. **Identify the offending reference** — look for character references like `&#2;`, `&#x02;`, `&#0;`, `&#127;`, or similar that point to control character code points.
2. **Determine intent** — figure out what character or content was actually intended. Often, a control character reference is the result of a bug in a data pipeline or template engine.
3. **Remove or replace** — either delete the reference entirely or replace it with the correct printable character or HTML entity.

## Examples

### Incorrect: Control character reference

This markup contains `&#2;`, which expands to the control character U+0002 (Start of Text) and triggers the validation error:

```html
<p>Some text &#2; more text</p>
```

### Incorrect: Hexadecimal form of a control character

The same problem occurs with the hexadecimal syntax:

```html
<p>Data: &#x02;</p>
```

### Correct: Remove the control character reference

If the control character was unintentional, simply remove it:

```html
<p>Some text more text</p>
```

### Correct: Use a valid character reference instead

If you intended to display a special character, use the correct printable code point or named entity. For example, to display a bullet (•), copyright sign (©), or ampersand (&):

```html
<p>Item &#8226; Details</p>
<p>Copyright &#169; 2024</p>
<p>Tom &amp; Jerry</p>
```

### Correct: Full document without control characters

```html
<!DOCTYPE html>
<html lang="en">
<head>
  <title>Example Page</title>
</head>
<body>
  <p>This paragraph uses only valid character references: &amp; &lt; &gt; &#169;</p>
</body>
</html>
```

### Common Control Character Code Points to Avoid

| Reference | Code Point | Name |
|-----------|-----------|------|
| `&#0;` | U+0000 | Null |
| `&#1;` | U+0001 | Start of Heading |
| `&#2;` | U+0002 | Start of Text |
| `&#7;` | U+0007 | Bell |
| `&#8;` | U+0008 | Backspace |
| `&#11;` | U+000B | Vertical Tab |
| `&#12;` | U+000C | Form Feed |
| `&#127;` | U+007F | Delete |

If your content is generated dynamically (from a database, API, or user input), sanitize the data before inserting it into HTML to strip out control characters. Most server-side languages and templating engines provide utilities for this purpose.
