About This HTML Issue
The HTML specification explicitly forbids certain Unicode code points from appearing anywhere in an HTML document. These include most ASCII control characters (such as U+0000 NULL, U+0008 BACKSPACE, or U+000B VERTICAL TAB), as well as Unicode noncharacters like U+FFFE, U+FFFF, and the range U+FDD0 to U+FDEF. When the W3C validator encounters one of these code points, it reports the error “Forbidden code point” followed by the specific value.
These characters are forbidden because they have no defined meaning in HTML and can cause unpredictable behavior across browsers and platforms. Some may be silently dropped, others may produce rendering glitches, and some could interfere with parsing. Screen readers and other assistive technologies may also behave erratically when encountering these characters, making this an accessibility concern as well.
How forbidden characters get into your code
- Copy-pasting from external sources like word processors, PDFs, or databases that embed invisible control characters.
- Faulty text editors or build tools that introduce stray bytes during file processing.
- Incorrect character encoding where byte sequences are misinterpreted, resulting in forbidden code points.
- Programmatic content generation where strings aren’t properly sanitized before being inserted into HTML.
How to fix it
-
Identify the character and its location. The validator message includes the code point (e.g.,
U+000B) and the line number. Use a text editor that can show invisible characters (such as VS Code with the “Render Whitespace” or “Render Control Characters” setting enabled, or a hex editor). - Remove or replace the character. In most cases, the forbidden character serves no purpose and can simply be deleted. If it was standing in for a space or line break, replace it with the appropriate standard character.
- Sanitize content at the source. If your HTML is generated dynamically, strip forbidden code points from strings before outputting them. In JavaScript, you can use a regular expression to remove them.
// Remove common forbidden code points
text = text.replace(/[\x00-\x08\x0B\x0E-\x1F\x7F\uFDD0-\uFDEF\uFFFE\uFFFF]/g, '');
Examples
Incorrect — contains a forbidden control character
In this example, a vertical tab character (U+000B) is embedded between “Hello” and “World.” It is invisible in most editors but the validator will flag it.
<!-- The ␋ below represents U+000B VERTICAL TAB, an invisible forbidden character -->
<p>Hello␋World</p>
Correct — forbidden character removed
<p>Hello World</p>
Incorrect — NULL character in an attribute value
A U+0000 NULL character may appear inside an attribute, often from programmatic output.
<!-- The attribute value contains a U+0000 NULL byte -->
<div title="Some�Text">Content</div>
Correct — NULL character removed from attribute
<div title="SomeText">Content</div>
Allowed control characters
Not all control characters are forbidden. The following are explicitly permitted in HTML:
-
U+0009— Horizontal tab (regular tab character) -
U+000A— Line feed (newline) -
U+000D— Carriage return
<pre>Line one
Line two with a tab</pre>
This is valid because it uses only standard whitespace characters (U+000A for the newline and U+0009 for the tab).
Find issues like this automatically
Rocket Validator scans thousands of pages in seconds, detecting HTML issues across your entire site.