HTML Guides for stream
Learn how to identify and fix common HTML validation errors flagged by the W3C Validator — so your pages are standards-compliant and render correctly across every browser. Also check our Accessibility Guides.
U+0000, commonly called the "null character" or NUL, is a non-printable control character. It's invisible in most text editors and browsers, so you won't see it just by looking at your code. However, the HTML parsing algorithm defined by the WHATWG HTML Living Standard treats U+0000 as a parse error in virtually every context — whether it appears in text content, attribute values, comments, or anywhere else in the document.
When a browser encounters a null character, it typically either ignores it or replaces it with the Unicode replacement character (U+FFFD, �). This means the character serves no useful purpose and can only cause problems.
Why Does This Happen?
Null characters usually sneak into HTML files through one of these common scenarios:
- Copy and paste from applications like word processors, PDFs, or terminal output that embed hidden control characters.
- File encoding corruption, such as when a file is converted between encodings incorrectly or transferred in binary mode.
- Editor or build tool bugs that inadvertently insert null bytes into output files.
- Database content that contains null characters which then gets rendered into HTML templates.
- Concatenation of binary data with text during a build process.
Why It Matters
- Standards compliance: The HTML specification explicitly forbids U+0000. Its presence makes your document non-conforming.
- Unpredictable rendering: Different browsers may handle null characters differently — some ignore them, others replace them with
�, which can produce visible artifacts in your content. - Accessibility: Screen readers and assistive technologies may behave unexpectedly when encountering null characters, potentially skipping content or producing garbled output.
- Data integrity: The presence of null characters is often a symptom of a deeper problem, such as encoding corruption, that could affect other parts of your content.
How to Fix It
Reveal hidden characters in your text editor. Most code editors have this option:
- In VS Code, use the setting
"editor.renderControlCharacters": true. - In Sublime Text, look for non-printable characters via Find & Replace with regex enabled.
- In Vim, null characters typically display as
^@.
- In VS Code, use the setting
Search and remove null characters. Use your editor's find-and-replace with regex support to search for
\x00or\0and replace with nothing (an empty string).Use command-line tools for a quick fix:
- On Linux/macOS:
tr -d '\0' < input.html > output.html - With
sed:sed -i 's/\x0//g' file.html
- On Linux/macOS:
Check your build pipeline. If the null characters reappear after cleaning, the issue likely originates upstream — in a database, a template engine, or a build step that concatenates files.
Re-validate your HTML after cleaning to confirm the issue is resolved.
Examples
Incorrect — null character in text content
In the example below, \0 represents the invisible U+0000 character for illustration purposes:
<p>Welcome to our site!\0 We're glad you're here.</p>
This triggers the "Saw U+0000 in stream" error.
Correct — null character removed
<p>Welcome to our site! We're glad you're here.</p>
Incorrect — null character inside an attribute value
<a href="/about\0us">About Us</a>
Correct — null character removed from attribute
<a href="/about-us">About Us</a>
Incorrect — null character in a comment
<!-- This comment has a null \0 character -->
Correct — clean comment
<!-- This comment has no null character -->
Null characters are forbidden in all contexts, so no matter where they appear — in elements, attributes, comments, or even the DOCTYPE — they must be removed. If you consistently see this error after saving files, investigate your editor's encoding settings and ensure files are saved as UTF-8 without any binary artifacts.
The validator reports "Stream length exceeds limit" when the submitted document is larger than the maximum input size the checker accepts, so it stops before parsing the markup.
This is not a markup mistake. The checker reads your page as a byte stream and refuses input past a fixed size cap, which is why the report names no element or attribute. The whole document was rejected for being too big.
Oversized HTML usually comes from content embedded directly in the page rather than from the hand-written markup. The common culprits are base64 data URIs for images or fonts, large inline <script> blocks carrying JSON or generated code, and big inline <style> sheets. A runaway template or a loop that repeats a fragment thousands of times can also push a page past the limit.
Start by measuring the document, then move large inline content into external files the page links to. If the page is genuinely large, validate a representative section on its own instead of the full document.
Check the document size
curl -s https://example.com/page.html | wc -c
Before: a large inline image
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA...">
After: reference an external file
<img src="/images/logo.png" alt="Company logo">
Validate at scale.
Ship accessible websites, faster.
Automated HTML & accessibility validation for large sites. Check thousands of pages against WCAG guidelines and W3C standards in minutes, not days.
Pro Trial
Full Pro access. Cancel anytime.
Start Pro Trial →Join teams across 40+ countries