About This HTML Issue
Unicode allows certain characters — especially accented letters and other composed characters — to be represented in multiple ways. For example, the letter “é” can be a single precomposed character (U+00E9, NFC form) or a base letter “e” (U+0065) followed by a combining acute accent (U+0301, NFD form). While they may look identical on screen, they are different byte sequences. The HTML specification requires that all attribute values use NFC to ensure consistent behavior across browsers, search engines, and assistive technologies.
This matters for several important reasons:
-
String matching and comparison: Browsers and scripts may compare attribute values byte-by-byte. An
idvalue in NFD form won’t match a CSS selector or fragment identifier targeting the NFC form, causing broken links and broken styles. - Accessibility: Screen readers and other assistive technologies may process NFC and NFD strings differently, potentially mispronouncing text or failing to match ARIA references.
- Interoperability: Different operating systems produce different normalization forms by default (macOS file systems historically use NFD, for example). Copying text from various sources can introduce non-NFC characters without any visual indication.
- Standards compliance: The WHATWG HTML specification and W3C guidance on normalization explicitly recommend NFC for all HTML content.
The issue most commonly appears when attribute values contain accented characters (like in id, class, alt, title, or value attributes) that were copied from a source using NFD normalization, or when files are created on systems that default to NFD.
To fix the problem, you need to convert the affected attribute values to NFC. You can do this by:
- Retyping the characters directly in your editor, which usually produces NFC by default.
-
Using a programming tool such as Python’s
unicodedata.normalize('NFC', text), JavaScript’stext.normalize('NFC'), or similar utilities in your language of choice. - Using a text editor that supports normalization conversion (some editors have built-in Unicode normalization features or plugins).
- Running a batch conversion on your HTML files before deployment as part of your build process.
Examples
Incorrect: Attribute value uses NFD (decomposed form)
In this example, the id attribute value for “résumé” uses decomposed characters (base letter + combining accent), which triggers the validation error. The decomposition is invisible in source code but present at the byte level.
<!-- The "é" here is stored as "e" + combining acute accent (NFD) -->
<div id="résumé">
<p>My résumé content</p>
</div>
Correct: Attribute value uses NFC (precomposed form)
Here, the id attribute value uses precomposed characters, which is the correct NFC form.
<!-- The "é" here is stored as a single precomposed character (NFC) -->
<div id="résumé">
<p>My résumé content</p>
</div>
While these two examples look identical in source view, they differ at the byte level. You can verify the normalization form using browser developer tools or a hex editor.
Checking and fixing with JavaScript
You can programmatically normalize attribute values:
<script>
// Check if a string is in NFC
const text = "résumé";
const nfcText = text.normalize("NFC");
console.log(text === nfcText); // false if original was NFD
</script>
Checking and fixing with Python
import unicodedata
text = "r\u0065\u0301sume\u0301" # NFD form
normalized = unicodedata.normalize('NFC', text)
print(normalized) # Outputs NFC form: "résumé"
If you encounter this validation error, inspect the flagged attribute value carefully and ensure all characters are in their precomposed NFC form. Adding a normalization step to your build pipeline is a reliable way to prevent this issue from recurring.
Find issues like this automatically
Rocket Validator scans thousands of pages in seconds, detecting HTML issues across your entire site.
Learn more: