About This HTML Issue
When a browser or validator reads your HTML file, it interprets the raw bytes according to a character encoding — most commonly UTF-8. Each encoding has rules about which byte sequences are valid. For example, in UTF-8, bytes above 0x7F must follow specific multi-byte patterns. If the validator encounters a byte or sequence of bytes that violates these rules, it reports a “malformed byte sequence” error because it literally cannot decode the bytes into meaningful characters.
This problem commonly arises in a few scenarios:
-
Encoding mismatch: Your file is saved as Windows-1252 (or Latin-1, ISO-8859-1) but the document declares UTF-8, or vice versa. Characters like curly quotes (
""), em dashes (—), or accented letters (é,ñ) are encoded differently across these encodings, producing invalid byte sequences when interpreted under the wrong one. - Copy-pasting from word processors: Content copied from Microsoft Word or similar applications often includes “smart quotes” and special characters encoded in Windows-1252, which can produce malformed bytes in a UTF-8 file.
- File corruption: The file was partially corrupted during transfer (e.g., FTP in the wrong mode) or by a tool that modified it without respecting its encoding.
- Mixed encodings: Parts of the file were written or appended using different encodings, resulting in some sections containing invalid byte sequences.
This is a serious problem because browsers may display garbled text (mojibake), skip characters entirely, or substitute replacement characters (�). It also breaks accessibility tools like screen readers, which may mispronounce or skip corrupted text. Search engines may index garbled content, harming your SEO.
How to Fix It
-
Declare UTF-8 encoding in your HTML with
<meta charset="utf-8">as the first element inside<head>. - Save your file as UTF-8 in your text editor. Most editors have an option like “Save with Encoding” or “File Encoding” in the status bar or save dialog. Choose “UTF-8” or “UTF-8 without BOM.”
-
Re-encode the file if it was originally saved in a different encoding. Tools like
iconvon the command line can convert between encodings:iconv -f WINDOWS-1252 -t UTF-8 input.html -o output.html - Replace problematic characters by re-typing them or using HTML character references if needed.
-
Check your server configuration. If your server sends a
Content-Typeheader with a charset that conflicts with the file’s actual encoding (e.g.,Content-Type: text/html; charset=iso-8859-1for a UTF-8 file), the validator will use the HTTP header’s encoding, causing mismatches.
Examples
Incorrect — Encoding mismatch
A file saved in Windows-1252 but declaring UTF-8. The byte 0xE9 represents é in Windows-1252 but is an invalid lone byte in UTF-8, triggering the malformed byte sequence error.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>My Page</title>
</head>
<body>
<!-- If the file is saved as Windows-1252, the é below is byte 0xE9, -->
<!-- which is not a valid UTF-8 sequence -->
<p>Resumé</p>
</body>
</html>
Correct — File properly saved as UTF-8
The same document, but the file is actually saved in UTF-8 encoding. The character é is stored as the two-byte sequence 0xC3 0xA9, which is valid UTF-8.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>My Page</title>
</head>
<body>
<p>Resumé</p>
</body>
</html>
Alternative — Using character references
If you can’t resolve the encoding issue immediately, you can use HTML character references to avoid non-ASCII bytes entirely:
<p>Resumé</p>
Or using the named reference:
<p>Resumé</p>
Both render as “Resumé” regardless of file encoding, though this is a workaround — properly saving the file as UTF-8 is the preferred long-term solution.
Find issues like this automatically
Rocket Validator scans thousands of pages in seconds, detecting HTML issues across your entire site.