HTML check: Malformed byte sequence:

About This HTML Issue

When a browser or validator reads your HTML file, it interprets the raw bytes according to a character encoding — most commonly UTF-8. Each encoding has rules about which byte sequences are valid. For example, in UTF-8, bytes above 0x7F must follow specific multi-byte patterns. If the validator encounters a byte or sequence of bytes that violates these rules, it reports a “malformed byte sequence” error because it literally cannot decode the bytes into meaningful characters.

This problem commonly arises in a few scenarios:

Encoding mismatch: Your file is saved as Windows-1252 (or Latin-1, ISO-8859-1) but the document declares UTF-8, or vice versa. Characters like curly quotes (" "), em dashes (—), or accented letters (é, ñ) are encoded differently across these encodings, producing invalid byte sequences when interpreted under the wrong one.
Copy-pasting from word processors: Content copied from Microsoft Word or similar applications often includes “smart quotes” and special characters encoded in Windows-1252, which can produce malformed bytes in a UTF-8 file.
File corruption: The file was partially corrupted during transfer (e.g., FTP in the wrong mode) or by a tool that modified it without respecting its encoding.
Mixed encodings: Parts of the file were written or appended using different encodings, resulting in some sections containing invalid byte sequences.

This is a serious problem because browsers may display garbled text (mojibake), skip characters entirely, or substitute replacement characters (�). It also breaks accessibility tools like screen readers, which may mispronounce or skip corrupted text. Search engines may index garbled content, harming your SEO.

How to Fix It

Declare UTF-8 encoding in your HTML with <meta charset="utf-8"> as the first element inside <head>.
Save your file as UTF-8 in your text editor. Most editors have an option like “Save with Encoding” or “File Encoding” in the status bar or save dialog. Choose “UTF-8” or “UTF-8 without BOM.”
Re-encode the file if it was originally saved in a different encoding. Tools like iconv on the command line can convert between encodings:
```
iconv -f WINDOWS-1252 -t UTF-8 input.html -o output.html
```
Replace problematic characters by re-typing them or using HTML character references if needed.
Check your server configuration. If your server sends a Content-Type header with a charset that conflicts with the file’s actual encoding (e.g., Content-Type: text/html; charset=iso-8859-1 for a UTF-8 file), the validator will use the HTTP header’s encoding, causing mismatches.

Examples

Incorrect — Encoding mismatch

A file saved in Windows-1252 but declaring UTF-8. The byte 0xE9 represents é in Windows-1252 but is an invalid lone byte in UTF-8, triggering the malformed byte sequence error.

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>My Page</title>
  </head>
  <body>
<!-- If the file is saved as Windows-1252, the é below is byte 0xE9, -->

<!-- which is not a valid UTF-8 sequence -->

    <p>Resumé</p>
  </body>
</html>

Correct — File properly saved as UTF-8

The same document, but the file is actually saved in UTF-8 encoding. The character é is stored as the two-byte sequence 0xC3 0xA9, which is valid UTF-8.

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>My Page</title>
  </head>
  <body>
    <p>Resumé</p>
  </body>
</html>

Alternative — Using character references

If you can’t resolve the encoding issue immediately, you can use HTML character references to avoid non-ASCII bytes entirely:

<p>Resum&#xe9;</p>

Or using the named reference:

<p>Resum&eacute;</p>

Both render as “Resumé” regardless of file encoding, though this is a workaround — properly saving the file as UTF-8 is the preferred long-term solution.

Find issues like this automatically

Rocket Validator scans thousands of pages in seconds, detecting HTML issues across your entire site.

Start Free Trial See Pricing

Learn more:

Character encodings in HTML

→

WHATWG Spec

The document's character encoding

→

Help us improve our guides

Was this guide helpful?