# Malformed byte sequence:

> Canonical HTML version: https://rocketvalidator.com/html-validation/malformed-byte-sequence
> Attribution: Rocket Validator (https://rocketvalidator.com)
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

When a browser or validator reads your HTML file, it interprets the raw bytes according to a character encoding — most commonly UTF-8. Each encoding has rules about which byte sequences are valid. For example, in UTF-8, bytes above `0x7F` must follow specific multi-byte patterns. If the validator encounters a byte or sequence of bytes that violates these rules, it reports a "malformed byte sequence" error because it literally cannot decode the bytes into meaningful characters.

This problem commonly arises in a few scenarios:

- **Encoding mismatch:** Your file is saved as Windows-1252 (or Latin-1, ISO-8859-1) but the document declares UTF-8, or vice versa. Characters like curly quotes (`"` `"`), em dashes (`—`), or accented letters (`é`, `ñ`) are encoded differently across these encodings, producing invalid byte sequences when interpreted under the wrong one.
- **Copy-pasting from word processors:** Content copied from Microsoft Word or similar applications often includes "smart quotes" and special characters encoded in Windows-1252, which can produce malformed bytes in a UTF-8 file.
- **File corruption:** The file was partially corrupted during transfer (e.g., FTP in the wrong mode) or by a tool that modified it without respecting its encoding.
- **Mixed encodings:** Parts of the file were written or appended using different encodings, resulting in some sections containing invalid byte sequences.

This is a serious problem because browsers may display garbled text (mojibake), skip characters entirely, or substitute replacement characters (`�`). It also breaks accessibility tools like screen readers, which may mispronounce or skip corrupted text. Search engines may index garbled content, harming your SEO.

## How to Fix It

1. **Declare UTF-8 encoding** in your HTML with `<meta charset="utf-8">` as the first element inside `<head>`.
2. **Save your file as UTF-8** in your text editor. Most editors have an option like "Save with Encoding" or "File Encoding" in the status bar or save dialog. Choose "UTF-8" or "UTF-8 without BOM."
3. **Re-encode the file** if it was originally saved in a different encoding. Tools like `iconv` on the command line can convert between encodings:
   ```
   iconv -f WINDOWS-1252 -t UTF-8 input.html -o output.html
   ```
4. **Replace problematic characters** by re-typing them or using HTML character references if needed.
5. **Check your server configuration.** If your server sends a `Content-Type` header with a charset that conflicts with the file's actual encoding (e.g., `Content-Type: text/html; charset=iso-8859-1` for a UTF-8 file), the validator will use the HTTP header's encoding, causing mismatches.

## Examples

### Incorrect — Encoding mismatch

A file saved in Windows-1252 but declaring UTF-8. The byte `0xE9` represents `é` in Windows-1252 but is an invalid lone byte in UTF-8, triggering the malformed byte sequence error.

```html
<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>My Page</title>
  </head>
  <body>
    <!-- If the file is saved as Windows-1252, the é below is byte 0xE9, -->
    <!-- which is not a valid UTF-8 sequence -->
    <p>Resumé</p>
  </body>
</html>
```

### Correct — File properly saved as UTF-8

The same document, but the file is actually saved in UTF-8 encoding. The character `é` is stored as the two-byte sequence `0xC3 0xA9`, which is valid UTF-8.

```html
<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>My Page</title>
  </head>
  <body>
    <p>Resumé</p>
  </body>
</html>
```

### Alternative — Using character references

If you can't resolve the encoding issue immediately, you can use HTML character references to avoid non-ASCII bytes entirely:

```html
<p>Resum&#xe9;</p>
```

Or using the named reference:

```html
<p>Resum&eacute;</p>
```

Both render as "Resumé" regardless of file encoding, though this is a workaround — properly saving the file as UTF-8 is the preferred long-term solution.
