Description
Currently the only way to specify the charset is in the document (with BOM or <meta charset=
); if the charset is known but not specified in the document, there is no way to specify it.
Additionally, charset detection even with Heuristics.ALL
does not always work well; in particular, it fails to recognize UTF-8 at least if the first non-ASCII byte is late in the document. The WHATWG spec recommends that systems are able to recognize UTF-8 even if they arenʼt good at other charsets (as a non-normative note)
The UTF-8 encoding has a highly detectable bit pattern. Files from the local file system that contain bytes with values greater than 0x7F which match the UTF-8 pattern are very likely to be UTF-8, while documents with byte sequences that do not match it are very likely not. When a user agent can examine the whole file, rather than just the preamble, detecting for UTF-8 specifically can be especially effective. [PPUTF8] [UTF8DET]
(This is reproduced with multiple test documents; the smallest is below but another one output the warning method that the UTF-8 character was invalid in Windows-1252, meaning that went with the default which was a particularly bad guess)
<!DOCTYPE html>
<html lang="en">
<head>
<link rel="stylesheet" href="https://fred-wang.github.io/mathml.css/mathml.css">
<title>Circle equation</title>
<!-- <meta charset="utf-8" /> -->
</head>
<body>
<p>
The equation
<math display=inline>
<mi>y</mi><mo>=</mo><mo>±</mo>
<msqrt>
<msup><mi>r</mi><mn>2</mn></msup>
<mo>-</mo>
<msup><mi>x</mi><mn>2</mn></msup>
</msqrt>
</math>
produces a circle with radius <math display=inline><mi>r</mi></math>:
</p>
<svg width="10em" height="10em" viewBox="0 0 100 100">
<desc>A circle</desc>
<circle cx="50" cy="50" r="40" fill="none" stroke="blue" stroke-width="1" />
</svg>
</body>
</html>