-
Notifications
You must be signed in to change notification settings - Fork 253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add XHTML/XML option #15
Comments
It would be nice to have an option (default off) that would exclude the characters not allowed in XML even as an entity reference. These characters cause XML validation to fail. For example, https://github.com/MylesBorins/xml-sanitizer/ he could either strip the invalid characters (like above) or replace them with a non-entity (like ESC for 0x1B). |
Another XML vs. HTML issue is how to encode using only named XML entities (i.e. Ended up with a few extra lines of JS to work-around this: const ENTITIES = ['&', '"', ''', '<', '>'];
const recode = (s) => he.encode(he.decode(s), { decimal: true });
const fix = (s) => ENTITIES.reduce((s, e) => s.replaceAll(recode(e), e), s); |
This may not be worth it, but here goes…
E.g.
…
→ U+0085 in XHTML, while in HTML it’s U+2026.http://www.w3.org/TR/xml/#d0e3895
Entities for these symbols are allowed in XML: http://www.w3.org/TR/xml/#NT-Char
The text was updated successfully, but these errors were encountered: