You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Many HTML files do not contain proper character set declarations, but we still need to be able to read them. LXML is a bit too picky and fails when such files are opened with:
How do your example files look like? Can you share one here (or at least the beginning of it)?
All examples in the linked discussion seem to have a proper character set declaration, but lxml does not recognize the short form <meta charset='utf-8'>. Do you face the same problem?
Many HTML files do not contain proper character set declarations, but we still need to be able to read them. LXML is a bit too picky and fails when such files are opened with:
doc = html.parse(args.file)
See the discussion here for how to fix it:
https://stackoverflow.com/questions/15302125/html-encoding-and-lxml-parsing
I'm going to see whether I can add a fix for this.
The text was updated successfully, but these errors were encountered: