-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling of malformed iframe tags #86
Comments
I maintain a Python binding for Modest, lexbor is not ready to be replaced yet. |
any updates on this? |
Maybe we should tell the parser that we have enabled
myhtml_tree_t* tree = myhtml_tree_create();
myhtml_tree_init(tree, myhtml);
tree->flags |= MyHTML_TREE_FLAGS_SCRIPT;
myhtml_parse(tree, MyENCODING_UTF_8, html, length); |
Hi @lexborisov, thats sound like a great idea! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I've noticed a pretty annoying problem on some websites (I think there are at least a thousand of them in Alexa 1M).
An unclosed Iframe tag breaks all the HTML below it.
Here is an example:
It's missing the closing
iframe
tag but still works when parsing it using Modest.But for some reason, if you open it in Chrome (to render the javascript parts) and dump HTML, you get this:
Now there are no closing tags for both iframes.
The problem with this is that Modest will ignore everything after such a tag:
Seaching for
script
nodes usingmyhtml_get_nodes_by_name
or using CSS selectors returns no results.@lexborisov Are there any ways to improve this? Other parsers can still handle this.
The text was updated successfully, but these errors were encountered: