Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: css pseudo-class error raised by XPathVisitor #3223

Merged
merged 2 commits into from
Jun 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Nokogiri follows [Semantic Versioning](https://semver.org/), please see the [REA
### Fixed

* `Node#clone`, `NodeSet#clone`, and `*::Document#clone` all properly copy the metaclass of the original as expected. Previously, `#clone` had been aliased to `#dup` for these classes (since v1.3.0 in 2009). [#316, #3117] @flavorjones
* CSS queries for pseudo-selectors that cannot be transpiled into XPath queries now raise a more descriptive `Nokogiri::CSS::SyntaxError` when they are parsed. Previously, an invalid XPath query was created and a hard-to-understand XPath error was being raised by the query engine. [#3197] @flavorjones
* CSS queries for pseudo-selectors that cannot be translated into XPath expressions now raise a more descriptive `Nokogiri::CSS::SyntaxError` when they are parsed. Previously, an invalid XPath expression was evaluated and a hard-to-understand XPath error was raised by the query engine. [#3193] @flavorjones
* [CRuby] libgumbo (the HTML5 parser) treats reaching max-depth as EOF. This addresses a class of issues when the parser is interrupted in this way. [#3121] @stevecheckoway
* [CRuby] Update node GC lifecycle to avoid a potential memory leak with fragments in libxml 2.13.0 caused by changes in `xmlAddChild`. [#3156] @flavorjones

Expand Down
463 changes: 225 additions & 238 deletions lib/nokogiri/css/parser.rb

Large diffs are not rendered by default.

14 changes: 5 additions & 9 deletions lib/nokogiri/css/parser.y
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
class Nokogiri::CSS::Parser

token FUNCTION INCLUDES DASHMATCH LBRACE HASH PLUS MINUS GREATER S STRING IDENT
token FUNCTION INCLUDES DASHMATCH LBRACE HASH PLUS GREATER S STRING IDENT
token COMMA NUMBER PREFIXMATCH SUFFIXMATCH SUBSTRINGMATCH TILDE NOT_EQUAL
token SLASH DOUBLESLASH NOT EQUAL RPAREN LSQUARE RSQUARE HAS

Expand Down Expand Up @@ -143,17 +143,13 @@ rule
raise Racc::ParseError, "parse error on IDENT '#{val[1]}'"
end
}
| IDENT PLUS NUMBER { # n+3
| IDENT PLUS NUMBER { # n+3, -n+3
if val[0] == 'n'
val.unshift("1")
result = Node.new(:NTH, val)
else
raise Racc::ParseError, "parse error on IDENT '#{val[0]}'"
end
}
| MINUS IDENT PLUS NUMBER { # -n+3
if val[1] == 'n'
val[0] = '-1'
elsif val[0] == '-n'
val[0] = 'n'
val.unshift("-1")
result = Node.new(:NTH, val)
else
raise Racc::ParseError, "parse error on IDENT '#{val[1]}'"
Expand Down
7 changes: 2 additions & 5 deletions lib/nokogiri/css/tokenizer.rb
Original file line number Diff line number Diff line change
Expand Up @@ -63,10 +63,10 @@
when (text = @ss.scan(/has\([\s]*/))
action { [:HAS, text] }

when (text = @ss.scan(/([_A-Za-z]|[^\0-\177]|(\\[0-9A-Fa-f]{1,6}(\r\n|[\s])?|\\[^\n\r\f0-9A-Fa-f]))([_A-Za-z0-9-]|[^\0-\177]|(\\[0-9A-Fa-f]{1,6}(\r\n|[\s])?|\\[^\n\r\f0-9A-Fa-f]))*\([\s]*/))
when (text = @ss.scan(/-?([_A-Za-z]|[^\0-\177]|(\\[0-9A-Fa-f]{1,6}(\r\n|[\s])?|\\[^\n\r\f0-9A-Fa-f]))([_A-Za-z0-9-]|[^\0-\177]|(\\[0-9A-Fa-f]{1,6}(\r\n|[\s])?|\\[^\n\r\f0-9A-Fa-f]))*\([\s]*/))
Dismissed Show dismissed Hide dismissed
action { [:FUNCTION, text] }

when (text = @ss.scan(/([_A-Za-z]|[^\0-\177]|(\\[0-9A-Fa-f]{1,6}(\r\n|[\s])?|\\[^\n\r\f0-9A-Fa-f]))([_A-Za-z0-9-]|[^\0-\177]|(\\[0-9A-Fa-f]{1,6}(\r\n|[\s])?|\\[^\n\r\f0-9A-Fa-f]))*/))
when (text = @ss.scan(/-?([_A-Za-z]|[^\0-\177]|(\\[0-9A-Fa-f]{1,6}(\r\n|[\s])?|\\[^\n\r\f0-9A-Fa-f]))([_A-Za-z0-9-]|[^\0-\177]|(\\[0-9A-Fa-f]{1,6}(\r\n|[\s])?|\\[^\n\r\f0-9A-Fa-f]))*/))
action { [:IDENT, text] }

when (text = @ss.scan(/\#([_A-Za-z0-9-]|[^\0-\177]|(\\[0-9A-Fa-f]{1,6}(\r\n|[\s])?|\\[^\n\r\f0-9A-Fa-f]))+/))
Expand Down Expand Up @@ -120,9 +120,6 @@
when (text = @ss.scan(/-?([0-9]+|[0-9]*\.[0-9]+)/))
action { [:NUMBER, text] }

when (text = @ss.scan(/[\s]*\-[\s]*/))
action { [:MINUS, text] }

when (text = @ss.scan(/[\s]*\/\/[\s]*/))
action { [:DOUBLESLASH, text] }

Expand Down
6 changes: 3 additions & 3 deletions lib/nokogiri/css/tokenizer.rex
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ macro
nmchar ([_A-Za-z0-9-]|{nonascii}|{escape})
nmstart ([_A-Za-z]|{nonascii}|{escape})
name {nmstart}{nmchar}*
ident -?{name}
charref {nmchar}+
string1 "([^\n\r\f"]|{nl}|{nonascii}|{escape})*(?<!\\)(?:\\{2})*"
string2 '([^\n\r\f']|{nl}|{nonascii}|{escape})*(?<!\\)(?:\\{2})*'
Expand All @@ -24,8 +25,8 @@ rule
# [:state] pattern [actions]

has\({w} { [:HAS, text] }
{name}\({w} { [:FUNCTION, text] }
{name} { [:IDENT, text] }
{ident}\({w} { [:FUNCTION, text] }
{ident} { [:IDENT, text] }
\#{charref} { [:HASH, text] }
{w}~={w} { [:INCLUDES, text] }
{w}\|={w} { [:DASHMATCH, text] }
Expand All @@ -43,7 +44,6 @@ rule
{w}~{w} { [:TILDE, text] }
\:not\({w} { [:NOT, text] }
{num} { [:NUMBER, text] }
{w}\-{w} { [:MINUS, text] }
{w}\/\/{w} { [:DOUBLESLASH, text] }
{w}\/{w} { [:SLASH, text] }

Expand Down
9 changes: 9 additions & 0 deletions lib/nokogiri/css/xpath_visitor.rb
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,8 @@ def visit_function(node)
is_direct = node.value[1].value[0].nil? # e.g. "has(> a)", "has(~ a)", "has(+ a)"
".#{"//" unless is_direct}#{node.value[1].accept(self)}"
else
validate_xpath_function_name(node.value.first)

# xpath function call, let's marshal those arguments
args = ["."]
args += node.value[1..-1].map do |n|
Expand Down Expand Up @@ -207,6 +209,7 @@ def visit_pseudo_class(node)
when "parent" then "node()"
when "root" then "not(parent::*)"
else
validate_xpath_function_name(node.value.first)
"nokogiri:#{node.value.first}(.)"
end
end
Expand Down Expand Up @@ -270,6 +273,12 @@ def accept(node)

private

def validate_xpath_function_name(name)
if name.start_with?("-")
raise Nokogiri::CSS::SyntaxError, "Invalid XPath function name '#{name}'"
end
end

def html5_element_name_needs_namespace_handling(node)
# if this is the wildcard selector "*", use it as normal
node.value.first != "*" &&
Expand Down
3 changes: 1 addition & 2 deletions test/css/test_tokenizer.rb
Original file line number Diff line number Diff line change
Expand Up @@ -262,8 +262,7 @@ def test_scan_nth
[:IDENT, "x"],
[":", ":"],
[:FUNCTION, "nth-child("],
[:MINUS, "-"],
[:IDENT, "n"],
[:IDENT, "-n"],
[:PLUS, "+"],
[:NUMBER, "3"],
[:RPAREN, ")"],
Expand Down
3 changes: 2 additions & 1 deletion test/css/test_xpath_visitor.rb
Original file line number Diff line number Diff line change
Expand Up @@ -372,7 +372,8 @@ def assert_xpath(expecteds, asts)

it "raises an exception for pseudo-classes that are not XPath Names" do
# see https://github.com/sparklemotion/nokogiri/issues/3193
assert_raises(Nokogiri::CSS::SyntaxError) { parser.parse("div:-moz-drag-over") }
assert_raises(Nokogiri::CSS::SyntaxError) { Nokogiri::CSS.xpath_for("div:-moz-drag-over") }
assert_raises(Nokogiri::CSS::SyntaxError) { Nokogiri::CSS.xpath_for("div:-moz-drag-over()") }
end
end

Expand Down