Skip to content

Commit

Permalink
fix: css pseudo-class error raised by XPathVisitor (#3223)
Browse files Browse the repository at this point in the history
**What problem is this PR intended to solve?**

After some thought, I decided I didn't like the solution to #3193
introduced in #3197, in which the exception was raised by the CSS
parser, because it's an XPath translation problem and not a CSS problem.

This PR reverts that change, and instead raises the exception from the
XPathVisitor class when it tries to translate the pseudo-class selector
or function into an XPath function call.


**Have you included adequate test coverage?**

Yes.


**Does this change affect the behavior of either the C or the Java
implementations?**

N/A
  • Loading branch information
flavorjones committed Jun 11, 2024
2 parents 09c35b5 + b918ce1 commit 47884cc
Show file tree
Hide file tree
Showing 8 changed files with 248 additions and 259 deletions.
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Nokogiri follows [Semantic Versioning](https://semver.org/), please see the [REA
### Fixed

* `Node#clone`, `NodeSet#clone`, and `*::Document#clone` all properly copy the metaclass of the original as expected. Previously, `#clone` had been aliased to `#dup` for these classes (since v1.3.0 in 2009). [#316, #3117] @flavorjones
* CSS queries for pseudo-selectors that cannot be transpiled into XPath queries now raise a more descriptive `Nokogiri::CSS::SyntaxError` when they are parsed. Previously, an invalid XPath query was created and a hard-to-understand XPath error was being raised by the query engine. [#3197] @flavorjones
* CSS queries for pseudo-selectors that cannot be translated into XPath expressions now raise a more descriptive `Nokogiri::CSS::SyntaxError` when they are parsed. Previously, an invalid XPath expression was evaluated and a hard-to-understand XPath error was raised by the query engine. [#3193] @flavorjones
* [CRuby] libgumbo (the HTML5 parser) treats reaching max-depth as EOF. This addresses a class of issues when the parser is interrupted in this way. [#3121] @stevecheckoway
* [CRuby] Update node GC lifecycle to avoid a potential memory leak with fragments in libxml 2.13.0 caused by changes in `xmlAddChild`. [#3156] @flavorjones

Expand Down
463 changes: 225 additions & 238 deletions lib/nokogiri/css/parser.rb

Large diffs are not rendered by default.

14 changes: 5 additions & 9 deletions lib/nokogiri/css/parser.y
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
class Nokogiri::CSS::Parser

token FUNCTION INCLUDES DASHMATCH LBRACE HASH PLUS MINUS GREATER S STRING IDENT
token FUNCTION INCLUDES DASHMATCH LBRACE HASH PLUS GREATER S STRING IDENT
token COMMA NUMBER PREFIXMATCH SUFFIXMATCH SUBSTRINGMATCH TILDE NOT_EQUAL
token SLASH DOUBLESLASH NOT EQUAL RPAREN LSQUARE RSQUARE HAS

Expand Down Expand Up @@ -143,17 +143,13 @@ rule
raise Racc::ParseError, "parse error on IDENT '#{val[1]}'"
end
}
| IDENT PLUS NUMBER { # n+3
| IDENT PLUS NUMBER { # n+3, -n+3
if val[0] == 'n'
val.unshift("1")
result = Node.new(:NTH, val)
else
raise Racc::ParseError, "parse error on IDENT '#{val[0]}'"
end
}
| MINUS IDENT PLUS NUMBER { # -n+3
if val[1] == 'n'
val[0] = '-1'
elsif val[0] == '-n'
val[0] = 'n'
val.unshift("-1")
result = Node.new(:NTH, val)
else
raise Racc::ParseError, "parse error on IDENT '#{val[1]}'"
Expand Down
7 changes: 2 additions & 5 deletions lib/nokogiri/css/tokenizer.rb
Original file line number Diff line number Diff line change
Expand Up @@ -63,10 +63,10 @@ def _next_token
when (text = @ss.scan(/has\([\s]*/))
action { [:HAS, text] }

when (text = @ss.scan(/([_A-Za-z]|[^\0-\177]|(\\[0-9A-Fa-f]{1,6}(\r\n|[\s])?|\\[^\n\r\f0-9A-Fa-f]))([_A-Za-z0-9-]|[^\0-\177]|(\\[0-9A-Fa-f]{1,6}(\r\n|[\s])?|\\[^\n\r\f0-9A-Fa-f]))*\([\s]*/))
when (text = @ss.scan(/-?([_A-Za-z]|[^\0-\177]|(\\[0-9A-Fa-f]{1,6}(\r\n|[\s])?|\\[^\n\r\f0-9A-Fa-f]))([_A-Za-z0-9-]|[^\0-\177]|(\\[0-9A-Fa-f]{1,6}(\r\n|[\s])?|\\[^\n\r\f0-9A-Fa-f]))*\([\s]*/))
action { [:FUNCTION, text] }

when (text = @ss.scan(/([_A-Za-z]|[^\0-\177]|(\\[0-9A-Fa-f]{1,6}(\r\n|[\s])?|\\[^\n\r\f0-9A-Fa-f]))([_A-Za-z0-9-]|[^\0-\177]|(\\[0-9A-Fa-f]{1,6}(\r\n|[\s])?|\\[^\n\r\f0-9A-Fa-f]))*/))
when (text = @ss.scan(/-?([_A-Za-z]|[^\0-\177]|(\\[0-9A-Fa-f]{1,6}(\r\n|[\s])?|\\[^\n\r\f0-9A-Fa-f]))([_A-Za-z0-9-]|[^\0-\177]|(\\[0-9A-Fa-f]{1,6}(\r\n|[\s])?|\\[^\n\r\f0-9A-Fa-f]))*/))
action { [:IDENT, text] }

when (text = @ss.scan(/\#([_A-Za-z0-9-]|[^\0-\177]|(\\[0-9A-Fa-f]{1,6}(\r\n|[\s])?|\\[^\n\r\f0-9A-Fa-f]))+/))
Expand Down Expand Up @@ -120,9 +120,6 @@ def _next_token
when (text = @ss.scan(/-?([0-9]+|[0-9]*\.[0-9]+)/))
action { [:NUMBER, text] }

when (text = @ss.scan(/[\s]*\-[\s]*/))
action { [:MINUS, text] }

when (text = @ss.scan(/[\s]*\/\/[\s]*/))
action { [:DOUBLESLASH, text] }

Expand Down
6 changes: 3 additions & 3 deletions lib/nokogiri/css/tokenizer.rex
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ macro
nmchar ([_A-Za-z0-9-]|{nonascii}|{escape})
nmstart ([_A-Za-z]|{nonascii}|{escape})
name {nmstart}{nmchar}*
ident -?{name}
charref {nmchar}+
string1 "([^\n\r\f"]|{nl}|{nonascii}|{escape})*(?<!\\)(?:\\{2})*"
string2 '([^\n\r\f']|{nl}|{nonascii}|{escape})*(?<!\\)(?:\\{2})*'
Expand All @@ -24,8 +25,8 @@ rule
# [:state] pattern [actions]
has\({w} { [:HAS, text] }
{name}\({w} { [:FUNCTION, text] }
{name} { [:IDENT, text] }
{ident}\({w} { [:FUNCTION, text] }
{ident} { [:IDENT, text] }
\#{charref} { [:HASH, text] }
{w}~={w} { [:INCLUDES, text] }
{w}\|={w} { [:DASHMATCH, text] }
Expand All @@ -43,7 +44,6 @@ rule
{w}~{w} { [:TILDE, text] }
\:not\({w} { [:NOT, text] }
{num} { [:NUMBER, text] }
{w}\-{w} { [:MINUS, text] }
{w}\/\/{w} { [:DOUBLESLASH, text] }
{w}\/{w} { [:SLASH, text] }
Expand Down
9 changes: 9 additions & 0 deletions lib/nokogiri/css/xpath_visitor.rb
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,8 @@ def visit_function(node)
is_direct = node.value[1].value[0].nil? # e.g. "has(> a)", "has(~ a)", "has(+ a)"
".#{"//" unless is_direct}#{node.value[1].accept(self)}"
else
validate_xpath_function_name(node.value.first)

# xpath function call, let's marshal those arguments
args = ["."]
args += node.value[1..-1].map do |n|
Expand Down Expand Up @@ -207,6 +209,7 @@ def visit_pseudo_class(node)
when "parent" then "node()"
when "root" then "not(parent::*)"
else
validate_xpath_function_name(node.value.first)
"nokogiri:#{node.value.first}(.)"
end
end
Expand Down Expand Up @@ -270,6 +273,12 @@ def accept(node)

private

def validate_xpath_function_name(name)
if name.start_with?("-")
raise Nokogiri::CSS::SyntaxError, "Invalid XPath function name '#{name}'"
end
end

def html5_element_name_needs_namespace_handling(node)
# if this is the wildcard selector "*", use it as normal
node.value.first != "*" &&
Expand Down
3 changes: 1 addition & 2 deletions test/css/test_tokenizer.rb
Original file line number Diff line number Diff line change
Expand Up @@ -262,8 +262,7 @@ def test_scan_nth
[:IDENT, "x"],
[":", ":"],
[:FUNCTION, "nth-child("],
[:MINUS, "-"],
[:IDENT, "n"],
[:IDENT, "-n"],
[:PLUS, "+"],
[:NUMBER, "3"],
[:RPAREN, ")"],
Expand Down
3 changes: 2 additions & 1 deletion test/css/test_xpath_visitor.rb
Original file line number Diff line number Diff line change
Expand Up @@ -372,7 +372,8 @@ def assert_xpath(expecteds, asts)

it "raises an exception for pseudo-classes that are not XPath Names" do
# see https://github.com/sparklemotion/nokogiri/issues/3193
assert_raises(Nokogiri::CSS::SyntaxError) { parser.parse("div:-moz-drag-over") }
assert_raises(Nokogiri::CSS::SyntaxError) { Nokogiri::CSS.xpath_for("div:-moz-drag-over") }
assert_raises(Nokogiri::CSS::SyntaxError) { Nokogiri::CSS.xpath_for("div:-moz-drag-over()") }
end
end

Expand Down

0 comments on commit 47884cc

Please sign in to comment.