Skip to content

Commit

Permalink
fix: raise CSS::SyntaxError if a pseudo-class is not an XPath Name
Browse files Browse the repository at this point in the history
Some pseudo-classes cannot be converted into an XPath function name,
and libxml2 will raise an Nokogiri::XML::XPath::SyntaxError at
query-time:

    nokogiri/xml/searchable.rb:238:in `evaluate': ERROR: Invalid expression: //*:div[nokogiri:-moz-drag-over(.)] (Nokogiri::XML::XPath::SyntaxError)

This change moves the error from query-time to parse-time, in the
hopes that this is more rescuable (and the error is more descriptive):

    nokogiri/css/parser_extras.rb:86:in `on_error': unexpected '-' after ':' (Nokogiri::CSS::SyntaxError)

Closes #3193
  • Loading branch information
flavorjones committed May 24, 2024
1 parent 3d3238c commit d3a60cb
Show file tree
Hide file tree
Showing 7 changed files with 264 additions and 237 deletions.
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,9 @@ Nokogiri follows [Semantic Versioning](https://semver.org/), please see the [REA

### Fixed

* [CRuby] libgumbo (the HTML5 parser) treats reaching max-depth as EOF. This addresses a class of issues when the parser is interrupted in this way. [#3121] @stevecheckoway
* `Node#clone`, `NodeSet#clone`, and `*::Document#clone` all properly copy the metaclass of the original as expected. Previously, `#clone` had been aliased to `#dup` for these classes (since v1.3.0 in 2009). [#316, #3117] @flavorjones
* CSS queries for pseudo-selectors that cannot be transpiled into XPath queries now raise a more descriptive `Nokogiri::CSS::SyntaxError` when they are parsed. Previously, an invalid XPath query was created and a hard-to-understand XPath error was being raised by the query engine. [#3197] @flavorjones
* [CRuby] libgumbo (the HTML5 parser) treats reaching max-depth as EOF. This addresses a class of issues when the parser is interrupted in this way. [#3121] @stevecheckoway
* [CRuby] Update node GC lifecycle to avoid a potential memory leak with fragments in libxml 2.13.0 caused by changes in `xmlAddChild`. [#3156] @flavorjones


Expand Down
463 changes: 238 additions & 225 deletions lib/nokogiri/css/parser.rb

Large diffs are not rendered by default.

14 changes: 9 additions & 5 deletions lib/nokogiri/css/parser.y
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
class Nokogiri::CSS::Parser

token FUNCTION INCLUDES DASHMATCH LBRACE HASH PLUS GREATER S STRING IDENT
token FUNCTION INCLUDES DASHMATCH LBRACE HASH PLUS MINUS GREATER S STRING IDENT
token COMMA NUMBER PREFIXMATCH SUFFIXMATCH SUBSTRINGMATCH TILDE NOT_EQUAL
token SLASH DOUBLESLASH NOT EQUAL RPAREN LSQUARE RSQUARE HAS

Expand Down Expand Up @@ -143,13 +143,17 @@ rule
raise Racc::ParseError, "parse error on IDENT '#{val[1]}'"
end
}
| IDENT PLUS NUMBER { # n+3, -n+3
| IDENT PLUS NUMBER { # n+3
if val[0] == 'n'
val.unshift("1")
result = Node.new(:NTH, val)
elsif val[0] == '-n'
val[0] = 'n'
val.unshift("-1")
else
raise Racc::ParseError, "parse error on IDENT '#{val[0]}'"
end
}
| MINUS IDENT PLUS NUMBER { # -n+3
if val[1] == 'n'
val[0] = '-1'
result = Node.new(:NTH, val)
else
raise Racc::ParseError, "parse error on IDENT '#{val[1]}'"
Expand Down
7 changes: 5 additions & 2 deletions lib/nokogiri/css/tokenizer.rb
Original file line number Diff line number Diff line change
Expand Up @@ -63,10 +63,10 @@ def _next_token
when (text = @ss.scan(/has\([\s]*/))
action { [:HAS, text] }

when (text = @ss.scan(/-?([_A-Za-z]|[^\0-\177]|(\\[0-9A-Fa-f]{1,6}(\r\n|[\s])?|\\[^\n\r\f0-9A-Fa-f]))([_A-Za-z0-9-]|[^\0-\177]|(\\[0-9A-Fa-f]{1,6}(\r\n|[\s])?|\\[^\n\r\f0-9A-Fa-f]))*\([\s]*/))
when (text = @ss.scan(/([_A-Za-z]|[^\0-\177]|(\\[0-9A-Fa-f]{1,6}(\r\n|[\s])?|\\[^\n\r\f0-9A-Fa-f]))([_A-Za-z0-9-]|[^\0-\177]|(\\[0-9A-Fa-f]{1,6}(\r\n|[\s])?|\\[^\n\r\f0-9A-Fa-f]))*\([\s]*/))
action { [:FUNCTION, text] }

when (text = @ss.scan(/-?([_A-Za-z]|[^\0-\177]|(\\[0-9A-Fa-f]{1,6}(\r\n|[\s])?|\\[^\n\r\f0-9A-Fa-f]))([_A-Za-z0-9-]|[^\0-\177]|(\\[0-9A-Fa-f]{1,6}(\r\n|[\s])?|\\[^\n\r\f0-9A-Fa-f]))*/))
when (text = @ss.scan(/([_A-Za-z]|[^\0-\177]|(\\[0-9A-Fa-f]{1,6}(\r\n|[\s])?|\\[^\n\r\f0-9A-Fa-f]))([_A-Za-z0-9-]|[^\0-\177]|(\\[0-9A-Fa-f]{1,6}(\r\n|[\s])?|\\[^\n\r\f0-9A-Fa-f]))*/))
action { [:IDENT, text] }

when (text = @ss.scan(/\#([_A-Za-z0-9-]|[^\0-\177]|(\\[0-9A-Fa-f]{1,6}(\r\n|[\s])?|\\[^\n\r\f0-9A-Fa-f]))+/))
Expand Down Expand Up @@ -120,6 +120,9 @@ def _next_token
when (text = @ss.scan(/-?([0-9]+|[0-9]*\.[0-9]+)/))
action { [:NUMBER, text] }

when (text = @ss.scan(/[\s]*\-[\s]*/))
action { [:MINUS, text] }

when (text = @ss.scan(/[\s]*\/\/[\s]*/))
action { [:DOUBLESLASH, text] }

Expand Down
6 changes: 3 additions & 3 deletions lib/nokogiri/css/tokenizer.rex
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@ macro
nmchar ([_A-Za-z0-9-]|{nonascii}|{escape})
nmstart ([_A-Za-z]|{nonascii}|{escape})
name {nmstart}{nmchar}*
ident -?{name}
charref {nmchar}+
string1 "([^\n\r\f"]|{nl}|{nonascii}|{escape})*(?<!\\)(?:\\{2})*"
string2 '([^\n\r\f']|{nl}|{nonascii}|{escape})*(?<!\\)(?:\\{2})*'
Expand All @@ -25,8 +24,8 @@ rule
# [:state] pattern [actions]
has\({w} { [:HAS, text] }
{ident}\({w} { [:FUNCTION, text] }
{ident} { [:IDENT, text] }
{name}\({w} { [:FUNCTION, text] }
{name} { [:IDENT, text] }
\#{charref} { [:HASH, text] }
{w}~={w} { [:INCLUDES, text] }
{w}\|={w} { [:DASHMATCH, text] }
Expand All @@ -44,6 +43,7 @@ rule
{w}~{w} { [:TILDE, text] }
\:not\({w} { [:NOT, text] }
{num} { [:NUMBER, text] }
{w}\-{w} { [:MINUS, text] }
{w}\/\/{w} { [:DOUBLESLASH, text] }
{w}\/{w} { [:SLASH, text] }
Expand Down
3 changes: 2 additions & 1 deletion test/css/test_tokenizer.rb
Original file line number Diff line number Diff line change
Expand Up @@ -262,7 +262,8 @@ def test_scan_nth
[:IDENT, "x"],
[":", ":"],
[:FUNCTION, "nth-child("],
[:IDENT, "-n"],
[:MINUS, "-"],
[:IDENT, "n"],
[:PLUS, "+"],
[:NUMBER, "3"],
[:RPAREN, ")"],
Expand Down
5 changes: 5 additions & 0 deletions test/css/test_xpath_visitor.rb
Original file line number Diff line number Diff line change
Expand Up @@ -369,6 +369,11 @@ def assert_xpath(expecteds, asts)
assert_xpath("//*[not(@id='foo')]", parser.parse(":not(#foo)"))
assert_xpath("//*[count(preceding-sibling::*)=0]", parser.parse(":first-child"))
end

it "raises an exception for pseudo-classes that are not XPath Names" do
# see https://github.com/sparklemotion/nokogiri/issues/3193
assert_raises(Nokogiri::CSS::SyntaxError) { parser.parse("div:-moz-drag-over") }
end
end

describe "combinators" do
Expand Down

0 comments on commit d3a60cb

Please sign in to comment.