Skip to content

Commit 2ca9c08

Browse files
committed
NodePattern compiler complete rewrite. Add support for multiple variadic terms.
1 parent 12f8a81 commit 2ca9c08

31 files changed

+1880
-881
lines changed

.github/workflows/rubocop.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ jobs:
7474
run: bundle exec rake spec
7575
- name: internal investigation
7676
if: matrix.internal_investigation
77-
run: bundle exec rake internal_investigation
77+
run: bundle exec rake generate internal_investigation
7878
rubocop_specs:
7979
name: >-
8080
Main Gem Specs | RuboCop: ${{ matrix.rubocop }} | ${{ matrix.ruby }} (${{ matrix.os }})
@@ -98,6 +98,8 @@ jobs:
9898
ruby-version: ${{ matrix.ruby }}
9999
- name: install dependencies
100100
run: bundle install --jobs 3 --retry 3
101+
- name: generate lexer and parser
102+
run: bundle exec rake generate
101103
- name: clone rubocop from source for full specs -- master
102104
if: matrix.rubocop == 'master'
103105
run: git clone --branch ${{ matrix.rubocop }} https://github.com/rubocop-hq/rubocop.git ../rubocop

.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,8 @@
1+
# generated parser / lexer
2+
/lib/rubocop/ast/node_pattern/parser.racc.rb
3+
/lib/rubocop/ast/node_pattern/parser.output
4+
/lib/rubocop/ast/node_pattern/lexer.rex.rb
5+
16
# rcov generated
27
coverage
38
coverage.data

.rubocop.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,9 @@ AllCops:
1313
- 'spec/fixtures/**/*'
1414
- 'tmp/**/*'
1515
- '.git/**/*'
16+
- 'lib/rubocop/ast/node_pattern/parser.racc.rb'
17+
- 'lib/rubocop/ast/node_pattern/lexer.rex.rb'
18+
- 'spec/rubocop/ast/node_pattern/parse_helper.rb'
1619
TargetRubyVersion: 2.4
1720

1821
Naming/PredicateName:

.rubocop_todo.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ Metrics/MethodLength:
3232
# Offense count: 1
3333
# Configuration parameters: CountComments.
3434
Metrics/ModuleLength:
35-
Max: 101
35+
Max: 108
3636

3737
# Offense count: 1
3838
# Configuration parameters: ExpectMatchingDefinition, Regex, IgnoreExecutableScripts, AllowedAcronyms.
@@ -65,6 +65,7 @@ RSpec/ContextWording:
6565
- 'spec/rubocop/ast/resbody_node_spec.rb'
6666
- 'spec/rubocop/ast/token_spec.rb'
6767
- 'spec/spec_helper.rb'
68+
- 'spec/rubocop/ast/node_pattern/helper.rb'
6869

6970
# Offense count: 6
7071
# Configuration parameters: Max.
@@ -73,6 +74,7 @@ RSpec/ExampleLength:
7374
- 'spec/rubocop/ast/node_pattern_spec.rb'
7475
- 'spec/rubocop/ast/processed_source_spec.rb'
7576
- 'spec/rubocop/ast/send_node_spec.rb'
77+
- 'spec/rubocop/ast/node_pattern/parser_spec.rb'
7678

7779
# Offense count: 6
7880
RSpec/LeakyConstantDeclaration:

CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,10 @@
22

33
## master (unreleased)
44

5+
### New features
6+
7+
* [#105](https://github.com/rubocop-hq/rubocop-ast/pull/105): `NodePattern` compiler [complete rewrite](https://docs.rubocop.org/rubocop-ast/node_pattern_compiler.html). Add support for multiple variadic terms. ([@marcandre][])
8+
59
## 0.6.0 (2020-09-26)
610

711
### New features

Gemfile

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,10 @@ source 'https://rubygems.org'
55
gemspec
66

77
gem 'bump', require: false
8+
gem 'oedipus_lex', require: false
89
gem 'pry'
9-
gem 'rake', '~> 12.0'
10+
gem 'racc'
11+
gem 'rake', '~> 13.0'
1012
gem 'rspec', '~> 3.7'
1113
local_ast = File.expand_path('../rubocop', __dir__)
1214
if Dir.exist? local_ast

Rakefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ end
1515

1616
require 'rspec/core/rake_task'
1717

18-
RSpec::Core::RakeTask.new(:spec) do |spec|
18+
RSpec::Core::RakeTask.new(spec: :generate) do |spec|
1919
spec.pattern = FileList['spec/**/*_spec.rb']
2020
end
2121

docs/modules/ROOT/nav.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,4 @@
22
* xref:installation.adoc[Installation]
33
* xref:node_types.adoc[Node Types]
44
* xref:node_pattern.adoc[Node Pattern]
5+
* xref:node_pattern_compiler.adoc[Node Pattern Compiler]
Lines changed: 240 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,240 @@
1+
= Hacker's guide to the `NodePattern` compiler
2+
3+
This documentation is aimed at anyone wanting to understand / modify the `NodePattern` compiler.
4+
It assumes some familiarity with the syntax of https://github.com/rubocop-hq/rubocop-ast/blob/master/doc/modules/ROOT/pages/node_pattern.md[`NodePattern`], as well as the AST produced by the `parser` gem.
5+
6+
== High level view
7+
8+
The `NodePattern` compiler uses the same techniques as the `parser` gem:
9+
10+
* a `Lexer` that breaks source into tokens
11+
* a `Parser` that uses tokens and a `Builder` to emit an AST
12+
* a `Compiler` that converts this AST into Ruby code
13+
14+
Example:
15+
16+
* Pattern: `+(send nil? {:puts :p} $...)+`
17+
* Tokens: `+'(', [:tNODE_TYPE, :send], [:tPREDICATE, :nil?], '{', ...+`
18+
* AST: `+s(:sequence, s(:node_type, :send), s(:predicate, :nil?), s(:union, ...+`
19+
* Ruby code:
20+
+
21+
[source,ruby]
22+
----
23+
node.is_a?(::RuboCop::AST::Node) && node.children.size >= 2 &&
24+
node.send_type? &&
25+
node.children[0].nil?() &&
26+
(union2 = node.children[1]; ...
27+
----
28+
29+
The different parts are described below
30+
31+
== Vocabulary
32+
33+
*"node pattern"*: something that can be matched against a single `AST::Node`.
34+
While `(int 42)` and `#is_fun?` both correspond to node patterns, `+...+` (without the parenthesis) is not a node pattern.
35+
36+
*"sequence"*: a node pattern that describes the sequence of children of a node (and its type): `+(type first_child second_child ...)+`
37+
38+
*"variadic"*: element of a sequence that can match a variable number of children.
39+
`+(send _ int* ...)+` has two variadic elements (`int*` and `+...+`).
40+
`(send _ :name)` contains no variadic element.
41+
Note that a sequence is itself never variadic.
42+
43+
*"atom"*: element of a pattern that corresponds with a simple Ruby object.
44+
`(send nil?
45+
:puts (str 'hello'))` has two atoms: `:puts` and `'hello'`.
46+
47+
== Lexer
48+
49+
The `lexer.rb` defines `Lexer` and has the few definitions needed for the lexer to work.
50+
The bulk of the processing is in the inherited class that is generated by https://github.com/seattlerb/oedipus_lex[`oedipus_lex`]
51+
52+
[discrete]
53+
==== Rules
54+
55+
https://github.com/seattlerb/oedipus_lex[`oedipus_lex`] generates the Ruby file `lexer.rex.rb` from the rules defined in `lexer.rex`.
56+
57+
These rules map a Regexp to code that emits a token.
58+
59+
`oedipus_lex` aims to be simple and the generated file is readable.
60+
It uses https://ruby-doc.org/stdlib-2.7.1/libdoc/strscan/rdoc/StringScanner.html[`StringScanner`] behind the scene.
61+
It selects the first rule that matches, contrary to many lexing tools that prioritize longest match.
62+
63+
[discrete]
64+
==== Tokens
65+
66+
The `Lexer` emits tokens with types that are:
67+
68+
* string for the syntactic symbols (e.g.
69+
`'('`, `'$'`, `+'...'+`)
70+
* symbols of the form `:tTOKEN_TYPE` for the rest (e.g.
71+
`:tPREDICATE`)
72+
73+
Tokens are stored as `[type, value]`.
74+
75+
[discrete]
76+
==== Generation
77+
78+
Use `rake generate:lexer` to generate the `lexer.rex.rb` from `lexer.rex` file.
79+
This is done automatically by `rake spec`.
80+
81+
NOTE: the `lexer.rex.rb` is not under source control, but is included in the gem.
82+
83+
== Parser
84+
85+
Similarly to the `Lexer`, the `parser.rb` defines `Parser` and has the few definitions needed for the parser to work.
86+
The bulk of the processing is in the inherited class `parser.racc.rb` that is generated by https://ruby-doc.org/stdlib-2.7.0/libdoc/racc/parser/rdoc/Racc.html#module-Racc-label-Writing+A+Racc+Grammar+File[`racc`] from the rules in `parser.y`.
87+
88+
[discrete]
89+
==== Nodes
90+
91+
The `Parser` emits `NodePattern::Node` which are similar to RuboCop's node.
92+
They both inherit from ``parser``'s `Parser::AST::Source::Node`, and share additional methods too.
93+
94+
Like for RuboCop's nodes, some nodes have specicialized classes (e.g.
95+
`Sequence`) while other nodes use the base class directly (e.g.
96+
`s(:number, 42)`)
97+
98+
[discrete]
99+
==== Rules
100+
101+
The rules follow closely the definitions above.
102+
In particular a distinction between `node_pattern_list`, which is a list of node patterns (each term can match a single node), while the more generic `variadic_pattern_list` is a list of elements, some of which could be variadic, others simple node patterns.
103+
104+
[discrete]
105+
==== Generation
106+
107+
Similarly to the lexer, use `rake generate:parser` to generate the `parser.racc.rb` from `parser.y` file.
108+
This is done automatically by `rake spec`.
109+
110+
NOTE: the `parser.racc.rb` is not under source control, but is included in the gem.
111+
112+
== Compiler
113+
114+
The compiler's core is the `Compiler` class.
115+
It holds the global state (e.g.
116+
references to named arguments).
117+
The goal of the compiler is to produce `matching_code`, Ruby code that can be run against an `AST::Node`, or any Ruby object for that matter.
118+
119+
Packaging of that `matching_code` into code for a `lambda`, or method `def` is handled separately by the `MethodDefiner` module.
120+
121+
The compilation itself is handled by three subcompilers:
122+
123+
* `NodePatternSubcompiler`
124+
* `AtomSubcompiler`
125+
* `SequenceSubcompiler`
126+
127+
=== Visitors
128+
129+
The subcompilers use the visitor pattern [https://en.wikipedia.org/wiki/Visitor_pattern]
130+
131+
The methods starting with `visit_` are used to process the different types of nodes.
132+
For a node of type `:capture`, the method `visit_capture` will be called, or if none is defined then `visit_other_type` will be called.
133+
134+
No argument is passed, as the visited node is accessible with the `node` attribute reader.
135+
136+
=== NodePatternSubcompiler
137+
138+
Given any `NodePattern::Node`, it generates the Ruby code that can return `true` or `false` for the given node, or node type for sequence head.
139+
140+
==== `var` vs `access`
141+
142+
The subcompiler can be called with the current node stored either in a variable (provided with the `var:` keyword argument) or via a Ruby expression (e.g.
143+
`access: 'current_node.children[2]'`).
144+
145+
The subcompiler will not generate code that executes this `access` expression more than once or twice.
146+
If it might access the node more than that, `multiple_access` will store the result in a temporary variable (e.g.
147+
`union`).
148+
149+
==== Sequences
150+
151+
Sequences are the most difficult elements to handle and are deferred to the `SequenceSubcompiler`.
152+
153+
==== Atoms
154+
155+
Atoms are handled with `visit_other_type`, which defers to the `AtomSubcompiler` and converts that result to a node pattern by appending `=== cur_node` (or `=== cur_node.type` if in sequence head).
156+
157+
This way, the two arguments in `(_ #func?(%1) %2)` would be compiled differently;
158+
`%1` would be compiled as `param1`, while `%2` gets compiled as `param2 === node.children[1]`.
159+
160+
==== Precedence
161+
162+
The code generated has higher or equal precedence to `&&`, so as to make chaining convenient.
163+
164+
=== AtomSubcompiler
165+
166+
This subcompiler produces Ruby code that gets evaluated to a Ruby object.
167+
E.g.
168+
`"42"`, `:a_symbol`, `param1`.
169+
170+
A good way to think about it is when it has to be passed as arguments to a function call.
171+
For example:
172+
173+
[source,ruby]
174+
----
175+
# Pattern '#func(42, %1)' compiles to
176+
func(node, 42, param1)
177+
----
178+
179+
Note that any node pattern can be output by this subcompiler, but those that don't correspond to a Ruby literal will be output as a lambda so they can be combined.
180+
For example:
181+
182+
[source,ruby]
183+
----
184+
# Pattern '#func(int)' compiles to
185+
func(node, ->(compare) { compare.is_a?(::RuboCop::AST::Node) && compare.int_type? })
186+
----
187+
188+
=== SequenceSubcompiler
189+
190+
The subcompiler compiles the sequences' terms in turn, keeping track of which children of the `AST::Node` are being matched.
191+
192+
==== Variadic terms
193+
194+
The complexity comes from variadic elements, which have complex processing _and_ may make it impossible to know at compile time which children are matched by the subsequent terms.
195+
196+
*Example* (no variadic terms)
197+
198+
----
199+
(_type int _ str)
200+
----
201+
202+
First child must match `int`, third child must match `str`.
203+
The subcompiler will use `children[0]` and `children[2]`.
204+
205+
*Example* (one variadic terms)
206+
207+
----
208+
(_type int _* str)
209+
----
210+
211+
First child must match `int` and _last_ child must match `str`.
212+
The subcompiler will use `children[0]` and `children[-1]`.
213+
214+
*Example* (multiple variadic terms)
215+
216+
----
217+
(_type int+ sym str+)
218+
----
219+
220+
The subcompiler can not use any integer and `children[]` to match `sym`.
221+
This must be tracked at runtime in a variable (`cur_index`).
222+
223+
The subcompiler will use fixed indices before the first variadic element and after the last one.
224+
225+
==== Node pattern terms
226+
227+
The node pattern terms are delegated to the `NodePatternSubcompiler`.
228+
229+
In the pattern `(:sym :sym)`, both `:sym` will be compiled differently because the first `:sym` is in "sequence head": `:sym === node.type` and `:sym == node.children[0]` respectively.
230+
The subcompiler indicates if the pattern is in "sequence head" or not, so the `NodePatternSubcompiler` can produce the right code.
231+
232+
Variadic elements may not (currently) cover the sequence head.
233+
As a convenience, `+(...)+` is understood as `+(_ ...)+`.
234+
Other types of nodes will raise an error (e.g.
235+
`(<will not compile>)`;
236+
see `Node#in_sequence_head`)
237+
238+
==== Precedence
239+
240+
Like the node pattern subcompiler, it generates code that has higher or equal precedence to `&&`, so as to make chaining convenient.

lib/rubocop/ast.rb

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,20 @@
66

77
require_relative 'ast/ext/range'
88
require_relative 'ast/ext/set'
9+
require_relative 'ast/node_pattern/method_definer'
910
require_relative 'ast/node_pattern'
1011
require_relative 'ast/node/mixin/descendence'
12+
require_relative 'ast/node_pattern/builder'
13+
require_relative 'ast/node_pattern/comment'
14+
require_relative 'ast/node_pattern/compiler'
15+
require_relative 'ast/node_pattern/compiler/subcompiler'
16+
require_relative 'ast/node_pattern/compiler/atom_subcompiler'
17+
require_relative 'ast/node_pattern/compiler/binding'
18+
require_relative 'ast/node_pattern/compiler/node_pattern_subcompiler'
19+
require_relative 'ast/node_pattern/compiler/sequence_subcompiler'
20+
require_relative 'ast/node_pattern/lexer'
21+
require_relative 'ast/node_pattern/node'
22+
require_relative 'ast/node_pattern/parser'
1123
require_relative 'ast/sexp'
1224
require_relative 'ast/node'
1325
require_relative 'ast/node/mixin/method_identifier_predicates'

0 commit comments

Comments
 (0)