-
Notifications
You must be signed in to change notification settings - Fork 36
/
CHANGES
7527 lines (4665 loc) · 274 KB
/
CHANGES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1.11.0-dev.217 | 2024-06-19 09:29:16 +0200
* Fix lints reported by clang-tidy-18 (Benjamin Bannier, Corelight)
1.11.0-dev.202 | 2024-06-18 12:31:36 +0200
* GH-1763: Only allow creation of `const` variables from literals. (Benjamin Bannier, Corelight)
Since it is hard to enforce that `const` variables are initialized in
correct order (order of decls usually is not significant in Spicy/HILTI
and works e.g., for `global`s, but this is not the case for `const`s due
to codegen), this patch disallows creating `const` values from anything
but literals. This completely removes the ordering issue.
1.11.0-dev.200 | 2024-06-14 10:56:04 +0200
* GH-1759: Fix `if`-condition with `switch` parsing. (Robin Sommer, Corelight)
The parser generator was ignoring `if` conditions attached to `switch`
constructs. While we actually had a test for this already, turns out
we had recorded a broken baseline. Plus, we were testing only one
variant of `switch` (expression-based, not look-ahead-based). This
implements and tests both variants now.
* Fix clang-tidy. (Robin Sommer, Corelight)
1.11.0-dev.197 | 2024-06-13 12:43:33 +0200
* GH-1750: Add `to_real` method to `bytes`. (Robin Sommer, Corelight)
This interprets the data as representing an ASCII-encoded floating
point number and converts that into a ``real``. The data can be in
either decimal or hexadecimal format. If it cannot be parsed as
either, throws an `InvalidValue` exception.
* GH-1608: Add `get_optional` method to maps. (Robin Sommer, Corelight)
This returns an optional either containing the map's element for the
given key if that entry exists, or an unset optional if it does not.
* Update some debug baselines. (Robin Sommer, Corelight)
1.11.0-dev.193 | 2024-06-13 12:10:22 +0200
* GH-1760: Fix generated code for huge `const` collections. (Benjamin Bannier, Corelight)
1.11.0-dev.191 | 2024-06-13 09:59:22 +0200
* GH-1598: Enforce that the argument `new` is either a type or a
ctor. (Robin Sommer, Corelight)
So far we allowed some more generic expressions as well, but it's hard
for the parser to support arbitrary expressions here due to parsing
ambiguities, leaving things inconsistent. So we now limit it to what
was pretty much the intent originally anyways.
Note that the error message for #1598 stays the same: it's not great,
but seems good enough. However, we now actually disallow the
workaround shown in the ticket as well for consistency. The new
work-around is shown in the changes to `hilti.codegen.type-info`.
* GH-90/GH-1733: Add `result` and `spicy::Error` types to Spicy to
facilitate error handling. (Robin Sommer, Corelight)
The `result` and `error` types were already implemented internally
HILTI-side, but not yet available to Spicy users. This exposes them to
Spicy as well. To avoid name clashes with existing code, we don't
introduce `error` as a new type keyword, but instead make it available
as a library type ``spicy::Error``.
Typical usage is something like this:
```
function foo() : result<int64> {
...
if ( everything_is_ok )
return 42;
else
return error"Something went wrong.";
}
if ( local x = foo() )
print "result: %d " % *x;
else
print "error: %s " % x.error();
```
The documentation has more specifics.
* Support `result<void>` to HILTI. (Robin Sommer, Corelight)
This allows to capture errors even if there's no actual result
otherwise. Example (HILTI syntax):
```
function result<void> x(bool b) {
if ( b )
return Null; # coerces to a successful result<void>
else
return error("trouble...");
}
assert x(True);
assert x(False).error() == error("trouble...");
```
* Add `==`/`!=` operators for HILTI `error` instances. (Robin
Sommer, Corelight)
1.11.0-dev.185 | 2024-06-11 18:47:19 +0200
* Bump centos-stream in CI. (Benjamin Bannier, Corelight)
1.11.0-dev.183 | 2024-06-07 10:35:53 +0200
* GH-1745: Fix C++ initialization of global constants through global functions. (Robin Sommer, Corelight)
The changes ordering of the emitted global declarations so that
functions now come first, allowing them be used inside subsequent
constant initializations.
1.11.0-dev.181 | 2024-06-04 10:18:25 +0200
* Clean up includes. (Benjamin Bannier, Corelight)
* Fix Flex linker error when building as part of Zeek. (Benjamin Bannier, Corelight)
When building Spicy as part of Zeek against the Homebrew flex-2.6.4 I
saw linker errors after f52325693aad8bd7931d51c4658027a8b7d7adae,
```
ld: Undefined symbols:
HiltiFlexLexer::LexerInput(char*, unsigned long), referenced from:
vtable for hilti::detail::parser::Scanner in driver.cc.o
HiltiFlexLexer::LexerOutput(char const*, unsigned long), referenced from:
vtable for hilti::detail::parser::Scanner in driver.cc.o
```
It is still not clear to me how this error comes about, but the change in
this patch seems to address the issue.
1.11.0-dev.178 | 2024-06-03 09:50:53 +0200
* Fix a typo in packing.rst (Tanner Kvarfordt)
Fix a typo in packing.rst where the template argument to unpack was (u)int
instead of real.
* Expand guidelines on improving compilation performance. [skip CI] (Benjamin Bannier, Corelight)
* Fix documented type mapping for integers. [skip CI] (Benjamin Bannier, Corelight)
1.11.0-dev.172 | 2024-05-17 12:51:34 +0200
* GH-1742: Unroll ctrs of big containers. (Benjamin Bannier, Corelight)
When generating C++ code for container ctrs we previously would directly
invoke the respective C++ ctrs taking an initializer list. For very big
initializer lists this causes very bad C++ compiler performance, e.g.,
compiling code constructing a vector with 10,000 elements could take
minutes.
With this patch we unroll such ctrs calls by calling a dedicated
initialization function. For huge containers this causes creating of big
functions instead of big initializer lists, but compiling functions
seems to behave more predictively.
Closes #1742.
1.11.0-dev.170 | 2024-05-17 12:51:06 +0200
* GH-1743: Use a checked cast for `map`'s `in` operator. (Benjamin Bannier, Corelight)
1.11.0-dev.168 | 2024-05-15 10:19:54 +0200
* Fix behavior for unset accept and decline hooks. (Benjamin Bannier, Corelight)
We did not properly initialize pointer values for accept and decline
hooks. In downstream code they then appeared to be set when they were in
fact not.
With this patch we initialize them with proper defaults.
1.11.0-dev.166 | 2024-05-14 11:54:16 +0200
* Docs: Add new section with guidelines and best practices. (Robin Sommer, Corelight)
This focuses on performance for now, but may be extended with other
areas alter.
Much of the content was contributed by Corelight Labs.
* Docs: Update Custom Extensions section. (Robin Sommer, Corelight)
The usage of `-P` wasn't up to date.
* Docs: Update feedback section. (Robin Sommer, Corelight)
1.11.0-dev.162 | 2024-05-13 14:42:08 +0200
* Update types.rst (Smoot)
1.11.0-dev.160 | 2024-05-13 11:45:04 +0200
* GH-1657: Update Spicy runtime driver to use new stream features for improved performance. (Robin Sommer, Corelight)
This does two things:
- When adding data to a stream, we now do that without copying
anything initially. For block input (e.g., UDP) that's always fine
because the parser will never suspend before it's fully done
parsing; hence we can safely delete it once the parser returns. For
stream input (e.g., TCP), we make the stream own its data later
if (and only if) the parser suspends.
- For block input (e.g., UDP) we now keep reusing the same stream for
subsequent blocks, instead of creating a new one each time. This
allows the stream to reuse an allocated chunk that it may have still
cached internally.
The result of this, plus the new chunk caching introduced earlier, is
that for a UDP flow, we never need to allocate more than one chunk,
and never need to copy any data; and for TCP it's the same as long as
parsing consumes all data before suspending (which should be a common
case), plus, when we allocate new storage we only copy data that didn't
get trimmed immediately anyways.
* Give stream a method to reset it into freshly initialized state. (Robin Sommer, Corelight)
This does not clear the internal chunk cache.
* Cache previously trimmed chunks inside stream for reuse. (Robin Sommer, Corelight)
A chain now retains one previously used but no longer needed chunk for
reuse, so that we can avoid constant cycles of creating/destructing
chunks (and their paylaod memory) in the common case of a parser
consuming full chunks without yielding. The caching is also geared
towards owning/non-owning semantics staying consistent across
subsequent append operations.
* Extend stream API to allow for chunks that don't own their data. (Robin Sommer, Corelight)
By default, we still copy data when creating chunks but we add a
parallel API that just stores pointers, assuming the data will stay
around as long as needed. If the stream owner cannot guarantee that,
they may at any point convert all not-owned data into owned data
through a corresponding `makeOwning()` stream method. To make that
method efficient even with long chains of chunks, we internally
maintain an invariant that only the last chunk of chain can be
non-owning: whenever we add a new chunk to a chain, we ensure that the
previous tail become owning at that point. In other words, we amortize
the work across all newly added chunks.
* Remove `std::functional` from `DeferredExpression`. (Robin Sommer)
* Revert "Remove support for deferred expressions." (Robin Sommer)
Turns out this is actually still being used by the Zeek
integration.
1.11.0-dev.151 | 2024-05-10 16:35:50 +0200
* Remove unused include headers. (Robin Sommer, Corelight)
* Remove all usage of `std::function` from toolchain. (Robin Sommer, Corelight)
* Remove usage of `std::function` from parser and sink runtime representations. (Robin Sommer, Corelight)
We switch to raw function pointers, which is easy enough.
* Remove use of `std::function` from `spicy::rt::Configuration`. (Robin Sommer, Corelight)
Callbacks are now classic function pointers.
* Remove unused functional header. (Robin Sommer, Corelight)
* Remove use of `std::function` from runtime vector class. (Robin Sommer, Corelight)
* Remove support for deferred expressions. (Robin Sommer, Corelight)
These weren't used anymore anywhere so we can remove the corresponding
code from toolchain and runtime.
1.11.0-dev.142 | 2024-05-06 15:14:40 +0200
* GH-1664: Fix `&convert` typing issue with bit ranges. (Robin Sommer, Corelight)
Turns out #1664 was only indirectly related to the `&convert` itself;
the real issue was that we couldn't assign one bitfield struct to
another if their field types didn't match exactly, even in cases where
at the C++ level there was no meaningful difference. In this case we
ended up with a field that had a C++ type `rt::Bool` in one type and
`bool` in another, leading to errors when assigning the latter to the
former. We now allow to creating instances of the former from the
latter through standard C++ type conversions on a per field basis.
* Suppress new `clang-tidy` warnings. (Robin Sommer, Corelight)
* Fix a Spicy scoping issue across imports. (Robin Sommer)
We could get a bogus "unknown ID" error for default arguments of
functions defined in an imported module if that default argument was
itself referring to an identifier inside yet another imported module.
The test case shows the exact situation that was broken.
1.11.0-dev.137 | 2024-04-25 13:38:42 +0200
* Remove Spicy parser support for unsupported `&priority` attribute. (Benjamin Bannier, Corelight)
* Make spelling of hook `priority` consistent across Spicy and HILTI. (Benjamin Bannier, Corelight)
We were naming the priority attribute differently in Spicy (`priority`)
and HILTI (`&priority`). While e.g., a Spicy `Hook` could correct
extract its priority, this still could have lead to potential issues if
we were attempting to access the priority of a Spicy hook from HILTI as
we do not perform any adjustment of this attribute when lowering to
HILTI. This distinction also made it hard to generate intended code from
the outside using our API (e.g., from Zeek) since one needed to be aware
at which level the attribute was injected (Spicy or HILTI).
With this patch we internally translate a Spicy `priority` attribute to
`&attribute` syntax.
1.11.0-dev.134 | 2024-04-25 13:36:59 +0200
* Fix incremental skipping. (Benjamin Bannier, Corelight)
We previously would incorrectly compute the amount of data to skip which
could have potentially lead to the parser consuming more data than
available. With this patch we correctly use the actually consumed amount
to compute what the trim from the input.
* GH-1724: Fix skipping in size-constrained units. (Benjamin Bannier, Corelight)
We previously could skip too much data if `skip` was used in a unit with
a global `&size`. This was due to the machinery moving the input forward
believing that `skip` units were that really field productions. While
this is true in a sense in this particular case it still lead to
incorrect behavior, in particular as far is input handling is concerned
`skip` productions largely behave like any other unit.
1.11.0-dev.131 | 2024-04-25 09:27:48 +0200
* Fix potential internal error in port string conversion. (Robin Sommer)
* GH-1284: GH-1693: Promote use of `-x` over `-c`, clean up `-P`. (Robin Sommer)
This includes these pieces:
- `-P` now requires a prefix argument that set's the C++ namespace, so
that generated prototypes match that of `-x`. Like with `-x`, the
prefix may be empty (`-P ""`) to get back to the old `hlt::`.
- For both `-P` and `-x` the prefix now must be a valid C++ identifier,
because we use it as such.
- `spicyc` usage message now refers to `-c` and `-l` as for debugging
use.
- Update documentation on host applications, switching to using `-x`
instead of `-{cl}`.
- Scanned for other use of `-c` and `-l` in the docs as well, the
remaining ones seem fine.
* Remove generated, in-code linker JSON meta data. (Robin Sommer)
This was originally to allow for compiling multiple Spicy modules
separately, with the meta data providing what's the necessary to add
any cross-module functionality. However, we've moved away from that
approach and now already require the compiler to always see all code,
so this is no longer needed/possible.
1.11.0-dev.125 | 2024-04-18 15:53:48 +0200
* GH-1501: Improve some error messages for runtime parse errors. (Robin Sommer, Corelight)
* GH-1586: Make skip productions behave like the production they are wrapping. (Robin Sommer, Corelight)
* GH-1719: Fix `new` passing a unit reference to an `inout` unit parameter. (Robin Sommer, Corelight)
One might debate whether this is something we should allow at all but
we do permit it elsewhere when passing unit parameters, and changing
that would probably break code, so for consistency this allows it for
`new` as well.
Internally, there was a more general inconsistency introduced by
automatic derefs of Spicy-level strong references. While we do need
that deref to happen for operator resolution, we now remove it after
generation of HILTI code so that HILTI-level resolution can then work
as expected on the value references introduced for units. I wouldn't
be surprised if the prior behavior was causing more trouble than just
#1719 and we just hadn't run into yet it.
* Mark automatic derefs inside the operator's AST node. (Robin Sommer, Corelight)
This allows to differentiate between explicit `deref` operations part
of the source code and implicit `deref` operations inserted by the
coercer. A subsequent commit will leverage this information, but I've
been meaning to do this anyways because it could be hard to track
where a particular `deref` was coming from.
* Unify code generation for `new`. (Robin Sommer, Corelight)
There's no functional change (afaict) but for consistency of
implementation and results, `new` should go through
`compileCallArguments()`.
* GH-1655: Reject joint usage of filters and look-ahead. (Robin Sommer, Corelight)
We cannot retrieve look-ahead information from units to which a filter
is connected, because parsing for outer and inner layers will be
operating on separate streams. While we could pass the lahead token
number upwards, we would also need the stream position for the end of
the lahead symbol, but we cannot tie an iterator on the filtered
stream back to a position on the original stream. So we now reject
this.
This shouldn't cause any actual backwards-incompatibility because this
wasn't working in the first place: it would reliably abort with an
invalid stream iterator exception during parsing.
* Add test confirming that `&parse-{at,from}` don't interfere with outer look-ahead parsing. (Robin Sommer, Corelight)
This makes sure fields with these attributes are ignored for
look-ahead computation of the unit containing them. We actually have a
related test for `&parse-from` already, but can't hurt to have it
covered further here.
1.11.0-dev.114 | 2024-04-16 09:38:38 +0200
* Bump baselines. (Benjamin Bannier, Corelight)
1.11.0-dev.112 | 2024-04-15 17:22:55 +0200
* Update NEWS with a list of incompatibilities compared to previous version. (Robin Sommer)
* Pretty print reference types in Spicy output. (Robin Sommer)
We now render `strong_ref<T>` as `T&`, as one would expect on
the Spicy side.
* Do not perform automatic deref on RHS of an assignment. (Robin Sommer)
This used to be accepted but had not the intended effect at runtime:
```
function f(s: string&) {
*s = new "xxx";
}
```
* Fix line numbers. (Robin Sommer)
Line numbers could be off in the presence of comments including '#'
characters.
* Fix auto-deref of LHS references. (Robin Sommer)
We were checking the constness of the reference, not of the wrapped
value.
* Improve error message. (Robin Sommer)
1.11.0-dev.105 | 2024-04-10 13:24:03 +0200
* Add missing file to git. (Robin Sommer, Corelight)
* Document generic operators. (Robin Sommer, Corelight)
* Refactor output logic in `spicy-doc-to-rst`. (Robin Sommer, Corelight)
* GH-1711: Fix forwarding of a reference unit parameter to a non-reference parameter. (Robin Sommer, Corelight)
* GH-1599: Fix integer increment/decrement operators require mutable arguments. (Robin Sommer, Corelight)
* Add tests checking sinks as unit parameters. (Robin Sommer, Corelight)
* GH-1710: Improve implementation of sink type. (Robin Sommer, Corelight)
So far we lowered Spicy's `sink` type into a `strong_ref` a HILTI
codegen time, meaning that a unit's `sink` field would have type
`sink` at the Spicy-level and then later `strong_ref<sink>` inside
HILTI (actually: `strong_ref<spicy_rt::Sink>`). This approach led some
inconsistencies because of the mismatch between the two levels, and it
also made the implementation more complex than necessary. We now let
`sink` fields simply have type `sink&` (i.e., `strong_ref<sink>`) in
Spicy; everything else then falls in place more easily. From a user
perspective, the change should remain largely invisible.
* Fix internal assertion potentially triggering erroneously during retrieval of parent node. (Robin Sommer, Corelight)
* GH-1618: Fix and clarify usage of references in Spicy. (Robin Sommer, Corelight)
References weren't fully consistent in their properties and
implementation, and they weren't documented either. This commit cleans
that up and adds documentation.
Changes:
- Constness 1: For a type `T&`, which internally is `strong_ref<T>`,
we now consistently make the inner `T` a mutable LHS type. That
seems most natural and useful from a user perspective. In
particular, this allows mutating objects passed into functions as
parameters without declaring them `inout`: (`foo (x: bytes&)`). This
is important because declaring the parameter `inout` makes the
reference itself mutable (not the contained object), which isn't
what one wants. Plus, the latter doesn't work for unit parameters
anyways. Accordingly, we also adapt our built-in operators to
use`T&` instead of `inout T&`.
- Constness 2: For a type `T&`, we now make the outer reference type
constant and non-mutable. This is mostly for consistency, it doesn't
really change anything as we didn't havemutating operations on
references anyways.
- Feature: We add `new BASIC_TYPE` as syntax for creating default
initialized values of basic types. So far we only had `new
NAMED_TYPE` and `new VALUE`.
Technical note to the reviewer: the changes to the `recreateAs*`
methods are necessary to avoid complex types being copied into the
newly created type, which would lead to name resolution problems. This
is tested through existing tests which fail otherwise
(`spicy.types.sink.filter-it.spicy`,
`spicy.rt.base64-filter-eod.spicy`). (Well, `recreateAsLHS()` is
tested that way, but the others would have the same issue.) This is a
good change anyways, because it keeps the AST smaller.
* GH-1515: Catch unsupported types for unit inout parameters. (Robin Sommer, Corelight)
We support only types that are (internally) passed around as
references.
* GH-1583: Disallow coercion when passing arguments to `inout` parameters. (Robin Sommer, Corelight)
1.11.0-dev.92 | 2024-04-10 09:40:32 +0200
* Update release version in documentation. (Robin Sommer, Corelight)
This hadn't been updated, leading to outdated links.
* Do not require all AST nodes to be destroyed before we begin compilation. (Robin Sommer)
If a 3rdparty (like Zeek) still retains a pointer to a `Node`, the
check would trigger, making usage awkward. Instead we now just
check at context destruction time that we no longer have any live
nodes.
1.11.0-dev.88 | 2024-04-09 09:33:03 +0200
* Bump 3rdparty/utf8proc from `1fe43f5` to `894e810` (dependabot[bot])
1.11.0-dev.86 | 2024-04-04 15:51:17 +0200
* Overhaul AST node memory management. (Robin Sommer)
We switch back to reference counting nodes, but through a custom
scheme that allows us to continue passing around raw pointers most
of the time.
* Remove unused class. (Robin Sommer)
* Reduce memory usage for operators with complex argument types. (Robin Sommer)
We now use external types for to store operands of name types, which
reduces the number of AST nodes created for operators substantially.
* Remove meta/location information from AST IDs. (Robin Sommer)
Turns out these aren't used anywhere.
* Use less memory for storing context IDs. (Robin Sommer)
32 bits should still be plenty. Because these are stored in many
nodes, the change is noticeable in overall memory consumption.
* Compute an ID's internal views on demand. (Robin Sommer)
We had previously optimized IDs by pre-computing some information for
quick access. However, for most IDs that information isn't actually
accessed at all, so we now compute it only on demand the first time
it's needed. This saves both CPU and memory.
* Move inherit-scope information into virtual method. (Robin Sommer)
This saves space, and is also more aligned with how other information is
managed as well.
* Change storage for AstContext. (Robin Sommer)
No need for `shared_ptr`, now using a `unique_ptr`.
* Do not auto-allocate storage for errors with each AST node. (Robin Sommer)
We now create the vector for error messages only when needed.
* Reuse `Meta` instances across nodes. (Robin Sommer)
We now store each `Meta` value only once globally. Saves about 15%
memory.
1.11.0-dev.75 | 2024-04-03 11:55:16 +0200
* Remove a few left-over unused variables. (Benjamin Bannier, Corelight)
* Fix a few instances where codegen was non-deterministic. (Benjamin Bannier, Corelight)
1.11.0-dev.72 | 2024-04-03 09:35:23 +0200
* Fix repeated evaluations of `&parse-at` expression. (Robin Sommer)
* GH-1316: GH-1635: Provide better error messages for some cases of unknown unit IDs. (Robin Sommer, Corelight)
* GH-1493: Support/fix public type aliases to units. (Robin Sommer, Corelight)
An alias like `public type Unit1 = Unit2` used to lead to C++-side
compiler errors, which this fixes. We also fully support this now by
making both `Unit1` and `Unit2` available for parsing to host
applications. Internally, the `Unit1` parser is just a small facade
pointing to the parsing functions for `Unit2`.
* GH-1661: Deprecate usage of `&convert` with `&chunked`. (Robin Sommer, Corelight)
Per discussion in #1661, this combination can lead to confusion and
can be worked around if really needed.
1.11.0-dev.64 | 2024-04-02 17:44:13 +0200
* Fix GCC false positive around `strncpy` use. (Benjamin Bannier, Corelight)
1.11.0-dev.62 | 2024-03-25 10:39:42 +0100
* Reimplement `IDBase` for better performance. (Robin Sommer)
Our codegen phase had quite some overhead due to repeated ID
operations recomputing the same information over and over again (e.g.,
subpaths and namespaces). This reimplements the class to compute
everything once upfront, cutting codegen time into half.
* Add more tests for `IDBase`. (Robin Sommer)
This includes a tiny semantic change for `IDBase::length()`: it now
returns zero for an empty ID, which seems more consistent. (This
doesn't seem to have an impact anywhere, all tests pass.)
1.11.0-dev.59 | 2024-03-21 12:35:06 +0100
* Rework order of C++ codegen. (Robin Sommer)
We used to potentially codegen modules multiple times. Now we cache a
module's generated C++ code inside the AST node the first time we
create it. From there, we can then easily reuse it later, in
particular when needing to import its declarations into another
module. Internally, we switch storage for `cxx::Unit` to shared
pointers, and tweak the debug logging a bit for better readability in
this new model.
* Cleanup: Remove flag to compile implementation from higher-level codegen method. (Robin Sommer)
Pushing down the condition that we used to pass in, to lower-level
code.
* Fix duplicates in module dependency tracking. (Robin Sommer)
We now use a set instead of a vector to unique dependencies
automatically.
* Fix file name case for fuzzer builds. (Benjamin Bannier, Corelight)
1.11.0-dev.54 | 2024-03-18 09:56:58 +0100
* Refresh CI platforms. (Benjamin Bannier, Corelight)
1.11.0-dev.50 | 2024-03-16 08:55:11 +0100
* Rework memory management for AST nodes. (Robin Sommer, Corelight)
We switch memory management of AST nodes to an arena/bump allocator
that releases them as a whole once the AST gets destroyed, not
individually/continiously through their own life-time scoping. This
then allows us to also switch them from `shared_ptr` to raw pointers
throughout the AST code. The result is a compiler speed up of about
20% for some complex analyzer.
* Centralize the `vector` type we use for storing AST nodes. (Robin Sommer, Corelight)
No functional change, this just unifies the various vectors of
`Node`-derived classes so that there's a central place where to define
the underlying vector type. For now this is just for easier
maintenance. In the future we could experiment with different vector
or allocator implementations.
* Make node tags `constexpr`. (Robin Sommer, Corelight)
1.11.0-dev.45 | 2024-03-15 15:11:26 +0100
* Fix broken f-string. (Benjamin Bannier, Corelight)
* Remove Moneterey Homebrew CI tasks. (Benjamin Bannier, Corelight)
* Add Cirrus tasks running non-Homebrew macos builds and tests. (Benjamin Bannier, Corelight)
1.11.0-dev.41 | 2024-03-15 12:47:42 +0100
* Fix fuzzer builds for reworked AST. (Benjamin Bannier, Corelight)
1.11.0-dev.39 | 2024-03-13 09:07:42 +0100
* Bump softprops/action-gh-release from 1 to 2 (dependabot[bot])
* Bump typos pre-commit hook. (Benjamin Bannier, Corelight)
* Bump clang-format. (Benjamin Bannier, Corelight)
* Modernize Python scripts with `pyupgrade`. (Benjamin Bannier, Corelight)
* Reformat Python with ruff-format. (Benjamin Bannier, Corelight)
* Fix generation of doc example code. (Benjamin Bannier, Corelight)
* Fix Python lints diagnosed by ruff. (Benjamin Bannier, Corelight)
1.11.0-dev.30 | 2024-03-12 12:02:43 +0100
* Get rid of more dynamic casts. (Robin Sommer, Corelight)
The main use of dynamic casts that's left now (other than for
debugging code) is inside the grammar's production hierarchy. That
logic isn't standing out during profiling, so seems fine to leave for
now.
* Introduce custom RTTI system for casting safely between node
types. (Robin Sommer, Corelight)
Our custom system is faster than C++'s `dynamic_cast<>`, resulting in
a noticeable speed-up for larger parsers.
We assign a unique integer tag to each `Node`-derived class. Each
class' constructors pass an array through to the top-level `Node`
class that contains a series of these integers describing the
inheritance path from the derived class back up to `Node`. The `Node`
class stores this path for fast type checks. In addition, each
`Node`-derived class `T` gets a couple constants as members: (1) a
copy of its own tag (`T::NodeTag`), and (2) a level indicating its
distance from `Node` in the inheritance tree (`T::NodeLevel`). The
level is used as index into an instance's tag array when performing
type checks. The result is that we can do `isA<T>` operation with just
a single comparison between an array element and a constant value.
This system is optimized for our very simple Node hierarchy: not very
deep (max. 4 entries in the tag array), single-inheritance only,
derived class are always the same distance from `Node`, and
all `Node`-derived classes are known upfront.
1.11.0-dev.27 | 2024-03-06 17:06:09 +0100
* Speed up `util::toIdentifier`. (Robin Sommer)
Turns out this is called a lot. This changes baselines due to slight
differences in generated IDs (changes are good because they better avoid
potential conflicts).
This also removes the option to ensure non-keyword IDs: I don't
think we need that because we check for that when we generate C++
code.
* GH-1675: Extend runtime profiling to measure parser input volume. (Robin Sommer, Corelight)
With `--enable-profiling` the output for Spicy units/fields now
includes a new `volume` column, like this:
```
#name count time avg-% total-% volume
[...]
spicy/unit/test::A 1 285500 43.96 43.96 8
spicy/unit/test::A/__gap__ 4 3167 0.12 0.49 0
spicy/unit/test::A/__synchronize__ 1 35500 5.47 5.47 4
spicy/unit/test::A::a 1 74833 11.52 11.52 -
spicy/unit/test::A::b 1 15333 2.36 2.36 1
spicy/unit/test::A::c 1 19125 2.94 2.94 1
spicy/unit/test::A::d 1 7583 1.17 1.17 1
spicy/unit/test::A::e 1 8042 1.24 1.24 1
```
Three different things here:
- The `volume` column for `spicy/unit/TYPE` and
`spicy/unit/TYPE::FIELD` augments the already existing timing
measurement and reports the total, aggregate number of bytes that
this unit/field got to parse over the course of the processing.
- For units going into synchronization mode, there are now additional
rows `spicy/unit/TYPE/__synchronize__` that report both CPU time and
volume spent in synchronization while processing that unit.
- For units encountering input gaps during synchronization, there are
now additional rows 'spicy/unit/TYPE/__gap__` that report total
aggregate gap size encountered while processing the unit.
All the volume measurements are taken as differences of two offsets
inside the input stream. For normal unit/field parsing, we subtract
the final offset after parsing an instance from the initial offset
where its parsing started.[1] For synchronization, it's the offset
where synchronization stopped successfully minus where it started.[2]
For gaps, it's the offset where we continued after the gap minus where
the gap started.[3] All these differences are then added up for each
row over the course of total input stream processing.
Note that volume isn't counted if parsing for some reason never
reaches the point where the end measurement would be taken (e.g., a
parser error prevents it from being reached; in the output above
that's the case for `spicy/unit/test::A::a`).
Closes #1675.
[1] This *includes any ranges that the unit spent in synchronization
mode trying to recover from parse errors.
[2] This does *not* include any gaps encountered because they don't
affect stream offsets.
[3] Little glitch: these values can currently by off by one due to some
internal ambiguity.
* Move inferring of a unit's context type into the resolver. (Robin Sommer)
No functional change here, just cleanup: this is where the logic
belongs.
* Infer constness of `<unit>.context()` from that of `<unit>`. (Robin Sommer)
Closes https://github.com/corelight/zeek-spicy-openvpn/issues/11.
* Ensure constness of result for const map/vector index operator. (Robin Sommer)
Not sure this actually changes anything, but it ensures they are correct.
* Simplify constness check for struct field assignments. (Robin Sommer)
Same treatment as for unit fields.
* Bump 3rdparty/filesystem from `2fc4b46` to `42ea4fc` (dependabot[bot])
1.11.0-dev.15 | 2024-03-05 10:43:27 +0100
* Bump 3rdparty/filesystem from `2fc4b46` to `42ea4fc` (dependabot[bot])
1.11.0-dev.11 | 2024-03-04 10:07:25 +0100
* New AST architecture. (Robin Sommer)
This is a large revamp of compiler internals, cleaning up and speeding
up lots of the AST pipeline. From a user perspective, nothing changes,
except that the new compiler is a tiny bit more strict: turns out that
in rare cases the old compiler ended up accepting some ill-formed
Spicy code that is now (rightfully) rejected. Specifically, two
instances of this are known where existing Spicy code may need
tweaking now:
- Identifiers from the (internal) `hilti::` namespace are no longer
accessible. Usually you can just scope them with `spicy::` instead,
and it'll work.
- The old compiler didn't always enforce constness as it should have.
In particular, function parameters could end up being mutable even
when they weren't declared as `inout`. Now `inout` is required for
supporting any mutable operations on a parameter, so make sure to
add it where needed.
1.11.0-dev.9 | 2024-03-01 18:10:49 +0100
* Simplify `while` loops over constant conditions. (Benjamin Bannier, Corelight)
While loops with constant conditions are unlikely to occur in code
generated by us, but could appear in user code. Cleaning them up is
simple and cheap, so this patch implements a pass which simplifies them.
1.11.0-dev.7 | 2024-02-27 17:44:22 +0100
* GH-1624: Enable optimizations when running `spicy-build`. (Benjamin Bannier, Corelight)
We previously would emit single C++ source files from `spicy-build`.
This made it impossible to enable optimizations since we could not know
whether individual code was used. Since optional features are by default
enabled and only disabled through an optimizer pass this meant that
`spicy-build` emitted code which performed a lot of work not done
usually.
This patch reworks `spicy-build` to use `spicyc`'s `-x` instead which
has a global view and can run optimizations, and emits all source files
into a prefix.
Closes #1624.
Closes #1625.
Closes #1622.
* Fix some shellcheck warnings in `spicy-build`. (Benjamin Bannier, Corelight)
* Fix handling of `spicyc -x` when passed a filename. (Benjamin Bannier, Corelight)
For the `-x` flag users can pass either the name of a directory or a
file. If passed a filename the file name is incorporated into the
identifier generated namespace identifier of e.g., the form
`__hlt_FILENAME` in the emitted C++ files. Previously when passing a
directory we would still emit a `_` separator, but since there was no
filename nothing after it so we generated namespace identifiers like
`__hlt_`.
With this patch we now only emit the separator if passed a filename.
* Do not optimize out public functions. (Benjamin Bannier, Corelight)
Optimizing out public functions breaks the assumption we have elsewhere
that something marked `public` is part of the public API and would not
get optimized out. This is e.g., the behavior we have for units.
1.11.0-dev.2 | 2024-02-26 12:16:17 +0100
* Fix stray Python escape sequence. (Benjamin Bannier, Corelight)
1.10.0-dev.149 | 2024-02-22 09:48:01 +0100
* GH-1585: Put closing of unit sinks behind feature guard. (Benjamin Bannier, Corelight)
This code gets emitted, regardless of whether a sink was actually
connected or not. Put it behind a feature guard so it does not enable
the feature on its own.
Closes #1585.
1.10.0-dev.147 | 2024-02-21 10:30:22 +0100
* GH-1667: Always advance input before attempting resynchronization. (Benjamin Bannier, Corelight)
When we enter resynchronization after hitting a parse error we
previously would have left the input alone, even though we know it fails
to parse. We then relied fully on resynchronization to advance the
input.
While this just pushed work downstream when synchronizing on literals,
it could cause us loosing input if synchronizing on regular expressions
if we happened to fail parsing due to a gap which is now at the front of
the input (parse errors from gaps are the most likely resynchronization
scenario when parsing genuine traffic); in this case the regular
expression would synchronize at the second byte after the input and we
would synchronize only at a later position.
With this patch we always forcibly advance the input to the next non-gap
position. This has no effect for synchronization on literals, but allows
it to happen earlier for regular expressions.
Closes #1667.
* Refactor test `spicy.types.unit.synchronize-on-gap`. (Benjamin Bannier, Corelight)
This refactoring cleans up how we feed gaps into the parser to testing
with more inputs simpler.
1.10.0-dev.144 | 2024-02-14 15:55:35 +0100
* GH-1652: Fix filters consuming too much data. (Benjamin Bannier, Corelight)
We would previously assume that a filter would consume all available
data. This only holds if the filter is attached to a top-level unit, but
in general not if some sub-unit uses a filter. With this patch we
explicitly compute how much data is consumed.
Closes #1652.
1.10.0-dev.142 | 2024-02-08 17:00:53 +0100
* GH-1668: Fix incorrect data consumption for `&max-size`. (Benjamin Bannier, Corelight)
We would previously handle `&size` and `&max-size` almost identical
with the only difference that `&max-size` sets up a slightly larger view
to accommodate a sentinel. In particular, we also used identical code to
set up the position where parsing should resume after such a field.
This was incorrect as it is in general impossible to tell where parsing
continues after a field with `&max-size` since it does not signify a
fixed view like `&size`. In this patch we now compute the next position
for a `&max-size` field by inspecting the limited view to detect how
much data was extracted.
Closes #1668.
1.10.0-dev.140 | 2024-02-08 13:22:52 +0100
* GH-1522: Drop overzealous validator. (Benjamin Bannier, Corelight)
This validator was intended to reject incorrect parsing of vectors but instead
ending up rejecting all vector parsing if the vector elements itself produced
vectors. Since this code has no test and it seems to have no clear purpose this
patch drops this validation.
Closes #1522.
1.10.0-dev.138 | 2024-02-08 13:20:31 +0100