Input manager on large files does not suppress a large quantity of errors #3692

FishyFluffer · 2024-04-18T18:46:31Z

When a malformed large input file is loaded into Zeek, I'd expect the framework to limit the errors encountered so to not flood output, based on the info found under: https://docs.zeek.org/en/master/frameworks/input.html#broken-input-data

This doesn't seem to be the case, as the reporter.log file fills up with every error encountered while loading. On a table file containing almost a million entries, most of them faulty, an error is reported for each case resulting in a massive flood.

A simple reproduction can be shown, with an input file generated with:

(echo '#fields\tindicator\tindicator_type\tmeta.source'; yes hello|head -n 1000) > intel.dat

and a zeek script loading it:

cat intel.zeek redef Intel::read_files += { fmt("%s/intel.dat", @DIR) };

Running this script results in a flood of output:

warning: /private/tmp/./intel.dat/Input::READER_ASCII: /private/tmp/./intel.dat, line 2: Not enough fields in line 'hello' of /private/tmp/./intel.dat. Found 0 fields, want positions 1 and -1
warning: /private/tmp/./intel.dat/Input::READER_ASCII: /private/tmp/./intel.dat, line 3: Not enough fields in line 'hello' of /private/tmp/./intel.dat. Found 0 fields, want positions 1 and -1
warning: /private/tmp/./intel.dat/Input::READER_ASCII: /private/tmp/./intel.dat, line 4: Not enough fields in line 'hello' of /private/tmp/./intel.dat. Found 0 fields, want positions 1 and -1
warning: /private/tmp/./intel.dat/Input::READER_ASCII: /private/tmp/./intel.dat, line 5: Not enough fields in line 'hello' of /private/tmp/./intel.dat. Found 0 fields, want positions 1 and -1
warning: /private/tmp/./intel.dat/Input::READER_ASCII: /private/tmp/./intel.dat, line 6: Not enough fields in line 'hello' of /private/tmp/./intel.dat. Found 0 fields, want positions 1 and -1
warning: /private/tmp/./intel.dat/Input::READER_ASCII: /private/tmp/./intel.dat, line 7: Not enough fields in line 'hello' of /private/tmp/./intel.dat. Found 0 fields, want positions 1 and -1

...

I've attached the two sample intel files here, along with an additional table.zeek which calls the input framework directly to load a table on the intel.dat file. Both zeek scripts result in a reporter.log containing every error encountered.
testFiles.zip

The text was updated successfully, but these errors were encountered:

ckreibich · 2024-04-18T19:06:31Z

Thank you! It looks like there are at least two parts to this:

While reader backends have suppression logic, we may not be requesting suppression in all the right places,
The input manager also triggers warnings (like here), and there's no explicit suppresion logic — but there is error handling, so we should check whether that is supposed to avoid repeat messages but somehow doesn't work right.

ckreibich added Type: Bug 🐛 Unexpected behavior or output. Area: Input Implementation: Core Implementation requires modification of the Zeek core labels Apr 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Input manager on large files does not suppress a large quantity of errors #3692

Input manager on large files does not suppress a large quantity of errors #3692

FishyFluffer commented Apr 18, 2024 •

edited by ckreibich

ckreibich commented Apr 18, 2024

Input manager on large files does not suppress a large quantity of errors #3692

Input manager on large files does not suppress a large quantity of errors #3692

Comments

FishyFluffer commented Apr 18, 2024 • edited by ckreibich

ckreibich commented Apr 18, 2024

FishyFluffer commented Apr 18, 2024 •

edited by ckreibich