Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input manager on large files does not suppress a large quantity of errors #3692

Open
FishyFluffer opened this issue Apr 18, 2024 · 1 comment
Labels
Area: Input Implementation: Core Implementation requires modification of the Zeek core Type: Bug 🐛 Unexpected behavior or output.

Comments

@FishyFluffer
Copy link

FishyFluffer commented Apr 18, 2024

When a malformed large input file is loaded into Zeek, I'd expect the framework to limit the errors encountered so to not flood output, based on the info found under: https://docs.zeek.org/en/master/frameworks/input.html#broken-input-data

This doesn't seem to be the case, as the reporter.log file fills up with every error encountered while loading. On a table file containing almost a million entries, most of them faulty, an error is reported for each case resulting in a massive flood.

A simple reproduction can be shown, with an input file generated with:

(echo '#fields\tindicator\tindicator_type\tmeta.source'; yes hello|head -n 1000) > intel.dat

and a zeek script loading it:

cat intel.zeek redef Intel::read_files += { fmt("%s/intel.dat", @DIR) };

Running this script results in a flood of output:

warning: /private/tmp/./intel.dat/Input::READER_ASCII: /private/tmp/./intel.dat, line 2: Not enough fields in line 'hello' of /private/tmp/./intel.dat. Found 0 fields, want positions 1 and -1
warning: /private/tmp/./intel.dat/Input::READER_ASCII: /private/tmp/./intel.dat, line 3: Not enough fields in line 'hello' of /private/tmp/./intel.dat. Found 0 fields, want positions 1 and -1
warning: /private/tmp/./intel.dat/Input::READER_ASCII: /private/tmp/./intel.dat, line 4: Not enough fields in line 'hello' of /private/tmp/./intel.dat. Found 0 fields, want positions 1 and -1
warning: /private/tmp/./intel.dat/Input::READER_ASCII: /private/tmp/./intel.dat, line 5: Not enough fields in line 'hello' of /private/tmp/./intel.dat. Found 0 fields, want positions 1 and -1
warning: /private/tmp/./intel.dat/Input::READER_ASCII: /private/tmp/./intel.dat, line 6: Not enough fields in line 'hello' of /private/tmp/./intel.dat. Found 0 fields, want positions 1 and -1
warning: /private/tmp/./intel.dat/Input::READER_ASCII: /private/tmp/./intel.dat, line 7: Not enough fields in line 'hello' of /private/tmp/./intel.dat. Found 0 fields, want positions 1 and -1

...

I've attached the two sample intel files here, along with an additional table.zeek which calls the input framework directly to load a table on the intel.dat file. Both zeek scripts result in a reporter.log containing every error encountered.
testFiles.zip

@ckreibich ckreibich added Type: Bug 🐛 Unexpected behavior or output. Area: Input Implementation: Core Implementation requires modification of the Zeek core labels Apr 18, 2024
@ckreibich
Copy link
Member

Thank you! It looks like there are at least two parts to this:

  • While reader backends have suppression logic, we may not be requesting suppression in all the right places,
  • The input manager also triggers warnings (like here), and there's no explicit suppresion logic — but there is error handling, so we should check whether that is supposed to avoid repeat messages but somehow doesn't work right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Input Implementation: Core Implementation requires modification of the Zeek core Type: Bug 🐛 Unexpected behavior or output.
Projects
None yet
Development

No branches or pull requests

2 participants