-
Notifications
You must be signed in to change notification settings - Fork 107
[NAT] Pivot to being a test runner #171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| # Example gate definition | ||
| - id: alphanet | ||
| description: "Alphanet validation gate" | ||
| inherits: ["localnet"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the inherits config have to be defined in the same yaml file at this point in time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question. Yes. For now, we only support a single 'validation config' file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this to an idea backlog.
| ```yaml | ||
| # Run test 'TestInteropSystem' in package 'github.com/ethereum-optimism/optimism/kurtosis-devnet/tests/interop' | ||
| - name: TestInteropSystem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about a config example for running a single test file within a package? Is that possible at this time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File-based isn't possible at this time. It's also unclear to me whether we want it.
For now we can express:
- Single test
- Entire package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think down the line, expressing an specific file would be a nice to have. Lets add it to the nat backlog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be slightly tricky. I think we'd need to analyze the file to extract the list of individual test cases (as a test file is not always self-contained, so we might need to load the entire module anyway).
Probably easier to start with specifying individual tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it's non-trivial, otherwise I'd have just added it.
I believe go test doesn't facilitate "run this file". IDE plugins, for example, handle this by inspecting the target file and extracting all the test names within. Then passing that list to go test. Something like go test -run ^(TestA|TestB|TestC) x/y/package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this to an idea backlog.
op-nat/config.go
Outdated
| TestDir string | ||
| ValidatorConfig string | ||
| TargetGate string | ||
| Wallets []*wallet.Wallet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are wallets still necessary in the new version? Shouldn't these be define in the test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. I'll try remove these as part of this PR/change. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Removed in 29434e5.
op-nat/config.go
Outdated
| if validatorConfig == "" { | ||
| return nil, errors.New("validator config path is required") | ||
| } | ||
| if gate == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does gate need to be required? What if the whole config file should be ran?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good question. I figured it'd be better if we have to specify a gate.
Let me know if you want to chat about it though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure feel free to leave a gate for now, however it would be nice to make nat comptabile with just running a single test. Thinking about a case where nat is running on a network and debugging a single failure, It would be conveint for the developer to run a single test without the overhead of having to define a whole gate.
Can be backlogged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered that but decided the overhead is trivial. Do you disagree?
They'd add this to fix/debug a single test (say, 'TestFindRPCEndpoints'):
gates:
- id: debug
tests:
- name: TestFindRPCEndpoints
package: github.com/ethereum-optimism/optimism/kurtosis-devnet/pkg/kurtosis/api/run
Then run NAT/test-runner with -gate debug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good with me
op-nat/config.go
Outdated
| // Network config | ||
| SC SuperchainManifest | ||
| Validators []Validator | ||
|
|
||
| Wallets []*wallet.Wallet | ||
|
|
||
| // Networks | ||
| SC SuperchainManifest | ||
| L1 *network.Network | ||
| L2A *network.Network |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also believe. network objects will be defined in the tests so these can be removed as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely. Removed in d808c28.
Thanks!
op-nat/config.go
Outdated
| if validatorConfig == "" { | ||
| return nil, errors.New("validator config path is required") | ||
| } | ||
| if gate == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure feel free to leave a gate for now, however it would be nice to make nat comptabile with just running a single test. Thinking about a case where nat is running on a network and debugging a single failure, It would be conveint for the developer to run a single test without the overhead of having to define a whole gate.
Can be backlogged
| devnet_system: | ||
| description: "System tests" | ||
| tests: | ||
| - name: TestWallet | ||
| package: github.com/ethereum-optimism/optimism/devnet-sdk/system | ||
| - name: TestChainUser | ||
| package: github.com/ethereum-optimism/optimism/devnet-sdk/system |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, are suites able to be defined outside of gate?
Ex. to compose gates of multiple suites, without needing to inherit an sub gate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not as-is. Do you think we'll need this?
My reasoning for not needing this is that if we don't "force" suites to be part of gates then they could become stale/unused over time.
eg.:
- suite1: [test1]
- suite2: [test2]
- suite3: [test3]
- gate1: [suite2, suite3]
In this example suite1 and test1 are unused (and thus useless?) (and we may not notice among all the config)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I get that, I feel as though for composability purposes it may be worth it.
- suite1: [test1]
- suite2: [test2]
- suite3: [test3]
- gate1: [suite2, suite3]
- gate2: [suite1, suite3]
It would remove suite3 from being defined multiple times. Especially as the number of gates and test grow.
Thinking similar like ansible with task, plays and playbooks.
Curios what @sigma thinks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! I slightly misunderstood you earlier.
Again, this isn't possible as-is. But we could make it so.
I believe this could be a great improvement, iff., we don't end up using gates such that each is a superset of the previous.
But I'm not sure how the gates will be used right now. My guess right now is that they'll look like this:
alphanet: [stuff]
betanet: [alphanet, more-stuff-only-for-betanet]
mainnet: [betanet, more-stuff-only-for-mainnet]
In other words, each "stricter" gate MUST include all previous gates' tests/criteria.
It's also worth noting that defining suites/tests outside of gates, for more modularity, would almost definitely be better in the general case of a test-runner.
So this really should be thought about more. The only question is, do we need to think about it in this PR/change? I would argue not.
| }) | ||
|
|
||
| acceptancesTotal = promauto.NewCounterVec(prometheus.CounterOpts{ | ||
| acceptanceResults = promauto.NewGaugeVec(prometheus.GaugeOpts{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all of these metrics whats the rational of using Gauges instead of counters? Unless its expected for the metric to have a decrement. Typically counters are used.
I'm refereing back to the convo regarding modeling metrics after an http server. Most of those are counters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I keep going back and forth on this myself. Can always reassess when we're actually consuming them.
My thinking is essentially:
- 'totals' should be Counters (number of tests only increases)
- 'results' perhaps should be Gauges (result of each test goes "up & down" between 0,1 [fail, pass]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created #194 to address this properly.
Thanks!
op-nat/metrics/metrics.go
Outdated
| acceptanceTestFailed = promauto.NewGaugeVec(prometheus.GaugeOpts{ | ||
| Namespace: MetricsNamespace, | ||
| Name: "acceptance_test_failed", | ||
| Help: "Number of failed acceptance tests", | ||
| }, []string{ | ||
| "network_name", | ||
| "run_id", | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to missing an acceptance test skipped in this file as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would also be nice to emit metrics with labels suchs as
gate=<>, suite=<>, test=<>
To all for grouping tests together along with the run_id
Can be backlogged if you like
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On skipped validator metrics, created: Issue #195 created
On more labels/dimensions, created: Issue #196 created
| acceptanceResults.WithLabelValues(network, runID, result).Set(1) | ||
| acceptanceTestTotal.WithLabelValues(network, runID).Set(float64(total)) | ||
| acceptanceTestPassed.WithLabelValues(network, runID).Set(float64(passed)) | ||
| acceptanceTestFailed.WithLabelValues(network, runID).Set(float64(failed)) | ||
| acceptanceTestDuration.WithLabelValues(network, runID).Set(duration.Seconds()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, to allow for finer metrics. |
I think total failures and total acceptances should come from aggregating individual test results in the grafana backend, rather than aggegrating them in the service
The query in grafana could look like
sum by (runID) acceptance_test_result{result="passed"}
sum by (runID) acceptance_test_result{result="failed"}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the metrics need a little tweaking, but I think this can be backlogged, once its pivoted fully to test runner re-work these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's backlog and reconsider how we approach it.
I was treating it like http_requests_total in the prometheus docs here: https://prometheus.io/docs/practices/naming/#labels
Created the following issue to address it: #197
| ctx context.Context | ||
| config *Config | ||
| version string | ||
| registry *registry.Registry | ||
| runner runner.TestRunner | ||
| result *runner.RunnerResult |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the logger be apart of the nat object instead of the config object?
| // TODO: This shouldn't be here; needs a refactor | ||
| // TODO: don't hardcode the network name | ||
| metrics.RecordAcceptance("todo", runID, overallResult.String()) | ||
| metrics.RecordAcceptance( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metrics should be emitted as each test case completes instead of when the suite ends to allow for viewing test progression, This may already be the case, I will double check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I think that's what we have. But there's a different metric (and helper function) for each.
RecordValidation is used for tests and RecordAcceptance for overall run pass/fail.
| r.mu.Lock() | ||
| defer r.mu.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious what the purpose of the mutex is?
Will keep reviewing... maybe it becomes apparent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not strictly needed; this is possibly premature optimisation.
It guards the registry.validators slice in case of concurrent access, for example multiple calls to loadValidators. However, as-is, I don't think that'll happen in practise.
|
|
||
| // Check for circular inheritance before resolving | ||
| for _, gate := range config.Gates { | ||
| if err := r.checkCircularInheritance(gate.ID, gate.Inherits, gateMap, make(map[string]bool)); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice forward thinking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. And if we add the 'modular suite definitions' idea you mentioned we should probably add a similar check for unused & duplicate validator definitions.
| }, | ||
| } | ||
| ### Adding a new test/suite/gate | ||
| All tests, suites, and gates are defined in a `validators.yaml` file. The filename is not important. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a (current) requirement for the location of these files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is not. Only that there's just one of them.
| ```yaml | ||
| # Run test 'TestInteropSystem' in package 'github.com/ethereum-optimism/optimism/kurtosis-devnet/tests/interop' | ||
| - name: TestInteropSystem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be slightly tricky. I think we'd need to analyze the file to extract the list of individual test cases (as a test file is not always self-contained, so we might need to load the entire module anyway).
Probably easier to start with specifying individual tests
op-nat/config.go
Outdated
| log.Warn("error creating wallet: %w", err) | ||
| } | ||
| wallets = append(wallets, w) | ||
| // Parse kurtosis-devnet manifest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it's meant to not be kurtosis-specific :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely. I've since removed this actually. The assumption now is that the tests must use devnet-sdk and it will handle the manifest-parsing.
| # test: (go_test "./...") | ||
| test: | ||
| go build ./... && go test -v ./... | ||
| CGO_ENABLED=0 go build ./... && go test -count=1 -v ./... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious: why disable the test cache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Paranoia :)
I do plan to remove this.
op-nat/nat.go
Outdated
| overallErr = errors.Join(overallErr) | ||
| if res.Result == ResultFailed { | ||
| overallResult = ResultFailed | ||
| t.SetTitle(fmt.Sprintf("NAT Results (%s)", formatDuration(n.result.Duration))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to separate the raw data from its formatting (I can guarantee we'll want a json output in 3... 2.. 1.. :))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it already though? This function, printResultsTable, simply iterates over nat.result (the data). We could write a similar function printJSON by iterating over the same results data.
Or am I missing something?
op-nat/runner/runner.go
Outdated
| ) | ||
|
|
||
| // TestResult captures the outcome of a single test run | ||
| type TestResult struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we have some support for test artifacts here? logs for example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I was thinking of tackling this in a separate PR. May be non-trivial to parse it nicely.
#198 created to capture the work; please add any more ideas/context to that ticket.
Sound OK?
op-nat/runner/runner.go
Outdated
| // TestResult captures the outcome of a single test run | ||
| type TestResult struct { | ||
| Metadata types.ValidatorMetadata | ||
| Status string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit worried about leaving the status free-form. Maybe some enum would be appropriate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Updated this to an enum (and moved it to the types package)
op-nat/runner/runner.go
Outdated
| type TestResult struct { | ||
| Metadata types.ValidatorMetadata | ||
| Status string | ||
| Error string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd use the error interface here. At least that'd leave some room for detecting error types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed; updated.
| gateResult.Suites[suiteName] = suiteResult | ||
|
|
||
| // Run all tests in the suite | ||
| for _, validator := range suiteTests { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not urgent at all, but it'd be nice to have some sort of job executor here, to abstract away the loop and make some room for parallel treatment, limitation of the number of workers, and so on.
op-nat/runner/runner.go
Outdated
|
|
||
| // listTestsInPackage returns all test names in a package | ||
| func (r *runner) listTestsInPackage(pkg string) ([]string, error) { | ||
| listCmd := exec.Command("go", "test", pkg, "-list", "^Test") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably want to make the location of the "go" binary configurable from main. Someone somewhere (probably a fellow nix user) won't have it in the PATH :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done; made it configurable (via a flag or envvar).
jelias2
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall the main functionality for test-running logic is there imo. LG(reat)TM
op-nat/runner/runner.go
Outdated
| func (r *runner) runIndividualTest(pkg, testName string) (bool, string) { | ||
| r.log.Debug("Running individual test", "testName", testName) | ||
|
|
||
| ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5 Second timeout seems a bit short, If an l2 has 2s block times, thats only 2 blocks.
It would be cool if in the validator.yaml the users could configure a timeout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to 5 minutes.
A per-test timeout would indeed be good. I'll add that to the backlog. Parameterisation in general needs to be reconsidered with this pivot.
| } | ||
|
|
||
| // buildTestArgs constructs the command line arguments for running a test | ||
| func (r *runner) buildTestArgs(metadata types.ValidatorMetadata) []string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the idea was to use go test -json?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was/is. I forgot while implementing it. :/
I've added a backlog item to revisit this if we don't mind.
29434e5 to
b670212
Compare
497c9c0 to
76a09e0
Compare
76a09e0 to
7e0d0ed
Compare
* Renamed to op-acceptor * Focused on being a test runner
Description
This PR implements the pivot for NAT which makes it more focused on being a test runner.
Key features:
Video summary:
https://www.loom.com/share/e64a3236ea874fbdafcac67d8233249b?sid=249b72ef-d9eb-4ad1-9e9f-60cb4d82cc0c
CLI interface:
Example Validators YAML:
Metadata
Resolves #172