Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange hanging provider verification #17

Open
mefellows opened this issue Dec 22, 2022 · 1 comment
Open

Strange hanging provider verification #17

mefellows opened this issue Dec 22, 2022 · 1 comment

Comments

@mefellows
Copy link
Member

mefellows commented Dec 22, 2022

I've been running into a strange verification issue, where my build just hangs. Locally, it seems to reliably run just fine (M1 Mac and also via intel emulation in Docker).

The setup:

2022/12/22 10:07:12 lumberjack logging initialised
2022/12/22 10:07:12 [DEBUG] initialised logging
2022/12/22 10:07:12 [INFO] server on port 43957
2022/12/22 10:07:12 [INFO] InitPlugin request: plugin-driver-rust 0.2.2
2022/12/22 10:07:12 [INFO] UpdateCatalogue request: [type:TRANSPORT  key:"matt" key:"matt"  values:{key:"content-types"  value:"text/matt;application/matt"}]

2022/12/22 10:08:00 lumberjack logging initialised
2022/12/22 10:08:00 [DEBUG] initialised logging
2022/12/22 10:08:00 [INFO] server on port 43251
2022/12/22 10:08:00 [INFO] InitPlugin request: plugin-driver-rust 0.2.2

2022/12/22 10:08:00 lumberjack logging initialised
2022/12/22 10:08:00 [DEBUG] initialised logging
2022/12/22 10:08:00 [INFO] server on port 38435
2022/12/22 10:08:00 [INFO] InitPlugin request: plugin-driver-rust 0.2.2

2022/12/22 10:08:00 lumberjack logging initialised
2022/12/22 10:08:00 [DEBUG] initialised logging
2022/12/22 10:08:00 [INFO] server on port 34817

The test times out after 30s (default Golang test timeout). If you notice, the plugin is actually started 4 times (resulting in 4 separate processes on different ports). 3 of them received an InitPlugin request and the first also received the UpdateCatalogue request.

Running locally, I can see that 3/4 of the matt processes were started and continue to run after the tests complete:

  UID   PID  PPID   C STIME   TTY           TIME CMD
  502 28591     1   0  9:21pm ttys001    0:00.01 /Users/matthew.fellows/.pact/plugins/matt-0.0.7/matt
  502 28653     1   0  9:21pm ttys001    0:00.01 /Users/matthew.fellows/.pact/plugins/matt-0.0.7/matt
  502 28715     1   0  9:21pm ttys001    0:00.02 /Users/matthew.fellows/.pact/plugins/matt-0.0.7/matt

This doesn't seem to happen with the protobuf plugin, so I do have my suspicions of this being a matt plugin specific problem.

Update

So I increased the test timeout by another 100s to get past the Golang timeout (in case that was all it was. It wasn't). The build failed with a verification error code of 2 (https://github.com/pact-foundation/pact-go/actions/runs/3758088926/jobs/6386014379#step:5:2542) which indicates the verification process wasn't able to run.

This leads me further down the "matt plugin" problem, but there are no obvious logs to indicate the process was killed/died or panic'd (I've also wrapped the plugin's main process in a panic recover(), so it would catch and log any of those). What's also confusing, is that this plugin works with the Pact JS builds on linux. But it does seem like the driver is trying to start the plugin and run the verification, but something goes wrong and it has to continually try and start a new plugin process.

The logs from the plugin are interesting, with an extra plugin process starting this time:

2022/12/22 13:02:45 lumberjack logging initialised
2022/12/22 13:02:45 [DEBUG] initialised logging
2022/12/22 13:02:45 [INFO] server on port 35233
2022/12/22 13:02:45 [INFO] InitPlugin request: plugin-driver-rust 0.2.2
2022/12/22 13:02:45 [INFO] UpdateCatalogue request: [key:"matt"  values:{key:"content-types"  value:"text/matt;application/matt"} type:TRANSPORT  key:"matt"]
2022/12/22 13:03:47 lumberjack logging initialised
2022/12/22 13:03:47 [DEBUG] initialised logging
2022/12/22 13:03:47 [INFO] server on port 36493
2022/12/22 13:03:47 [INFO] InitPlugin request: plugin-driver-rust 0.2.2
2022/12/22 13:03:47 lumberjack logging initialised
2022/12/22 13:03:47 [DEBUG] initialised logging
2022/12/22 13:03:47 [INFO] server on port 37547
2022/12/22 13:03:47 [INFO] InitPlugin request: plugin-driver-rust 0.2.2
2022/12/22 13:03:47 [INFO] UpdateCatalogue request: [type:MATCHER  key:"v3-includes" type:MATCHER  key:"v2-max-type" key:"json"  values:{key:"content-types"  value:"application/.*json,application/json-rpc,application/jsonrequest"} type:MATCHER  key:"v2-min-type" type:MATCHER  key:"v3-number-type" type:TRANSPORT  key:"matt" type:MATCHER  key:"v4-not-empty" key:"matt"  values:{key:"content-types"  value:"text/matt;application/matt"} type:CONTENT_GENERATOR  key:"json"  values:{key:"content-types"  value:"application/.*json,application/json-rpc,application/jsonrequest"} type:MATCHER  key:"v3-null" type:MATCHER  key:"v2-minmax-type" type:MATCHER  key:"v3-date" key:"text"  values:{key:"content-types"  value:"text/plain"} key:"xml"  values:{key:"content-types"  value:"application/.*xml,text/xml"} type:MATCHER  key:"v3-integer-type" type:MATCHER  key:"v4-max-equals-ignore-order" key:"multipart-form-data"  values:{key:"content-types"  value:"multipart/form-data,multipart/mixed"} type:MATCHER  key:"v4-array-contains" type:MATCHER  key:"v1-equality" type:MATCHER  key:"v4-minmax-equals-ignore-order" type:CONTENT_GENERATOR  key:"binary"  values:{key:"content-types"  value:"application/octet-stream"} type:MATCHER  key:"v3-decimal-type" type:MATCHER  key:"v4-semver" type:MATCHER  key:"v3-time" type:MATCHER  key:"v2-regex" type:MATCHER  key:"v4-min-equals-ignore-order" type:MATCHER  key:"v2-type" type:MATCHER  key:"v4-equals-ignore-order" type:MATCHER  key:"v3-datetime" type:MATCHER  key:"v3-content-type"]
2022/12/22 13:03:47 lumberjack logging initialised
2022/12/22 13:03:47 [DEBUG] initialised logging
2022/12/22 13:03:47 [INFO] server on port 41019
2022/12/22 13:03:47 [INFO] InitPlugin request: plugin-driver-rust 0.2.2
2022/12/22 13:03:47 lumberjack logging initialised
2022/12/22 13:03:47 [DEBUG] initialised logging
2022/12/22 13:03:47 [INFO] server on port 41107

What I found really interesting, is that the github agent detected the orphaned processes:

2022-12-22T13:18:57.6873264Z Cleaning up orphan processes
2022-12-22T13:18:57.7135586Z Terminate orphan process: pid (6006) (matt)
2022-12-22T13:18:57.7156236Z Terminate orphan process: pid (6014) (matt)
2022-12-22T13:18:57.7173464Z Terminate orphan process: pid (6023) (matt)

These were the first three plugins started by the framework, my assumption is that the last one the driver started (PID 6031) was shutdown by the framework (albeit I can't see any logs to validate this). This indicates the processes didn't die, but somehow the driver lost track of them or deliberately orphaned them?

@mefellows
Copy link
Member Author

mefellows commented Dec 22, 2022

So after rubber ducking, I noticed that the pact file curiously had the following:

...
  "metadata": {
    "pactRust": {
      "ffi": "0.3.18",
      "mockserver": "0.9.8",
      "models": "1.0.2"
    },
    "pactSpecification": {
      "version": "4.0"
    },
    "plugins": [
      {
        "configuration": {},
        "name": "matt",
        "version": "0.0.1"
      },
      {
        "configuration": {},
        "name": "matt",
        "version": "0.0.2"
      },
      {
        "configuration": {},
        "name": "matt",
        "version": "0.0.4"
      },
      {
        "configuration": {},
        "name": "matt",
        "version": "0.0.5"
      }
    ]
  },
...

Note the version of the plugin in the test itself was 0.0.7. I presumably hadn't run and committed the pact file with the latest version, but that might explain why the plugin was started 5 times.

Interestingly, even though 0.0.7 wasn't in the pact file, it was specified in the test itself, and that was the only version that was started up (as all of the log files were obtained from that version).

The other plugin tests were reliably working because this just so happened to be the only test that didn't write to a shared pact directory which was cleaned out prior to each test run - so that is likely the explanation for how so many plugins were stored in the pact file.

The fix

For clarity, the fix was simply to remove the pact file prior to running the test, to avoid stale interactions/configurations being written to the file.

What next

Given the difficultly in tracking this down, is there a change we can make to either make the problem more visible to the user or perhaps preventing having multiple unused plugins written to the pact file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant