Details of task notifications #652

agluszak · 2024-02-12T14:44:22Z

agluszak
Feb 12, 2024
Maintainer

A simple task of requesting the server to run some tests turns out to hit a lot of undefined or vaguely defined areas in the protocol specification. Let's review a few scenarios and encode the conclusions in the spec.

Scenario 1 (the simplest happy case):
User wants to test target T1 which contains only one test case C1. There is no compilation involved. C1 succeeds.

In my opinion, the json-rpc exchange between the client and the server should look like this:

buildTarget/test (client -> server)

{
  "targets": ["T1"],
  "originId": "TestRequestOriginId"
}

build/taskStart (server -> client) [1]

{
  "taskId": {
	  "id": "TestingStarted",
	  // parents can be either null or an empty list, see https://github.com/build-server-protocol/build-server-protocol/issues/649
  }
  "originId": "TestRequestOriginId",
  "dataKind": "test-task",
  "data": {
	"target": "T1"
  }
}

build/taskStart (server -> client) [2]

{
  "taskId": {
	  "id": "C1Started",
	  "parents": "TestingStarted"
  }
  "originId": "TestRequestOriginId",
  "dataKind": "test-start"
  "data": {
    "displayName": "C1"
  }
}

build/taskFinish (server -> client)

{
  "taskId": {
	  "id": "C1Started",
	  "parents": "TestingStarted" // [3]
  }
  "originId": "TestRequestOriginId",
  "status": 1, // This is of type StatusCode
  "dataKind": "test-finish",
  "data": {
    "displayName": "C1", // [4]
    "status": 1 // This is of type TestStatus [5]
  } 
}

build/taskFinish (server -> client)

{
  "taskId": {
	  "id": "TestingStarted",
  }
  "originId": "TestRequestOriginId",
  "status": 1 // [6]
  "dataKind": "test-report",
  "data": {
    "target": "T1", // [7]
    "passed": 1, // [8]
    "failed": 0,
    "ignored": 0,
    "cancelled": 0,
    "skipped": 0
  } 
}

buildTarget/test (request response, server -> client)

{
  "originId": "TestRequestOriginId",
  "statusCode": 1, // [9]
}

Scenario 2 (the simplest failure case)
User wants to test target T1 which contains only one test case C1. There is no compilation involved. C1 fails.

My understanding is that exactly the same as above would happen, but with different statuses. Certainly the TestStatus should be set to 2, but what about the rest? I'd say that all of them should be set to 1 (OK), because both the TestTask and the test request itself succeeded. TestFinish task itself also succeeded, even though the test failed. But if so, what would it mean if the task failed? There's nothing about this in the spec currently.

Scenario 3 (multiple test cases)
User wants to test target T1 which contains two test cases C1 and C2. There is no compilation involved. C1 succeeds, C2 fails.

There is one "outer" TestTask task. So there would be 3 build/taskStart notifications and 3 matching build/taskFinish notifications.

               /--- C1
TestingStarted
               \--- C2

Scenario 4 (multiple targets)
User wants to test targets T1 and T2 in one request. T1 contains two test cases C1 and C2. T2 contains test case C3. No compilation.

My understanding is that there would be 5 build/taskStart notifications.

               /--- C1
T1TestingStarted
               \--- C2

T2TestingStarted --- C3

Unless we agree that there has to be an additional top level task.

Scenario 5 (building)
User wants to test target T1 which contains only one test case C1. T1 needs to be built first.

My understanding is that there would be 3 task build/taskStart notifications.

Before notifications described in Scenario 1 there would be a CompileTask notification, perhaps followed by some build/taskProgress notifications.

Scenario 6 (cancellation)
Same as above, but the user cancels the task before the target is built or before the tests are run.

statusCode in TestResult (result of the entire request) is set to 3 (cancelled)[12]. If the cancellation happens before the tests are run, but after the building is done - task notifications from building are sent from the server to the client.

Scenario 7 (building error)
Same as above, but building/compilation fails.

buildTarget/test (client -> server)

{
  "targets": ["T1"],
  "originId": "TestRequestOriginId"
}

build/taskStart (server -> client)

{
  "taskId": {
	  "id": "BuildingStarted",
  }
  "originId": "TestRequestOriginId",
  "dataKind": "compile-task",
  "data": {
	  "target": "T1"
  }
}

build/taskProgress (server -> client)

{
  "taskId": {
    "id": "BuildingStarted",
  },
  "originId": "TestRequestOriginId",
  "message": "Reticulating splines in file.format",
  "total": 314,
  "progress": 9,
  "unit": "splines"
}

build/publishDiagnostics (server -> client)

{
  "textDocument": {
    "uri": "file://somewhere/file.format"
  },
  "buildTarget": "T1",
  "originId": "TestRequestOriginId",
  "diagnostics": [
    {
      "range": {
        "start": 0,
        "end": 42
      },
      "message": "some error"
    }
  ],
  "reset": true // [10]
}

build/taskFinish (server -> client)

{
  "taskId": {
	  "id": "TestingStarted",
  }
  "originId": "TestRequestOriginId",
  "status": 2
  "dataKind": "compile-report",
  "data": {
    "target": "T1",
    "errors": 5,
    "warnings": 6,
  } 
}

buildTarget/test (request response, server -> client)

{
  "originId": "TestRequestOriginId",
  "statusCode": 2, // [11]
}

[1] In my opinion we need to say in the spec either that:

this top level task is optional and must be present if you want to get a TestReport task notification later on. However this significantly increases the implementation complexity, because now there are two code paths: either all tests are grouped under a top level task or they are parentless. This essentially makes the TestReport task notification useless, because a compliant BSP client would always have to be able to handle both cases: with TestReport (so it doesn't have to keep track of the number of succeeded/failed/ignored test) or without one. In other words: depending on whether you get a TestReport the test statistics have to be computed either on the client side or on the server side.
this top level task must always be present and a TestReport must be sent after all tests for a given target are run. This, however, makes the task mechanism useless for this usecase, because this data could be included in the TestResult response (which is, by definition, always returned for a buildTarget/test request).
The spec currently says here that "The beginning of a testing unit may be signalled to the client with a build/taskStart notification. When the testing unit is a build target, the notification's dataKind field must be test-task and the data field must include a TestTask object."

What else can be a testing unit? What if you have multiple "testing units" in a build target? Will you get multiple TestReports? Or should there be a top level task anyway? (with its own TestReport?). I feel like this part of the specification tries to be too general.

[2] Should we also send a build/taskProgress[13] notification? Or instead of this one? Technically speaking, running a test case is a form of making progress in the greater task of testing a build target. The spec even suggests that build/taskProgress notification can be used for telling the client about running tests:

/** If known, total amount of work units in this task. */
  total?: Long;

  /** If known, completed amount of work units in this task. */
  progress?: Long;

  /** Name of a work unit. For example, "files" or "tests". May be empty. */
  unit?: string;

However, in my opinion, it makes no sense to use build/taskProgress for reporting of tests being started, because that prevents you from getting a TestFinish notification later on. So we should explicitly prohibit that.

[3] In theory, once a task has started, the list of parents can no longer change. So this field could be ignored. But what if it's present and, what's worse, it contains data different from what it contained when the task was started? We could say in the specification that for notifications other than build/taskStarted this field MUST be absent. But then having TaskId as a separate structure would no longer make sense and it would be simpler to "extract" its fields directly to the fields of the notification parameters. build/taskStart would have both id and parents, build/taskProgress and build/taskFinish would only have id.

[4] What should happen in the client if it’s different from the name received in the start notification? What is the point of the possibility that these two values could be different? Also: why is this field present in test notifications, but not compile notifications?

IMO we need to clarify that this field SHOULD NOT be used as an identifier.

[5] How is one status different from another? How can a test succeed, but the task containing it fail?

[6] What does this status really mean? Should it be set to failed if any of the contained tests fail? Or should it be always set to OK and ignored?

[7] Redundant. We already know that form the TestTask notification.

[8] As mentioned in [1], what if this number is not equal to the number of TestFinish notifications with status == 1 received? Who should be the authority computing the stats? The client or the server?

[9] This time the meaning of this yet another status is clear, as it is the status of the entire request. It will be discussed later on.

[10] Perhaps instead of this boolean switch (which, from my experience as an implementor, is hard to work with, because it requires some implicit, mutable state) we could have an obligatory taskId field. This way we would always know which diagnostics should be displayed together, even if they come in separate batches. If you use BSP for, for example, background typechecking - then each time the compiler runs and starts spawning diagnostics - start a new task.

[11] That's a failure, because we didn't even get to the testing part, I guess? But what if there are multiple targets and they can be built and tested independently, in parallel? What if, let's say, one target builds fine and is tested successfully, but another one breaks in the process of testing?

[12] Oh, now I remember that we already talked about cancellation and it turned out that our cancelled status code only exist because the original authors of the BSP spec weren't aware of the LSP cancellation mechanism or it didn't exist back then.

[13] Regarding progress notifications: just a reminder that the LSP base protocol (on which we'd probably like to rebase BSP in version 3.0) contains a progress reporting mechanism

agluszak · 2024-02-14T16:19:51Z

agluszak
Feb 14, 2024
Maintainer Author

Additional question. Task notifications have an optional eventTime field. What's the use for that? Why should one be using this timestamp instead of the moment when the notification came? Does it matter from the end user perspective? Is it realistically possible that the server will take >1s between registering an event and sending a notification about it? The client has to be prepared for the case when there's no eventTime field anyway, probably using the notification arrival time as a replacement.

0 replies

adpi2 · 2024-02-20T15:15:04Z

adpi2
Feb 20, 2024
Maintainer

Thanks @agluszak for this comprehensive discussion of the BSP test workflow.

[1] I am more in favor of the second option: the TestReport should be mandatory and it should be part of the TestResult response. For each target in the test request, the test response should contain a TestReport.

What else can be a testing unit? What if you have multiple "testing units" in a build target? Will you get multiple TestReports? Or should there be a top level task anyway? (with its own TestReport?). I feel like this part of the specification tries to be too general.

To be flexible we should allow a build target to contain many test groups. But also to keep things simple we should only require a single test report on each build target. We can also have a optional test report on a test group (but I don't think we need one).

I don't like the test-task and test-start notifications because their names are ambiguous, and their are not well specified. I propose we replace them with the notions of test groups and tests (test-group-start, test-group-finish, test-start, test-finish). A test request can trigger some test groups and/or some tests, each test group can contain other test groups and/or some tests. The client can use those test group hierarchy to display the test execution.

Maybe the test-group-start notification should contain the list of inner tests and groups. So that we know beforehand how many sub-tasks are expected to run.

[2] It would make sense to report progress of a test group given that we know the total number of tests. But do we have any client that need that?

[3]

build/taskStart would have both id and parents, build/taskProgress and build/taskFinish would only have id.

LGTM

[4] Looks like a detail to me. We can remove the field or we can specify that it should be the same as in the task-start notification.

[5]

How is one status different from another? How can a test succeed, but the task containing it fail?

This is how I understand it:

If the task succeeds then the data field is not empty and it contain the test status (success, failure, ignored)
If the task fails it means something unexpected happened (e.g. the crash of the test runner) and the data field is empty. There is no test status because the test runner crashed. I expect this situation to happen very rarely.

[6] Similarly this status should be success except if something unexpected happened such as the crash of the runner.

[7] Same comment as [4]

[8] The server is always right. But the client is free to ignore the report if it feels like it should re-compute it.

[10]

Perhaps instead of this boolean switch (which, from my experience as an implementor, is hard to work with, because it requires some implicit, mutable state) we could have an obligatory taskId field.

I totally agree, the reset mechanism is not easy to implement at the server side. The taskId field would solve the issue quite nicely.

[11] This problem is a recurring one. It is also described in #655 and I proposed a solution in #204. The multi-target request is at the core of BSP and we should have a multi-target response to handle success and failures at the target level.

0 replies

mnoah1 · 2024-05-16T15:41:14Z

mnoah1
May 16, 2024

I don't like the test-task and test-start notifications because their names are ambiguous, and their are not well specified. I propose we replace them with the notions of test groups and tests (test-group-start, test-group-finish, test-start, test-finish). A test request can trigger some test groups and/or some tests, each test group can contain other test groups and/or some tests. The client can use those test group hierarchy to display the test execution.

Since there is already a parent-child structure, is there a need for test-group and test to be two separate concepts? Isn't the fact that it's a test group implied by the existence of multiple children?

1 reply

adpi2 May 16, 2024
Maintainer

Yes that would work, if we agree that they have the same associated data on both start and finish notifications.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Details of task notifications #652

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Details of task notifications #652

agluszak Feb 12, 2024 Maintainer

Replies: 3 comments · 1 reply

agluszak Feb 14, 2024 Maintainer Author

adpi2 Feb 20, 2024 Maintainer

mnoah1 May 16, 2024

adpi2 May 16, 2024 Maintainer

agluszak
Feb 12, 2024
Maintainer

Replies: 3 comments 1 reply

agluszak
Feb 14, 2024
Maintainer Author

adpi2
Feb 20, 2024
Maintainer

mnoah1
May 16, 2024

adpi2 May 16, 2024
Maintainer