Test task on Windows image 20230517.1 fails #7683

gioce90 · 2023-06-08T09:09:01Z

Description

Hi, I hope this is the right place where to ask help.

In my Azure DevOps Pipeline, I have this pipeline definition:

pool:
  vmImage: 'windows-2022'

#... stuff...

steps:

#... stuff...

- task: DotNetCoreCLI@2
  displayName: 'dotnet test'
  inputs:
    command: 'test'
    projects: '**/*.csproj'
    arguments: '--no-restore --configuration $(buildConfiguration) --collect "Code Coverage"'

This pipeline for years worked well. Tests usually run successful in less than 2-3 minutes.
Now, in my company we noticed that, starting from 2-3 days, the pipeline randomly fails, because the 'dotnet test' Task is longer than 60 minutes.
The strange things is that for some runs on the same branch and the same commit this doesn't happen and the run is successful.
After a short analysis, I noticed that the only different things is the agent machine configurations:

Run with this config always run successfully ('dotnet test' Task short as usual):

Run with this config always fails ('dotnet test' Task too longer):

So my opinion is that the 20230508.3 version or oldests were okay and the new 20230517.1 version have something wrong.

Here, in this project, I found the difference between these two versions, and maybe is something related the new version of MSBuild, I don't know. Can you help me with that, maybe prioritizing the release of a new image version?

Platforms affected

Azure DevOps
GitHub Actions - Standard Runners
GitHub Actions - Larger Runners

Runner images affected

Image version and build link

Image: windows-2022
Version: 20230517.1
Included Software: https://github.com/actions/runner-images/blob/win22/20230517.1/images/win/Windows2022-Readme.md
Image Release: https://github.com/actions/runner-images/releases/tag/win22%2F20230517.1

Is it regression?

20230508.3

Expected behavior

The pipeline's runs should all run successful because nothing changed in code or other (same branch, same commit)

Actual behavior

The pipeline's runs based on same branch and same commit sometimes fails. The only difference is the windows image version.

Repro steps

I just run the pipeline based on the same code (same branch and commit).
When the machine configuration mount the image version 20230508.3 (or older) it works, but when the machine mount the new image version 20230517.1 always fails because the tests are indefinitely longer.

The text was updated successfully, but these errors were encountered:

erik-bershel · 2023-06-08T14:37:05Z

@gioce90 hey there!
It might be that currently known bug of Visual Studio Test Platform:
microsoft/vstest#4516

Can you provide some additional information for investigation? Such as repro steps and logs. Or you may try proposed workaround. We'll wait your response.

gioce90 · 2023-06-12T15:42:19Z

Hi @erik-bershel . I tried to substitute my DotNetCoreCLI@2 task with the two suggested in that link you posted (VisualStudioTestPlatformInstaller@1 and VSTest@2).
Sadly, there is no difference: the VSTest task is running indefinitely.

Moreover the VSTest task raises some new kind of errors, as you can see:

(to be honest, I would rather use the "DotNetCoreCLI@2 task" instead of switching to using VSTest)

I don't know which repro steps you need, as already said I just runs a old commit on a branch that always worked, but when the Azure DevOps Pipeline Agent is with version 20230517.1 (or newest) the test just run indefinitely.

If you need logs I can send you privately.

gioce90 · 2023-06-13T13:02:59Z

Upon further investigation, I was able to isolate the problematic portion of the code.
I removed all the test assemblies and patiently re-launched the Azure Pipelines multiple times (I remind you that locally with Visual Studio the problem does not occur) adding every test assembly back one by one, and finally I was able to isolate the tests assembly that runs forever.

Once figured out which assembly I noticed the problem in the code (consist in a "sync over async" antipattern):

This is of course bad code, but is there from the year 2020, and so far it has never given any problems.
I suspect that in the last few days we have encountered a case of deadlocks (on .Result) because the system somehow have switched from multi-thread execution of tests to single thread execution. It's just my guess, but as further proof in my support: we use xUnit as Test Framework and, locally, I tried to force maxParallelThreads = 1 by configuration. Doing so, also locally I can reproduce the deadlock problem.

Now, I fixed it (right side of the screen) and ran all the tests: the run is successful, finally.

The strange thing is that, as already reiterated, we haven't changed anything in the last few months.
I suppose, something has probably changed in the tools that run the tests. I suspect that, unlike in my local environment and on Azure DevOps before of some weeks ago, the number of parallel threads is now forced to 1.
In this case (and I know probably is not here the place to ask) there is a way to know what is changed?

Of course, I think we can close this issue because is not a specific problem of this project. Anyway thank you for your time.

erik-bershel · 2023-06-13T17:33:14Z

Hello @gioce90!
Thanks for the detailed description of the situation. I think that without access to the project's code, it would be very difficult for me to get to the same conclusions. On my test builds, this problem did not arise due to the poor project code in the samples that I used.
Regarding concurrency, I'll try to find out and come back with an answer as to why or at least where it was changed. In case anyone else runs into similar problems.

erik-bershel · 2023-07-18T09:20:21Z

Hi again @gioce90!

After weeks of waiting for a response from the teams involved in the development of the task you used, I have to admit that an answer to the question of what specific changes led to this result is not expected in the near future. I have requested my colleagues to try to conclude the investigation, but the outcome is not guaranteed.

If and when such a response is received, I will add it here for reference.

gioce90 added bug report needs triage labels Jun 8, 2023

erik-bershel added bug Something isn't working Area: Testing and code coverage OS: Windows and removed needs triage labels Jun 8, 2023

erik-bershel self-assigned this Jun 13, 2023

ilia-shipitsin mentioned this issue Jul 10, 2023

Build Pipeline Bug - .net core build failing #7767

Closed

10 tasks

erik-bershel closed this as completed Jul 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test task on Windows image 20230517.1 fails #7683

Test task on Windows image 20230517.1 fails #7683

gioce90 commented Jun 8, 2023 •

edited

Loading

erik-bershel commented Jun 8, 2023

gioce90 commented Jun 12, 2023

gioce90 commented Jun 13, 2023

erik-bershel commented Jun 13, 2023

erik-bershel commented Jul 18, 2023

Test task on Windows image 20230517.1 fails #7683

Test task on Windows image 20230517.1 fails #7683

Comments

gioce90 commented Jun 8, 2023 • edited Loading

Description

Platforms affected

Runner images affected

Image version and build link

Is it regression?

Expected behavior

Actual behavior

Repro steps

erik-bershel commented Jun 8, 2023

gioce90 commented Jun 12, 2023

gioce90 commented Jun 13, 2023

erik-bershel commented Jun 13, 2023

erik-bershel commented Jul 18, 2023

gioce90 commented Jun 8, 2023 •

edited

Loading