Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test task on Windows image 20230517.1 fails #7683

Closed
2 of 10 tasks
gioce90 opened this issue Jun 8, 2023 · 5 comments
Closed
2 of 10 tasks

Test task on Windows image 20230517.1 fails #7683

gioce90 opened this issue Jun 8, 2023 · 5 comments

Comments

@gioce90
Copy link

gioce90 commented Jun 8, 2023

Description

Hi, I hope this is the right place where to ask help.

In my Azure DevOps Pipeline, I have this pipeline definition:

pool:
  vmImage: 'windows-2022'

#... stuff...

steps:

#... stuff...

- task: DotNetCoreCLI@2
  displayName: 'dotnet test'
  inputs:
    command: 'test'
    projects: '**/*.csproj'
    arguments: '--no-restore --configuration $(buildConfiguration) --collect "Code Coverage"' 

This pipeline for years worked well. Tests usually run successful in less than 2-3 minutes.
Now, in my company we noticed that, starting from 2-3 days, the pipeline randomly fails, because the 'dotnet test' Task is longer than 60 minutes.
The strange things is that for some runs on the same branch and the same commit this doesn't happen and the run is successful.
After a short analysis, I noticed that the only different things is the agent machine configurations:

Run with this config always run successfully ('dotnet test' Task short as usual):
image

Run with this config always fails ('dotnet test' Task too longer):
image

So my opinion is that the 20230508.3 version or oldests were okay and the new 20230517.1 version have something wrong.

Here, in this project, I found the difference between these two versions, and maybe is something related the new version of MSBuild, I don't know. Can you help me with that, maybe prioritizing the release of a new image version?

Platforms affected

  • Azure DevOps
  • GitHub Actions - Standard Runners
  • GitHub Actions - Larger Runners

Runner images affected

  • Ubuntu 20.04
  • Ubuntu 22.04
  • macOS 11
  • macOS 12
  • macOS 13
  • Windows Server 2019
  • Windows Server 2022

Image version and build link

Image: windows-2022
Version: 20230517.1
Included Software: https://github.com/actions/runner-images/blob/win22/20230517.1/images/win/Windows2022-Readme.md
Image Release: https://github.com/actions/runner-images/releases/tag/win22%2F20230517.1

Is it regression?

20230508.3

Expected behavior

The pipeline's runs should all run successful because nothing changed in code or other (same branch, same commit)

Actual behavior

The pipeline's runs based on same branch and same commit sometimes fails. The only difference is the windows image version.

Repro steps

I just run the pipeline based on the same code (same branch and commit).
When the machine configuration mount the image version 20230508.3 (or older) it works, but when the machine mount the new image version 20230517.1 always fails because the tests are indefinitely longer.

@erik-bershel
Copy link
Contributor

@gioce90 hey there!
It might be that currently known bug of Visual Studio Test Platform:
microsoft/vstest#4516

Can you provide some additional information for investigation? Such as repro steps and logs. Or you may try proposed workaround. We'll wait your response.

@gioce90
Copy link
Author

gioce90 commented Jun 12, 2023

Hi @erik-bershel . I tried to substitute my DotNetCoreCLI@2 task with the two suggested in that link you posted (VisualStudioTestPlatformInstaller@1 and VSTest@2).
Sadly, there is no difference: the VSTest task is running indefinitely.

image

Moreover the VSTest task raises some new kind of errors, as you can see:

image

(to be honest, I would rather use the "DotNetCoreCLI@2 task" instead of switching to using VSTest)

I don't know which repro steps you need, as already said I just runs a old commit on a branch that always worked, but when the Azure DevOps Pipeline Agent is with version 20230517.1 (or newest) the test just run indefinitely.

If you need logs I can send you privately.

@gioce90
Copy link
Author

gioce90 commented Jun 13, 2023

Upon further investigation, I was able to isolate the problematic portion of the code.
I removed all the test assemblies and patiently re-launched the Azure Pipelines multiple times (I remind you that locally with Visual Studio the problem does not occur) adding every test assembly back one by one, and finally I was able to isolate the tests assembly that runs forever.

Once figured out which assembly I noticed the problem in the code (consist in a "sync over async" antipattern):

image

This is of course bad code, but is there from the year 2020, and so far it has never given any problems.
I suspect that in the last few days we have encountered a case of deadlocks (on .Result) because the system somehow have switched from multi-thread execution of tests to single thread execution. It's just my guess, but as further proof in my support: we use xUnit as Test Framework and, locally, I tried to force maxParallelThreads = 1 by configuration. Doing so, also locally I can reproduce the deadlock problem.

Now, I fixed it (right side of the screen) and ran all the tests: the run is successful, finally.

The strange thing is that, as already reiterated, we haven't changed anything in the last few months.
I suppose, something has probably changed in the tools that run the tests. I suspect that, unlike in my local environment and on Azure DevOps before of some weeks ago, the number of parallel threads is now forced to 1.
In this case (and I know probably is not here the place to ask) there is a way to know what is changed?

Of course, I think we can close this issue because is not a specific problem of this project. Anyway thank you for your time.

@erik-bershel
Copy link
Contributor

Hello @gioce90!
Thanks for the detailed description of the situation. I think that without access to the project's code, it would be very difficult for me to get to the same conclusions. On my test builds, this problem did not arise due to the poor project code in the samples that I used.
Regarding concurrency, I'll try to find out and come back with an answer as to why or at least where it was changed. In case anyone else runs into similar problems.

@erik-bershel
Copy link
Contributor

Hi again @gioce90!

After weeks of waiting for a response from the teams involved in the development of the task you used, I have to admit that an answer to the question of what specific changes led to this result is not expected in the near future. I have requested my colleagues to try to conclude the investigation, but the outcome is not guaranteed.

If and when such a response is received, I will add it here for reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants