Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error joining Windows 10 and EFR to domain #579 is now an issue again #801

Open
jt0dd opened this issue Apr 19, 2022 · 17 comments · Fixed by xx4h/DetectionLab#8 · May be fixed by #896
Open

Error joining Windows 10 and EFR to domain #579 is now an issue again #801

jt0dd opened this issue Apr 19, 2022 · 17 comments · Fixed by xx4h/DetectionLab#8 · May be fixed by #896

Comments

@jt0dd
Copy link

jt0dd commented Apr 19, 2022

#579

This issue was solved and closed, but I and a separate commenter seem to be experiencing it again as of now.

[20:40] Current domain is set to 'workgroup'. Time to join the domain!
[20:40] My hostname is WIN10
[20:40] Joining the domain...
[20:40] First, set DNS to DC to join the domain...

__GENUS          : 2
__CLASS          : __PARAMETERS
__SUPERCLASS     :
__DYNASTY        : __PARAMETERS
__RELPATH        :
__PROPERTY_COUNT : 1
__DERIVATION     : {}
__SERVER         :
__NAMESPACE      :
__PATH           :
ReturnValue      : 0
PSComputerName   :


__GENUS          : 2
__CLASS          : __PARAMETERS
__SUPERCLASS     :
__DYNASTY        : __PARAMETERS
__RELPATH        :
__PROPERTY_COUNT : 1
__DERIVATION     : {}
__SERVER         :
__NAMESPACE      :
__PATH           :
ReturnValue      : 0
PSComputerName   :

[20:40] Now join the domain...
[20:40] Adding Win10 to the domain. Sometimes this step times out. If that happens, just run 'vagrant reload win10 --provision'
[20:40] Disabling Windows Updates and Windows Module Services




Stderr from the command:

powershell.exe : Add-Computer : Computer 'win10' failed to join domain 'windomain.local' from its current
    + CategoryInfo          : NotSpecified: (Add-Computer : ...om its current :String) [], RemoteException
    + FullyQualifiedErrorId : NativeCommandError
workgroup 'WORKGROUP' with following error message: The specified domain either does not exist or
could not be contacted.
At C:\vagrant\scripts\join-domain.ps1:42 char:3
+   Add-Computer -DomainName "windomain.local" -credential $DomainCred  ...
+   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : OperationStopped: (win10:String) [Add-Computer], InvalidOperationEx
   ception
    + FullyQualifiedErrorId : FailToJoinDomainFromWorkgroup,Microsoft.PowerShell.Commands.AddComp
   uterCommand

have tried:

vagrant reload win10 --provision

@clong
Copy link
Owner

clong commented Apr 20, 2022

I'm able to reproduce this like 1/20 times, but I have never been able to determine a root cause :(

@clong
Copy link
Owner

clong commented Apr 29, 2022

@jt0dd Can you provide OS/Provider/etc

@faisal6me
Copy link

The root cause could be due to duplicated SID for win10, so solve this generate a new SID in win10 by going to the path ( C:\Windows\System32\Sysprep) Out-of-Box Experience (OOBE) is selected from the System Cleanup Action menu and that Generalize selected then OK . Then try rejoin to the domain

@clong
Copy link
Owner

clong commented May 23, 2022

@faisal6me That shouldn't be the issue as I believe the domain join would fail with an error regarding duplicate SIDs. The Win10 image is sysprepped, plus a reboot solves this problem almost 100% of the time and a reboot doesn't have any effect on the SID.

@clong
Copy link
Owner

clong commented May 23, 2022

@jt0dd Can you provide OS/Provider/etc

@jt0dd
Copy link
Author

jt0dd commented May 23, 2022 via email

@clong
Copy link
Owner

clong commented May 31, 2022

If anyone can reliably reproduce this, I'd love to test some potential fixes

@OpalSec
Copy link

OpalSec commented Aug 18, 2022

I've been trying over the last few hours to get this to work and seems like I'm coming up with the same error. Full reboot of the physical host, destroy and rebuild with vagrant, and the recommended reload with --provision haven't done anything.

Happy to be your guinea pig if you have specific scenarios you want me to test?

Running this on a Server 2019 host with Vagrant 2.3.0 and the latest commit from the master branch.

[04:19] My hostname is WIN10
[04:19] Joining the domain...
[04:19] First, set DNS to DC to join the domain...

__GENUS          : 2
__CLASS          : __PARAMETERS
__SUPERCLASS     :
__DYNASTY        : __PARAMETERS
__RELPATH        :
__PROPERTY_COUNT : 1
__DERIVATION     : {}
__SERVER         :
__NAMESPACE      :
__PATH           :
ReturnValue      : 0
PSComputerName   :


__GENUS          : 2
__CLASS          : __PARAMETERS
__SUPERCLASS     :
__DYNASTY        : __PARAMETERS
__RELPATH        :
__PROPERTY_COUNT : 1
__DERIVATION     : {}
__SERVER         :
__NAMESPACE      :
__PATH           :
ReturnValue      : 0
PSComputerName   :

[04:19] Now join the domain...
[04:19] Adding Win10 to the domain. Sometimes this step times out. If that happens, just run 'vagrant reload win10 --provision'
[04:19] Disabling Windows Updates and Windows Module Services




Stderr from the command:

powershell.exe : Add-Computer : Computer 'win10' failed to join domain 'windomain.local' from its current
    + CategoryInfo          : NotSpecified: (Add-Computer : ...om its current :String) [], RemoteException
    + FullyQualifiedErrorId : NativeCommandError
workgroup 'WORKGROUP' with following error message: The specified domain either does not exist or
could not be contacted.
At C:\vagrant\scripts\join-domain.ps1:42 char:3
+   Add-Computer -DomainName "windomain.local" -credential $DomainCred  ...
+   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : OperationStopped: (win10:String) [Add-Computer], InvalidOperationEx
   ception
    + FullyQualifiedErrorId : FailToJoinDomainFromWorkgroup,Microsoft.PowerShell.Commands.AddComp
   uterCommand

@OpalSec
Copy link

OpalSec commented Aug 19, 2022

Sorry but I'm going to have to rescind that offer - in one last act of desperation I completely removed the cloned repo and re-cloned and rebuilt everything using the Vagrant scripts, which eventually ended up working.

I did come across a few unrelated issues though that I managed to get around - initially it failed because the join-domain PowerShell script was being flagged as malicious by Defender, and while adding an exception to the scripts folder didn't work, disabling Defender altogether did. The second error came when the vagrant-shell script tried to add exclusions, but a simple reload with the --provision flag sorted that out.

Defender blocking the join-domain script

[06:24] My hostname is WIN10




Stderr from the command:

powershell.exe : At C:\vagrant\scripts\join-domain.ps1:1 char:1
    + CategoryInfo          : NotSpecified: (At C:\vagrant\s...in.ps1:1 char:1:String) [], RemoteException
    + FullyQualifiedErrorId : NativeCommandError
+ # Purpose: Joins a Windows host to the windomain.local domain which w ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This script contains malicious content and has been blocked by your antivirus software.
    + CategoryInfo          : ParserError: (:) [], ParseException
    + FullyQualifiedErrorId : ScriptContainedMaliciousContent

Errors adding Defender Exclusions

[00:51] Red Team tooling installation complete!




Stderr from the command:

powershell.exe : Set-MpPreference : Operation failed with the following error: 0x800106ba. Operation:
    + CategoryInfo          : NotSpecified: (Set-MpPreferenc...ba. Operation: :String) [], RemoteException
    + FullyQualifiedErrorId : NativeCommandError
MpPreference. Target: ConfigListExtension.
At C:\tmp\vagrant-shell.ps1:14 char:3
+   Set-MpPreference -ExclusionPath "C:\Tools"
+   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (MSFT_MpPreference:root\Microsoft\...FT_MpPreference)
    [Set-MpPreference], CimException
    + FullyQualifiedErrorId : HRESULT 0x800106ba,Set-MpPreference

Set-MpPreference : Operation failed with the following error: 0x%1!x!
At C:\tmp\vagrant-shell.ps1:14 char:3
+   Set-MpPreference -ExclusionPath "C:\Tools"
+   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (MSFT_MpPreference:root\Microsoft\...FT_MpPreference)
    [Set-MpPreference], CimException
    + FullyQualifiedErrorId : HRESULT 0x800106ba,Set-MpPreference

Add-MpPreference : Operation failed with the following error: 0x800106ba. Operation:
MpPreference. Target: ConfigListExtension.
At C:\tmp\vagrant-shell.ps1:15 char:3
+   Add-MpPreference -ExclusionPath "C:\Users\vagrant\AppData\Local\Tem ...
+   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (MSFT_MpPreference:root\Microsoft\...FT_MpPreference)
    [Add-MpPreference], CimException
    + FullyQualifiedErrorId : HRESULT 0x800106ba,Add-MpPreference

Add-MpPreference : Operation failed with the following error: 0x%1!x!
At C:\tmp\vagrant-shell.ps1:15 char:3
+   Add-MpPreference -ExclusionPath "C:\Users\vagrant\AppData\Local\Tem ...
+   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (MSFT_MpPreference:root\Microsoft\...FT_MpPreference)
    [Add-MpPreference], CimException
    + FullyQualifiedErrorId : HRESULT 0x800106ba,Add-MpPreference

@jt0dd
Copy link
Author

jt0dd commented Sep 25, 2022

For me personally, this experience caused me to entirely lose any shred of confidence in Vagrant as a tool (it just seems entirely unreliable, a colleague and I went through probably 5 issues like this one trying to get this one project up and running) and switch to Terraform. You should not have to go through this much "uncertain" trouble. Debugging is fine when building a complex infrastructure set. But bugs that people can't reproduce and just happen randomly to some and not others for no clear reason, that's a serious problem in my view, and a dealbreaker.

(OpenStack + Terraform lets you do a lot of what Vagrant does, and yes, locally without needing to pay for cloud resources)

@jt0dd
Copy link
Author

jt0dd commented Sep 25, 2022

I just realized I was being waited on for details here, far too late, because I abandoned the usage of Vagrant after making this issue.

Device name DESKTOP-4KS9BL7
Processor Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz 3.60 GHz
Installed RAM 32.0 GB
System type 64-bit operating system, x64-based processor

@juho-nurmi
Copy link

While troubleshooting installation problems for a colleague's Virtualbox setup I came across this same issue, which didn't get fixed with just restarting or provisioning again, but I got it fixed in a way that hasn't been mentioned yet.
Unfortunately I did the troubleshooting remotely and due certain circumstances I don't have logs to provide.

  • I found out using wireshark that the second interface (=NATted) was used with DNS query for dc.windomain.local when join-domain.ps1 was run.
  • Both network interfaces had the same Interface metric value, which lead me think, at the time, win10 randomly chooses which interface it uses.
    image
  • I "solved" this by setting the Interface metric (to value 15) in powershell, followed by running manually join-domain.ps1 and finally doing the provisioning again. (Note. before setting the metric I ran join-domain.ps1 manually and it would fail & give the same error message)
    image
    It might have been enough to set the metric and then do the provisioning, but I wasn't sure if the new metric would have persisted during reboot.

After tshooting session and while writing this I tried to replicate the issue on my own setup, but I wasn't able to and I'm starting to question if I was just lucky.

My idea to replicate was to remove win10 from domain, then setting the static interface's metric to higher value and joining the host to domain again. I thought, this should have failed the DNS query during domain join and give the same error message. However, that didn't work. There was more precise route in routing table, which took probaly preference over default gw / preferred interface.

While digging more how windows does DNS querries it also seems much more complicated process than I thought (2).

In the end I suspect the issue is related on which interface and DNS is used to send the DNS query of dc.windomain.local during the domain joining process.

My fix suggestions would be to lower the Interface metric (=increase preference) of the static interface and make a static entry for dc.windomain.local in the hosts file.
I would think with these two changes you can "force" the correct interface to be used.

https://learn.microsoft.com/en-us/windows-server/networking/technologies/network-subsystem/net-sub-interface-metric (1)
https://learn.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2008-R2-and-2008/dd197552(v=ws.10) (2)

@clong
Copy link
Owner

clong commented Oct 1, 2022

@juho-nurmi wow, thank you so much for this information! That actually makes a LOT of sense and seems like the likely culprit here. I'll open a PR with this change implemented and we'll see how it goes!

@clong
Copy link
Owner

clong commented Oct 1, 2022

f919a02

@wugglesworth
Copy link

While troubleshooting installation problems for a colleague's Virtualbox setup I came across this same issue, which didn't get fixed with just restarting or provisioning again, but I got it fixed in a way that hasn't been mentioned yet. Unfortunately I did the troubleshooting remotely and due certain circumstances I don't have logs to provide.

  • I found out using wireshark that the second interface (=NATted) was used with DNS query for dc.windomain.local when join-domain.ps1 was run.
  • Both network interfaces had the same Interface metric value, which lead me think, at the time, win10 randomly chooses which interface it uses.
    image
  • I "solved" this by setting the Interface metric (to value 15) in powershell, followed by running manually join-domain.ps1 and finally doing the provisioning again. (Note. before setting the metric I ran join-domain.ps1 manually and it would fail & give the same error message)
    image
    It might have been enough to set the metric and then do the provisioning, but I wasn't sure if the new metric would have persisted during reboot.

After tshooting session and while writing this I tried to replicate the issue on my own setup, but I wasn't able to and I'm starting to question if I was just lucky.

My idea to replicate was to remove win10 from domain, then setting the static interface's metric to higher value and joining the host to domain again. I thought, this should have failed the DNS query during domain join and give the same error message. However, that didn't work. There was more precise route in routing table, which took probaly preference over default gw / preferred interface.

While digging more how windows does DNS querries it also seems much more complicated process than I thought (2).

In the end I suspect the issue is related on which interface and DNS is used to send the DNS query of dc.windomain.local during the domain joining process.

My fix suggestions would be to lower the Interface metric (=increase preference) of the static interface and make a static entry for dc.windomain.local in the hosts file. I would think with these two changes you can "force" the correct interface to be used.

https://learn.microsoft.com/en-us/windows-server/networking/technologies/network-subsystem/net-sub-interface-metric (1) https://learn.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2008-R2-and-2008/dd197552(v=ws.10) (2)

I can verify that after changing the metric value as you suggested all I had to do was re-run the provisioner without any other changes and it worked.

@tsunamifi
Copy link

Having this issue on logger and wef too not just win10. Tried changing the metrics no luck. Any ideas?

@duytrinh-boop
Copy link

@clong Hi!
Think I've been able to reliably reproduce the error.
I'm using vagrant, and virtualbox as provider.

How to reproduce error: add another win10 instance in the Vagrantfile, and give new definition-name, new IP-address, and hostname.
I.e. copy the config.vm.define "win10" do |cfg|-block. Then:

  1. Give new name-definition. E.g. config.vm.define "win10-ws1" do |cfg|
  2. Give new ip. E.g. cfg.vm.network :private_network, ip: "192.168.56.114", gateway: "192.168.56.1", dns: "192.168.56.102. Remember to update ip in cfg.vm.provision "shell", path: "scripts/fix-second-network.ps1" as well
  3. New hostname. E.g. vb.name = "win10-ws1.windomain.local"
  4. We need to update Vagrant\scripts\join-domain.ps1, line 40 to match our new hostname as well. ElseIf ($hostname -like "*win10*")

Proposed fix, based on wugglesworth's suggestions.

# explanation of script>
# 1. get list of all IP's on VM
# 2. find interface with ip in correct subnet. I.e. same range as DNS=192.168.56.102
# 3. Change the interfaceMetric of this interface to have higher priority than the others

# Step 1
$ips=Get-NetIPAddress

# Step 2
# Two variables used in while-loop. 
$correctInterfaceIndex=''
[bool]$correctInterface # Cast variable to boolean. Used in while loop
$i=0

# While loop> run until correct interface is found
while ($i -le $ips.Length -and !$correctInterfaceIndex ) {
    #$ips[$i].IPAddress
    if ($ips[$i].IPAddress -like "*192.168.*"){
        $correctInterfaceIndex=$ips[$i].InterfaceIndex
        #$correctInterfaceIndex
    }
    $i++
}

#Step 3
#Get current InterfaceMetric of NetworkInterface with index $correntInterfaceIndex
$current_interface_metric=$(Get-NetIPInterface -InterfaceIndex $correctInterfaceIndex).InterfaceMetric
$current_interface_metric

# Set interfaceMetric with higher priority
Set-NetIPInterface -InterfaceIndex $correctInterfaceIndex -InterfaceMetric $($current_interface_metric-1)
$(Get-NetIPInterface -InterfaceIndex $correctInterfaceIndex).InterfaceMetric

This ps1 script works, but I haven't been able to reliably integrate it with the provisioning phase. Maybe just call the script in Vagrant\scripts\join-domain.ps1, line 40:

} ElseIf ($hostname -like "*win10*") {
   Write-Host "$('[{0:HH:mm}]' -f (Get-Date)) Adding Win10 to the domain."
  ### Debugging the Win10 domain join issue https://github.com/clong/DetectionLab/issues/801
  $tries = 0
  While ($tries -lt 3) {
    Try {
      $tries += 1
      $ c:\vagrant\scripts\fix-failed-to-join-domain_interface-issue.ps1
      Write-Host "$('[{0:HH:mm}]' -f (Get-Date)) Try # $tries"
      Add-Computer -DomainName "windomain.local" -credential $DomainCred -OUPath "ou=Workstations,dc=windomain,dc=local"

security-companion added a commit to security-companion/DetectionLab that referenced this issue Dec 20, 2022
Based on clong#801 (comment)
so credit goes to him
xx4h added a commit to xx4h/DetectionLab that referenced this issue Mar 27, 2023
Set low interface metric for domain interface.

fixes clong#801
xx4h added a commit to xx4h/DetectionLab that referenced this issue Mar 27, 2023
Set low interface metric for domain interface.

fixes clong#801
@xx4h xx4h linked a pull request Mar 27, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants