Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ansible winrm connection hangs when connection is bad #83081

Open
1 task done
ChristianGeiger3 opened this issue Apr 18, 2024 · 5 comments
Open
1 task done

ansible winrm connection hangs when connection is bad #83081

ChristianGeiger3 opened this issue Apr 18, 2024 · 5 comments
Assignees
Labels
affects_2.16 bug This issue/PR relates to a bug.

Comments

@ChristianGeiger3
Copy link

ChristianGeiger3 commented Apr 18, 2024

Summary

Problem:

We discovered that Ansible somehow hangs when the internet connection is bad and no timeout exception was thrown.
Ansible will hang on some random tasks and not finish or fail.

see log: after the Warning '[WARNING]: ERROR DURING WINRM SEND INPUT - attempting to recover' the process does not continue

Expectation:

If a task does not finish, a timeout error should be thrown and ansible should exit.

Test scenario

I set up an environment where I simulate a bad internet connection and check what ansible does.

  • Ubuntu 22.04
  • latest ansible
  • Bandwidth limiting Traffic Toll (https://github.com/cryzed/TrafficToll)
    • download: 1kbps
    • upload: 1kbps
    • download-minimum: 1kbps
    • upload-minimum: 1kbps
      sudo tt eth1 throttle.yaml --delay 0.5
  • some playbook (see below)

similar issues

#72991
#47255
#79016
ansible-collections/ansible.windows#516

possible solution

I am not familiar with WinRM, but I have created a solution that seems to work.
devel...ChristianGeiger3:ansible:bugfix/winrm_hang

If this could be a solution, can I create a PR?

Issue Type

Bug Report

Component Name

winrm.py

Ansible Version

ansible [core 2.16.6]
  config file = /home/vagrant/ansible.cfg
  configured module search path = ['/home/vagrant/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /home/vagrant/.local/lib/python3.10/site-packages/ansible
  ansible collection location = /home/vagrant/.ansible/collections:/usr/share/ansible/collections
  executable location = /home/vagrant/.local/bin/ansible
  python version = 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] (/usr/bin/python3)
  jinja version = 3.0.3
  libyaml = True

Configuration

CALLBACKS_ENABLED(/home/vagrant/ansible.cfg) = ['xxxx']
CONFIG_FILE() = /home/vagrant/ansible.cfg

OS / Environment

host: ubuntu 22.04
target: win10

Steps to Reproduce

---
- name: Install
  hosts: all

  vars:
    template_file_path: "{{ run_directory_path }}/project/main.yaml"
  vars_files:
    - "{{ template_file_path }}"

  tasks:
    # These variables comes from the installer
    - name: Check variables from the installer
      ansible.builtin.validate_argument_spec:
        argument_spec: '{{required_data}}'
      vars:
        required_data:
          # from download_component role
          local_tmp_root_path:
            type: "str"
            required: true
            description: "The temporary root path. It is the path where the components temporary stored on the control node."

          target_path:
            type: "str"
            required: true
            description: "The root path on the target, where components will be stored."


          source_type:
            type: "str"
            required: true
            description: "The source type, where the content packages are stored."
            choices: ['upack']

          source_arg_0:
            type: "str"
            required: true
            description: "The first argument to be passed to the get_data plugin. This depends on the type of the get_data plugin selected by 'source_type' Upack-> ProGet api endpoint."
          
          source_arg_1:
            type: "str"
            required: true
            description: "The second argument to be passed to the get_data plugin. This depends on the type of the get_data plugin selected by 'source_type' Upack -> cache folder."
          
          # from install_component role
          run_directory_path:
            type: "str"
            required: true
            description: "The path where ansible runner runs."

          component_bundle_path:
            type: "str"
            required: true
            description: "The path to the component bundle. It is relative to the template."

          # from ipc_password role
          serial_number_machine:
            type: "str"
            required: true
            description: "The serial number of the machine. It is part of the id to save the password."
          
          password_service_url:
            type: "str"
            required: true
            description: "The url of the password service."

          password_service_username:
            type: "str"
            required: true
            description: "The user name of the password service."

          password_service_password:
            type: "str"
            required: true
            description: "The password of the password service."

    # Check if template file exists and is not empty
    - name: Check if template file exists
      ansible.builtin.stat:
        path: "{{ template_file_path }}"
      register: template_file_stat_results
      delegate_to: localhost

Expected Results

When an Task will not finish, an Timeout error should be thrown and ansible should finish

Actual Results

ansible-playbook [core 2.16.6]
  config file = /etc/ansible/ansible.cfg
  configured module search path = ['/home/vagrant/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /home/vagrant/.local/lib/python3.10/site-packages/ansible
  ansible collection location = /home/vagrant/.ansible/collections:/usr/share/ansible/collections
  executable location = /home/vagrant/.local/bin/ansible-playbook
  python version = 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] (/usr/bin/python3)
  jinja version = 3.0.3
  libyaml = True
Using /etc/ansible/ansible.cfg as config file
Loading collection edna.installer from /home/vagrant/.ansible/collections/ansible_collections/edna/installer
setting up inventory plugins
Loading collection ansible.builtin from 
host_list declined parsing /home/vagrant/Documents/edna-installer-template-development/runner/inventory/hosts as it did not pass its verify_file() method
script declined parsing /home/vagrant/Documents/edna-installer-template-development/runner/inventory/hosts as it did not pass its verify_file() method
auto declined parsing /home/vagrant/Documents/edna-installer-template-development/runner/inventory/hosts as it did not pass its verify_file() method
Parsed /home/vagrant/Documents/edna-installer-template-development/runner/inventory/hosts inventory source with yaml plugin
running playbook inside collection edna.installer
Loading collection ansible.windows from /home/vagrant/.ansible/collections/ansible_collections/ansible/windows
Loading callback plugin default of type stdout, v2.0 from /home/vagrant/.local/lib/python3.10/site-packages/ansible/plugins/callback/default.py
Loading callback plugin awx_display of type stdout, v2.0 from /usr/local/lib/python3.10/dist-packages/ansible_runner/display_callback/callback/awx_display.py
Attempting to use 'awx_display' callback.
Skipping callback 'awx_display', as we already have a stdout callback.
Attempting to use 'default' callback.
Skipping callback 'default', as we already have a stdout callback.
Attempting to use 'junit' callback.
Attempting to use 'minimal' callback.
Skipping callback 'minimal', as we already have a stdout callback.
Attempting to use 'oneline' callback.
Skipping callback 'oneline', as we already have a stdout callback.
Attempting to use 'tree' callback.

PLAYBOOK: start_installation.yaml **********************************************
Positional arguments: edna.installer.start_installation
verbosity: 6
connection: ssh
become_method: sudo
tags: ('all',)
inventory: ('/home/vagrant/Documents/yyyyyy/runner/inventory',)
extra_vars: ('@/home/vagrant/Documents/yyyyy/runner/env/extravars',)
forks: 5
1 plays in /home/vagrant/.ansible/collections/ansible_collections/edna/installer/playbooks/start_installation.yaml
Read vars_file '{{ template_file_path }}'
Read vars_file '{{ template_file_path }}'
Read vars_file '{{ template_file_path }}'

PLAY [Install] *****************************************************************
Read vars_file '{{ template_file_path }}'

TASK [Gathering Facts] *********************************************************
task path: /home/vagrant/.ansible/collections/ansible_collections/edna/installer/playbooks/start_installation.yaml:2
redirecting (type: modules) ansible.builtin.setup to ansible.windows.setup
Using module file /home/vagrant/.ansible/collections/ansible_collections/ansible/windows/plugins/modules/setup.ps1
Pipelining is enabled.
<192.168.1.12> ESTABLISH WINRM CONNECTION FOR USER: vagrant on PORT 5986 TO 192.168.1.12
<192.168.1.12> WINRM CONNECT: transport=ssl endpoint=https://192.168.1.12:5986/wsman
<192.168.1.12> WINRM OPEN SHELL: 3B1566BE-788A-4B13-8538-DFB14D2AE53A
EXEC (via pipeline wrapper)
<192.168.1.12> WINRM EXEC 'PowerShell' ['-NoProfile', '-NonInteractive', '-ExecutionPolicy', 'Unrestricted', '-EncodedCommand', 'xxxxxxx'
[WARNING]: ERROR DURING WINRM SEND INPUT - attempting to recover:
ConnectionError ('Connection aborted.', TimeoutError('The write operation timed
out'))
.......

Code of Conduct

  • I agree to follow the Ansible Code of Conduct
@ansibot ansibot added bug This issue/PR relates to a bug. needs_triage Needs a first human triage before being processed. affects_2.16 labels Apr 18, 2024
@ansibot
Copy link
Contributor

ansibot commented Apr 18, 2024

Files identified in the description:

If these files are incorrect, please update the component name section of the description or use the component bot command.

@ChristianGeiger3
Copy link
Author

I added somedebug code to the winrm_get_command_output function.
It seems that the first call returns
stdout : b''
stderr : b'#< CLIXML\r\n'

and all following calls will have an 'WinRMOperationTimeoutError'.
So in this szenario the 'try_once' will not work

@bcoca bcoca removed the needs_triage Needs a first human triage before being processed. label Apr 18, 2024
@jborean93
Copy link
Contributor

Thanks for the info, I think I can understand what is happening here but trying to figure out the best way forward here will be hard. Unfortunately while your change might help in your case it is placed there to help in other scenarios I've come across. I've got a few ideas but need some time to figure out the best way forward.

When you applied the change to comment out try_once = False do you find that the task actually succeeds and the module output is returned or it just doesn't hang anymore and actually returns an error?

You could also try out the psrp connection plugin which is a lot more stable when it comes to writing the module payload. You might find it is better for your particular scenario.

@ChristianGeiger3
Copy link
Author

ChristianGeiger3 commented Apr 19, 2024

Hallo, when i remove the try_once=false the module/task fails

Here is the log

TASK [Gathering Facts] *********************************************************
task path: /home/vagrant/.ansible/collections/ansible_collections/edna/installer/playbooks/start_installation.yaml:2
redirecting (type: modules) ansible.builtin.setup to ansible.windows.setup
Using module file /home/vagrant/.ansible/collections/ansible_collections/ansible/windows/plugins/modules/setup.ps1
Pipelining is enabled.
<192.168.1.12> ESTABLISH WINRM CONNECTION FOR USER: vagrant on PORT 5986 TO 192.168.1.12
EXEC (via pipeline wrapper)
[WARNING]: ERROR DURING WINRM SEND INPUT - attempting to recover:
ConnectionError ('Connection aborted.', TimeoutError('The write operation timed
out'))

The full traceback is:
Traceback (most recent call last):
  File "/home/vagrant/.local/lib/python3.10/site-packages/ansible/executor/task_executor.py", line 165, in run
    res = self._execute()
  File "/home/vagrant/.local/lib/python3.10/site-packages/ansible/executor/task_executor.py", line 637, in _execute
    result = self._handler.run(task_vars=vars_copy)
  File "/home/vagrant/.local/lib/python3.10/site-packages/ansible/plugins/action/gather_facts.py", line 109, in run
    res = self._execute_module(module_name=fact_module, module_args=mod_args, task_vars=task_vars, wrap_async=False)
  File "/home/vagrant/.local/lib/python3.10/site-packages/ansible/plugins/action/__init__.py", line 1150, in _execute_module
    res = self._low_level_execute_command(cmd, sudoable=sudoable, in_data=in_data)
  File "/home/vagrant/.local/lib/python3.10/site-packages/ansible/plugins/action/__init__.py", line 1315, in _low_level_execute_command
    rc, stdout, stderr = self._connection.exec_command(cmd, in_data=in_data, sudoable=sudoable)
  File "/home/vagrant/.local/lib/python3.10/site-packages/ansible/plugins/connection/winrm.py", line 760, in exec_command
    return self._winrm_exec(cmd_parts[0], cmd_parts[1:], from_exec=True, stdin_iterator=stdin_iterator)
  File "/home/vagrant/.local/lib/python3.10/site-packages/ansible/plugins/connection/winrm.py", line 687, in _winrm_exec
    b_stderr = _parse_clixml(b_stderr)
  File "/home/vagrant/.local/lib/python3.10/site-packages/ansible/plugins/shell/powershell.py", line 50, in _parse_clixml
    clixml = ET.fromstring(current_element)
  File "/usr/lib/python3.10/xml/etree/ElementTree.py", line 1343, in XML
    return parser.close()
xml.etree.ElementTree.ParseError: no element found: line 1, column 0
fatal: [192.168.1.12]: FAILED! => {
    "msg": "Unexpected failure during module execution: no element found: line 1, column 0",
    "stdout": ""
}

PLAY RECAP *********************************************************************
192.168.1.12               : ok=0    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0   

@jborean93
Copy link
Contributor

Thanks for sharing, I'll try to come up with a good solution that helps in your case as well as the existing one the code is trying to support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects_2.16 bug This issue/PR relates to a bug.
Projects
None yet
Development

No branches or pull requests

4 participants