Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

connection refused when running init -upgrade with provider cache #3123

Open
1 task done
norman-zon opened this issue May 7, 2024 · 9 comments
Open
1 task done
Assignees
Labels
bug Something isn't working terragrunt

Comments

@norman-zon
Copy link

Describe the bug

Running terragrunt init -upgrade with TERRAGRUNT_PROVIDER_CACHE=1 fails in CI due to refused connection.

Steps To Reproduce

That's tricky. I could not reproduce in a minimal example. This only seems to happen in our production setup.

export TERRAGRUNT_PROVIDER_CACHE=1

  function print_warn() {
      echo -e "\033[0;33mWARN: $1\033[0m"
  }

  for DIR in $(find ${{ inputs.dir }} -name '.terraform.lock.hcl' ! -path '*/.terragrunt-cache/*' -exec dirname {} \; | sort); do
    if (grep -q -e "^\s*skip\s*=\s*true$" $DIR/terragrunt.hcl); then
      print_warn "Skipping $DIR: 'skip = true' in terragrunt.hcl"
    else
      echo "DIR: $DIR"
      if ! (terragrunt init -upgrade --terragrunt-working-dir $DIR --terragrunt-log-level debug); then
        print_warn "Failed to init terragrunt in $DIR"
        print_warn "Deleting .terraform.lock.hcl and retry"
        rm -f $DIR/.terraform.lock.hcl
        terragrunt init -upgrade --terragrunt-working-dir $DIR --terragrunt-log-level debug
      fi
      terragrunt providers lock -platform=darwin_amd64 -platform=linux_amd64 -platform=darwin_arm64 --terragrunt-working-dir $DIR --terragrunt-log-level debug
    fi
  done

The error:

│ Error: Failed to resolve provider packages
│ 
│ Could not resolve provider hashicorp/time: could not query provider
│ registry for registry.opentofu.org/hashicorp/time: the request failed after
│ 2 attempts, please try again later: Get
│ "http://127.0.0.1:44237/v1/providers//registry.opentofu.org/hashicorp/time/versions":
│ dial tcp 127.0.0.1:44237: connect: connection refused

Expected behavior

Terragrunt starts the cache server and is able to serve providers from there.

Nice to haves

Versions

  • Terragrunt version: 0.58.3
  • OpenTofu version: 1.7.0
  • Environment details (Ubuntu 20.04, Windows 10, etc.): Github Action Runner, based on Ubuntu 22.04

Additional context

I'm sorry I can't provide a reproducible example. Hopefully the logs are helpful.

@norman-zon norman-zon added the bug Something isn't working label May 7, 2024
@norman-zon
Copy link
Author

I just tried run-all instead:

export TERRAGRUNT_PROVIDER_CACHE=1
export TERRAGRUNT_PARALLELISM=5
terragrunt run-all init -upgrade --terragrunt-working-dir ${{ inputs.dir }}
terragrunt run-all providers lock -platform=darwin_amd64 -platform=linux_amd64 -platform=darwin_arm64 --terragrunt-working-dir ${{ inputs.dir }}

with the same result

@levkohimins levkohimins self-assigned this May 7, 2024
@levkohimins
Copy link
Contributor

levkohimins commented May 7, 2024

Hi @norman-zon, thanks for the full log, it's really helpful.

We are still working on improving Terragrunt Provider Cache, so critical errors in the logs are not yet obvious enough.
You receive the error connect: connection refused because the cache server was unable to verify the signature of some providers.

2024-05-07T12:48:14.4368305Z time=2024-05-07T12:48:14Z level=error msg=tofu invocation failed in /home/runner/_work/terraform-projects/terraform-projects/projects-team-interactive/base/environments/production/.terragrunt-cache/WDfGuyxxnvdter3UCVGXKlJ7Ylc/OuHMLorwoQH1IrPrCjcWG7kWxUQ prefix=[projects-team-interactive/base/environments/production] 
2024-05-07T12:48:14.4370690Z time=2024-05-07T12:48:14Z level=error msg=1 error occurred:
2024-05-07T12:48:14.4371255Z 	* authentication signature from unknown issuer

This is most likely due to the fact that terragrunt now uses tofu by default, and unless you explicitly specify a registry host in the source, tofu downloads them from its own registry where some providers have invalid signatures.

There are two ways to solve the issue:

  1. Using terraform TERRAGRUNT_TFPATH=terraform
  2. Explicitly specifying the registry host, example:
terraform {
  required_providers {
    fastly = {
      source  = "registry.terraform.io/fastly/fastly"
    }
  }
}

@levkohimins
Copy link
Contributor

1  tmuxinator local 2024-05-07 at 9 30 44 PM

@levkohimins
Copy link
Contributor

levkohimins commented May 7, 2024

@norman-zon, please do not close this issue. I will change the behavior of signature verification so that it does not lead to a critical error, but only a notification.

@linear linear bot added the terragrunt label May 7, 2024
@norman-zon
Copy link
Author

@levkohimins I do use tofu on purpose.
Also I noticed the warnings about the provider signatures. But I wonder why this happens for hashicorp owned providers, like hashicorp/time in my example above:

"http://127.0.0.1:44237/v1/providers//registry.opentofu.org/hashicorp/time/versions":
dial tcp 127.0.0.1:44237: connect: connection refused

Furthermore when trying to recreate the behaviour in a minimal example and only using hashicorp/time, no other providers, no modules, the caching worked:
3_Lock TG providers.txt

@levkohimins
Copy link
Contributor

@levkohimins I do use tofu on purpose. Also I noticed the warnings about the provider signatures. But I wonder why this happens for hashicorp owned providers, like hashicorp/time in my example above:

"http://127.0.0.1:44237/v1/providers//registry.opentofu.org/hashicorp/time/versions":
dial tcp 127.0.0.1:44237: connect: connection refused

Furthermore when trying to recreate the behaviour in a minimal example and only using hashicorp/time, no other providers, no modules, the caching worked: 3_Lock TG providers.txt

I didn't find any errors in your logs. But in any case, as I said earlier, use the full path in the source attribute registry.terraform.io/*/* to force tofu to use the providers from the terraform registry, or wait until this issue is fixed.

@norman-zon
Copy link
Author

norman-zon commented May 8, 2024

I'm a little confused.

The error I am seeing is connection refused for hashicorp/time. I don't understand how this is connected to the missing provider signatures.

Here is an example debug-log of a run using hashicorp/time and ns1-terraform/ns1, which is not signed, as the log indicates. But this still does not cause a connection refused error, although having provider cache enabled.

This leads me to believe the cause for the connection refused is not the missing signature.

EDIT:
image

The screenshot shows the missing signature as a warning, but not resulting in an error

@levkohimins
Copy link
Contributor

levkohimins commented May 9, 2024

Hope this clears things up for you:

  • The cache server is responsible for verifying the signature. At the moment, if the signature can't be verified, it fails with the error authentication signature from unknown issuer, which is what I'm going to fix.
  • The signature is checked only for providers that have not yet been cached, so the error does not occur if the provider has been cached previously.
  • When the cache server is not available Terraform returns the error connection refused.

@levkohimins
Copy link
Contributor

levkohimins commented May 23, 2024

Hi @norman-zon, Could you confirm that the issue has been resolved at least as of version v0.58.9?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working terragrunt
Projects
None yet
Development

No branches or pull requests

2 participants