Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] - Azure deployment fails after upgrade on 2025.2.1rc2 #2964

Closed
viniciusdc opened this issue Feb 18, 2025 · 1 comment
Closed

[BUG] - Azure deployment fails after upgrade on 2025.2.1rc2 #2964

viniciusdc opened this issue Feb 18, 2025 · 1 comment
Assignees
Labels
area: nebari-cli area: user experience 👩🏻‍💻 impact: high 🟥 This issue affects most of the nebari users or is a critical issue provider: Azure type: bug 🐛 Something isn't working

Comments

@viniciusdc
Copy link
Contributor

viniciusdc commented Feb 18, 2025

Describe the bug

Due to recent changes in the Azure provider version (#2812), a few inner attributes from the cluster networking configuration were deprecated and removed, while some were entirely replaced.

This does not affect any nebari deployment per sisince the actual apply command ccalls the tofu init method under the hood. However, before deploying to avoid misuse of specific attributes in our config, we ran check_immutable_file.d

def check_immutable_fields(self):
nebari_config_state = self.get_nebari_config_state()
if not nebari_config_state:
return

def get_nebari_config_state(self) -> dict:
directory = str(self.output_directory / self.stage_prefix)
tf_state = opentofu.show(directory)
nebari_config_state = None

which depends on tofu show --json to load the state data info that is used later in the checks. The main problem comes when there are provider version schema changes, which is the case for this release. In this situation, the open tofu docs suggests its users run tofu refresh to update the provider versions beforehand, as seen below:

If you've updated providers that contain new schema versions since the state was written, the state needs to be upgraded before it can be displayed with show -JSON. If you are viewing a plan, it must be created without -refresh=false. If you are viewing a state file, run tofu refresh first.

Based on a quick look at the code, we have two options:

  • Include a parsing/override logic into the upgrade command to fix this for this release by manually updating the affected fields in the state files ourselves;
  • Update the inner logic around the check_immutable_fields to properly refresh its state before attempting to run tofu show;

To me, addressing the root cause of the issue will not be the best in this case, but it would also address the issue in eventual provider updates without requiring us to maintain these patches in the upgrade command. However, a caveat is that since tofu show didn't depend on any input variable for it to run, that was not implemented considering this, while tofu refresh requires the values of those inputs to be passed down or else the results in an error with missing vars.

Expected behavior

Correct run of nebari's deployment

OS and architecture in which you are running Nebari

Linux

How to Reproduce the problem?

deploy an Azure deployment using the latest release (2024.12.1) and then, after running nebari upgrade with the latest RC, run nebari deploy

Command output

[tofu]: 
[tofu]: Initializing the backend...
[tofu]: Upgrading modules...
[tofu]: - terraform-state in modules/terraform-state
[tofu]: 
[tofu]: Initializing provider plugins...
[tofu]: - terraform.io/builtin/terraform is built in to OpenTofu
[tofu]: - Finding hashicorp/azurerm versions matching "4.7.0"...
[tofu]: - Installing hashicorp/azurerm v4.7.0...
[tofu]: - Installed hashicorp/azurerm v4.7.0 (signed, key ID 0C0AF313E5FD9F80)
[tofu]: 
[tofu]: Providers are signed by their developers.
[tofu]: If you'd like to know more about provider signing, you can read about it here:
[tofu]: https://opentofu.org/docs/cli/plugins/signing/
[tofu]: 
[tofu]: OpenTofu has made some changes to the provider dependency selections recorded
[tofu]: in the .terraform.lock.hcl file. Review those changes and commit them to your
[tofu]: version control system if they represent changes you intended to make.
[tofu]: 
[tofu]: OpenTofu has been successfully initialized!
[tofu]: 
[tofu]: You may now begin working with OpenTofu. Try running "tofu plan" to see
[tofu]: any changes that are required for your infrastructure. All OpenTofu commands
[tofu]: should now work.
[tofu]: 
[tofu]: If you ever set or change modules or backend configuration for OpenTofu,
[tofu]: rerun this command to reinitialize your working directory. If you forget, other
[tofu]: commands will detect it and remind you to do so if necessary.
[tofu]: Failed to marshal state to json: unsupported attribute "enable_https_traffic_only"

Versions and dependencies used.

No response

Compute environment

Azure

Integrations

No response

Anything else?

No response

@viniciusdc viniciusdc added needs: triage 🚦 Someone needs to have a look at this issue and triage type: bug 🐛 Something isn't working labels Feb 18, 2025
@viniciusdc viniciusdc self-assigned this Feb 18, 2025
@viniciusdc viniciusdc added area: user experience 👩🏻‍💻 provider: Azure impact: high 🟥 This issue affects most of the nebari users or is a critical issue area: nebari-cli and removed needs: triage 🚦 Someone needs to have a look at this issue and triage labels Feb 18, 2025
@viniciusdc
Copy link
Contributor Author

addressed by #2965

@github-project-automation github-project-automation bot moved this from New 🚦 to Done 💪🏾 in 🪴 Nebari Project Management Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: nebari-cli area: user experience 👩🏻‍💻 impact: high 🟥 This issue affects most of the nebari users or is a critical issue provider: Azure type: bug 🐛 Something isn't working
Projects
Status: Done 💪🏾
Development

No branches or pull requests

1 participant