Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Aeza.net hosting provider #5299

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open

Conversation

cofob
Copy link

@cofob cofob commented May 17, 2024

Proposed Commit Message

Add support for Aeza.net hosting provider

Added a DataSource for the Aeza.net hosting provider, enabling
metadata and user data retrieval and instance provisioning via
cloud-init.

Additional Context

Test urls:

Test Steps

At the moment cloud-init support in our service is in draft state. We have tested the functionality of these changes.

Checklist

Merge type

  • Squash merge using "Proposed Commit Message"
  • Rebase and merge unique commits. Requires commit messages per-commit each referencing the pull request number (#<PR_NUM>)

Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you much @cofob for thinking of adding cloud-init support and driving this pull request to make cloud-init better on your platform. I've added some intiial review comments, suggestions and questions in this first pass. I'd like to see if we can include this support in our next 24.3 release milestone and we'll work with you to get this across the line.

cloudinit/sources/DataSourceAeza.py Outdated Show resolved Hide resolved
cloudinit/sources/DataSourceAeza.py Outdated Show resolved Hide resolved
cloudinit/sources/DataSourceAeza.py Outdated Show resolved Hide resolved
cloudinit/sources/DataSourceAeza.py Outdated Show resolved Hide resolved
tools/ds-identify Show resolved Hide resolved
cloudinit/sources/DataSourceAeza.py Show resolved Hide resolved
cloudinit/sources/helpers/aeza.py Outdated Show resolved Hide resolved
cloudinit/sources/helpers/aeza.py Outdated Show resolved Hide resolved
cloudinit/sources/helpers/aeza.py Outdated Show resolved Hide resolved
@blackboxsw
Copy link
Collaborator

Also a general security question here. Given that user-data and vendor-data likely contains potentially sensitive information and Aeza.net's instance metadata endpoint is providing content over http that appears to be accessible to anyone with the URL http://77.221.156.49/v1/cloudinit/3c0fff62-b1db-4165-a2b2-f819c5ddf83f/meta-data and UUIDs can be guessable, what sort of authentication layer do you have planned to avoid potential exposure of sensitive information to unwanted parties?

@cofob
Copy link
Author

cofob commented Jun 19, 2024

Also a general security question here. Given that user-data and vendor-data likely contains potentially sensitive information and Aeza.net's instance metadata endpoint is providing content over http that appears to be accessible to anyone with the URL http://77.221.156.49/v1/cloudinit/3c0fff62-b1db-4165-a2b2-f819c5ddf83f/meta-data and UUIDs can be guessable, what sort of authentication layer do you have planned to avoid potential exposure of sensitive information to unwanted parties?

The UUID is not guessable, as it always uses UUIDv4, in which 122 bits are random and using system-level random source. I believe that this is a sufficient layer of authorization.
And in order to accurately prevent cloud-init metadata information retrieval the service will be limited through rate-limit requests per second.

@cofob
Copy link
Author

cofob commented Jun 27, 2024

@blackboxsw can you please take a look?

@github-actions github-actions bot added the stale-pr Pull request is stale; will be auto-closed soon label Jul 12, 2024
@cofob
Copy link
Author

cofob commented Jul 12, 2024

@TheRealFalcon can you please take a look?

@TheRealFalcon TheRealFalcon removed the stale-pr Pull request is stale; will be auto-closed soon label Jul 12, 2024
@canonical canonical deleted a comment from github-actions bot Jul 12, 2024
Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the pings on this Pull request and continued work here. PTO got in the way her . I think there are still a couple of open discussion points we should understand (mandatory metadata/user-data files) possibly the use of util.read_seeded and moving some unnecessary side-effects out of __init__

def _get_data(self):
system_uuid = dmi.read_dmi_data("system-uuid")

md = read_metadata(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way read_metadata is written, 404s in any of these route returns a None value.

I don't think we generally want to allow for absent or optional metadata or user-data endpoints to have a correctly discovered datasource.

For cloud-init to be certain it has contacted a viable instance metadata service that matches the discovered datasource type and not some bogus service that happened to be stood up at this public IP address, each datasource generally requires minimally meta-data and user-data files or components to exist (see DataSourceNoCloud as an example).

The meta-data should really be a yaml dict that minimally contains an unique instance-id which cloud-init caches and acts as an indicator to cloud-init that it has applied the desired configuration from the Aeza.net, if that instance-id changes, cloud-init then knows desired configuration has changed for this instance, and it needs to read and apply new configuration for this node. This includes re-creating SSH host keys, setting up default users or whatever other config operations are declared from meta-data/user-data files.

Other keys in meta-data that are optional:

  • local-hostname: the cloud platform's desired hostname for this node
  • cloud-name: if desired (it will fallback to the DataSourceAeza.dsname.lower())
  • public-keys: List of public keys to import for the default user
  • region: string representing the physical region
  • availability-zone: string representing separate AZ if applicable

Again best practice is to provide minimally instance-id that give the platform a knob to trigger a re-run of cloud-init across system boot if necessary due to mandated config/provisioning changes. The rest are optional.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The meta-data should really be a yaml dict that minimally contains an unique instance-id which cloud-init caches

I think we might be being a tad too prescriptive here. At the end of the day, the metadata is useful so that the cloud can get the desired functionality from cloud-init. The form it takes doesn't really matter as long as the cloud-init datasource can scrape it and set the internal structures correctly. Cloud-init can even work just fine without any metadata, but I agree that in order to get cloud-init to do the right thing on snapshots and new instances, a cloud should really be giving us an instance-id...but at the end of the day I wouldn't block on it.

cloudinit/sources/DataSourceAeza.py Outdated Show resolved Hide resolved
doc/rtd/reference/datasources/aeza.rst Outdated Show resolved Hide resolved
tools/ds-identify Show resolved Hide resolved
doc/rtd/reference/datasources/aeza.rst Show resolved Hide resolved
cloudinit/sources/DataSourceAeza.py Outdated Show resolved Hide resolved
cloudinit/sources/DataSourceAeza.py Outdated Show resolved Hide resolved
cloudinit/sources/DataSourceAeza.py Outdated Show resolved Hide resolved
cloudinit/sources/DataSourceAeza.py Show resolved Hide resolved
cloudinit/sources/DataSourceAeza.py Outdated Show resolved Hide resolved
@blackboxsw
Copy link
Collaborator

Hi @cofob, I don't see that your github username has signed the cloud-init's Contributor License Agreement. So that I can enable CI on this PR and move toward merging this PR, would you mind reviewing and signing our CLA so that we may include this work in upstream cloud-init?

@blackboxsw blackboxsw added the CLA not signed The submitter of the PR has not (yet) signed the CLA label Jul 18, 2024
@cofob
Copy link
Author

cofob commented Jul 25, 2024

Hi @cofob, I don't see that your github username has signed the cloud-init's Contributor License Agreement. So that I can enable CI on this PR and move toward merging this PR, would you mind reviewing and signing our CLA so that we may include this work in upstream cloud-init?

I signed the CLA

@github-actions github-actions bot added the documentation This Pull Request changes documentation label Jul 25, 2024
@cofob cofob requested a review from blackboxsw July 25, 2024 14:32
@cofob
Copy link
Author

cofob commented Jul 25, 2024

@blackboxsw I have made the changes according to your comments.

@blackboxsw blackboxsw added CLA signed The submitter of the PR has signed the CLA and removed CLA not signed The submitter of the PR has not (yet) signed the CLA labels Jul 31, 2024
@cofob
Copy link
Author

cofob commented Aug 14, 2024

@blackboxsw can you review my changes please?

Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This update looks better thank you @cofob. I have a number of comments inline, one of which would be that we need to rebase this branch against tip of main and adapt the util.read_seeded to either expect an ignored _network_config return value as the 4th item of the 4-tuple, or add a new param to util.read_seeded to ignore_files=['network-config'] to avoid that round trip on your platform.

Below is a diff of all other inline changes/suggestions I've made in review

commit 1e5417d14fc44e868f5e87e01e707976b2c321ac (HEAD -> aeza)
Author: Chad Smith <[email protected]>
Date:   Thu Aug 15 17:45:27 2024 -0600

    review: move side-effects out of __init__ and into _get_data
    
    - factor side-effects out of __init__ and into _get_data
    - raise source.InvalidMetaDataError on unexpected format
    - update docs to mention other datasource config options retries/timeout
    - add unittests for ds_detect
    - use pytest instead of CiTestCase

diff --git a/cloudinit/sources/DataSourceAeza.py b/cloudinit/sources/DataSourceAeza.py
index 62f757095..fd43be43b 100644
--- a/cloudinit/sources/DataSourceAeza.py
+++ b/cloudinit/sources/DataSourceAeza.py
@@ -6,7 +6,7 @@
 
 import logging
 
-from cloudinit import sources, dmi, util
+from cloudinit import dmi, sources, util
 
 LOG = logging.getLogger(__name__)
 
@@ -24,45 +24,35 @@ class DataSourceAeza(sources.DataSource):
 
         self.ds_cfg = util.mergemanydict([self.ds_cfg, BUILTIN_DS_CONFIG])
 
-        url_params = self.get_url_params()
-        self.timeout_seconds = url_params.timeout_seconds
-        self.max_wait_seconds = url_params.max_wait_seconds
-        self.retries = url_params.num_retries
-        self.sec_between_retries = url_params.sec_between_retries
+    @staticmethod
+    def ds_detect():
+        return dmi.read_dmi_data("system-manufacturer") == "Aeza"
 
+    def _get_data(self):
         system_uuid = dmi.read_dmi_data("system-uuid")
-        self.metadata_address = (
+        metadata_address = (
             self.ds_cfg["metadata_url"].format(
                 id=system_uuid,
             )
             + "%s"
         )
-
-    @staticmethod
-    def ds_detect():
-        return dmi.read_dmi_data("system-manufacturer") == "Aeza"
-
-    def _get_data(self):
+        url_params = self.get_url_params()
         md, ud, vd = util.read_seeded(
-            self.metadata_address,
-            timeout=self.timeout_seconds,
-            retries=self.retries,
+            metadata_address,
+            timeout=url_params.timeout_seconds,
+            retries=url_params.num_retries,
         )
 
         if md is None:
-            LOG.warn(
-                "Failed to read metadata from %s",
-                self.metadata_address,
+            raise sources.InvalidMetaDataException(
+                f"Failed to read metadata from {metadata_address}",
             )
-            return False
         if not isinstance(md.get("instance-id"), str):
-            LOG.warn(
-                "Metadata does not contain instance-id",
+            raise sources.InvalidMetaDataException(
+                f"Metadata does not contain instance-id: {md}"
             )
-            return False
         if not isinstance(ud, bytes):
-            LOG.warn("Userdata is not bytes")
-            return False
+            raise sources.InvalidMetaDataException("Userdata is not bytes")
 
         self.metadata, self.userdata_raw, self.vendordata_raw = md, ud, vd
 
diff --git a/doc/rtd/reference/datasources/aeza.rst b/doc/rtd/reference/datasources/aeza.rst
index 767215594..430cf3fd2 100644
--- a/doc/rtd/reference/datasources/aeza.rst
+++ b/doc/rtd/reference/datasources/aeza.rst
@@ -21,6 +21,21 @@ Aeza's datasource can be configured as follows: ::
   Specifies the URL to retrieve the VPS meta-data. (optional)
   Default: ``http://77.221.156.49/v1/cloudinit/{id}/``
 
+* ``timeout``
+
+  The timeout value provided to ``urlopen`` for each individual http request.
+  This is used both when selecting a ``metadata_url`` and when crawling the
+  metadata service.
+
+  Default: 10
+
+* ``retries``
+
+  The number of retries that should be attempted for an http request. This
+  value is used only after ``metadata_url`` is selected.
+
+  Default: 5
+
 .. note::
    ``{id}`` in URLs is system-uuid DMI value.
 
diff --git a/tests/unittests/sources/test_aeza.py b/tests/unittests/sources/test_aeza.py
index 4f9e0363e..47d05675f 100644
--- a/tests/unittests/sources/test_aeza.py
+++ b/tests/unittests/sources/test_aeza.py
@@ -3,10 +3,14 @@
 # Author: Egor Ternovoy <[email protected]>
 #
 # This file is part of cloud-init. See LICENSE file for license information.
+import re
+
+import pytest
 
 from cloudinit import helpers, settings, util
-from cloudinit.sources import DataSourceAeza
-from tests.unittests.helpers import CiTestCase, mock
+from cloudinit.sources import DataSourceAeza as aeza
+from cloudinit.sources import InvalidMetaDataException
+from tests.unittests.helpers import mock
 
 METADATA = util.load_yaml(
     """---
@@ -25,42 +29,117 @@ runcmd:
 """
 
 
-class TestDataSourceAeza(CiTestCase):
-    """
-    Test reading the meta-data
-    """
+M_PATH = "cloudinit.sources.DataSourceAeza."
 
-    def setUp(self):
-        super(TestDataSourceAeza, self).setUp()
-        self.tmp = self.tmp_dir()
 
-    def get_ds(self):
-        distro = mock.MagicMock()
-        distro.get_tmp_exec_path = self.tmp_dir
-        ds = DataSourceAeza.DataSourceAeza(
-            settings.CFG_BUILTIN,
-            distro,
-            helpers.Paths({"run_dir": self.tmp}),
-        )
-        return ds
+class TestDataSourceAeza:
+    """Test Aeza.net reading instance-data"""
+
+    @pytest.mark.parametrize(
+        "system_manufacturer,expected",
+        (
+            pytest.param("Aeza", True, id="dmi_platform_match_aeza"),
+            pytest.param("aeza", False, id="dmi_platform_match_case_sensitve"),
+            pytest.param("Aezanope", False, id="dmi_platform_strict_match"),
+        ),
+    )
+    @mock.patch(f"{M_PATH}dmi.read_dmi_data")
+    def test_ds_detect(self, m_read_dmi_data, system_manufacturer, expected):
+        """Only strict case-senstiive match on DMI system-manfacturer Aeza"""
+        m_read_dmi_data.return_value = system_manufacturer
+        assert expected is aeza.DataSourceAeza.ds_detect()
 
+    @pytest.mark.parametrize(
+        "sys_cfg,expected_calls",
+        (
+            pytest.param(
+                {},
+                [
+                    mock.call(
+                        "http://77.221.156.49/v1/cloudinit/1dd9a779-uuid/%s",
+                        timeout=10,
+                        retries=5,
+                    )
+                ],
+                id="default_sysconfig_ds_url_retry_and_timeout",
+            ),
+            pytest.param(
+                {
+                    "datasource": {
+                        "Aeza": {
+                            "timeout": 7,
+                            "retries": 8,
+                            "metadata_url": "https://somethingelse/",
+                        }
+                    }
+                },
+                [
+                    mock.call(
+                        "https://somethingelse/%s",
+                        timeout=7,
+                        retries=8,
+                    )
+                ],
+                id="custom_sysconfig_ds_url_retry_and_timeout_overrides",
+            ),
+        ),
+    )
     @mock.patch("cloudinit.util.read_seeded")
-    @mock.patch("cloudinit.sources.DataSourceAeza.DataSourceAeza.ds_detect")
+    @mock.patch(f"{M_PATH}dmi.read_dmi_data")
     def test_read_data(
         self,
-        m_ds_detect,
+        m_read_dmi_data,
         m_read_seeded,
+        sys_cfg,
+        expected_calls,
+        paths,
+        tmpdir,
     ):
-        m_ds_detect.return_value = True
+        m_read_dmi_data.return_value = "1dd9a779-uuid"
         m_read_seeded.return_value = (METADATA, USERDATA, VENDORDATA)
 
-        with self.allow_subp(True):
-            ds = self.get_ds()
-        ret = ds.get_data()
-        self.assertTrue(ret)
+        ds = aeza.DataSourceAeza(
+            sys_cfg=sys_cfg, distro=mock.Mock(), paths=paths
+        )
+        with mock.patch.object(ds, "ds_detect", return_value=True):
+            assert True is ds.get_data()
+
+        assert m_read_seeded.call_args_list == expected_calls
+        assert ds.get_public_ssh_keys() == METADATA.get("public-keys")
+        assert isinstance(ds.get_public_ssh_keys(), list)
+        assert ds.userdata_raw == USERDATA
+        assert ds.vendordata_raw == VENDORDATA
+
+    @pytest.mark.parametrize(
+        "metadata,userdata,error_msg",
+        (
+            ({}, USERDATA, "Metadata does not contain instance-id: {}"),
+            (
+                None,
+                USERDATA,
+                "Failed to read metadata from http://77.221.156.49/v1/cloudinit/1dd9a779-uuid/%s",
+            ),
+            ({"instance-id": "yep"}, "non-bytes", "Userdata is not bytes"),
+        ),
+    )
+    @mock.patch("cloudinit.util.read_seeded")
+    @mock.patch(f"{M_PATH}dmi.read_dmi_data")
+    def test_not_detected_on_invalid_instance_data(
+        self,
+        m_read_dmi_data,
+        m_read_seeded,
+        metadata,
+        userdata,
+        error_msg,
+        paths,
+    ):
+        """Assert get_data returns False for unexpected format conditions."""
+        m_read_dmi_data.return_value = "1dd9a779-uuid"
+        m_read_seeded.return_value = (metadata, userdata, VENDORDATA)
 
-        self.assertTrue(m_read_seeded.called)
-        self.assertEqual(ds.get_public_ssh_keys(), METADATA.get("public-keys"))
-        self.assertIsInstance(ds.get_public_ssh_keys(), list)
-        self.assertEqual(ds.get_userdata_raw(), USERDATA)
-        self.assertEqual(ds.get_vendordata_raw(), VENDORDATA)
+        ds = aeza.DataSourceAeza(sys_cfg={}, distro=mock.Mock(), paths=paths)
+        with pytest.raises(
+            InvalidMetaDataException, match=re.escape(error_msg)
+        ):
+            with mock.patch.object(ds, "ds_detect", return_value=True):
+                ds.get_data()
diff --git a/tests/unittests/sources/test_common.py b/tests/unittests/sources/test_common.py
index 6a7b2d527..1e06b0efc 100644
--- a/tests/unittests/sources/test_common.py
+++ b/tests/unittests/sources/test_common.py
@@ -4,6 +4,7 @@ from unittest.mock import patch
 
 from cloudinit import importer, settings, sources, type_utils
 from cloudinit.sources import DataSource
+from cloudinit.sources import DataSourceAeza as Aeza
 from cloudinit.sources import DataSourceAkamai as Akamai
 from cloudinit.sources import DataSourceAliYun as AliYun
 from cloudinit.sources import DataSourceAltCloud as AltCloud
@@ -34,7 +35,6 @@ from cloudinit.sources import DataSourceUpCloud as UpCloud
 from cloudinit.sources import DataSourceVMware as VMware
 from cloudinit.sources import DataSourceVultr as Vultr
 from cloudinit.sources import DataSourceWSL as WSL
-from cloudinit.sources import DataSourceAeza as Aeza
 from tests.unittests import helpers as test_helpers
 
 DEFAULT_LOCAL = [

I can push this commit to your PR if you wish, and rebase against main if you like to ensure we minimally get the updated util.read_seeded changes, but I wanted to leave the decision to you as to what you would prefer to do? Do you want your platform to ignore network-config by default? If so, I can provide a separate PR if you like against tip of main to allow avoiding network-config requests from read_seeded that we could just rebase into this branch after review or we could incorporate that changeset into this branch.

What would you prefer?

Comment on lines 27 to 31
url_params = self.get_url_params()
self.timeout_seconds = url_params.timeout_seconds
self.max_wait_seconds = url_params.max_wait_seconds
self.retries = url_params.num_retries
self.sec_between_retries = url_params.sec_between_retries
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid side-effects in init and setting instance variables for values that are only used in a single function _get_data. We can move this into _get_data as locals and drop the unused sec_between_retries and max_wait_seconds

"""


class TestDataSourceAeza(CiTestCase):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the unit test start here. Some things we hope to generally consider in unittests:

  • Across the board we are moving away from CITestCase as we are looking to retire that test class. We prefer instead to use more pytest instead of builtin unittest module. It gives us access to fixtures like tmpdir/caplog and parametrizatoin etc. Let's refactor this test a bit.
  • The unittest we have here are mocks ds_detect and doesn't exercise what ds_detect queries. Let's get some test coverage there to assert we are actually looking at the 'right' dmi value.

self.retries = url_params.num_retries
self.sec_between_retries = url_params.sec_between_retries

system_uuid = dmi.read_dmi_data("system-uuid")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect this value to change at all across system reboot? If so, __init__ isn't going to be called across system reboot due to datasource cache in /var/lib/cloud/instance/obj.pkl. We should likely be performing this inside _get_data to ensure it's called each boot.

cloudinit/sources/DataSourceAeza.py Outdated Show resolved Hide resolved
)

@staticmethod
def ds_detect():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs unittest coverage

)

if md is None:
LOG.warn(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In cases where instance data isn't the expected format we generally raise sources.InvalidMetaDataException instead of logging a plain warning level message

return False
if not isinstance(md.get("instance-id"), str):
LOG.warn(
"Metadata does not contain instance-id",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's parametrize log values to tell us what we actually did see, intead of static logs that provide no further information.

Suggested change
"Metadata does not contain instance-id",
"Metadata does not contain instance-id: %s", md

)
return False
if not isinstance(ud, bytes):
LOG.warn("Userdata is not bytes")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We won't parametrize user-data and add it to logs generally because it's highly likely to contain sensitive information (passwords) which we don't want to persist in cloud-init.log if possible.

return dmi.read_dmi_data("system-manufacturer") == "Aeza"

def _get_data(self):
md, ud, vd = util.read_seeded(
Copy link
Collaborator

@blackboxsw blackboxsw Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A commit landed in tip of main that is incompatible with this changeset as read_seeded now returns a 4-tuple from util.read_seeded which looks for a supplemental/optional network-config file.We need to git rebase on this branch to refresh to latest commit on main and possibly ignore that network-config 4-tuple return value. I haven't rebased your PR against upstream/main and pushed my suggested changes as a commit because we may actually want to think about providing a supplemental parameter to allow read_seeded to ignore GETs of network-config since network-config isdisregared by this platform and we don't want to waste URL retries on an unexpected/irrelevant file.

@blackboxsw
Copy link
Collaborator

@cofob if the proposed changes work for you, I think the final thing we'd need (once we agree on the approach for how to address util.read_seeded) would be to have you test this version of cloud-init in your environment and paste the following:

  • captured output of sudo DI_LOG=stderr /usr/lib/cloud-init/ds-identify --force on a working system to ensure ds-identify is working well.
  • Captured /var/log/cloud-init.log after running sudo cloud-init init --local and sudo cloud-init init
  • Captured output of sudo cloud-init query --all

@blackboxsw
Copy link
Collaborator

Ping @cofob. I think this PR needs a git rebase upstream/main to ensure it has visibility to the incompatible changes in util.read_seeded which surfaces a 4-tuple return value. Let us know how we should proceed with the previous change suggestions.

cloudinit/util.py Outdated Show resolved Hide resolved
cloudinit/util.py Outdated Show resolved Hide resolved
cloudinit/util.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!
Minor diff remaining to fix mypy, doc-check and ruff and dropped that extra unittest.

And lastly, can we get output of a successful run of the suggested commands against the latest commit to confirm successful datasource discovery?

diff --git a/cloudinit/sources/DataSourceAeza.py b/cloudinit/sources/DataSourceAeza.py
index d55dc16e0..40917efb3 100644
--- a/cloudinit/sources/DataSourceAeza.py
+++ b/cloudinit/sources/DataSourceAeza.py
@@ -52,9 +52,6 @@ class DataSourceAeza(sources.DataSource):
             raise sources.InvalidMetaDataException(
                 f"Metadata does not contain instance-id: {md}"
             )
-        if not isinstance(ud, bytes):
-            raise sources.InvalidMetaDataException("Userdata is not bytes")
-
         self.metadata, self.userdata_raw, self.vendordata_raw = md, ud, vd
 
         return True
diff --git a/doc/rtd/reference/datasources/aeza.rst b/doc/rtd/reference/datasources/aeza.rst
index 430cf3fd2..3b5b452a7 100644
--- a/doc/rtd/reference/datasources/aeza.rst
+++ b/doc/rtd/reference/datasources/aeza.rst
@@ -37,6 +37,6 @@ Aeza's datasource can be configured as follows: ::
   Default: 5
 
 .. note::
-   ``{id}`` in URLs is system-uuid DMI value.
+   ``{id}`` in URLs is ``system-uuid`` DMI value.
 
 .. _Aeza: https://wiki.aeza.net/cloud-init
diff --git a/tests/unittests/sources/test_aeza.py b/tests/unittests/sources/test_aeza.py
index a07cf43c3..0374e168a 100644
--- a/tests/unittests/sources/test_aeza.py
+++ b/tests/unittests/sources/test_aeza.py
@@ -7,7 +7,7 @@ import re
 
 import pytest
 
-from cloudinit import helpers, settings, util
+from cloudinit import util
 from cloudinit.sources import DataSourceAeza as aeza
 from cloudinit.sources import InvalidMetaDataException
 from tests.unittests.helpers import mock
@@ -121,7 +121,6 @@ class TestDataSourceAeza:
                 USERDATA,
                 "Failed to read metadata from http://77.221.156.49/v1/cloudinit/1dd9a779-uuid/%s",
             ),
-            ({"instance-id": "yep"}, "non-bytes", "Userdata is not bytes"),
         ),
     )
     @mock.patch("cloudinit.util.read_seeded")

Comment on lines +55 to +56
if not isinstance(ud, bytes):
raise sources.InvalidMetaDataException("Userdata is not bytes")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we really expect bytes here from util.read_seeded for ud. self.userdata_raw certainly expects str. I think we can drop this check here as this is also not needed for DataSourceNoCloudNet when it reaches out to a remote URL for user-data files.

Suggested change
if not isinstance(ud, bytes):
raise sources.InvalidMetaDataException("Userdata is not bytes")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA signed The submitter of the PR has signed the CLA documentation This Pull Request changes documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants