diff --git a/docs/404.html b/docs/404.html index 511515dec80..a853ff097c1 100644 --- a/docs/404.html +++ b/docs/404.html @@ -2,5 +2,5 @@

Page Not Found

We dug around, but couldn't find the page that you were looking for.

You could go back to our home page or use the search bar to find what you were looking for.

Page Not Found

We dug around, but couldn't find the page that you were looking for.

You could go back to our home page or use the search bar to find what you were looking for.

\ No newline at end of file diff --git a/docs/_print/adopter/index.html b/docs/_print/adopter/index.html index 41d864bb2e5..84d3394bca7 100644 --- a/docs/_print/adopter/index.html +++ b/docs/_print/adopter/index.html @@ -2,5 +2,5 @@

See who is using Gardener

Gardener adopters in production environments that have publicly shared details of their usage.

teaser

SAPSAP uses Gardener to deploy and manage Kubernetes clusters at scale in a uniform way across infrastructures (AWS, Azure, GCP, Alicloud, as well as generic interfaces to OpenStack and vSphere). Workloads include Databases (SAP HANA Cloud), Big Data (SAP Data Intelligence), Kyma, many other cloud native applications, and diverse business workloads.
OVHcloudGardener can now be run by customers on the Public Cloud Platform of the leading European Cloud Provider OVHcloud.
ScaleUp TechnologiesScaleUp Technologies runs Gardener within their public Openstack Clouds (Hamburg, Berlin, Düsseldorf). Their clients run all kinds of workloads on top of Gardener maintained Kubernetes clusters ranging from databases to Software-as-a-Service applications.
Finanz Informatik Technologie Services GmbHFinanz Informatik Technologie Services GmbH uses Gardener to offer k8s as a service for customers in the financial industry in Germany. It is built on top of a “metal as a service” infrastructure implemented from scratch for k8s workloads in mind. The result is k8s on top of bare metal in minutes.
PingCAPPingCAP TiDB, is a cloud-native distributed SQL database with MySQL compatibility, and one of the most popular open-source database projects - with 23.5K+ stars and 400+ contributors. Its sister project TiKV is a Cloud Native Interactive Landscape project. PingCAP envisioned their managed TiDB service, known as TiDB Cloud, to be multi-tenant, secure, cost-efficient, and to be compatible with different cloud providers and they chose Gardener.
BeezlabsBeezlabs uses Gardener to deliver Intelligent Process Automation platform, on multiple cloud providers and reduce costs and lock-in risks.
b’nerdb’nerd uses Gardener as the core technology for its own managed Kubernetes as a Service solution and operates multiple Gardener installations for several cloud hosting service providers.
STACKITSTACKIT is a digital brand of Europe’s biggest retailer, the Schwarz Group, which includes Lidl, Kaufland, but also production and recycling companies. It uses Gardener to offer public and private Kubernetes as a service in own data centers in Europe and targets to become the cloud provider for German and European small and mid-sized companies.
T-SystemsSupporting and managing multiple application landscapes on-premises and across different hyperscaler infrastructures can be painful. At T-Systems we use Gardener both for internal usage and to manage clusters for our customers. We love the openness of the project, the flexibility and the architecture that allows us to manage clusters around the world with only one team from one single pane of glass and to meet industry specific certification standards. The sovereignty by design is another great value, the technology implicitly brings along.
23 TechnologiesThe German-based company 23 Technologies uses Gardener to offer an enterprise-class Kubernetes engine for industrial use cases as well as cloud service providers and offers managed and professional services for it. 23T is also the team behind okeanos.dev, a public service that can be used by anyone to try out Gardener.
B1 Systems GmbHB1 Systems GmbH is a international provider of Linux & Open Source consulting, training, managed service & support. We are founded in 2004 and based in Germany. Our team of 140 Linux experts offers tailor-made solutions based on cloud & container technologies, virtualization & high availability as well as monitoring, system & configuration management. B1 is using Gardener internally and also set up solutions/environments for customers.
finleap connect GmbHfinleap connect GmbH is the leading independent Open Banking platform provider in Europe. It enables companies across a multitude of industries to provide the next generation of financial services by understanding how customers transact and interact. With its “full-stack” platform of solutions, finleap connect makes it possible for its clients to compliantly access the financial transactions data of customers, enrich said data with analytics tools, provide digital banking services and deliver high-quality, digital financial services products and services to customers. Gardener uniquly enables us to deploy our platform in Europe and across the globe in a uniform way on the providers preferred by our customers.
CodesphereCodesphere is a Cloud IDE with integrated and automated deployment of web apps. It uses Gardener internally to manage clusters that host customer deployments and internal systems all over the world.
plusserverplusserver combines its own cloud offerings with hyperscaler platforms to provide individually tailored multi-cloud solutions. The plusserver Kubernetes Engine (PSKE) based on Gardener reduces the complexity in managing multi-cloud environments and enables companies to orchestrate their containers and cloud-native applications across a variety of platforms such as plusserver’s pluscloud open or hyperscalers such as AWS, either by mouseclick or via an API. With PSKE, companies remain vendor-independent and profit from guaranteed data sovereignty and data security due to GDPR-compliant cloud platforms in the certified plusserver data centers in Germany.
Fuga CloudFuga Cloud uses Gardener as the basis for its Enterprise Managed Kubernetes (EMK), a platform that simplifies the management of your k8s and provides insight into usage and performance. The other Fuga Cloud services can be added with a mouse click, and the choice of another cloud provider is a negotiable option. Fuga Cloud stands for Digital Sovereignty, Data Portability and GDPR compatibility.
Metalstack Cloudmetalstack.cloud uses Gardener and is based on the open-source software metal-stack.io, which is developed for regulated financial institutions. The focus here is on the highest possible security and compliance conformity. This makes metalstack.cloud perfect for running enterprise-grade container applications and provides your workloads with the highest possible performance.
CleuraCleura uses Gardener to power its Container Orchestration Engine for Cleura Public Cloud and Cleura Compliant Cloud. Cleura Container Orchestration Engine simplifies the creation and management of Kubernetes clusters through their user-friendly Cleura Cloud Management Panel or API, allowing users to focus on deploying applications instead of maintaining the underlying infrastructure.
PITS Globale DatenrettungsdienstePITS Globale Datenrettungsdienste is a data recovery company located in Germany specializing in recovering lost or damaged files from hard drives, solid-state drives, flash drives, and other storage media. Gardener is used to handle highly-loaded internal infrastructure and provide reliable, fully-managed K8 cluster solutions.

If you’re using Gardener and you aren’t on this list, submit a pull request!

See who is using Gardener

Gardener adopters in production environments that have publicly shared details of their usage.

teaser

SAPSAP uses Gardener to deploy and manage Kubernetes clusters at scale in a uniform way across infrastructures (AWS, Azure, GCP, Alicloud, as well as generic interfaces to OpenStack and vSphere). Workloads include Databases (SAP HANA Cloud), Big Data (SAP Data Intelligence), Kyma, many other cloud native applications, and diverse business workloads.
OVHcloudGardener can now be run by customers on the Public Cloud Platform of the leading European Cloud Provider OVHcloud.
ScaleUp TechnologiesScaleUp Technologies runs Gardener within their public Openstack Clouds (Hamburg, Berlin, Düsseldorf). Their clients run all kinds of workloads on top of Gardener maintained Kubernetes clusters ranging from databases to Software-as-a-Service applications.
Finanz Informatik Technologie Services GmbHFinanz Informatik Technologie Services GmbH uses Gardener to offer k8s as a service for customers in the financial industry in Germany. It is built on top of a “metal as a service” infrastructure implemented from scratch for k8s workloads in mind. The result is k8s on top of bare metal in minutes.
PingCAPPingCAP TiDB, is a cloud-native distributed SQL database with MySQL compatibility, and one of the most popular open-source database projects - with 23.5K+ stars and 400+ contributors. Its sister project TiKV is a Cloud Native Interactive Landscape project. PingCAP envisioned their managed TiDB service, known as TiDB Cloud, to be multi-tenant, secure, cost-efficient, and to be compatible with different cloud providers and they chose Gardener.
BeezlabsBeezlabs uses Gardener to deliver Intelligent Process Automation platform, on multiple cloud providers and reduce costs and lock-in risks.
b’nerdb’nerd uses Gardener as the core technology for its own managed Kubernetes as a Service solution and operates multiple Gardener installations for several cloud hosting service providers.
STACKITSTACKIT is a digital brand of Europe’s biggest retailer, the Schwarz Group, which includes Lidl, Kaufland, but also production and recycling companies. It uses Gardener to offer public and private Kubernetes as a service in own data centers in Europe and targets to become the cloud provider for German and European small and mid-sized companies.
T-SystemsSupporting and managing multiple application landscapes on-premises and across different hyperscaler infrastructures can be painful. At T-Systems we use Gardener both for internal usage and to manage clusters for our customers. We love the openness of the project, the flexibility and the architecture that allows us to manage clusters around the world with only one team from one single pane of glass and to meet industry specific certification standards. The sovereignty by design is another great value, the technology implicitly brings along.
23 TechnologiesThe German-based company 23 Technologies uses Gardener to offer an enterprise-class Kubernetes engine for industrial use cases as well as cloud service providers and offers managed and professional services for it. 23T is also the team behind okeanos.dev, a public service that can be used by anyone to try out Gardener.
B1 Systems GmbHB1 Systems GmbH is a international provider of Linux & Open Source consulting, training, managed service & support. We are founded in 2004 and based in Germany. Our team of 140 Linux experts offers tailor-made solutions based on cloud & container technologies, virtualization & high availability as well as monitoring, system & configuration management. B1 is using Gardener internally and also set up solutions/environments for customers.
finleap connect GmbHfinleap connect GmbH is the leading independent Open Banking platform provider in Europe. It enables companies across a multitude of industries to provide the next generation of financial services by understanding how customers transact and interact. With its “full-stack” platform of solutions, finleap connect makes it possible for its clients to compliantly access the financial transactions data of customers, enrich said data with analytics tools, provide digital banking services and deliver high-quality, digital financial services products and services to customers. Gardener uniquly enables us to deploy our platform in Europe and across the globe in a uniform way on the providers preferred by our customers.
CodesphereCodesphere is a Cloud IDE with integrated and automated deployment of web apps. It uses Gardener internally to manage clusters that host customer deployments and internal systems all over the world.
plusserverplusserver combines its own cloud offerings with hyperscaler platforms to provide individually tailored multi-cloud solutions. The plusserver Kubernetes Engine (PSKE) based on Gardener reduces the complexity in managing multi-cloud environments and enables companies to orchestrate their containers and cloud-native applications across a variety of platforms such as plusserver’s pluscloud open or hyperscalers such as AWS, either by mouseclick or via an API. With PSKE, companies remain vendor-independent and profit from guaranteed data sovereignty and data security due to GDPR-compliant cloud platforms in the certified plusserver data centers in Germany.
Fuga CloudFuga Cloud uses Gardener as the basis for its Enterprise Managed Kubernetes (EMK), a platform that simplifies the management of your k8s and provides insight into usage and performance. The other Fuga Cloud services can be added with a mouse click, and the choice of another cloud provider is a negotiable option. Fuga Cloud stands for Digital Sovereignty, Data Portability and GDPR compatibility.
Metalstack Cloudmetalstack.cloud uses Gardener and is based on the open-source software metal-stack.io, which is developed for regulated financial institutions. The focus here is on the highest possible security and compliance conformity. This makes metalstack.cloud perfect for running enterprise-grade container applications and provides your workloads with the highest possible performance.
CleuraCleura uses Gardener to power its Container Orchestration Engine for Cleura Public Cloud and Cleura Compliant Cloud. Cleura Container Orchestration Engine simplifies the creation and management of Kubernetes clusters through their user-friendly Cleura Cloud Management Panel or API, allowing users to focus on deploying applications instead of maintaining the underlying infrastructure.
PITS Globale DatenrettungsdienstePITS Globale Datenrettungsdienste is a data recovery company located in Germany specializing in recovering lost or damaged files from hard drives, solid-state drives, flash drives, and other storage media. Gardener is used to handle highly-loaded internal infrastructure and provide reliable, fully-managed K8 cluster solutions.

If you’re using Gardener and you aren’t on this list, submit a pull request!

\ No newline at end of file diff --git a/docs/_print/community/index.html b/docs/_print/community/index.html index 1ef1ee1241e..a3c9dd63105 100644 --- a/docs/_print/community/index.html +++ b/docs/_print/community/index.html @@ -14,7 +14,7 @@ Gardener Google Group The recordings are published on the Gardener Project YouTube channel. Topic Speaker Date and Time Link Get more computing power in Gardener by overcoming Kubelet limitations with CRI-resource-manager Pawel Palucki, Alexander D. Kanevskiy October 20, 2022 Recording Summary Cilium / Isovalent Presentation Raymond de Jong October 6, 2022 Recording Summary Gardener Extension Development - From scratch to the gardener-extension-shoot-flux Jens Schneider, Lothar Gesslein June 9, 2022 Recording Summary Deploying and Developing Gardener Locally (Without Any External Infrastructure!) Tim Ebert, Rafael Franzke March 17, 2022 Recording Summary Gardenctl-v2 Holger Kosser, Lukas Gross, Peter Sutter February 17, 2022 Recording Summary Google Calendar">

Gardener Community

Follow - Engage - Contribute

Community Calls

Join our community calls to connect with other Gardener enthusiasts and watch cool presentations.

What content can you expect?

  • Gardener core developers roll out new information, share knowledge with the members and demonstrate new service capabilities.
  • Adopters and contributors share their use-cases, experience and exchange on future requirements.

If you want to receive updates, sign up here:

TopicSpeakerDate and TimeLink
Get more computing power in Gardener by overcoming Kubelet limitations with CRI-resource-managerPawel Palucki, Alexander D. KanevskiyOctober 20, 2022Recording
Summary
Cilium / Isovalent PresentationRaymond de JongOctober 6, 2022Recording
Summary
Gardener Extension Development - From scratch to the gardener-extension-shoot-fluxJens Schneider, Lothar GessleinJune 9, 2022Recording
Summary
Deploying and Developing Gardener Locally (Without Any External Infrastructure!)Tim Ebert, Rafael FranzkeMarch 17, 2022Recording
Summary
Gardenctl-v2Holger Kosser, Lukas Gross, Peter SutterFebruary 17, 2022Recording
Summary

Google Calendar

Presenting a Topic

If there is a topic you would like to present, message us in our #gardener slack channel or get in touch with Jessica Katz.

Gardener Community

Follow - Engage - Contribute

Community Calls

Join our community calls to connect with other Gardener enthusiasts and watch cool presentations.

What content can you expect?

  • Gardener core developers roll out new information, share knowledge with the members and demonstrate new service capabilities.
  • Adopters and contributors share their use-cases, experience and exchange on future requirements.

If you want to receive updates, sign up here:

TopicSpeakerDate and TimeLink
Get more computing power in Gardener by overcoming Kubelet limitations with CRI-resource-managerPawel Palucki, Alexander D. KanevskiyOctober 20, 2022Recording
Summary
Cilium / Isovalent PresentationRaymond de JongOctober 6, 2022Recording
Summary
Gardener Extension Development - From scratch to the gardener-extension-shoot-fluxJens Schneider, Lothar GessleinJune 9, 2022Recording
Summary
Deploying and Developing Gardener Locally (Without Any External Infrastructure!)Tim Ebert, Rafael FranzkeMarch 17, 2022Recording
Summary
Gardenctl-v2Holger Kosser, Lukas Gross, Peter SutterFebruary 17, 2022Recording
Summary

Google Calendar

Presenting a Topic

If there is a topic you would like to present, message us in our #gardener slack channel or get in touch with Jessica Katz.

Get in Touch

@GardenerProject Follow the latest project updates on Twitter
GitHub diff --git a/docs/_print/contribute/docs/index.html b/docs/_print/contribute/docs/index.html index 71a2c2c49a0..b4223dd4af2 100644 --- a/docs/_print/contribute/docs/index.html +++ b/docs/_print/contribute/docs/index.html @@ -10,7 +10,7 @@ Contributions must be licensed under the Creative Commons Attribution 4.0 International License You need to sign the Contributor License Agreement. We are using CLA assistant providing a click-through workflow for accepting the CLA. For company contributors additionally the company needs to sign a corporate license agreement. See the following sections for details.">

This is the multi-page printable view of this section. +All

This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Contributing Documentation

You are welcome to contribute documentation to Gardener.

The following rules govern documentation contributions:

  • Contributions must be licensed under the Creative Commons Attribution 4.0 International License
  • You need to sign the Contributor License Agreement. We are using CLA assistant providing a click-through workflow for accepting the CLA. For company contributors additionally the company needs to sign a corporate license agreement. See the following sections for details.

1 - Working with Images

Using images on the website has to contribute to the aesthetics and comprehensibility of the materials, with uncompromised experience when loading and browsing pages. That concerns crisp clear images, their consistent layout and color scheme, dimensions and aspect ratios, flicker-free and fast loading or the feeling of it, even on unreliable mobile networks and devices.

Image Production Guidelines

A good, detailed reference for optimal use of images for the web can be found at web.dev’s Fast Load Times topic. The following summarizes some key points plus suggestions for tools support.

You are strongly encouraged to use vector images (SVG) as much as possible. They scale seamlessly without compromising the quality and are easier to maintain.

If you are just now starting with SVG authoring, here are some tools suggestions: Figma (online/Win/Mac), Sketch (Mac only).

For raster images (JPG, PNG, GIF), consider the following requirements and choose a tool that enables you to conform to them:

  • Be mindful about image size, the total page size and loading times.
  • Larger images (>10K) need to support progressive rendering. Consult with your favorite authoring tool’s documentation to find out if and how it supports that.
  • The site delivers the optimal media content format and size depending on the device screen size. You need to provide several variants (large screen, laptop, tablet, phone). Your authoring tool should be able to resize and resample images. Always save the largest size first and then downscale from it to avoid image quality loss.

If you are looking for a tool that conforms to those guidelines, IrfanView is a very good option.

Screenshots can be taken with whatever tool you have available. A simple Alt+PrtSc (Win) and paste into an image processing tool to save it does the job. If you need to add emphasized steps (1,2,3) when you describe a process on a screeshot, you can use Snaggit. Use red color and numbers. Mind the requirements for raster images laid out above.

Diagrams can be exported as PNG/JPG from a diagraming tool such as Visio or even PowerPoint. Pick whichever you are comfortable with to design the diagram and make sure you comply with the requirements for the raster images production above. Diagrams produced as SVG are welcome too if your authoring tool supports exporting in that format. In any case, ensure that your diagrams “blend” with the content on the site - use the same color scheme and geometry style. Do not complicate diagrams too much. The site also supports Mermaid diagrams produced with markdown and rendered as SVG. You don’t need special tools for them, but for more complex ones you might want to prototype your diagram wth Mermaid’s online live editor, before encoding it in your markdown. More tips on using Mermaid can be found in the Shortcodes documentation.

Using Images in Markdown

The standard for adding images to a topic is to use markdown’s ![caption](image-path). If the image is not showing properly, or if you wish to serve images close to their natural size and avoid scaling, then you can use HTML5’s <picture> tag.

Example:

<picture>
     <!-- default, laptop-width-L max 1200px -->
     <source srcset="https://github.tools.sap/kubernetes/documentation/tree/master/website/documentation/015-tutorials/my-guide/images/overview-XL.png"
diff --git a/docs/_print/docs/contribute/code/index.html b/docs/_print/docs/contribute/code/index.html
index cd29efea5e3..eefb39b5a48 100644
--- a/docs/_print/docs/contribute/code/index.html
+++ b/docs/_print/docs/contribute/code/index.html
@@ -10,7 +10,7 @@
 Contributions must be licensed under the Apache 2.0 License You need to sign the Contributor License Agreement. We are using CLA assistant providing a click-through workflow for accepting the CLA. For company contributors additionally the company needs to sign a corporate license agreement. See the following sections for details.">

This is the multi-page printable view of this section. +All

This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Contributing Code

You are welcome to contribute code to Gardener in order to fix a bug or to implement a new feature.

The following rules govern code contributions:

  • Contributions must be licensed under the Apache 2.0 License
  • You need to sign the Contributor License Agreement. We are using CLA assistant providing a click-through workflow for accepting the CLA. For company contributors additionally the company needs to sign a corporate license agreement. See the following sections for details.

1 - Contributing Bigger Changes

Contributing Bigger Changes

Here are the guidelines you should follow when contributing larger changes to Gardener:

  • We strongly recommend to write a Gardener Enhancement Proposal (GEP) to get a common understanding what you want to achieve. This makes it easier for reviewers to understand the big picture.

  • Avoid proposing a big change in one single PR. Instead, split your work into multiple stages which are independently mergeable and create one PR for each stage. For example, if introducing a new API resource and its controller, these stages could be:

    • API resource types, including defaults and generated code.
    • API resource validation.
    • API server storage.
    • Admission plugin(s), if any.
    • Controller(s), including changes to existing controllers. Split this phase further into different functional subsets if appropriate.
  • If you realize later that changes to artifacts introduced in a previous stage are required, by all means make them and explain in the PR why they were needed.

  • Consider splitting a big PR further into multiple commits to allow for more focused reviews. For example, you could add unit tests / documentation in separate commits from the rest of the code. If you have to adapt your PR to review feedback, prefer doing that also in a separate commit to make it easier for reviewers to check how their feedback has been addressed.

  • To make the review process more efficient and avoid too many long discussions in the PR itself, ask for a “main reviewer” to be assigned to your change, then work with this person to make sure he or she understands it in detail, and agree together on any improvements that may be needed. If you can’t reach an agreement on certain topics, comment on the PR and invite other people to join the discussion.

  • Even if you have a “main reviewer” assigned, you may still get feedback from other reviewers. In general, these “non-main reviewers” are advised to focus more on the design and overall approach rather than the implementation details. Make sure that you address any concerns on this level appropriately.

2 - CI/CD

CI/CD

As an execution environment for CI/CD workloads, we use Concourse. We however abstract from the underlying “build executor” and instead offer a Pipeline Definition Contract, through which components declare their build pipelines as diff --git a/docs/adopter/index.html b/docs/adopter/index.html index 4e908c38060..1a8601f0c03 100644 --- a/docs/adopter/index.html +++ b/docs/adopter/index.html @@ -2,5 +2,5 @@

See who is using Gardener

Gardener adopters in production environments that have publicly shared details of their usage.

teaser

SAPSAP uses Gardener to deploy and manage Kubernetes clusters at scale in a uniform way across infrastructures (AWS, Azure, GCP, Alicloud, as well as generic interfaces to OpenStack and vSphere). Workloads include Databases (SAP HANA Cloud), Big Data (SAP Data Intelligence), Kyma, many other cloud native applications, and diverse business workloads.
OVHcloudGardener can now be run by customers on the Public Cloud Platform of the leading European Cloud Provider OVHcloud.
ScaleUp TechnologiesScaleUp Technologies runs Gardener within their public Openstack Clouds (Hamburg, Berlin, Düsseldorf). Their clients run all kinds of workloads on top of Gardener maintained Kubernetes clusters ranging from databases to Software-as-a-Service applications.
Finanz Informatik Technologie Services GmbHFinanz Informatik Technologie Services GmbH uses Gardener to offer k8s as a service for customers in the financial industry in Germany. It is built on top of a “metal as a service” infrastructure implemented from scratch for k8s workloads in mind. The result is k8s on top of bare metal in minutes.
PingCAPPingCAP TiDB, is a cloud-native distributed SQL database with MySQL compatibility, and one of the most popular open-source database projects - with 23.5K+ stars and 400+ contributors. Its sister project TiKV is a Cloud Native Interactive Landscape project. PingCAP envisioned their managed TiDB service, known as TiDB Cloud, to be multi-tenant, secure, cost-efficient, and to be compatible with different cloud providers and they chose Gardener.
BeezlabsBeezlabs uses Gardener to deliver Intelligent Process Automation platform, on multiple cloud providers and reduce costs and lock-in risks.
b’nerdb’nerd uses Gardener as the core technology for its own managed Kubernetes as a Service solution and operates multiple Gardener installations for several cloud hosting service providers.
STACKITSTACKIT is a digital brand of Europe’s biggest retailer, the Schwarz Group, which includes Lidl, Kaufland, but also production and recycling companies. It uses Gardener to offer public and private Kubernetes as a service in own data centers in Europe and targets to become the cloud provider for German and European small and mid-sized companies.
T-SystemsSupporting and managing multiple application landscapes on-premises and across different hyperscaler infrastructures can be painful. At T-Systems we use Gardener both for internal usage and to manage clusters for our customers. We love the openness of the project, the flexibility and the architecture that allows us to manage clusters around the world with only one team from one single pane of glass and to meet industry specific certification standards. The sovereignty by design is another great value, the technology implicitly brings along.
23 TechnologiesThe German-based company 23 Technologies uses Gardener to offer an enterprise-class Kubernetes engine for industrial use cases as well as cloud service providers and offers managed and professional services for it. 23T is also the team behind okeanos.dev, a public service that can be used by anyone to try out Gardener.
B1 Systems GmbHB1 Systems GmbH is a international provider of Linux & Open Source consulting, training, managed service & support. We are founded in 2004 and based in Germany. Our team of 140 Linux experts offers tailor-made solutions based on cloud & container technologies, virtualization & high availability as well as monitoring, system & configuration management. B1 is using Gardener internally and also set up solutions/environments for customers.
finleap connect GmbHfinleap connect GmbH is the leading independent Open Banking platform provider in Europe. It enables companies across a multitude of industries to provide the next generation of financial services by understanding how customers transact and interact. With its “full-stack” platform of solutions, finleap connect makes it possible for its clients to compliantly access the financial transactions data of customers, enrich said data with analytics tools, provide digital banking services and deliver high-quality, digital financial services products and services to customers. Gardener uniquly enables us to deploy our platform in Europe and across the globe in a uniform way on the providers preferred by our customers.
CodesphereCodesphere is a Cloud IDE with integrated and automated deployment of web apps. It uses Gardener internally to manage clusters that host customer deployments and internal systems all over the world.
plusserverplusserver combines its own cloud offerings with hyperscaler platforms to provide individually tailored multi-cloud solutions. The plusserver Kubernetes Engine (PSKE) based on Gardener reduces the complexity in managing multi-cloud environments and enables companies to orchestrate their containers and cloud-native applications across a variety of platforms such as plusserver’s pluscloud open or hyperscalers such as AWS, either by mouseclick or via an API. With PSKE, companies remain vendor-independent and profit from guaranteed data sovereignty and data security due to GDPR-compliant cloud platforms in the certified plusserver data centers in Germany.
Fuga CloudFuga Cloud uses Gardener as the basis for its Enterprise Managed Kubernetes (EMK), a platform that simplifies the management of your k8s and provides insight into usage and performance. The other Fuga Cloud services can be added with a mouse click, and the choice of another cloud provider is a negotiable option. Fuga Cloud stands for Digital Sovereignty, Data Portability and GDPR compatibility.
Metalstack Cloudmetalstack.cloud uses Gardener and is based on the open-source software metal-stack.io, which is developed for regulated financial institutions. The focus here is on the highest possible security and compliance conformity. This makes metalstack.cloud perfect for running enterprise-grade container applications and provides your workloads with the highest possible performance.
CleuraCleura uses Gardener to power its Container Orchestration Engine for Cleura Public Cloud and Cleura Compliant Cloud. Cleura Container Orchestration Engine simplifies the creation and management of Kubernetes clusters through their user-friendly Cleura Cloud Management Panel or API, allowing users to focus on deploying applications instead of maintaining the underlying infrastructure.
PITS Globale DatenrettungsdienstePITS Globale Datenrettungsdienste is a data recovery company located in Germany specializing in recovering lost or damaged files from hard drives, solid-state drives, flash drives, and other storage media. Gardener is used to handle highly-loaded internal infrastructure and provide reliable, fully-managed K8 cluster solutions.

If you’re using Gardener and you aren’t on this list, submit a pull request!

See who is using Gardener

Gardener adopters in production environments that have publicly shared details of their usage.

teaser

SAPSAP uses Gardener to deploy and manage Kubernetes clusters at scale in a uniform way across infrastructures (AWS, Azure, GCP, Alicloud, as well as generic interfaces to OpenStack and vSphere). Workloads include Databases (SAP HANA Cloud), Big Data (SAP Data Intelligence), Kyma, many other cloud native applications, and diverse business workloads.
OVHcloudGardener can now be run by customers on the Public Cloud Platform of the leading European Cloud Provider OVHcloud.
ScaleUp TechnologiesScaleUp Technologies runs Gardener within their public Openstack Clouds (Hamburg, Berlin, Düsseldorf). Their clients run all kinds of workloads on top of Gardener maintained Kubernetes clusters ranging from databases to Software-as-a-Service applications.
Finanz Informatik Technologie Services GmbHFinanz Informatik Technologie Services GmbH uses Gardener to offer k8s as a service for customers in the financial industry in Germany. It is built on top of a “metal as a service” infrastructure implemented from scratch for k8s workloads in mind. The result is k8s on top of bare metal in minutes.
PingCAPPingCAP TiDB, is a cloud-native distributed SQL database with MySQL compatibility, and one of the most popular open-source database projects - with 23.5K+ stars and 400+ contributors. Its sister project TiKV is a Cloud Native Interactive Landscape project. PingCAP envisioned their managed TiDB service, known as TiDB Cloud, to be multi-tenant, secure, cost-efficient, and to be compatible with different cloud providers and they chose Gardener.
BeezlabsBeezlabs uses Gardener to deliver Intelligent Process Automation platform, on multiple cloud providers and reduce costs and lock-in risks.
b’nerdb’nerd uses Gardener as the core technology for its own managed Kubernetes as a Service solution and operates multiple Gardener installations for several cloud hosting service providers.
STACKITSTACKIT is a digital brand of Europe’s biggest retailer, the Schwarz Group, which includes Lidl, Kaufland, but also production and recycling companies. It uses Gardener to offer public and private Kubernetes as a service in own data centers in Europe and targets to become the cloud provider for German and European small and mid-sized companies.
T-SystemsSupporting and managing multiple application landscapes on-premises and across different hyperscaler infrastructures can be painful. At T-Systems we use Gardener both for internal usage and to manage clusters for our customers. We love the openness of the project, the flexibility and the architecture that allows us to manage clusters around the world with only one team from one single pane of glass and to meet industry specific certification standards. The sovereignty by design is another great value, the technology implicitly brings along.
23 TechnologiesThe German-based company 23 Technologies uses Gardener to offer an enterprise-class Kubernetes engine for industrial use cases as well as cloud service providers and offers managed and professional services for it. 23T is also the team behind okeanos.dev, a public service that can be used by anyone to try out Gardener.
B1 Systems GmbHB1 Systems GmbH is a international provider of Linux & Open Source consulting, training, managed service & support. We are founded in 2004 and based in Germany. Our team of 140 Linux experts offers tailor-made solutions based on cloud & container technologies, virtualization & high availability as well as monitoring, system & configuration management. B1 is using Gardener internally and also set up solutions/environments for customers.
finleap connect GmbHfinleap connect GmbH is the leading independent Open Banking platform provider in Europe. It enables companies across a multitude of industries to provide the next generation of financial services by understanding how customers transact and interact. With its “full-stack” platform of solutions, finleap connect makes it possible for its clients to compliantly access the financial transactions data of customers, enrich said data with analytics tools, provide digital banking services and deliver high-quality, digital financial services products and services to customers. Gardener uniquly enables us to deploy our platform in Europe and across the globe in a uniform way on the providers preferred by our customers.
CodesphereCodesphere is a Cloud IDE with integrated and automated deployment of web apps. It uses Gardener internally to manage clusters that host customer deployments and internal systems all over the world.
plusserverplusserver combines its own cloud offerings with hyperscaler platforms to provide individually tailored multi-cloud solutions. The plusserver Kubernetes Engine (PSKE) based on Gardener reduces the complexity in managing multi-cloud environments and enables companies to orchestrate their containers and cloud-native applications across a variety of platforms such as plusserver’s pluscloud open or hyperscalers such as AWS, either by mouseclick or via an API. With PSKE, companies remain vendor-independent and profit from guaranteed data sovereignty and data security due to GDPR-compliant cloud platforms in the certified plusserver data centers in Germany.
Fuga CloudFuga Cloud uses Gardener as the basis for its Enterprise Managed Kubernetes (EMK), a platform that simplifies the management of your k8s and provides insight into usage and performance. The other Fuga Cloud services can be added with a mouse click, and the choice of another cloud provider is a negotiable option. Fuga Cloud stands for Digital Sovereignty, Data Portability and GDPR compatibility.
Metalstack Cloudmetalstack.cloud uses Gardener and is based on the open-source software metal-stack.io, which is developed for regulated financial institutions. The focus here is on the highest possible security and compliance conformity. This makes metalstack.cloud perfect for running enterprise-grade container applications and provides your workloads with the highest possible performance.
CleuraCleura uses Gardener to power its Container Orchestration Engine for Cleura Public Cloud and Cleura Compliant Cloud. Cleura Container Orchestration Engine simplifies the creation and management of Kubernetes clusters through their user-friendly Cleura Cloud Management Panel or API, allowing users to focus on deploying applications instead of maintaining the underlying infrastructure.
PITS Globale DatenrettungsdienstePITS Globale Datenrettungsdienste is a data recovery company located in Germany specializing in recovering lost or damaged files from hard drives, solid-state drives, flash drives, and other storage media. Gardener is used to handle highly-loaded internal infrastructure and provide reliable, fully-managed K8 cluster solutions.

If you’re using Gardener and you aren’t on this list, submit a pull request!

\ No newline at end of file diff --git a/docs/blog/2018/06.11-anti-patterns/index.html b/docs/blog/2018/06.11-anti-patterns/index.html index 87961a62aac..0e12bc694b1 100644 --- a/docs/blog/2018/06.11-anti-patterns/index.html +++ b/docs/blog/2018/06.11-anti-patterns/index.html @@ -6,7 +6,7 @@ Instead of running a root user, use RUN groupadd -r anygroup && useradd -r -g anygroup myuser to create a group and a user in it. Use the USER command to switch to this user.">

You are now ready to experiment with the admission-aws webhook server locally.

5.1.2.6 - Operations

Using the AWS provider extension with Gardener as operator

The core.gardener.cloud/v1beta1.CloudProfile resource declares a providerConfig field that is meant to contain provider-specific configuration. Similarly, the core.gardener.cloud/v1beta1.Seed resource is structured. Additionally, it allows to configure settings for the backups of the main etcds’ data of shoot clusters control planes running in this seed cluster.

This document explains what is necessary to configure for this provider extension.

CloudProfile resource

In this section we are describing how the configuration for CloudProfiles looks like for AWS and provide an example CloudProfile manifest with minimal configuration that you can use to allow creating AWS shoot clusters.

CloudProfileConfig

The cloud profile configuration contains information about the real machine image IDs in the AWS environment (AMIs). You have to map every version that you specify in .spec.machineImages[].versions here such that the AWS extension knows the AMI for every version you want to offer. @@ -11928,7 +12014,7 @@ } ] } -

5.1.2.6 - Usage

Using the AWS provider extension with Gardener as end-user

The core.gardener.cloud/v1beta1.Shoot resource declares a few fields that are meant to contain provider-specific configuration.

In this document we are describing how this configuration looks like for AWS and provide an example Shoot manifest with minimal configuration that you can use to create an AWS cluster (modulo the landscape-specific information like cloud profile names, secret binding names, etc.).

Provider Secret Data

Every shoot cluster references a SecretBinding or a CredentialsBinding which itself references a Secret, and this Secret contains the provider credentials of your AWS account. +

5.1.2.7 - Usage

Using the AWS provider extension with Gardener as end-user

The core.gardener.cloud/v1beta1.Shoot resource declares a few fields that are meant to contain provider-specific configuration.

In this document we are describing how this configuration looks like for AWS and provide an example Shoot manifest with minimal configuration that you can use to create an AWS cluster (modulo the landscape-specific information like cloud profile names, secret binding names, etc.).

Provider Secret Data

Every shoot cluster references a SecretBinding or a CredentialsBinding which itself references a Secret, and this Secret contains the provider credentials of your AWS account. This Secret must look as follows:

apiVersion: v1
 kind: Secret
 metadata:
@@ -12116,9 +12202,9 @@
 

The cloudControllerManager.featureGates contains a map of explicitly enabled or disabled feature gates. For production usage it’s not recommend to use this field at all as you can enable alpha features or disable beta/stable features, potentially impacting the cluster stability. If you don’t want to configure anything for the cloudControllerManager simply omit the key in the YAML specification.

The cloudControllerManager.useCustomRouteController controls if the custom routes controller should be enabled. -If enabled, it will add routes to the pod CIDRs for all nodes in the route tables for all zones.

The storage.managedDefaultClass controls if the default storage / volume snapshot classes are marked as default by Gardener. Set it to false to mark another storage / volume snapshot class as default without Gardener overwriting this change. If unset, this field defaults to true.

If the AWS Load Balancer Controller should be deployed, set loadBalancerController.enabled to true. +If enabled, it will add routes to the pod CIDRs for all nodes in the route tables for all zones.

The storage.managedDefaultClass controls if the default storage / volume snapshot classes are marked as default by Gardener. Set it to false to mark another storage / volume snapshot class as default without Gardener overwriting this change. If unset, this field defaults to true.

If the AWS Load Balancer Controller should be deployed, set loadBalancerController.enabled to true. In this case, it is assumed that an IngressClass named alb is created by the user. -You can overwrite the name by setting loadBalancerController.ingressClassName.

Please note, that currently only the “instance” mode is supported.

Examples for Ingress and Service managed by the AWS Load Balancer Controller:

  1. Prerequites

Make sure you have created an IngressClass. For more details about parameters, please see AWS Load Balancer Controller - IngressClass

apiVersion: networking.k8s.io/v1
+You can overwrite the name by setting loadBalancerController.ingressClassName.

Please note, that currently only the “instance” mode is supported.

Examples for Ingress and Service managed by the AWS Load Balancer Controller:

  1. Prerequites

Make sure you have created an IngressClass. For more details about parameters, please see AWS Load Balancer Controller - IngressClass

apiVersion: networking.k8s.io/v1
 kind: IngressClass
 metadata:
   name: alb # default name if not specified by `loadBalancerController.ingressClassName`
@@ -12130,7 +12216,7 @@
   namespace: default
   name: echoserver
   annotations:
-    # complete set of annotations: https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.4/guide/ingress/annotations/
+    # complete set of annotations: https://kubernetes-sigs.github.io/aws-load-balancer-controller/latest/guide/ingress/annotations/
     alb.ingress.kubernetes.io/scheme: internet-facing
     alb.ingress.kubernetes.io/target-type: instance # target-type "ip" NOT supported in Gardener
 spec:
@@ -12145,11 +12231,11 @@
               name: echoserver
               port:
                 number: 80
-

For more details see AWS Load Balancer Documentation - Ingress Specification

  1. Service of Type LoadBalancer

This can be used to create a Network Load Balancer (NLB).

apiVersion: v1
+

For more details see AWS Load Balancer Documentation - Ingress Specification

  1. Service of Type LoadBalancer

This can be used to create a Network Load Balancer (NLB).

apiVersion: v1
 kind: Service
 metadata:
   annotations:
-    # complete set of annotations: https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.4/guide/service/annotations/
+    # complete set of annotations: https://kubernetes-sigs.github.io/aws-load-balancer-controller/latest/guide/service/annotations/
     service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: instance # target-type "ip" NOT supported in Gardener
     service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
   name: ingress-nginx-controller
@@ -12159,7 +12245,7 @@
   ...
   type: LoadBalancer
   loadBalancerClass: service.k8s.aws/nlb # mandatory to be managed by AWS Load Balancer Controller (otherwise the Cloud Controller Manager will act on it)
-

For more details see AWS Load Balancer Documentation - Network Load Balancer

⚠️ When using Network Load Balancers (NLB) as internal load balancers, it is crucial to add the annotation service.beta.kubernetes.io/aws-load-balancer-target-group-attributes: preserve_client_ip.enabled=false. Without this annotation, if a request is routed by the NLB to the same target instance from which it originated, the client IP and destination IP will be identical. This situation, known as the hairpinning effect, will prevent the request from being processed.

WorkerConfig

The AWS extension supports encryption for volumes plus support for additional data volumes per machine. +

For more details see AWS Load Balancer Documentation - Network Load Balancer

⚠️ When using Network Load Balancers (NLB) as internal load balancers, it is crucial to add the annotation service.beta.kubernetes.io/aws-load-balancer-target-group-attributes: preserve_client_ip.enabled=false. Without this annotation, if a request is routed by the NLB to the same target instance from which it originated, the client IP and destination IP will be identical. This situation, known as the hairpinning effect, will prevent the request from being processed.

WorkerConfig

The AWS extension supports encryption for volumes plus support for additional data volumes per machine. For each data volume, you have to specify a name. By default (if not stated otherwise), all the disks (root & data volumes) are encrypted. Please make sure that your instance-type supports encryption. @@ -17818,7 +17904,7 @@ - Full-Snapshot-revision-0-600-192420 (Taken by compaction job)

Backward Compatibility

  1. Restoration : The changes to handle the newly proposed backup directory structure must be backward compatible with older structures at least for restoration because we need have to restore from backups in the older structure. This includes the support for restoring from a backup without a metadata file if that is used in the actual implementation.
  2. Backup : For new snapshots (even on a backup containing the older structure), the new structure may be used. The new structure must be setup automatically including creating the base full snapshot.
  3. Garbage collection : The existing functionality of garbage collection of snapshots (full and incremental) according to the backup retention policy must be compatible with both old and new backup folder structure. I.e. the snapshots in the older backup structure must be retained in their own structure and the snapshots in the proposed backup structure should be retained in the proposed structure. Once all the snapshots in the older backup structure go out of the retention policy and are garbage collected, we can think of removing the support for older backup folder structure.

Note: Compactor will run parallel to current snapshotter process and work only if there is any full snapshot already present in the store. By current design, a full snapshot will be taken if there is already no full snapshot or the existing full snapshot is older than 24 hours. It is not limitation but a design choice. As per proposed design, the backup storage will contain both periodic full snapshots as well as periodic compacted snapshot. Restorer will pickup the base snapshot whichever is latest one.

6.3.4 - 03 Scaling Up An Etcd Cluster

DEP-03: Scaling-up a single-node to multi-node etcd cluster deployed by etcd-druid

To mark a cluster for scale-up from single node to multi-node etcd, just patch the etcd custom resource’s .spec.replicas from 1 to 3 (for example).

Challenges for scale-up

  1. Etcd cluster with single replica don’t have any peers, so no peer communication is required hence peer URL may or may not be TLS enabled. However, while scaling up from single node etcd to multi-node etcd, there will be a requirement to have peer communication between members of the etcd cluster. Peer communication is required for various reasons, for instance for members to sync up cluster state, data, and to perform leader election or any cluster wide operation like removal or addition of a member etc. Hence in a multi-node etcd cluster we need to have TLS enable peer URL for peer communication.
  2. Providing the correct configuration to start new etcd members as it is different from boostrapping a cluster since these new etcd members will join an existing cluster.

Approach

We first went through the etcd doc of update-advertise-peer-urls to find out information regarding peer URL updation. Interestingly, etcd doc has mentioned the following:

To update the advertise peer URLs of a member, first update it explicitly via member command and then restart the member.
 

But we can’t assume peer URL is not TLS enabled for single-node cluster as it depends on end-user. A user may or may not enable the TLS for peer URL for a single node etcd cluster. So, How do we detect whether peer URL was enabled or not when cluster is marked for scale-up?

Detecting if peerURL TLS is enabled or not

For this we use an annotation in member lease object member.etcd.gardener.cloud/tls-enabled set by backup-restore sidecar of etcd. As etcd configuration is provided by backup-restore, so it can find out whether TLS is enabled or not and accordingly set this annotation member.etcd.gardener.cloud/tls-enabled to either true or false in member lease object. -And with the help of this annotation and config-map values etcd-druid is able to detect whether there is a change in a peer URL or not.

Etcd-Druid helps in scaling up etcd cluster

Now, it is detected whether peer URL was TLS enabled or not for single node etcd cluster. Etcd-druid can now use this information to take action:

  • If peer URL was already TLS enabled then no action is required from etcd-druid side. Etcd-druid can proceed with scaling up the cluster.
  • If peer URL was not TLS enabled then etcd-druid has to intervene and make sure peer URL should be TLS enabled first for the single node before marking the cluster for scale-up.

Action taken by etcd-druid to enable the peerURL TLS

  1. Etcd-druid will update the etcd-bootstrap config-map with new config like initial-cluster,initial-advertise-peer-urls etc. Backup-restore will detect this change and update the member lease annotation to member.etcd.gardener.cloud/tls-enabled: "true".
  2. In case the peer URL TLS has been changed to enabled: Etcd-druid will add tasks to the deployment flow:
    • Check if peer TLS has been enabled for existing StatefulSet pods, by checking the member leases for the annotation member.etcd.gardener.cloud/tls-enabled.
    • If peer TLS enablement is pending for any of the members, then check and patch the StatefulSet with the peer TLS volume mounts, if not already patched. This will cause a rolling update of the existing StatefulSet pods, which allows etcd-backup-restore to update the member peer URL in the etcd cluster.
    • Requeue this reconciliation flow until peer TLS has been enabled for all the existing etcd members.

After PeerURL is TLS enabled

After peer URL TLS enablement for single node etcd cluster, now etcd-druid adds a scale-up annotation: gardener.cloud/scaled-to-multi-node to the etcd statefulset and etcd-druid will patch the statefulsets .spec.replicas to 3(for example). The statefulset controller will then bring up new pods(etcd with backup-restore as a sidecar). Now etcd’s sidecar i.e backup-restore will check whether this member is already a part of a cluster or not and incase it is unable to check (may be due to some network issues) then backup-restore checks presence of this annotation: gardener.cloud/scaled-to-multi-node in etcd statefulset to detect scale-up. If it finds out it is the scale-up case then backup-restore adds new etcd member as a learner first and then starts the etcd learner by providing the correct configuration. Once learner gets in sync with the etcd cluster leader, it will get promoted to a voting member.

Providing the correct etcd config

As backup-restore detects that it’s a scale-up scenario, backup-restore sets initial-cluster-state to existing as this member will join an existing cluster and it calculates the rest of the config from the updated config-map provided by etcd-druid.

Sequence diagram

Future improvements:

The need of restarting etcd pods twice will change in the future. please refer: https://github.com/gardener/etcd-backup-restore/issues/538

6.3.5 - Add New Etcd Cluster Component

Add A New Etcd Cluster Component

etcd-druid defines an Operator which is responsible for creation, deletion and update of a resource that is created for an Etcd cluster. If you want to introduce a new resource for an Etcd cluster then you must do the following:

  • Add a dedicated package for the resource under component.

  • Implement Operator interface.

  • Define a new Kind for this resource in the operator Registry.

  • Every resource a.k.a Component needs to have the following set of default labels:

    • app.kubernetes.io/name - value of this label is the name of this component. Helper functions are defined here to create the name of each component using the parent Etcd resource. Please define a new helper function to generate the name of your resource using the parent Etcd resource.
    • app.kubernetes.io/component - value of this label is the type of the component. All component type label values are defined here where you can add an entry for your component.
    • In addition to the above component specific labels, each resource/component should have default labels defined on the Etcd resource. You can use GetDefaultLabels function.

    These labels are also part of recommended labels by kubernetes. +And with the help of this annotation and config-map values etcd-druid is able to detect whether there is a change in a peer URL or not.

    Etcd-Druid helps in scaling up etcd cluster

    Now, it is detected whether peer URL was TLS enabled or not for single node etcd cluster. Etcd-druid can now use this information to take action:

    • If peer URL was already TLS enabled then no action is required from etcd-druid side. Etcd-druid can proceed with scaling up the cluster.
    • If peer URL was not TLS enabled then etcd-druid has to intervene and make sure peer URL should be TLS enabled first for the single node before marking the cluster for scale-up.

    Action taken by etcd-druid to enable the peerURL TLS

    1. Etcd-druid will update the {etcd.Name}-config config-map with new config like initial-cluster,initial-advertise-peer-urls etc. Backup-restore will detect this change and update the member lease annotation to member.etcd.gardener.cloud/tls-enabled: "true".
    2. In case the peer URL TLS has been changed to enabled: Etcd-druid will add tasks to the deployment flow:
      • Check if peer TLS has been enabled for existing StatefulSet pods, by checking the member leases for the annotation member.etcd.gardener.cloud/tls-enabled.
      • If peer TLS enablement is pending for any of the members, then check and patch the StatefulSet with the peer TLS volume mounts, if not already patched. This will cause a rolling update of the existing StatefulSet pods, which allows etcd-backup-restore to update the member peer URL in the etcd cluster.
      • Requeue this reconciliation flow until peer TLS has been enabled for all the existing etcd members.

    After PeerURL is TLS enabled

    After peer URL TLS enablement for single node etcd cluster, now etcd-druid adds a scale-up annotation: gardener.cloud/scaled-to-multi-node to the etcd statefulset and etcd-druid will patch the statefulsets .spec.replicas to 3(for example). The statefulset controller will then bring up new pods(etcd with backup-restore as a sidecar). Now etcd’s sidecar i.e backup-restore will check whether this member is already a part of a cluster or not and incase it is unable to check (may be due to some network issues) then backup-restore checks presence of this annotation: gardener.cloud/scaled-to-multi-node in etcd statefulset to detect scale-up. If it finds out it is the scale-up case then backup-restore adds new etcd member as a learner first and then starts the etcd learner by providing the correct configuration. Once learner gets in sync with the etcd cluster leader, it will get promoted to a voting member.

    Providing the correct etcd config

    As backup-restore detects that it’s a scale-up scenario, backup-restore sets initial-cluster-state to existing as this member will join an existing cluster and it calculates the rest of the config from the updated config-map provided by etcd-druid.

    Sequence diagram

    Future improvements:

    The need of restarting etcd pods twice will change in the future. please refer: https://github.com/gardener/etcd-backup-restore/issues/538

6.3.5 - Add New Etcd Cluster Component

Add A New Etcd Cluster Component

etcd-druid defines an Operator which is responsible for creation, deletion and update of a resource that is created for an Etcd cluster. If you want to introduce a new resource for an Etcd cluster then you must do the following:

  • Add a dedicated package for the resource under component.

  • Implement Operator interface.

  • Define a new Kind for this resource in the operator Registry.

  • Every resource a.k.a Component needs to have the following set of default labels:

    • app.kubernetes.io/name - value of this label is the name of this component. Helper functions are defined here to create the name of each component using the parent Etcd resource. Please define a new helper function to generate the name of your resource using the parent Etcd resource.
    • app.kubernetes.io/component - value of this label is the type of the component. All component type label values are defined here where you can add an entry for your component.
    • In addition to the above component specific labels, each resource/component should have default labels defined on the Etcd resource. You can use GetDefaultLabels function.

    These labels are also part of recommended labels by kubernetes. NOTE: Constants for the label keys are already defined here.

  • Ensure that there is no wait introduced in any Operator method implementation in your component. In case there are multiple steps to be executed in a sequence then re-queue the event with a special error code in case there is an error or if the pre-conditions check to execute the next step are not yet satisfied.

  • All errors should be wrapped with a custom DruidError.

6.3.6 - Changing Api

Change the API

This guide provides detailed information on what needs to be done when the API needs to be changed.

etcd-druid API follows the same API conventions and guidelines which Kubernetes defines and adopts. The Kubernetes API Conventions as well as Changing the API topics already provide a good overview and general explanation of the basic concepts behind it. We adhere to the principles laid down by Kubernetes.

Etcd Druid API

The etcd-druid API is defined here.

!!! info The current version of the API is v1alpha1. We are currently working on migration to v1beta1 API.

Changing the API

If there is a need to make changes to the API, then one should do the following:

  • If new fields are added then ensure that these are added as optional fields. They should have the +optional comment and an omitempty JSON tag should be added against the field.
  • Ensure that all new fields or changing the existing fields are well documented with doc-strings.
  • Care should be taken that incompatible API changes should not be made in the same version of the API. If there is a real necessity to introduce a backward incompatible change then a newer version of the API should be created and an API conversion webhook should be put in place to support more than one version of the API.
  • After the changes to the API are finalized, run make generate to ensure that the changes are also reflected in the CRD.
  • If necessary, implement or adapt the validation for the API.
  • If necessary, adapt the samples YAML manifests.
  • When opening a pull-request, always add a release note informing the end-users of the changes that are coming in.

Removing a Field

If field(s) needs to be removed permanently from the API, then one should ensure the following:

  • Field should not be directly removed, instead first a deprecation notice should be put which should follow a well-defined deprecation period. Ensure that the release note in the pull-request is properly categorized so that this is easily visible to the end-users and clearly mentiones which field(s) have been deprecated. Clearly suggest a way in which clients need to adapt.
  • To allow sufficient time to the end-users to adapt to the API changes, deprecated field(s) should only be removed once the deprecation period is over. It is generally recommended that this be done in 2 stages:
    • First stage: Remove the code that refers to the deprecated fields. This ensures that the code no longer has dependency on the deprecated field(s).
    • Second Stage: Remove the field from the API.

6.3.7 - Configure Etcd Druid

etcd-druid CLI Flags

etcd-druid process can be started with the following command line flags.

Command line flags

Leader election

If you wish to setup etcd-druid in high-availability mode then leader election needs to be enabled to ensure that at a time only one replica services the incoming events and does the reconciliation.

FlagDescriptionDefault
enable-leader-electionLeader election provides the capability to select one replica as a leader where active reconciliation will happen. The other replicas will keep waiting for leadership change and not do active reconciliations.false
leader-election-idName of the k8s lease object that leader election will use for holding the leader lock. By default etcd-druid will use lease resource lock for leader election which is also a natural usecase for leases and is also recommended by k8s.“druid-leader-election”
leader-election-resource-lockDeprecated: This flag will be removed in later version of druid. By default lease.coordination.k8s.io resources will be used for leader election resource locking for the controller manager.“leases”

Metrics

etcd-druid exposes a /metrics endpoint which can be scrapped by tools like Prometheus. If the default metrics endpoint configuration is not suitable then consumers can change it via the following options.

FlagDescriptionDefault
metrics-bind-addressThe IP address that the metrics endpoint binds to""
metrics-portThe port used for the metrics endpoint8080
metrics-addrDuration to wait for after compaction job is completed, to allow Prometheus metrics to be scraped.
Deprecated: Please use --metrics-bind-address and --metrics-port instead
“:8080”

Metrics bind-address is computed by joining the host and port. By default its value is computed as :8080.

!!! tip Ensure that the metrics-port is also reflected in the etcd-druid deployment specification.

Webhook Server

etcd-druid provides the following CLI flags to configure webhook server. These CLI flags are used to construct a new webhook.Server by configuring Options.

FlagDescriptionDefault
webhook-server-bind-addressIt is the address that the webhook server will listen on.""
webhook-server-portPort is the port number that the webhook server will serve.9443
webhook-server-tls-server-cert-dirThe path to a directory containing the server’s TLS certificate and key (the files must be named tls.crt and tls.key respectively)./etc/webhook-server-tls

Etcd-Components Webhook

etcd-druid provisions and manages several Kubernetes resources which we call Etcdcluster components. To ensure that there is no accidental changes done to these managed resources, a webhook is put in place to check manual changes done to any managed etcd-cluster Kubernetes resource. It rejects most of these changes except a few. The details on how to enable the etcd-components webhook, which resources are protected and in which scenarios is the change allowed is documented here.

Following CLI flags are provided to configure the etcd-components webhook:

FlagDescriptionDefault
enable-etcd-components-webhookEnable EtcdComponents Webhook to prevent unintended changes to resources managed by etcd-druid.false
reconciler-service-accountThe fully qualified name of the service account used by etcd-druid for reconciling etcd resources. If unspecified, the default service account mounted for etcd-druid will be usedetcd-druid-service-account
etcd-components-exempt-service-accountsIn case there is a need to allow changes to Etcd resources from external controllers like vertical-pod-autoscaler then one must list the ServiceAaccount that is used by each such controller.""

Reconcilers

Following set of flags configures the reconcilers running within etcd-druid. To know more about different reconcilers read this document.

Etcd Reconciler

FlagDescriptionDefault
etcd-workersNumber of workers spawned for concurrent reconciles of Etcd resources.3
enable-etcd-spec-auto-reconcileIf true then automatically reconciles Etcd Spec. If false, waits for explicit annotation gardener.cloud/operation: reconcile to be placed on the Etcd resource to trigger reconcile.false
disable-etcd-serviceaccount-automountFor each Etcd cluster a ServiceAccount is created which is used by the StatefulSet pods and tied to Role via RoleBinding. If false then pods running as this ServiceAccount will have the API token automatically mounted. You can consider disabling it if you wish to use Projected Volumes allowing one to set an expirationSeconds on the mounted token for better security. To use projected volumes ensure that you have set relevant kube-apiserver flags.
Note: With Kubernetes version >=1.24 projected service account token is the default. This means that we no longer need this flag. Issue #872 has been raised to address this.
false
etcd-status-sync-periodEtcd.Status is periodically updated. This interval defines the status sync frequency.15s
etcd-member-notready-thresholdThreshold after which an etcd member is considered not ready if the status was unknown before. This is currently used to update EtcdMemberConditionStatus.5m
etcd-member-unknown-thresholdThreshold after which an etcd member is considered unknown. This is currently used to update EtcdMemberConditionStatus.1m
ignore-operation-annotationSpecifies whether to ignore or honour the annotation gardener.cloud/operation: reconcile on resources to be reconciled.
Deprecated: please use --enable-etcd-spec-auto-reconcile instead.
false

Compaction Reconciler

FlagDescriptionDefault
enable-backup-compactionEnable automatic compaction of etcd backupsfalse
compaction-workersNumber of workers that can be spawned for concurrent reconciles for compaction Jobs. The controller creates a backup compaction job if a certain etcd event threshold is reached. If compaction is enabled, the value for this flag must be greater than zero.3
etcd-events-thresholdDefines the threshold in terms of total number of etcd events before a backup compaction job is triggered.1000000
active-deadline-durationDuration after which a running backup compaction job will be terminated.3h
metrics-scrape-wait-durationDuration to wait for after compaction job is completed, to allow Prometheus metrics to be scraped.0s

Etcd Copy-Backup Task & Secret Reconcilers

FlagDescriptionDefault
etcd-copy-backups-task-workersNumber of workers spawned for concurrent reconciles for EtcdCopyBackupTask resources.3
secret-workersNumber of workers spawned for concurrent reconciles for secrets.10

Miscellaneous

FlagDescriptionDefault
feature-gatesA set of key=value pairs that describe feature gates for alpha/experimental features. Please check feature-gates for more information.""
disable-lease-cacheDisable cache for lease.coordination.k8s.io resources.false

6.3.8 - Contribution

Contributors Guide

etcd-druid is an actively maintained project which has organically evolved to be a mature and stable etcd operator. We welcome active participation from the community and to this end this guide serves as a good starting point.

Code of Conduct

All maintainers and contributors must abide by Contributor Covenant. Real progress can only happen in a collaborative environment which fosters mutual respect, openeness and disruptive innovation.

Developer Certificate of Origin

Due to legal reasons, contributors will be asked to accept a Developer Certificate of Origin (DCO) before they submit the first pull request to the IronCore project, this happens in an automated fashion during the submission process. We use the standard DCO text of the Linux Foundation.

License

Your contributions to etcd-druid must be licensed properly:

Contributing

etcd-druid use Github to manage reviews of pull requests.

  • If you are looking to make your first contribution, follow Steps to Contribute.
  • If you have a trivial fix or improvement, go ahead and create an issue first followed by a pull request.
  • If you plan to do something more involved, first discuss your ideas by creating an issue. This will avoid unnecessary work and surely give you and us a good deal of inspiration.

Steps to Contribute

  • If you wish to contribute and have not done that in the past, then first try and filter the list of issues with label exp/beginner. Once you find the issue that interests you, add a comment stating that you would like to work on it. This is to prevent duplicated efforts from contributors on the same issue.
  • If you have questions about one of the issues please comment on them and one of the maintainers will clarify it.

We kindly ask you to follow the Pull Request Checklist to ensure reviews can happen accordingly.

Issues and Planning

We use GitHub issues to track bugs and enhancement requests. Please provide as much context as possible when you open an issue. The information you provide must be comprehensive enough to understand, reproduce the behavior and find related reports of that issue for the assignee. Therefore, contributors may use but aren’t restricted to the issue template provided by the etcd-druid maintainers.

6.3.9 - Controllers

Controllers

etcd-druid is an operator to manage etcd clusters, and follows the Operator pattern for Kubernetes. @@ -17843,7 +17929,7 @@ Make sure that you test the code after you have updated the dependencies!

6.3.12 - Etcd Cluster Components

Etcd Cluster Components

For every Etcd cluster that is provisioned by etcd-druid it deploys a set of resources. Following sections provides information and code reference to each such resource.

StatefulSet

StatefulSet is the primary kubernetes resource that gets provisioned for an etcd cluster.

  • Replicas for the StatefulSet are derived from Etcd.Spec.Replicas in the custom resource.

  • Each pod comprises of two containers:

    • etcd-wrapper : This is the main container which runs an etcd process.

    • etcd-backup-restore : This is a side-container which does the following:

      • Orchestrates the initialization of etcd. This includes validation of any existing etcd data directory, restoration in case of corrupt etcd data directory files for a single-member etcd cluster.
      • Periodically renewes member lease.
      • Optionally takes schedule and thresold based delta and full snapshots and pushes them to a configured object store.
      • Orchestrates scheduled etcd-db defragmentation.

      NOTE: This is not a complete list of functionalities offered out of etcd-backup-restore.

Code reference: StatefulSet-Component

For detailed information on each container you can visit etcd-wrapper and etcd-backup-restore respositories.

ConfigMap

Every etcd member requires configuration with which it must be started. etcd-druid creates a ConfigMap which gets mounted onto the etcd-backup-restore container. etcd-backup-restore container will modify the etcd configuration and serve it to the etcd-wrapper container upon request.

Code reference: ConfigMap-Component

PodDisruptionBudget

An etcd cluster requires quorum for all write operations. Clients can additionally configure quorum based reads as well to ensure linearizable reads (kube-apiserver’s etcd client is configured for linearizable reads and writes). In a cluster of size 3, only 1 member failure is tolerated. Failure tolerance for an etcd cluster with replicas n is computed as (n-1)/2.

To ensure that etcd pods are not evicted more than its failure tolerance, etcd-druid creates a PodDisruptionBudget.

!!! note For a single node etcd cluster a PodDisruptionBudget will be created, however pdb.spec.minavailable is set to 0 effectively disabling it.

Code reference: PodDisruptionBudget-Component

ServiceAccount

etch-backup-restore container running as a side-car in every etcd-member, requires permissions to access resources like Lease, StatefulSet etc. A dedicated ServiceAccount is created per Etcd cluster for this purpose.

Code reference: ServiceAccount-Component

Role & RoleBinding

etch-backup-restore container running as a side-car in every etcd-member, requires permissions to access resources like Lease, StatefulSet etc. A dedicated Role and RoleBinding is created and linked to the ServiceAccount created per Etcd cluster.

Code reference: Role-Component & RoleBinding-Component

Client & Peer Service

To enable clients to connect to an etcd cluster a ClusterIP Client Service is created. To enable etcd members to talk to each other(for discovery, leader-election, raft consensus etc.) etcd-druid also creates a Headless Service.

Code reference: Client-Service-Component & Peer-Service-Component

Member Lease

Every member in an Etcd cluster has a dedicated Lease that gets created which signifies that the member is alive. It is the responsibility of the etcd-backup-store side-car container to periodically renew the lease.

!!! note Today the lease object is also used to indicate the member-ID and the role of the member in an etcd cluster. Possible roles are Leader, Member(which denotes that this is a member but not a leader). This will change in the future with EtcdMember resource.

Code reference: Member-Lease-Component

Delta & Full Snapshot Leases

One of the responsibilities of etcd-backup-restore container is to take periodic or threshold based snapshots (delta and full) of the etcd DB. Today etcd-backup-restore communicates the end-revision of the latest full/delta snapshots to etcd-druid operator via leases.

etcd-druid creates two Lease resources one for delta and another for full snapshot. This information is used by the operator to trigger snapshot-compaction jobs. Snapshot leases are also used to derive the health of backups which gets updated in the Status subresource of every Etcd resource.

In future these leases will be replaced by EtcdMember resource.

Code reference: Snapshot-Lease-Component

6.3.13 - Etcd Cluster Resource Protection

Etcd Cluster Resource Protection

etcd-druid provisions and manages kubernetes resources (a.k.a components) for each Etcd cluster. To ensure that each component’s specification is in line with the configured attributes defined in Etcd custom resource and to protect unintended changes done to any of these managed components a Validating Webhook is employed.

Etcd Components Webhook is the validating webhook which prevents unintended UPDATE and DELETE operations on all managed resources. Following sections describe what is prohibited and in which specific conditions the changes are permitted.

Configure Etcd Components Webhook

Prerequisite to enable the validation webhook is to configure the Webhook Server. Additionally you need to enable the Etcd Components validating webhook and optionally configure other options. You can look at all the options here.

What is allowed?

Modifications to managed resources under the following circumstances will be allowed:

  • Create and Connect operations are allowed and no validation is done.
  • Changes to a kubernetes resource (e.g. StatefulSet, ConfigMap etc) not managed by etcd-druid are allowed.
  • Changes to a resource whose Group-Kind is amongst the resources managed by etcd-druid but does not have a parent Etcd resource are allowed.
  • It is possible that an operator wishes to explicitly disable etcd-component protection. This can be done by setting druid.gardener.cloud/disable-etcd-component-protection annotation on an Etcd resource. If this annotation is present then changes to managed components will be allowed.
  • If Etcd resource has a deletion timestamp set indicating that it is marked for deletion and is awaiting etcd-druid to delete all managed resources then deletion requests for all managed resources for this etcd cluster will be allowed if:
    • The deletion request has come from a ServiceAccount associated to etcd-druid. If not explicitly specified via --reconciler-service-account then a default-reconciler-service-account will be assumed.
    • The deletion request has come from a ServiceAccount configured via --etcd-components-webhook-exempt-service-accounts.
  • Lease objects are periodically updated by each etcd member pod. A single ServiceAccount is created for all members. Update operation on Lease objects from this ServiceAccount is allowed.
  • If an active reconciliation is in-progress then only allow operations that are initiated by etcd-druid.
  • If no active reconciliation is currently in-progress, then allow updates to managed resource from ServiceAccounts configured via --etcd-components-webhook-exempt-service-accounts.

6.3.14 - Etcd Druid Api

API Reference

Packages

druid.gardener.cloud/v1alpha1

Package v1alpha1 contains API Schema definitions for the druid v1alpha1 API group

Resource Types

BackupSpec

BackupSpec defines parameters associated with the full and delta snapshots of etcd.

Appears in:

FieldDescriptionDefaultValidation
port integerPort define the port on which etcd-backup-restore server will be exposed.
tls TLSConfig
image stringImage defines the etcd container image and tag
store StoreSpecStore defines the specification of object store provider for storing backups.
resources ResourceRequirementsResources defines compute Resources required by backup-restore container.
More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/
compactionResources ResourceRequirementsCompactionResources defines compute Resources required by compaction job.
More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/
fullSnapshotSchedule stringFullSnapshotSchedule defines the cron standard schedule for full snapshots.
garbageCollectionPolicy GarbageCollectionPolicyGarbageCollectionPolicy defines the policy for garbage collecting old backupsEnum: [Exponential LimitBased]
maxBackupsLimitBasedGC integerMaxBackupsLimitBasedGC defines the maximum number of Full snapshots to retain in Limit Based GarbageCollectionPolicy
All full snapshots beyond this limit will be garbage collected.
garbageCollectionPeriod DurationGarbageCollectionPeriod defines the period for garbage collecting old backups
deltaSnapshotPeriod DurationDeltaSnapshotPeriod defines the period after which delta snapshots will be taken
deltaSnapshotMemoryLimit QuantityDeltaSnapshotMemoryLimit defines the memory limit after which delta snapshots will be taken
deltaSnapshotRetentionPeriod DurationDeltaSnapshotRetentionPeriod defines the duration for which delta snapshots will be retained, excluding the latest snapshot set.
The value should be a string formatted as a duration (e.g., ‘1s’, ‘2m’, ‘3h’, ‘4d’)
Pattern: ^([0-9][0-9]*([.][0-9]+)?(s|m|h|d))+$
Type: string
compression CompressionSpecSnapshotCompression defines the specification for compression of Snapshots.
enableProfiling booleanEnableProfiling defines if profiling should be enabled for the etcd-backup-restore-sidecar
etcdSnapshotTimeout DurationEtcdSnapshotTimeout defines the timeout duration for etcd FullSnapshot operation
leaderElection LeaderElectionSpecLeaderElection defines parameters related to the LeaderElection configuration.

ClientService

ClientService defines the parameters of the client service that a user can specify

Appears in:

FieldDescriptionDefaultValidation
annotations object (keys:string, values:string)Annotations specify the annotations that should be added to the client service
labels object (keys:string, values:string)Labels specify the labels that should be added to the client service

CompactionMode

Underlying type: string

CompactionMode defines the auto-compaction-mode: ‘periodic’ or ‘revision’. -‘periodic’ for duration based retention and ‘revision’ for revision number based retention.

Validation:

  • Enum: [periodic revision]

Appears in:

FieldDescription
periodicPeriodic is a constant to set auto-compaction-mode ‘periodic’ for duration based retention.
revisionRevision is a constant to set auto-compaction-mode ‘revision’ for revision number based retention.

CompressionPolicy

Underlying type: string

CompressionPolicy defines the type of policy for compression of snapshots.

Validation:

  • Enum: [gzip lzw zlib]

Appears in:

FieldDescription
gzipGzipCompression is constant for gzip compression policy.
lzwLzwCompression is constant for lzw compression policy.
zlibZlibCompression is constant for zlib compression policy.

CompressionSpec

CompressionSpec defines parameters related to compression of Snapshots(full as well as delta).

Appears in:

FieldDescriptionDefaultValidation
enabled boolean
policy CompressionPolicyEnum: [gzip lzw zlib]

Condition

Condition holds the information about the state of a resource.

Appears in:

FieldDescriptionDefaultValidation
type ConditionTypeType of the Etcd condition.
status ConditionStatusStatus of the condition, one of True, False, Unknown.
lastTransitionTime TimeLast time the condition transitioned from one status to another.
lastUpdateTime TimeLast time the condition was updated.
reason stringThe reason for the condition’s last transition.
message stringA human-readable message indicating details about the transition.

ConditionStatus

Underlying type: string

ConditionStatus is the status of a condition.

Appears in:

FieldDescription
TrueConditionTrue means a resource is in the condition.
FalseConditionFalse means a resource is not in the condition.
UnknownConditionUnknown means Gardener can’t decide if a resource is in the condition or not.
ProgressingConditionProgressing means the condition was seen true, failed but stayed within a predefined failure threshold.
In the future, we could add other intermediate conditions, e.g. ConditionDegraded.
ConditionCheckErrorConditionCheckError is a constant for a reason in condition.

ConditionType

Underlying type: string

ConditionType is the type of condition.

Appears in:

FieldDescription
ReadyConditionTypeReady is a constant for a condition type indicating that the etcd cluster is ready.
AllMembersReadyConditionTypeAllMembersReady is a constant for a condition type indicating that all members of the etcd cluster are ready.
BackupReadyConditionTypeBackupReady is a constant for a condition type indicating that the etcd backup is ready.
DataVolumesReadyConditionTypeDataVolumesReady is a constant for a condition type indicating that the etcd data volumes are ready.
SucceededEtcdCopyBackupsTaskSucceeded is a condition type indicating that a EtcdCopyBackupsTask has succeeded.
FailedEtcdCopyBackupsTaskFailed is a condition type indicating that a EtcdCopyBackupsTask has failed.

CrossVersionObjectReference

CrossVersionObjectReference contains enough information to let you identify the referred resource.

Appears in:

FieldDescriptionDefaultValidation
kind stringKind of the referent
name stringName of the referent
apiVersion stringAPI version of the referent

ErrorCode

Underlying type: string

ErrorCode is a string alias representing an error code that identifies an error.

Appears in:

Etcd

Etcd is the Schema for the etcds API

FieldDescriptionDefaultValidation
apiVersion stringdruid.gardener.cloud/v1alpha1
kind stringEtcd
metadata ObjectMetaRefer to Kubernetes API documentation for fields of metadata.
spec EtcdSpec
status EtcdStatus

EtcdConfig

EtcdConfig defines the configuration for the etcd cluster to be deployed.

Appears in:

FieldDescriptionDefaultValidation
quota QuantityQuota defines the etcd DB quota.
defragmentationSchedule stringDefragmentationSchedule defines the cron standard schedule for defragmentation of etcd.
serverPort integer
clientPort integer
image stringImage defines the etcd container image and tag
authSecretRef SecretReference
metrics MetricsLevelMetrics defines the level of detail for exported metrics of etcd, specify ’extensive’ to include histogram metrics.Enum: [basic extensive]
resources ResourceRequirementsResources defines the compute Resources required by etcd container.
More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/
clientUrlTls TLSConfigClientUrlTLS contains the ca, server TLS and client TLS secrets for client communication to ETCD cluster
peerUrlTls TLSConfigPeerUrlTLS contains the ca and server TLS secrets for peer communication within ETCD cluster
Currently, PeerUrlTLS does not require client TLS secrets for gardener implementation of ETCD cluster.
etcdDefragTimeout DurationEtcdDefragTimeout defines the timeout duration for etcd defrag call
heartbeatDuration DurationHeartbeatDuration defines the duration for members to send heartbeats. The default value is 10s.
clientService ClientServiceClientService defines the parameters of the client service that a user can specify

EtcdCopyBackupsTask

EtcdCopyBackupsTask is a task for copying etcd backups from a source to a target store.

FieldDescriptionDefaultValidation
apiVersion stringdruid.gardener.cloud/v1alpha1
kind stringEtcdCopyBackupsTask
metadata ObjectMetaRefer to Kubernetes API documentation for fields of metadata.
spec EtcdCopyBackupsTaskSpec
status EtcdCopyBackupsTaskStatus

EtcdCopyBackupsTaskSpec

EtcdCopyBackupsTaskSpec defines the parameters for the copy backups task.

Appears in:

FieldDescriptionDefaultValidation
sourceStore StoreSpecSourceStore defines the specification of the source object store provider for storing backups.
targetStore StoreSpecTargetStore defines the specification of the target object store provider for storing backups.
maxBackupAge integerMaxBackupAge is the maximum age in days that a backup must have in order to be copied.
By default all backups will be copied.
maxBackups integerMaxBackups is the maximum number of backups that will be copied starting with the most recent ones.
waitForFinalSnapshot WaitForFinalSnapshotSpecWaitForFinalSnapshot defines the parameters for waiting for a final full snapshot before copying backups.

EtcdCopyBackupsTaskStatus

EtcdCopyBackupsTaskStatus defines the observed state of the copy backups task.

Appears in:

FieldDescriptionDefaultValidation
conditions Condition arrayConditions represents the latest available observations of an object’s current state.
observedGeneration integerObservedGeneration is the most recent generation observed for this resource.
lastError stringLastError represents the last occurred error.

EtcdMemberConditionStatus

Underlying type: string

EtcdMemberConditionStatus is the status of an etcd cluster member.

Appears in:

FieldDescription
ReadyEtcdMemberStatusReady indicates that the etcd member is ready.
NotReadyEtcdMemberStatusNotReady indicates that the etcd member is not ready.
UnknownEtcdMemberStatusUnknown indicates that the status of the etcd member is unknown.

EtcdMemberStatus

EtcdMemberStatus holds information about etcd cluster membership.

Appears in:

FieldDescriptionDefaultValidation
name stringName is the name of the etcd member. It is the name of the backing Pod.
id stringID is the ID of the etcd member.
role EtcdRoleRole is the role in the etcd cluster, either Leader or Member.
status EtcdMemberConditionStatusStatus of the condition, one of True, False, Unknown.
reason stringThe reason for the condition’s last transition.
lastTransitionTime TimeLastTransitionTime is the last time the condition’s status changed.

EtcdRole

Underlying type: string

EtcdRole is the role of an etcd cluster member.

Appears in:

FieldDescription
LeaderEtcdRoleLeader describes the etcd role Leader.
MemberEtcdRoleMember describes the etcd role Member.

EtcdSpec

EtcdSpec defines the desired state of Etcd

Appears in:

FieldDescriptionDefaultValidation
selector LabelSelectorselector is a label query over pods that should match the replica count.
It must match the pod template’s labels.
More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors
labels object (keys:string, values:string)
annotations object (keys:string, values:string)
etcd EtcdConfig
backup BackupSpec
sharedConfig SharedConfig
schedulingConstraints SchedulingConstraints
replicas integer
priorityClassName stringPriorityClassName is the name of a priority class that shall be used for the etcd pods.
storageClass stringStorageClass defines the name of the StorageClass required by the claim.
More info: https://kubernetes.io/docs/concepts/storage/persistent-volumes#class-1
storageCapacity QuantityStorageCapacity defines the size of persistent volume.
volumeClaimTemplate stringVolumeClaimTemplate defines the volume claim template to be created

EtcdStatus

EtcdStatus defines the observed state of Etcd.

Appears in:

FieldDescriptionDefaultValidation
observedGeneration integerObservedGeneration is the most recent generation observed for this resource.
etcd CrossVersionObjectReference
conditions Condition arrayConditions represents the latest available observations of an etcd’s current state.
serviceName stringServiceName is the name of the etcd service.
Deprecated: this field will be removed in the future.
lastError stringLastError represents the last occurred error.
Deprecated: Use LastErrors instead.
lastErrors LastError arrayLastErrors captures errors that occurred during the last operation.
lastOperation LastOperationLastOperation indicates the last operation performed on this resource.
clusterSize integerCluster size is the current size of the etcd cluster.
Deprecated: this field will not be populated with any value and will be removed in the future.
currentReplicas integerCurrentReplicas is the current replica count for the etcd cluster.
replicas integerReplicas is the replica count of the etcd cluster.
readyReplicas integerReadyReplicas is the count of replicas being ready in the etcd cluster.
ready booleanReady is true if all etcd replicas are ready.
updatedReplicas integerUpdatedReplicas is the count of updated replicas in the etcd cluster.
Deprecated: this field will be removed in the future.
labelSelector LabelSelectorLabelSelector is a label query over pods that should match the replica count.
It must match the pod template’s labels.
Deprecated: this field will be removed in the future.
members EtcdMemberStatus arrayMembers represents the members of the etcd cluster
peerUrlTLSEnabled booleanPeerUrlTLSEnabled captures the state of peer url TLS being enabled for the etcd member(s)

GarbageCollectionPolicy

Underlying type: string

GarbageCollectionPolicy defines the type of policy for snapshot garbage collection.

Validation:

  • Enum: [Exponential LimitBased]

Appears in:

LastError

LastError stores details of the most recent error encountered for a resource.

Appears in:

FieldDescriptionDefaultValidation
code ErrorCodeCode is an error code that uniquely identifies an error.
description stringDescription is a human-readable message indicating details of the error.
observedAt TimeObservedAt is the time the error was observed.

LastOperation

LastOperation holds the information on the last operation done on the Etcd resource.

Appears in:

FieldDescriptionDefaultValidation
type LastOperationTypeType is the type of last operation.
state LastOperationStateState is the state of the last operation.
description stringDescription describes the last operation.
runID stringRunID correlates an operation with a reconciliation run.
Every time an Etcd resource is reconciled (barring status reconciliation which is periodic), a unique ID is
generated which can be used to correlate all actions done as part of a single reconcile run. Capturing this
as part of LastOperation aids in establishing this correlation. This further helps in also easily filtering
reconcile logs as all structured logs in a reconciliation run should have the runID referenced.
lastUpdateTime TimeLastUpdateTime is the time at which the operation was last updated.

LastOperationState

Underlying type: string

LastOperationState is a string alias representing the state of the last operation.

Appears in:

FieldDescription
ProcessingLastOperationStateProcessing indicates that an operation is in progress.
SucceededLastOperationStateSucceeded indicates that an operation has completed successfully.
ErrorLastOperationStateError indicates that an operation is completed with errors and will be retried.
RequeueLastOperationStateRequeue indicates that an operation is not completed and either due to an error or unfulfilled conditions will be retried.

LastOperationType

Underlying type: string

LastOperationType is a string alias representing type of the last operation.

Appears in:

FieldDescription
CreateLastOperationTypeCreate indicates that the last operation was a creation of a new Etcd resource.
ReconcileLastOperationTypeReconcile indicates that the last operation was a reconciliation of the spec of an Etcd resource.
DeleteLastOperationTypeDelete indicates that the last operation was a deletion of an existing Etcd resource.

LeaderElectionSpec

LeaderElectionSpec defines parameters related to the LeaderElection configuration.

Appears in:

FieldDescriptionDefaultValidation
reelectionPeriod DurationReelectionPeriod defines the Period after which leadership status of corresponding etcd is checked.
etcdConnectionTimeout DurationEtcdConnectionTimeout defines the timeout duration for etcd client connection during leader election.

MetricsLevel

Underlying type: string

MetricsLevel defines the level ‘basic’ or ’extensive’.

Validation:

  • Enum: [basic extensive]

Appears in:

FieldDescription
basicBasic is a constant for metrics level basic.
extensiveExtensive is a constant for metrics level extensive.

SchedulingConstraints

SchedulingConstraints defines the different scheduling constraints that must be applied to the +‘periodic’ for duration based retention and ‘revision’ for revision number based retention.

Validation:

  • Enum: [periodic revision]

Appears in:

FieldDescription
periodicPeriodic is a constant to set auto-compaction-mode ‘periodic’ for duration based retention.
revisionRevision is a constant to set auto-compaction-mode ‘revision’ for revision number based retention.

CompressionPolicy

Underlying type: string

CompressionPolicy defines the type of policy for compression of snapshots.

Validation:

  • Enum: [gzip lzw zlib]

Appears in:

FieldDescription
gzipGzipCompression is constant for gzip compression policy.
lzwLzwCompression is constant for lzw compression policy.
zlibZlibCompression is constant for zlib compression policy.

CompressionSpec

CompressionSpec defines parameters related to compression of Snapshots(full as well as delta).

Appears in:

FieldDescriptionDefaultValidation
enabled boolean
policy CompressionPolicyEnum: [gzip lzw zlib]

Condition

Condition holds the information about the state of a resource.

Appears in:

FieldDescriptionDefaultValidation
type ConditionTypeType of the Etcd condition.
status ConditionStatusStatus of the condition, one of True, False, Unknown.
lastTransitionTime TimeLast time the condition transitioned from one status to another.
lastUpdateTime TimeLast time the condition was updated.
reason stringThe reason for the condition’s last transition.
message stringA human-readable message indicating details about the transition.

ConditionStatus

Underlying type: string

ConditionStatus is the status of a condition.

Appears in:

FieldDescription
TrueConditionTrue means a resource is in the condition.
FalseConditionFalse means a resource is not in the condition.
UnknownConditionUnknown means Gardener can’t decide if a resource is in the condition or not.
ProgressingConditionProgressing means the condition was seen true, failed but stayed within a predefined failure threshold.
In the future, we could add other intermediate conditions, e.g. ConditionDegraded.
ConditionCheckErrorConditionCheckError is a constant for a reason in condition.

ConditionType

Underlying type: string

ConditionType is the type of condition.

Appears in:

FieldDescription
ReadyConditionTypeReady is a constant for a condition type indicating that the etcd cluster is ready.
AllMembersReadyConditionTypeAllMembersReady is a constant for a condition type indicating that all members of the etcd cluster are ready.
BackupReadyConditionTypeBackupReady is a constant for a condition type indicating that the etcd backup is ready.
DataVolumesReadyConditionTypeDataVolumesReady is a constant for a condition type indicating that the etcd data volumes are ready.
SucceededEtcdCopyBackupsTaskSucceeded is a condition type indicating that a EtcdCopyBackupsTask has succeeded.
FailedEtcdCopyBackupsTaskFailed is a condition type indicating that a EtcdCopyBackupsTask has failed.

CrossVersionObjectReference

CrossVersionObjectReference contains enough information to let you identify the referred resource.

Appears in:

FieldDescriptionDefaultValidation
kind stringKind of the referent
name stringName of the referent
apiVersion stringAPI version of the referent

ErrorCode

Underlying type: string

ErrorCode is a string alias representing an error code that identifies an error.

Appears in:

Etcd

Etcd is the Schema for the etcds API

FieldDescriptionDefaultValidation
apiVersion stringdruid.gardener.cloud/v1alpha1
kind stringEtcd
metadata ObjectMetaRefer to Kubernetes API documentation for fields of metadata.
spec EtcdSpec
status EtcdStatus

EtcdConfig

EtcdConfig defines the configuration for the etcd cluster to be deployed.

Appears in:

FieldDescriptionDefaultValidation
quota QuantityQuota defines the etcd DB quota.
snapshotCount integerSnapshotCount defines the number of applied Raft entries to hold in-memory before compaction.
More info: https://etcd.io/docs/v3.4/op-guide/maintenance/#raft-log-retention
defragmentationSchedule stringDefragmentationSchedule defines the cron standard schedule for defragmentation of etcd.
serverPort integer
clientPort integer
image stringImage defines the etcd container image and tag
authSecretRef SecretReference
metrics MetricsLevelMetrics defines the level of detail for exported metrics of etcd, specify ’extensive’ to include histogram metrics.Enum: [basic extensive]
resources ResourceRequirementsResources defines the compute Resources required by etcd container.
More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/
clientUrlTls TLSConfigClientUrlTLS contains the ca, server TLS and client TLS secrets for client communication to ETCD cluster
peerUrlTls TLSConfigPeerUrlTLS contains the ca and server TLS secrets for peer communication within ETCD cluster
Currently, PeerUrlTLS does not require client TLS secrets for gardener implementation of ETCD cluster.
etcdDefragTimeout DurationEtcdDefragTimeout defines the timeout duration for etcd defrag call
heartbeatDuration DurationHeartbeatDuration defines the duration for members to send heartbeats. The default value is 10s.
clientService ClientServiceClientService defines the parameters of the client service that a user can specify

EtcdCopyBackupsTask

EtcdCopyBackupsTask is a task for copying etcd backups from a source to a target store.

FieldDescriptionDefaultValidation
apiVersion stringdruid.gardener.cloud/v1alpha1
kind stringEtcdCopyBackupsTask
metadata ObjectMetaRefer to Kubernetes API documentation for fields of metadata.
spec EtcdCopyBackupsTaskSpec
status EtcdCopyBackupsTaskStatus

EtcdCopyBackupsTaskSpec

EtcdCopyBackupsTaskSpec defines the parameters for the copy backups task.

Appears in:

FieldDescriptionDefaultValidation
sourceStore StoreSpecSourceStore defines the specification of the source object store provider for storing backups.
targetStore StoreSpecTargetStore defines the specification of the target object store provider for storing backups.
maxBackupAge integerMaxBackupAge is the maximum age in days that a backup must have in order to be copied.
By default all backups will be copied.
maxBackups integerMaxBackups is the maximum number of backups that will be copied starting with the most recent ones.
waitForFinalSnapshot WaitForFinalSnapshotSpecWaitForFinalSnapshot defines the parameters for waiting for a final full snapshot before copying backups.

EtcdCopyBackupsTaskStatus

EtcdCopyBackupsTaskStatus defines the observed state of the copy backups task.

Appears in:

FieldDescriptionDefaultValidation
conditions Condition arrayConditions represents the latest available observations of an object’s current state.
observedGeneration integerObservedGeneration is the most recent generation observed for this resource.
lastError stringLastError represents the last occurred error.

EtcdMemberConditionStatus

Underlying type: string

EtcdMemberConditionStatus is the status of an etcd cluster member.

Appears in:

FieldDescription
ReadyEtcdMemberStatusReady indicates that the etcd member is ready.
NotReadyEtcdMemberStatusNotReady indicates that the etcd member is not ready.
UnknownEtcdMemberStatusUnknown indicates that the status of the etcd member is unknown.

EtcdMemberStatus

EtcdMemberStatus holds information about etcd cluster membership.

Appears in:

FieldDescriptionDefaultValidation
name stringName is the name of the etcd member. It is the name of the backing Pod.
id stringID is the ID of the etcd member.
role EtcdRoleRole is the role in the etcd cluster, either Leader or Member.
status EtcdMemberConditionStatusStatus of the condition, one of True, False, Unknown.
reason stringThe reason for the condition’s last transition.
lastTransitionTime TimeLastTransitionTime is the last time the condition’s status changed.

EtcdRole

Underlying type: string

EtcdRole is the role of an etcd cluster member.

Appears in:

FieldDescription
LeaderEtcdRoleLeader describes the etcd role Leader.
MemberEtcdRoleMember describes the etcd role Member.

EtcdSpec

EtcdSpec defines the desired state of Etcd

Appears in:

FieldDescriptionDefaultValidation
selector LabelSelectorselector is a label query over pods that should match the replica count.
It must match the pod template’s labels.
More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors
labels object (keys:string, values:string)
annotations object (keys:string, values:string)
etcd EtcdConfig
backup BackupSpec
sharedConfig SharedConfig
schedulingConstraints SchedulingConstraints
replicas integer
priorityClassName stringPriorityClassName is the name of a priority class that shall be used for the etcd pods.
storageClass stringStorageClass defines the name of the StorageClass required by the claim.
More info: https://kubernetes.io/docs/concepts/storage/persistent-volumes#class-1
storageCapacity QuantityStorageCapacity defines the size of persistent volume.
volumeClaimTemplate stringVolumeClaimTemplate defines the volume claim template to be created

EtcdStatus

EtcdStatus defines the observed state of Etcd.

Appears in:

FieldDescriptionDefaultValidation
observedGeneration integerObservedGeneration is the most recent generation observed for this resource.
etcd CrossVersionObjectReference
conditions Condition arrayConditions represents the latest available observations of an etcd’s current state.
serviceName stringServiceName is the name of the etcd service.
Deprecated: this field will be removed in the future.
lastError stringLastError represents the last occurred error.
Deprecated: Use LastErrors instead.
lastErrors LastError arrayLastErrors captures errors that occurred during the last operation.
lastOperation LastOperationLastOperation indicates the last operation performed on this resource.
clusterSize integerCluster size is the current size of the etcd cluster.
Deprecated: this field will not be populated with any value and will be removed in the future.
currentReplicas integerCurrentReplicas is the current replica count for the etcd cluster.
replicas integerReplicas is the replica count of the etcd cluster.
readyReplicas integerReadyReplicas is the count of replicas being ready in the etcd cluster.
ready booleanReady is true if all etcd replicas are ready.
updatedReplicas integerUpdatedReplicas is the count of updated replicas in the etcd cluster.
Deprecated: this field will be removed in the future.
labelSelector LabelSelectorLabelSelector is a label query over pods that should match the replica count.
It must match the pod template’s labels.
Deprecated: this field will be removed in the future.
members EtcdMemberStatus arrayMembers represents the members of the etcd cluster
peerUrlTLSEnabled booleanPeerUrlTLSEnabled captures the state of peer url TLS being enabled for the etcd member(s)

GarbageCollectionPolicy

Underlying type: string

GarbageCollectionPolicy defines the type of policy for snapshot garbage collection.

Validation:

  • Enum: [Exponential LimitBased]

Appears in:

LastError

LastError stores details of the most recent error encountered for a resource.

Appears in:

FieldDescriptionDefaultValidation
code ErrorCodeCode is an error code that uniquely identifies an error.
description stringDescription is a human-readable message indicating details of the error.
observedAt TimeObservedAt is the time the error was observed.

LastOperation

LastOperation holds the information on the last operation done on the Etcd resource.

Appears in:

FieldDescriptionDefaultValidation
type LastOperationTypeType is the type of last operation.
state LastOperationStateState is the state of the last operation.
description stringDescription describes the last operation.
runID stringRunID correlates an operation with a reconciliation run.
Every time an Etcd resource is reconciled (barring status reconciliation which is periodic), a unique ID is
generated which can be used to correlate all actions done as part of a single reconcile run. Capturing this
as part of LastOperation aids in establishing this correlation. This further helps in also easily filtering
reconcile logs as all structured logs in a reconciliation run should have the runID referenced.
lastUpdateTime TimeLastUpdateTime is the time at which the operation was last updated.

LastOperationState

Underlying type: string

LastOperationState is a string alias representing the state of the last operation.

Appears in:

FieldDescription
ProcessingLastOperationStateProcessing indicates that an operation is in progress.
SucceededLastOperationStateSucceeded indicates that an operation has completed successfully.
ErrorLastOperationStateError indicates that an operation is completed with errors and will be retried.
RequeueLastOperationStateRequeue indicates that an operation is not completed and either due to an error or unfulfilled conditions will be retried.

LastOperationType

Underlying type: string

LastOperationType is a string alias representing type of the last operation.

Appears in:

FieldDescription
CreateLastOperationTypeCreate indicates that the last operation was a creation of a new Etcd resource.
ReconcileLastOperationTypeReconcile indicates that the last operation was a reconciliation of the spec of an Etcd resource.
DeleteLastOperationTypeDelete indicates that the last operation was a deletion of an existing Etcd resource.

LeaderElectionSpec

LeaderElectionSpec defines parameters related to the LeaderElection configuration.

Appears in:

FieldDescriptionDefaultValidation
reelectionPeriod DurationReelectionPeriod defines the Period after which leadership status of corresponding etcd is checked.
etcdConnectionTimeout DurationEtcdConnectionTimeout defines the timeout duration for etcd client connection during leader election.

MetricsLevel

Underlying type: string

MetricsLevel defines the level ‘basic’ or ’extensive’.

Validation:

  • Enum: [basic extensive]

Appears in:

FieldDescription
basicBasic is a constant for metrics level basic.
extensiveExtensive is a constant for metrics level extensive.

SchedulingConstraints

SchedulingConstraints defines the different scheduling constraints that must be applied to the pod spec in the etcd statefulset. Currently supported constraints are Affinity and TopologySpreadConstraints.

Appears in:

FieldDescriptionDefaultValidation
affinity AffinityAffinity defines the various affinity and anti-affinity rules for a pod
that are honoured by the kube-scheduler.
topologySpreadConstraints TopologySpreadConstraint arrayTopologySpreadConstraints describes how a group of pods ought to spread across topology domains,
that are honoured by the kube-scheduler.

SecretReference

SecretReference defines a reference to a secret.

Appears in:

FieldDescriptionDefaultValidation
name stringname is unique within a namespace to reference a secret resource.
namespace stringnamespace defines the space within which the secret name must be unique.
dataKey stringDataKey is the name of the key in the data map containing the credentials.

SharedConfig

SharedConfig defines parameters shared and used by Etcd as well as backup-restore sidecar.

Appears in:

FieldDescriptionDefaultValidation
autoCompactionMode CompactionModeAutoCompactionMode defines the auto-compaction-mode:‘periodic’ mode or ‘revision’ mode for etcd and embedded-etcd of backup-restore sidecar.Enum: [periodic revision]
autoCompactionRetention stringAutoCompactionRetention defines the auto-compaction-retention length for etcd as well as for embedded-etcd of backup-restore sidecar.

StorageProvider

Underlying type: string

StorageProvider defines the type of object store provider for storing backups.

Appears in:

StoreSpec

StoreSpec defines parameters related to ObjectStore persisting backups

Appears in:

FieldDescriptionDefaultValidation
container stringContainer is the name of the container the backup is stored at.
prefix stringPrefix is the prefix used for the store.
provider StorageProviderProvider is the name of the backup provider.
secretRef SecretReferenceSecretRef is the reference to the secret which used to connect to the backup store.

TLSConfig

TLSConfig hold the TLS configuration details.

Appears in:

FieldDescriptionDefaultValidation
tlsCASecretRef SecretReference
serverTLSSecretRef SecretReference
clientTLSSecretRef SecretReference

WaitForFinalSnapshotSpec

WaitForFinalSnapshotSpec defines the parameters for waiting for a final full snapshot before copying backups.

Appears in:

FieldDescriptionDefaultValidation
enabled booleanEnabled specifies whether to wait for a final full snapshot before copying backups.
timeout DurationTimeout is the timeout for waiting for a final full snapshot. When this timeout expires, the copying of backups
will be performed anyway. No timeout or 0 means wait forever.

6.3.15 - etcd Network Latency

Network Latency analysis: sn-etcd-sz vs mn-etcd-sz vs mn-etcd-mz

This page captures the etcd cluster latency analysis for below scenarios using the benchmark tool (build from etcd benchmark tool).

sn-etcd-sz -> single-node etcd single zone (Only single replica of etcd will be running)

mn-etcd-sz -> multi-node etcd single zone (Multiple replicas of etcd pods will be running across nodes in a single zone)

mn-etcd-mz -> multi-node etcd multi zone (Multiple replicas of etcd pods will be running across nodes in multiple zones)

PUT Analysis

Summary

  • sn-etcd-sz latency is ~20% less than mn-etcd-sz when benchmark tool with single client.
  • mn-etcd-sz latency is less than mn-etcd-mz but the difference is ~+/-5%.
  • Compared to mn-etcd-sz, sn-etcd-sz latency is higher and gradually grows with more clients and larger value size.
  • Compared to mn-etcd-mz, mn-etcd-sz latency is higher and gradually grows with more clients and larger value size.
  • Compared to follower, leader latency is less, when benchmark tool with single client for all cases.
  • Compared to follower, leader latency is high, when benchmark tool with multiple clients for all cases.

Sample commands:

# write to leader
 benchmark put --target-leader --conns=1 --clients=1 --precise \
@@ -18031,7 +18117,7 @@
   // MemberRefs contains references to all existing EtcdMember resources
   MemberRefs []CrossVersionObjectReference
 }
-
  1. In Etcd.Status resource API, PeerUrlTLSEnabled reflects the status of enabling TLS for peer communication across all etcd-members. Currentlty this field is not been used anywhere. In this proposal, the authors have also proposed that each EtcdMember resource should capture the status of TLS enablement of peer URL. The authors propose to relook at the need to have this field under EtcdStatus.

Lifecycle of an EtcdMember

Creation

Druid creates an EtcdMember resource for every replica in etcd.Spec.Replicas during reconciliation of an etcd resource. For a fresh etcd cluster this is done prior to creation of the StatefulSet resource and for an existing cluster which has now been scaled-up, it is done prior to updating the StatefulSet resource.

Updation

All fields in EtcdMember.Status are only updated by the corresponding etcd-member. Druid only consumes the information published via EtcdMember resources.

Deletion

Druid is responsible for deletion of all existing EtcdMember resources for an etcd cluster. There are three scenarios where an EtcdMember resource will be deleted:

  1. Deletion of etcd resource.

  2. Scale down of an etcd cluster to 0 replicas due to hibernation of the k8s control plane.

  3. Transient scale down of an etcd cluster to 0 replicas to recover from a quorum loss.

Authors found no reason to retain EtcdMember resources when the etcd cluster is scale down to 0 replicas since the information contained in each EtcdMember resource would no longer represent the current state of each member and would thus be stale. Any controller in druid which acts upon the EtcdMember.Status could potentially take incorrect actions.

Reconciliation

Authors propose to introduce a new controller (let’s call it etcd-member-controller) which watches for changes to the EtcdMember resource(s). If a reconciliation of an Etcd resource is required as a result of change in EtcdMember status then this controller should enqueue an event and force a reconciliation via existing etcd-controller, thus preserving the single-actor-principal constraint which ensures deterministic changes to etcd cluster resources.

NOTE: Further decisions w.r.t responsibility segregation will be taken during implementation and will not be documented in this proposal.

Stale EtcdMember Status Handling

It is possible that an etcd-member is unable to update its respective EtcdMember resource. Following can be some of the implications which should be kept in mind while reconciling EtcdMember resource in druid:

  • Druid sees stale state transitions (this assumes that the backup-sidecar attempts to update the state/sub-state in etcdMember.status.transitions with best attempt). There is currently no implication other than an operator seeing a stale state.
  • dbSize and dbSizeInUse could not be updated. A consequence could be that druid continues to see high value for dbSize - dbSizeInUse for a extended amount of time. Druid should ensure that it does not trigger repeated defragmentations.
  • If VolumeMismatches is stale, then druid should no longer attempt to recover by repeatedly restarting the pod.
  • Failed restoration was recorded last and further updates to this array failed. Druid should not repeatedly take full-snapshots.
  • If snapshots.accumulatedDeltaSize could not be updated, then druid should not schedule repeated compaction Jobs.

Reference

6.3.17 - Feature Gates in Etcd-Druid

Feature Gates in Etcd-Druid

This page contains an overview of the various feature gates an administrator can specify on etcd-druid.

Overview

Feature gates are a set of key=value pairs that describe etcd-druid features. You can turn these features on or off by passing them to the --feature-gates CLI flag in the etcd-druid command.

The following tables are a summary of the feature gates that you can set on etcd-druid.

  • The “Since” column contains the etcd-druid release when a feature is introduced or its release stage is changed.
  • The “Until” column, if not empty, contains the last etcd-druid release in which you can still use a feature gate.
  • If a feature is in the Alpha or Beta state, you can find the feature listed in the Alpha/Beta feature gate table.
  • If a feature is stable you can find all stages for that feature listed in the Graduated/Deprecated feature gate table.
  • The Graduated/Deprecated feature gate table also lists deprecated and withdrawn features.

Feature Gates for Alpha or Beta Features

FeatureDefaultStageSinceUntil
UseEtcdWrapperfalseAlpha0.190.21
UseEtcdWrappertrueBeta0.22

Feature Gates for Graduated or Deprecated Features

FeatureDefaultStageSinceUntil

Using a Feature

A feature can be in Alpha, Beta or GA stage.

Alpha feature

  • Disabled by default.
  • Might be buggy. Enabling the feature may expose bugs.
  • Support for feature may be dropped at any time without notice.
  • The API may change in incompatible ways in a later software release without notice.
  • Recommended for use only in short-lived testing clusters, due to increased +
  1. In Etcd.Status resource API, PeerUrlTLSEnabled reflects the status of enabling TLS for peer communication across all etcd-members. Currentlty this field is not been used anywhere. In this proposal, the authors have also proposed that each EtcdMember resource should capture the status of TLS enablement of peer URL. The authors propose to relook at the need to have this field under EtcdStatus.

Lifecycle of an EtcdMember

Creation

Druid creates an EtcdMember resource for every replica in etcd.Spec.Replicas during reconciliation of an etcd resource. For a fresh etcd cluster this is done prior to creation of the StatefulSet resource and for an existing cluster which has now been scaled-up, it is done prior to updating the StatefulSet resource.

Updation

All fields in EtcdMember.Status are only updated by the corresponding etcd-member. Druid only consumes the information published via EtcdMember resources.

Deletion

Druid is responsible for deletion of all existing EtcdMember resources for an etcd cluster. There are three scenarios where an EtcdMember resource will be deleted:

  1. Deletion of etcd resource.

  2. Scale down of an etcd cluster to 0 replicas due to hibernation of the k8s control plane.

  3. Transient scale down of an etcd cluster to 0 replicas to recover from a quorum loss.

Authors found no reason to retain EtcdMember resources when the etcd cluster is scale down to 0 replicas since the information contained in each EtcdMember resource would no longer represent the current state of each member and would thus be stale. Any controller in druid which acts upon the EtcdMember.Status could potentially take incorrect actions.

Reconciliation

Authors propose to introduce a new controller (let’s call it etcd-member-controller) which watches for changes to the EtcdMember resource(s). If a reconciliation of an Etcd resource is required as a result of change in EtcdMember status then this controller should enqueue an event and force a reconciliation via existing etcd-controller, thus preserving the single-actor-principal constraint which ensures deterministic changes to etcd cluster resources.

NOTE: Further decisions w.r.t responsibility segregation will be taken during implementation and will not be documented in this proposal.

Stale EtcdMember Status Handling

It is possible that an etcd-member is unable to update its respective EtcdMember resource. Following can be some of the implications which should be kept in mind while reconciling EtcdMember resource in druid:

  • Druid sees stale state transitions (this assumes that the backup-sidecar attempts to update the state/sub-state in etcdMember.status.transitions with best attempt). There is currently no implication other than an operator seeing a stale state.
  • dbSize and dbSizeInUse could not be updated. A consequence could be that druid continues to see high value for dbSize - dbSizeInUse for a extended amount of time. Druid should ensure that it does not trigger repeated defragmentations.
  • If VolumeMismatches is stale, then druid should no longer attempt to recover by repeatedly restarting the pod.
  • Failed restoration was recorded last and further updates to this array failed. Druid should not repeatedly take full-snapshots.
  • If snapshots.accumulatedDeltaSize could not be updated, then druid should not schedule repeated compaction Jobs.

Reference

6.3.17 - Feature Gates in Etcd-Druid

Feature Gates in Etcd-Druid

This page contains an overview of the various feature gates an administrator can specify on etcd-druid.

Overview

Feature gates are a set of key=value pairs that describe etcd-druid features. You can turn these features on or off by passing them to the --feature-gates CLI flag in the etcd-druid command.

The following tables are a summary of the feature gates that you can set on etcd-druid.

  • The “Since” column contains the etcd-druid release when a feature is introduced or its release stage is changed.
  • The “Until” column, if not empty, contains the last etcd-druid release in which you can still use a feature gate.
  • If a feature is in the Alpha or Beta state, you can find the feature listed in the Alpha/Beta feature gate table.
  • If a feature is stable you can find all stages for that feature listed in the Graduated/Deprecated feature gate table.
  • The Graduated/Deprecated feature gate table also lists deprecated and withdrawn features.

Feature Gates for Alpha or Beta Features

FeatureDefaultStageSinceUntil

Feature Gates for Graduated or Deprecated Features

FeatureDefaultStageSinceUntil
UseEtcdWrapperfalseAlpha0.190.21
UseEtcdWrappertrueBeta0.220.24
UseEtcdWrappertrueGA0.25

Using a Feature

A feature can be in Alpha, Beta or GA stage.

Alpha feature

  • Disabled by default.
  • Might be buggy. Enabling the feature may expose bugs.
  • Support for feature may be dropped at any time without notice.
  • The API may change in incompatible ways in a later software release without notice.
  • Recommended for use only in short-lived testing clusters, due to increased risk of bugs and lack of long-term support.

Beta feature

  • Enabled by default.
  • The feature is well tested. Enabling the feature is considered safe.
  • Support for the overall feature will not be dropped, though details may change.
  • The schema and/or semantics of objects may change in incompatible ways in a subsequent beta or stable release. When this happens, we will provide instructions for migrating to the next version. This may require deleting, editing, and @@ -18401,7 +18487,7 @@ whenUnsatisfiable: DoNotSchedule

For a 3 member etcd-cluster, the above TopologySpreadConstraints will ensure that the members will be spread across zones (assuming there are 3 zones -> minDomains=3) and no two members will be on the same node.

Optimize Network Cost

In most cloud providers there is no network cost (ingress/egress) for any traffic that is confined within a single zone. For Zonal failure tolerance, it will become imperative to spread the Etcd cluster across zones within a region. Knowing that an Etcd cluster members are quite chatty (leader election, consensus building for writes and linearizable reads etc.), this can add to the network cost.

One could evaluate using TopologyAwareRouting which reduces cross-zonal traffic thus saving costs and latencies.

!!! tip You can read about how it is done in Gardener here.

Metrics & Alerts

Monitoring etcd metrics is essential for fine tuning Etcd clusters. etcd already exports a lot of metrics. You can see the complete list of metrics that are exposed out of an Etcd cluster provisioned by etcd-druid here. It is also recommended that you configure an alert for etcd space quota alarms.

Hibernation

If you have a concept of hibernating kubernetes clusters, then following should be kept in mind:

  • Before you bring down the Etcd cluster, leverage the capability to take a full snapshot which captures the state of the etcd DB and stores it in the configured Object store. This ensures that when the cluster is woken up from hibernation it can restore from the last state with no data loss.
  • To save costs you should consider deleting the PersistentVolumeClaims associated to the StatefulSet pods. However, it must be ensured that you take a full snapshot as highlighted in the previous point.
  • When the cluster is woken up from hibernation then you should do the following (assuming prior to hibernation the cluster had a size of 3 members):
    • Start the Etcd cluster with 1 replica. Let it restore from the last full snapshot.
    • Once the cluster reports that it is ready, only then increase the replicas to its original value (e.g. 3). The other two members will start up each as learners and post learning they will join as voting members (Followers).

Reference

  • A nicely written blog post on High Availability and Zone Outage Toleration has a lot of recommendations that one can borrow from.

6.3.28 - Raising A Pr

Raising a Pull Request

We welcome active contributions from the community. This document details out the things-to-be-done in order for us to consider a PR for review. Contributors should follow the guidelines mentioned in this document to minimize the time it takes to get the PR reviewed.

00-Prerequisites

In order to make code contributions you must setup your development environment. Follow the Prepare Dev Environment Guide for detailed instructions.

01-Raise an Issue

For every pull-request, it is mandatory to raise an Issue which should describe the problem in detail. We have created a few categories, each having its own dedicated template.

03-Prepare Code Changes

  • It is not recommended to create a branch on the main repository for raising pull-requests. Instead you must fork the etcd-druid repository and create a branch in the fork. You can follow the detailed instructions on how to fork a repository and set it up for contributions.

  • Ensure that you follow the coding guidelines while introducing new code.

  • If you are making changes to the API then please read Changing-API documentation.

  • If you are introducing new go mod dependencies then please read Dependency Management documentation.

  • If you are introducing a new Etcd cluster component then please read Add new Cluster Component documentation.

  • For guidance on testing, follow the detailed instructions here.

  • Before you submit your PR, please ensure that the following is done:

    • Run make check which will do the following:

      • Runs make format - this target will ensure a common formatting of the code and ordering of imports across all source files.
      • Runs make manifests - this target will re-generate manifests if there are any changes in the API.
      • Only when the above targets have run without errorrs, then make check will be run linters against the code. The rules for the linter are configured here.
    • Ensure that all the tests pass by running the following make targets:

      • make test-unit - this target will run all unit tests.
      • make test-integration - this target will run all integration tests (controller level tests) using envtest framework.
      • make ci-e2e-kind or any of its variants - these targets will run etcd-druid e2e tests.

      !!! warning -Please ensure that after introduction of new code the code coverage does not reduce. An increase in code coverage is always welcome.

  • If you add new features, make sure that you create relevant documentation under /docs.

04-Raise a pull request

  • Create Work In Progress [WIP] pull requests only if you need a clarification or an explicit review before you can continue your work item.
  • Ensure that you have rebased your fork’s development branch with upstream main/master branch.
  • Squash all commits into a minimal number of commits.
  • Fill in the PR template with appropriate details and provide the link to the Issue for which a PR has been raised.
  • If your patch is not getting reviewed, or you need a specific person to review it, you can @-reply a reviewer asking for a review in the pull request or a comment.

05-Post review

  • If a reviewer requires you to change your commit(s), please test the changes again.
  • Amend the affected commit(s) and force push onto your branch.
  • Set respective comments in your GitHub review as resolved.
  • Create a general PR comment to notify the reviewers that your amendments are ready for another round of review.

06-Merging a pull request

  • Merge can only be done if the PR has approvals from atleast 2 reviewers.
  • Add an appropriate release note detailing what is introduced as part of this PR.
  • Before merging the PR, ensure that you squash and then merge.

6.3.29 - Recovering Etcd Clusters

Recovery from Quorum Loss

In an Etcd cluster, quorum is a majority of nodes/members that must agree on updates to a cluster state before the cluster can authorise the DB modification. For a cluster with n members, quorum is (n/2)+1. An Etcd cluster is said to have lost quorum when majority of nodes (greater than or equal to (n/2)+1) are unhealthy or down and as a consequence cannot participate in consensus building.

For a multi-node Etcd cluster quorum loss can either be Transient or Permanent.

Transient quorum loss

If quorum is lost through transient network failures (e.g. n/w partitions), spike in resource usage which results in OOM, etcd automatically and safely resumes (once the network recovers or the resource consumption has come down) and restores quorum. In other cases like transient power loss, etcd persists the Raft log to disk and replays the log to the point of failure and resumes cluster operation.

Permanent quorum loss

In case the quorum is lost due to hardware failures or disk corruption etc, automatic recovery is no longer possible and it is categorized as a permanent quorum loss.

Note: If one has capability to detect Failed nodes and replace them, then eventually new nodes can be launched and etcd cluster can recover automatically. But sometimes this is just not possible.

Recovery

At present, recovery from a permanent quorum loss is achieved by manually executing the steps listed in this section.

Note: In the near future etcd-druid will offer capability to automate the recovery from a permanent quorum loss via Out-Of-Band Operator Tasks. An operator only needs to ascertain that there is a permanent quorum loss and the etcd-cluster is beyond auto-recovery. Once that is established then an operator can invoke a task whose status an operator can check.

!!! warning +Please ensure that after introduction of new code the code coverage does not reduce. An increase in code coverage is always welcome.

  • If you add new features, make sure that you create relevant documentation under /docs.

  • 04-Raise a pull request

    • Create Work In Progress [WIP] pull requests only if you need a clarification or an explicit review before you can continue your work item.
    • Ensure that you have rebased your fork’s development branch with upstream main/master branch.
    • Squash all commits into a minimal number of commits.
    • Fill in the PR template with appropriate details and provide the link to the Issue for which a PR has been raised.
    • If your patch is not getting reviewed, or you need a specific person to review it, you can @-reply a reviewer asking for a review in the pull request or a comment.

    05-Post review

    • If a reviewer requires you to change your commit(s), please test the changes again.
    • Amend the affected commit(s) and force push onto your branch.
    • Set respective comments in your GitHub review as resolved.
    • Create a general PR comment to notify the reviewers that your amendments are ready for another round of review.

    06-Merging a pull request

    • Merge can only be done if the PR has approvals from atleast 2 reviewers.
    • Add an appropriate release note detailing what is introduced as part of this PR.
    • Before merging the PR, ensure that you squash and then merge.

    6.3.29 - Recovering Etcd Clusters

    Recovery from Quorum Loss

    In an Etcd cluster, quorum is a majority of nodes/members that must agree on updates to a cluster state before the cluster can authorise the DB modification. For a cluster with n members, quorum is (n/2)+1. An Etcd cluster is said to have lost quorum when majority of nodes (greater than or equal to (n/2)+1) are unhealthy or down and as a consequence cannot participate in consensus building.

    For a multi-node Etcd cluster quorum loss can either be Transient or Permanent.

    Transient quorum loss

    If quorum is lost through transient network failures (e.g. n/w partitions) or there is a spike in resource usage which results in OOM, etcd automatically and safely resumes (once the network recovers or the resource consumption has come down) and restores quorum. In other cases like transient power loss, etcd persists the Raft log to disk and replays the log to the point of failure and resumes cluster operation.

    Permanent quorum loss

    In case the quorum is lost due to hardware failures or disk corruption etc, automatic recovery is no longer possible and it is categorized as a permanent quorum loss.

    Note: If one has capability to detect Failed nodes and replace them, then eventually new nodes can be launched and etcd cluster can recover automatically. But sometimes this is just not possible.

    Recovery

    At present, recovery from a permanent quorum loss is achieved by manually executing the steps listed in this section.

    Note: In the near future etcd-druid will offer capability to automate the recovery from a permanent quorum loss via Out-Of-Band Operator Tasks. An operator only needs to ascertain that there is a permanent quorum loss and the etcd-cluster is beyond auto-recovery. Once that is established then an operator can invoke a task whose status an operator can check.

    !!! warning Please note that manually restoring etcd can result in data loss. This guide is the last resort to bring an Etcd cluster up and running again.

    00-Identify the etcd cluster

    It is possible to shard the etcd cluster based on resource types using –etcd-servers-overrides CLI flag of kube-apiserver. Any sharding results in more than one etcd-cluster.

    !!! info In gardener, each shoot control plane has two etcd clusters, etcd-events which only stores events and etcd-main - stores everything else except events.

    Identify the etcd-cluster which has a permanent quorum loss. Most of the resources of an etcd-cluster can be identified by its name. The resources of interest to recover from permanent quorum loss are: Etcd CR, StatefulSet, ConfigMap and PVC.

    To identify the ConfigMap resource use the following command:

     kubectl get sts <sts-name> -o jsonpath='{.spec.template.spec.volumes[?(@.name=="etcd-config-file")].configMap.name}'
     

    01-Prepare Etcd Resource to allow manual updates

    To ensure that only one actor (in this case an operator) makes changes to the Etcd resource and also to the Etcd cluster resources, following must be done:

    Add the annotation to the Etcd resource:

    kubectl annotate etcd <etcd-name> -n <namespace> druid.gardener.cloud/suspend-etcd-spec-reconcile=
    @@ -18415,10 +18501,32 @@
     

    Delete all the member leases.

    kubectl delete lease <space separated lease names>
     # Alternatively you can use label selector. From v0.23.0 onwards leases will have common set of labels
     kubectl delete lease -l app.kubernetes.io.component=etcd-member-lease, app.kubernetes.io/part-of=<etcd-name> -n <namespace>
    -

    05-Modify ConfigMap

    Prerequisite to scale up etcd-cluster from 0->1 is to change initial-cluster in the ConfigMap. Assuming that prior to scale-down to 0, there were 3 members, the initial-cluster field would look like the following (assuming that the name of the etcd resource is etcd-main):

    # Initial cluster
    +

    05-Modify ConfigMap

    Prerequisite to scale up etcd-cluster from 0->1 is to change the fields initial-cluster, initial-advertise-peer-urls, and advertise-client-urls in the ConfigMap.

    Assuming that prior to scale-down to 0, there were 3 members:

    The initial-cluster field would look like the following (assuming that the name of the etcd resource is etcd-main):

    # Initial cluster
     initial-cluster: etcd-main-0=https://etcd-main-0.etcd-main-peer.default.svc:2380,etcd-main-1=https://etcd-main-1.etcd-main-peer.default.svc:2380,etcd-main-2=https://etcd-main-2.etcd-main-peer.default.svc:2380
    -

    Change the initial-cluster field to have only one member (in this case etc-main-0). After the change it should look like:

    # Initial cluster
    +

    Change the initial-cluster field to have only one member (in this case etcd-main-0). After the change it should look like:

    # Initial cluster
     initial-cluster: etcd-main-0=https://etcd-main-0.etcd-main-peer.default.svc:2380
    +

    The initial-advertise-peer-urls field would look like the following:

    # Initial advertise peer urls
    +initial-advertise-peer-urls:
    +  etcd-main-0:
    +  - http://etcd-main-0.etcd-main-peer.default.svc:2380
    +  etcd-main-1:
    +  - http://etcd-main-1.etcd-main-peer.default.svc:2380
    +  etcd-main-2:
    +  - http://etcd-main-2.etcd-main-peer.default.svc:2380
    +

    Change the initial-advertise-peer-urls field to have only one member (in this case etcd-main-0). After the change it should look like:

    # Initial advertise peer urls
    +initial-advertise-peer-urls:
    +  etcd-main-0:
    +  - http://etcd-main-0.etcd-main-peer.default.svc:2380
    +

    The advertise-client-urls field would look like the following:

    advertise-client-urls:
    +  etcd-main-0:
    +  - http://etcd-main-0.etcd-main-peer.default.svc:2379
    +  etcd-main-1:
    +  - http://etcd-main-1.etcd-main-peer.default.svc:2379
    +  etcd-main-2:
    +  - http://etcd-main-2.etcd-main-peer.default.svc:2379
    +

    Change the advertise-client-urls field to have only one member (in this case etcd-main-0). After the change it should look like:

    advertise-client-urls:
    +  etcd-main-0:
    +  - http://etcd-main-0.etcd-main-peer.default.svc:2379
     

    06-Scale up Etcd cluster to size 1

    kubectl scale sts <sts-name> -n <namespace> --replicas=1 
     

    07-Wait for Single-Member etcd cluster to be completely ready

    To check if the single-member etcd cluster is ready check the status of the pod.

    kubectl get pods <etcd-name-0> -n <namespace>
     NAME            READY   STATUS    RESTARTS   AGE
    @@ -18426,7 +18534,7 @@
     

    If both containers report readiness (as seen above), then the etcd-cluster is considered ready.

    08-Enable Etcd reconciliation and resource protection

    All manual changes are now done. We must now re-enable etcd-cluster resource protection and also enable reconciliation by etcd-druid by doing the following:

    kubectl annotate etcd <etcd-name> -n <namespace> druid.gardener.cloud/suspend-etcd-spec-reconcile-
     kubectl annotate etcd <etcd-name> -n <namespace> druid.gardener.cloud/disable-etcd-component-protection-
     

    09-Scale-up Etcd Cluster to 3 and trigger reconcile

    Scale etcd-cluster to its original size (we assumed 3 below).

    kubectl scale sts <sts-name> -n namespace --replicas=3
    -

    If etcd-druid has been set up with --enable-etcd-spec-auto-reconcile switched-off then to ensure reconciliation one must annotate Etcd resource with the following command:

    # Annotate etcd-test CR to reconcile
    +

    If etcd-druid has been set up with --enable-etcd-spec-auto-reconcile switched-off then to ensure reconciliation one must annotate Etcd resource with the following command:

    # Annotate etcd CR to reconcile
     kubectl annotate etcd <etcd-name> -n <namespace> gardener.cloud/operation="reconcile"
     

    10-Verify Etcd cluster health

    Check if all the member pods have both of their containers in Running state.

    kubectl get pods -n <namespace> -l app.kubernetes.io/part-of=<etcd-name>
     NAME            READY   STATUS    RESTARTS   AGE
    diff --git a/docs/docs/contribute/_print/index.html b/docs/docs/contribute/_print/index.html
    index 490d6358511..5e10fdd102c 100644
    --- a/docs/docs/contribute/_print/index.html
    +++ b/docs/docs/contribute/_print/index.html
    @@ -2,7 +2,7 @@
     

    This is the multi-page printable view of this section. +All

    This is the multi-page printable view of this section. Click here to print.

    Return to the regular view of this page.

    Contribute

    Contributors guides for code and documentation

    Contributing to Gardener

    Welcome

    Welcome to the Contributor section of Gardener. Here you can learn how it is possible for you to contribute your ideas and expertise to the project and have it grow even more.

    Prerequisites

    Before you begin contributing to Gardener, there are a couple of things you should become familiar with and complete first.

    Code of Conduct

    All members of the Gardener community must abide by the Contributor Covenant. Only by respecting each other can we develop a productive, collaborative community. diff --git a/docs/docs/contribute/code/cicd/index.html b/docs/docs/contribute/code/cicd/index.html index 0537df28836..7b59dbab8fd 100644 --- a/docs/docs/contribute/code/cicd/index.html +++ b/docs/docs/contribute/code/cicd/index.html @@ -10,7 +10,7 @@ Typical workloads encompass the execution of tests and builds of a variety of technologies, as well as building and publishing container images, typically containing build results.">

    1.2.7 - Usage

    Using the AWS provider extension with Gardener as end-user

    The core.gardener.cloud/v1beta1.Shoot resource declares a few fields that are meant to contain provider-specific configuration.

    In this document we are describing how this configuration looks like for AWS and provide an example Shoot manifest with minimal configuration that you can use to create an AWS cluster (modulo the landscape-specific information like cloud profile names, secret binding names, etc.).

    Provider Secret Data

    Every shoot cluster references a SecretBinding or a CredentialsBinding which itself references a Secret, and this Secret contains the provider credentials of your AWS account. This Secret must look as follows:

    apiVersion: v1
     kind: Secret
     metadata:
    @@ -798,9 +884,9 @@
     

    The cloudControllerManager.featureGates contains a map of explicitly enabled or disabled feature gates. For production usage it’s not recommend to use this field at all as you can enable alpha features or disable beta/stable features, potentially impacting the cluster stability. If you don’t want to configure anything for the cloudControllerManager simply omit the key in the YAML specification.

    The cloudControllerManager.useCustomRouteController controls if the custom routes controller should be enabled. -If enabled, it will add routes to the pod CIDRs for all nodes in the route tables for all zones.

    The storage.managedDefaultClass controls if the default storage / volume snapshot classes are marked as default by Gardener. Set it to false to mark another storage / volume snapshot class as default without Gardener overwriting this change. If unset, this field defaults to true.

    If the AWS Load Balancer Controller should be deployed, set loadBalancerController.enabled to true. +If enabled, it will add routes to the pod CIDRs for all nodes in the route tables for all zones.

    The storage.managedDefaultClass controls if the default storage / volume snapshot classes are marked as default by Gardener. Set it to false to mark another storage / volume snapshot class as default without Gardener overwriting this change. If unset, this field defaults to true.

    If the AWS Load Balancer Controller should be deployed, set loadBalancerController.enabled to true. In this case, it is assumed that an IngressClass named alb is created by the user. -You can overwrite the name by setting loadBalancerController.ingressClassName.

    Please note, that currently only the “instance” mode is supported.

    Examples for Ingress and Service managed by the AWS Load Balancer Controller:

    1. Prerequites

    Make sure you have created an IngressClass. For more details about parameters, please see AWS Load Balancer Controller - IngressClass

    apiVersion: networking.k8s.io/v1
    +You can overwrite the name by setting loadBalancerController.ingressClassName.

    Please note, that currently only the “instance” mode is supported.

    Examples for Ingress and Service managed by the AWS Load Balancer Controller:

    1. Prerequites

    Make sure you have created an IngressClass. For more details about parameters, please see AWS Load Balancer Controller - IngressClass

    apiVersion: networking.k8s.io/v1
     kind: IngressClass
     metadata:
       name: alb # default name if not specified by `loadBalancerController.ingressClassName`
    @@ -812,7 +898,7 @@
       namespace: default
       name: echoserver
       annotations:
    -    # complete set of annotations: https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.4/guide/ingress/annotations/
    +    # complete set of annotations: https://kubernetes-sigs.github.io/aws-load-balancer-controller/latest/guide/ingress/annotations/
         alb.ingress.kubernetes.io/scheme: internet-facing
         alb.ingress.kubernetes.io/target-type: instance # target-type "ip" NOT supported in Gardener
     spec:
    @@ -827,11 +913,11 @@
                   name: echoserver
                   port:
                     number: 80
    -

    For more details see AWS Load Balancer Documentation - Ingress Specification

    1. Service of Type LoadBalancer

    This can be used to create a Network Load Balancer (NLB).

    apiVersion: v1
    +

    For more details see AWS Load Balancer Documentation - Ingress Specification

    1. Service of Type LoadBalancer

    This can be used to create a Network Load Balancer (NLB).

    apiVersion: v1
     kind: Service
     metadata:
       annotations:
    -    # complete set of annotations: https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.4/guide/service/annotations/
    +    # complete set of annotations: https://kubernetes-sigs.github.io/aws-load-balancer-controller/latest/guide/service/annotations/
         service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: instance # target-type "ip" NOT supported in Gardener
         service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
       name: ingress-nginx-controller
    @@ -841,7 +927,7 @@
       ...
       type: LoadBalancer
       loadBalancerClass: service.k8s.aws/nlb # mandatory to be managed by AWS Load Balancer Controller (otherwise the Cloud Controller Manager will act on it)
    -

    For more details see AWS Load Balancer Documentation - Network Load Balancer

    ⚠️ When using Network Load Balancers (NLB) as internal load balancers, it is crucial to add the annotation service.beta.kubernetes.io/aws-load-balancer-target-group-attributes: preserve_client_ip.enabled=false. Without this annotation, if a request is routed by the NLB to the same target instance from which it originated, the client IP and destination IP will be identical. This situation, known as the hairpinning effect, will prevent the request from being processed.

    WorkerConfig

    The AWS extension supports encryption for volumes plus support for additional data volumes per machine. +

    For more details see AWS Load Balancer Documentation - Network Load Balancer

    ⚠️ When using Network Load Balancers (NLB) as internal load balancers, it is crucial to add the annotation service.beta.kubernetes.io/aws-load-balancer-target-group-attributes: preserve_client_ip.enabled=false. Without this annotation, if a request is routed by the NLB to the same target instance from which it originated, the client IP and destination IP will be identical. This situation, known as the hairpinning effect, will prevent the request from being processed.

    WorkerConfig

    The AWS extension supports encryption for volumes plus support for additional data volumes per machine. For each data volume, you have to specify a name. By default (if not stated otherwise), all the disks (root & data volumes) are encrypted. Please make sure that your instance-type supports encryption. diff --git a/docs/docs/extensions/container-runtime-extensions/_print/index.html b/docs/docs/extensions/container-runtime-extensions/_print/index.html index c740cd331df..d0cb3192f76 100644 --- a/docs/docs/extensions/container-runtime-extensions/_print/index.html +++ b/docs/docs/extensions/container-runtime-extensions/_print/index.html @@ -2,7 +2,7 @@

    This is the multi-page printable view of this section. +All

    This is the multi-page printable view of this section. Click here to print.

    Return to the regular view of this page.

    Container Runtime Extensions

    Gardener extensions for the supported container runtime interfaces

    1 - GVisor container runtime

    Gardener extension controller for the gVisor container runtime sandbox

    Gardener Extension for the gVisor Container Runtime Sandbox

    REUSE status CI Build status Go Report Card

    Project Gardener implements the automated management and operation of Kubernetes clusters as a service. Its main principle is to leverage Kubernetes concepts for all of its tasks.

    Recently, most of the vendor specific logic has been developed in-tree. However, the project has grown to a size where it is very hard to extend, maintain, and test. With GEP-1 we have proposed how the architecture can be changed in a way to support external controllers that contain their very own vendor specifics. This way, we can keep Gardener core clean and independent.


    How to use this Controller

    This controller operates on the ContainerRuntime resource in the extensions.gardener.cloud/v1alpha1 API group.

    It manages objects that are requesting (.spec.type=gvisor) to enable the gVisor container runtime sandbox for a worker pool of a shoot cluster.

    The ContainerRuntime can be configured in the shoot manifest in .spec.povider.workers[].cri.containerRuntimes an example can be found here:

    kind: Shoot
    diff --git a/docs/docs/extensions/container-runtime-extensions/gardener-extension-runtime-gvisor/_print/index.html b/docs/docs/extensions/container-runtime-extensions/gardener-extension-runtime-gvisor/_print/index.html
    index 94eab651033..6dd182e2a31 100644
    --- a/docs/docs/extensions/container-runtime-extensions/gardener-extension-runtime-gvisor/_print/index.html
    +++ b/docs/docs/extensions/container-runtime-extensions/gardener-extension-runtime-gvisor/_print/index.html
    @@ -2,7 +2,7 @@
     

    This is the multi-page printable view of this section. +All

    This is the multi-page printable view of this section. Click here to print.

    Return to the regular view of this page.

    GVisor container runtime

    Gardener extension controller for the gVisor container runtime sandbox

    Gardener Extension for the gVisor Container Runtime Sandbox

    REUSE status CI Build status Go Report Card

    Project Gardener implements the automated management and operation of Kubernetes clusters as a service. Its main principle is to leverage Kubernetes concepts for all of its tasks.

    Recently, most of the vendor specific logic has been developed in-tree. However, the project has grown to a size where it is very hard to extend, maintain, and test. With GEP-1 we have proposed how the architecture can be changed in a way to support external controllers that contain their very own vendor specifics. This way, we can keep Gardener core clean and independent.


    How to use this Controller

    This controller operates on the ContainerRuntime resource in the extensions.gardener.cloud/v1alpha1 API group.

    It manages objects that are requesting (.spec.type=gvisor) to enable the gVisor container runtime sandbox for a worker pool of a shoot cluster.

    The ContainerRuntime can be configured in the shoot manifest in .spec.povider.workers[].cri.containerRuntimes an example can be found here:

    kind: Shoot
    diff --git a/docs/docs/extensions/container-runtime-extensions/gardener-extension-runtime-gvisor/index.html b/docs/docs/extensions/container-runtime-extensions/gardener-extension-runtime-gvisor/index.html
    index ea8334d77e8..112f680f06c 100644
    --- a/docs/docs/extensions/container-runtime-extensions/gardener-extension-runtime-gvisor/index.html
    +++ b/docs/docs/extensions/container-runtime-extensions/gardener-extension-runtime-gvisor/index.html
    @@ -2,7 +2,7 @@
     

    You are now ready to experiment with the admission-aws webhook server locally.

    2.6 - Operations

    Using the AWS provider extension with Gardener as operator

    The core.gardener.cloud/v1beta1.CloudProfile resource declares a providerConfig field that is meant to contain provider-specific configuration. Similarly, the core.gardener.cloud/v1beta1.Seed resource is structured. Additionally, it allows to configure settings for the backups of the main etcds’ data of shoot clusters control planes running in this seed cluster.

    This document explains what is necessary to configure for this provider extension.

    CloudProfile resource

    In this section we are describing how the configuration for CloudProfiles looks like for AWS and provide an example CloudProfile manifest with minimal configuration that you can use to allow creating AWS shoot clusters.

    CloudProfileConfig

    The cloud profile configuration contains information about the real machine image IDs in the AWS environment (AMIs). You have to map every version that you specify in .spec.machineImages[].versions here such that the AWS extension knows the AMI for every version you want to offer. @@ -610,7 +696,7 @@ } ] } -

    2.6 - Usage

    Using the AWS provider extension with Gardener as end-user

    The core.gardener.cloud/v1beta1.Shoot resource declares a few fields that are meant to contain provider-specific configuration.

    In this document we are describing how this configuration looks like for AWS and provide an example Shoot manifest with minimal configuration that you can use to create an AWS cluster (modulo the landscape-specific information like cloud profile names, secret binding names, etc.).

    Provider Secret Data

    Every shoot cluster references a SecretBinding or a CredentialsBinding which itself references a Secret, and this Secret contains the provider credentials of your AWS account. +

    2.7 - Usage

    Using the AWS provider extension with Gardener as end-user

    The core.gardener.cloud/v1beta1.Shoot resource declares a few fields that are meant to contain provider-specific configuration.

    In this document we are describing how this configuration looks like for AWS and provide an example Shoot manifest with minimal configuration that you can use to create an AWS cluster (modulo the landscape-specific information like cloud profile names, secret binding names, etc.).

    Provider Secret Data

    Every shoot cluster references a SecretBinding or a CredentialsBinding which itself references a Secret, and this Secret contains the provider credentials of your AWS account. This Secret must look as follows:

    apiVersion: v1
     kind: Secret
     metadata:
    @@ -798,9 +884,9 @@
     

    The cloudControllerManager.featureGates contains a map of explicitly enabled or disabled feature gates. For production usage it’s not recommend to use this field at all as you can enable alpha features or disable beta/stable features, potentially impacting the cluster stability. If you don’t want to configure anything for the cloudControllerManager simply omit the key in the YAML specification.

    The cloudControllerManager.useCustomRouteController controls if the custom routes controller should be enabled. -If enabled, it will add routes to the pod CIDRs for all nodes in the route tables for all zones.

    The storage.managedDefaultClass controls if the default storage / volume snapshot classes are marked as default by Gardener. Set it to false to mark another storage / volume snapshot class as default without Gardener overwriting this change. If unset, this field defaults to true.

    If the AWS Load Balancer Controller should be deployed, set loadBalancerController.enabled to true. +If enabled, it will add routes to the pod CIDRs for all nodes in the route tables for all zones.

    The storage.managedDefaultClass controls if the default storage / volume snapshot classes are marked as default by Gardener. Set it to false to mark another storage / volume snapshot class as default without Gardener overwriting this change. If unset, this field defaults to true.

    If the AWS Load Balancer Controller should be deployed, set loadBalancerController.enabled to true. In this case, it is assumed that an IngressClass named alb is created by the user. -You can overwrite the name by setting loadBalancerController.ingressClassName.

    Please note, that currently only the “instance” mode is supported.

    Examples for Ingress and Service managed by the AWS Load Balancer Controller:

    1. Prerequites

    Make sure you have created an IngressClass. For more details about parameters, please see AWS Load Balancer Controller - IngressClass

    apiVersion: networking.k8s.io/v1
    +You can overwrite the name by setting loadBalancerController.ingressClassName.

    Please note, that currently only the “instance” mode is supported.

    Examples for Ingress and Service managed by the AWS Load Balancer Controller:

    1. Prerequites

    Make sure you have created an IngressClass. For more details about parameters, please see AWS Load Balancer Controller - IngressClass

    apiVersion: networking.k8s.io/v1
     kind: IngressClass
     metadata:
       name: alb # default name if not specified by `loadBalancerController.ingressClassName`
    @@ -812,7 +898,7 @@
       namespace: default
       name: echoserver
       annotations:
    -    # complete set of annotations: https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.4/guide/ingress/annotations/
    +    # complete set of annotations: https://kubernetes-sigs.github.io/aws-load-balancer-controller/latest/guide/ingress/annotations/
         alb.ingress.kubernetes.io/scheme: internet-facing
         alb.ingress.kubernetes.io/target-type: instance # target-type "ip" NOT supported in Gardener
     spec:
    @@ -827,11 +913,11 @@
                   name: echoserver
                   port:
                     number: 80
    -

    For more details see AWS Load Balancer Documentation - Ingress Specification

    1. Service of Type LoadBalancer

    This can be used to create a Network Load Balancer (NLB).

    apiVersion: v1
    +

    For more details see AWS Load Balancer Documentation - Ingress Specification

    1. Service of Type LoadBalancer

    This can be used to create a Network Load Balancer (NLB).

    apiVersion: v1
     kind: Service
     metadata:
       annotations:
    -    # complete set of annotations: https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.4/guide/service/annotations/
    +    # complete set of annotations: https://kubernetes-sigs.github.io/aws-load-balancer-controller/latest/guide/service/annotations/
         service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: instance # target-type "ip" NOT supported in Gardener
         service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
       name: ingress-nginx-controller
    @@ -841,7 +927,7 @@
       ...
       type: LoadBalancer
       loadBalancerClass: service.k8s.aws/nlb # mandatory to be managed by AWS Load Balancer Controller (otherwise the Cloud Controller Manager will act on it)
    -

    For more details see AWS Load Balancer Documentation - Network Load Balancer

    ⚠️ When using Network Load Balancers (NLB) as internal load balancers, it is crucial to add the annotation service.beta.kubernetes.io/aws-load-balancer-target-group-attributes: preserve_client_ip.enabled=false. Without this annotation, if a request is routed by the NLB to the same target instance from which it originated, the client IP and destination IP will be identical. This situation, known as the hairpinning effect, will prevent the request from being processed.

    WorkerConfig

    The AWS extension supports encryption for volumes plus support for additional data volumes per machine. +

    For more details see AWS Load Balancer Documentation - Network Load Balancer

    ⚠️ When using Network Load Balancers (NLB) as internal load balancers, it is crucial to add the annotation service.beta.kubernetes.io/aws-load-balancer-target-group-attributes: preserve_client_ip.enabled=false. Without this annotation, if a request is routed by the NLB to the same target instance from which it originated, the client IP and destination IP will be identical. This situation, known as the hairpinning effect, will prevent the request from being processed.

    WorkerConfig

    The AWS extension supports encryption for volumes plus support for additional data volumes per machine. For each data volume, you have to specify a name. By default (if not stated otherwise), all the disks (root & data volumes) are encrypted. Please make sure that your instance-type supports encryption. diff --git a/docs/docs/extensions/infrastructure-extensions/gardener-extension-provider-alicloud/_print/index.html b/docs/docs/extensions/infrastructure-extensions/gardener-extension-provider-alicloud/_print/index.html index 13b31cc4888..0ddaf48196f 100644 --- a/docs/docs/extensions/infrastructure-extensions/gardener-extension-provider-alicloud/_print/index.html +++ b/docs/docs/extensions/infrastructure-extensions/gardener-extension-provider-alicloud/_print/index.html @@ -2,7 +2,7 @@

    This is the multi-page printable view of this section. +All

    This is the multi-page printable view of this section. Click here to print.

    Return to the regular view of this page.

    Provider Alicloud

    Gardener extension controller for the Alibaba cloud provider

    Gardener Extension for Alicloud provider

    REUSE status CI Build status Go Report Card

    Project Gardener implements the automated management and operation of Kubernetes clusters as a service. diff --git a/docs/docs/extensions/infrastructure-extensions/gardener-extension-provider-alicloud/deployment/index.html b/docs/docs/extensions/infrastructure-extensions/gardener-extension-provider-alicloud/deployment/index.html index 2aae8a588aa..5cda4633ccc 100644 --- a/docs/docs/extensions/infrastructure-extensions/gardener-extension-provider-alicloud/deployment/index.html +++ b/docs/docs/extensions/infrastructure-extensions/gardener-extension-provider-alicloud/deployment/index.html @@ -10,7 +10,7 @@ Virtual Garden is not used, i.e., the runtime Garden cluster is also the target Garden cluster. Automounted Service Account Token The easiest way to deploy the gardener-extension-admission-alicloud component will be to not provide kubeconfig at all. This way in-cluster configuration and an automounted service account token will be used. The drawback of this approach is that the automounted token will not be automatically rotated.">