Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Pending] use kind for starting minikube failure #2771

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

kmrmt
Copy link
Contributor

@kmrmt kmrmt commented Dec 10, 2024

Description

Fix minikube start failure at readreplica e2e test.

Related Issue

Versions

  • Vald Version: v1.7.14
  • Go Version: v1.23.3
  • Rust Version: v1.82.0
  • Docker Version: v27.3.1
  • Kubernetes Version: v1.31.2
  • Helm Version: v3.16.2
  • NGT Version: v2.3.4
  • Faiss Version: v1.9.0

Checklist

Special notes for your reviewer

Summary by CodeRabbit

Release Notes

  • New Features

    • Enhanced Kubernetes cluster management with new commands for volume snapshots.
    • Added configuration for a new read replica in the agent setup.
    • Introduced a new YAML file for Kind cluster setup.
    • New input parameter for GitHub Actions to specify Kind environment requirements.
    • New phony targets for managing Kind volume snapshots.
    • New command added for Docker CLI installation in the build process.
  • Improvements

    • Updated Docker installation process for better integration with the build environment.
    • Enhanced logging options for Minikube startup.
    • Increased timeout and sleep duration for pod readiness checks.
    • Improved specificity in .gitignore for Rust-related files.
    • Adjusted deployment commands for Vald read replicas.
    • Enhanced configuration for health checks and logging in the read replica setup.
  • Bug Fixes

    • Improved readiness checks for Kubernetes pods.
    • Commented out readiness checks for pod deployments in the Vald read replica workflow.
    • Updated logging level for better debugging in the read replica configuration.

Copy link
Contributor

coderabbitai bot commented Dec 10, 2024

📝 Walkthrough
<details>
<summary>📝 Walkthrough</summary>

## Walkthrough

This pull request introduces significant updates to the development and testing infrastructure for a Kubernetes-based project. The modifications enhance cluster management, particularly for Kind (Kubernetes in Docker) environments. Key changes include the installation of the Docker CLI, configuration for volume snapshot support, updates to GitHub Actions workflows, and preparations for read replica deployments. The changes aim to streamline the setup and testing processes across various configuration files and build scripts.

## Changes

| File | Change Summary |
|------|----------------|
| `Makefile.d/e2e.mk` | Updated readiness checks for metrics-server and agent, replaced `minikube/restart` with `kind/vs/restart`. |
| `Makefile.d/minikube.mk` | Added Docker binary prerequisite, updated Minikube start command with Docker driver and logging options, enabled metrics-server addon. |
| `Makefile.d/tools.mk` | Added new `docker-cli/install` phony target for Docker CLI installation. |
| `dockers/ci/base/Dockerfile` | Added `sudo` package and Docker CLI installation step. |
| `dockers/dev/Dockerfile` | Integrated Docker CLI installation step. |
| `Makefile.d/kind.mk` | Added volume snapshot support, new targets for Kind cluster management, Docker binary dependency. |
| `.github/actions/setup-e2e/action.yaml` | Added `require_kind` input parameter and Kind environment setup step. |
| `.github/workflows/e2e.yaml` | Updated E2E job to use Kind, added read replica deployment step. |
| `.gitignore` | Refined Rust-related ignored files and added new ignore rules. |
| `k8s/debug/kind/e2e.yaml` | New Kubernetes Kind cluster configuration file. |
| `Makefile.d/k8s.mk` | Increased sleep duration before checking read replica pod readiness. |
| `.github/helm/values/values-readreplica.yaml` | Added read replica specific name and volume configuration. |
| `.github/actions/e2e-deploy-vald-readreplica/action.yaml` | Commented out readiness checks for pod deployment. |
| `.github/actions/e2e-deploy-vald/action.yaml` | Updated conditions for waiting on pod readiness. |

## Possibly related PRs

- **vdaas/vald#2639**: The changes in the main PR, specifically the modifications to the `Makefile.d/e2e.mk` related to the deployment of the read replica, are related to the changes in the retrieved PR, which also includes updates to the `imagePullPolicy` for init containers that could affect the deployment process. Both PRs involve adjustments that impact the deployment and readiness of components within the Vald project.

## Suggested labels

`type/refactoring`

## Suggested reviewers

- hlts2
- vankichi

</details>

Warning

There were issues while running some tools. Please review the errors and either fix the tool’s configuration or disable the tool if it’s a critical failure.

🔧 actionlint (1.7.4)
.github/workflows/_detect-ci-container.yaml

could not read ".github/workflows/_detect-ci-container.yaml": open .github/workflows/_detect-ci-container.yaml: no such file or directory

.github/workflows/e2e.yaml

could not read ".github/workflows/e2e.yaml": open .github/workflows/e2e.yaml: no such file or directory

🔧 YAMLlint (1.35.1)
.github/actions/e2e-deploy-vald-readreplica/action.yaml

[Errno 2] No such file or directory: '.github/actions/e2e-deploy-vald-readreplica/action.yaml'

.github/actions/e2e-deploy-vald/action.yaml

[Errno 2] No such file or directory: '.github/actions/e2e-deploy-vald/action.yaml'

.github/actions/setup-e2e/action.yaml

[Errno 2] No such file or directory: '.github/actions/setup-e2e/action.yaml'

  • 3 others

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@vdaas-ci
Copy link
Collaborator

[CHATOPS:HELP] ChatOps commands.

  • 🙆‍♀️ /approve - approve
  • 🍱 /format - format codes and add licenses
  • /gen-test - generate test codes
  • 🏷️ /label - add labels
  • 🔚 2️⃣ 🔚 /label actions/e2e-deploy - run E2E deploy & integration test

Copy link

cloudflare-workers-and-pages bot commented Dec 11, 2024

Deploying vald with  Cloudflare Pages  Cloudflare Pages

Latest commit: 11d4bd3
Status: ✅  Deploy successful!
Preview URL: https://dc1203f9.vald.pages.dev
Branch Preview URL: https://bugfix-e2e-readreplica-minik.vald.pages.dev

View logs

@kmrmt kmrmt changed the title [WIP] fix minikube start failure fix minikube start failure Dec 11, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between e06133d and 0473919.

⛔ Files ignored due to path filters (1)
  • hack/docker/gen/main.go is excluded by !**/gen/**
📒 Files selected for processing (6)
  • .github/workflows/e2e.yaml (1 hunks)
  • Makefile.d/e2e.mk (1 hunks)
  • Makefile.d/minikube.mk (2 hunks)
  • Makefile.d/tools.mk (1 hunks)
  • dockers/ci/base/Dockerfile (2 hunks)
  • dockers/dev/Dockerfile (1 hunks)
🔇 Additional comments (7)
Makefile.d/minikube.mk (2)

28-30: LGTM: Explicit Docker driver configuration

The changes properly ensure Docker CLI availability and explicitly specify the Docker driver, which helps prevent driver selection issues during Minikube startup.


35-35: Verify metrics-server requirement

The addition of metrics-server addon might be related to the readreplica e2e test requirements.

✅ Verification successful

Metrics-server requirement is verified for readreplica e2e tests

Based on the analysis of the e2e test files, specifically in Makefile.d/e2e.mk, the metrics-server is indeed required for the readreplica e2e tests. The e2e/actions/run/readreplica target explicitly waits for the metrics-server to be available and ready before proceeding with the readreplica tests:

kubectl wait -n kube-system --for=condition=Available deployment/metrics-server --timeout=$(E2E_WAIT_FOR_START_TIMEOUT)
kubectl wait -n kube-system --for=condition=Ready pod -l k8s-app=metrics-server --timeout=$(E2E_WAIT_FOR_START_TIMEOUT)
kubectl wait -n kube-system --for=condition=ContainersReady pod -l k8s-app=metrics-server --timeout=$(E2E_WAIT_FOR_START_TIMEOUT)

Additionally, the readreplica configuration in k8s/readreplica/configmap.yaml shows metrics-related configurations, indicating that metrics monitoring is an integral part of the readreplica functionality.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if readreplica tests depend on metrics-server

# Search for metrics-server usage in e2e tests
rg -l "metrics-server|metrics\.k8s\.io" --type yaml --type go | grep -i "e2e\|test"

# Check if readreplica manifests have resource requirements
rg -l "readreplica.*resources:" --type yaml

Length of output: 131


Script:

#!/bin/bash
# Let's try a broader search to understand the context better

# Search for metrics-server references in all files
rg -l "metrics-server"

# Search for any metrics-related configurations in test files
rg -l "metrics" test/ e2e/

# Look for resource requirements or monitoring in test files
rg -A 5 "resources:|monitoring" test/ e2e/

# Check if there are any readreplica test files
fd readreplica

Length of output: 732


Script:

#!/bin/bash
# Let's check the content of relevant files to understand the metrics-server requirement

# Check Makefile.d/e2e.mk for metrics-server usage context
cat Makefile.d/e2e.mk

# Check readreplica job manifests for resource requirements
rg -A 5 "resources:" charts/vald/templates/index/job/readreplica/
rg -A 5 "resources:" charts/vald-readreplica/

# Check if metrics-server is mentioned in readreplica related files
rg "metrics" k8s/readreplica/
rg "metrics" pkg/index/job/readreplica/

Length of output: 10509

dockers/ci/base/Dockerfile (2)

126-126: LGTM: Docker CLI installation

Adding docker-cli installation aligns with the Minikube prerequisites and supports the fix for Minikube start failure.


75-75: Verify sudo requirement

The addition of sudo package might be related to Docker operations. However, since we're running as root (USER root:root), verify if sudo is actually needed.

dockers/dev/Dockerfile (1)

140-140: LGTM: Consistent Docker CLI installation

Adding docker-cli installation in the dev environment maintains consistency with the CI environment and ensures Minikube works in both contexts.

Makefile.d/tools.mk (1)

264-272: LGTM: Docker CLI installation implementation

The implementation is well-structured and follows the established pattern for tool installations:

  • Uses official Docker repository
  • Handles architecture detection
  • Includes proper cleanup
  • Follows the same pattern as other tool installations
.github/workflows/e2e.yaml (1)

Line range hint 312-316: Verify the necessity of disabling k3d

The change to use Minikube exclusively (by disabling k3d) for this test might indicate an underlying issue that should be investigated.

Let's check if there are any k3d-specific configurations that might be causing issues:

✅ Verification successful

Let me gather more information about the usage of k3d and minikube in the e2e workflow.


Let me check the actual setup-k3d and setup-minikube actions to understand their configurations and potential differences.


Let me check the e2e workflow file to understand the specific test that's being modified and its requirements.


Let me check the specific test configuration and Makefile to understand the minikube setup.


The switch to Minikube for read replica test is justified and necessary

The read replica test specifically requires CSI Driver and Volume Snapshots support for testing the read replica functionality, as evidenced by the configuration in values-readreplica.yaml which uses:

  • storageClass: csi-hostpath-sc
  • snapshot_classname: "csi-hostpath-snapclass"
  • readreplica.enabled: true

Minikube is explicitly configured in minikube.mk to support these features with the commands:

minikube addons enable volumesnapshots
minikube addons enable csi-hostpath-driver

The comment in minikube.mk also confirms this: "Only use this for development related to Volume Snapshots. Usually k3d is faster."

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for k3d-specific configurations
rg -A 5 "k3d" .github/helm/values/
rg -A 5 "k3d" .github/actions/setup-e2e/

Length of output: 1407


Script:

#!/bin/bash
# Check for k3d and minikube usage in e2e workflow
rg -A 5 "setup-e2e" .github/workflows/e2e.yaml

# Check for any k3d or minikube related issues or discussions
rg -i "k3d|minikube" .github/ISSUE_TEMPLATE/
rg -i "k3d|minikube" .github/pull_request_template.md

Length of output: 2255


Script:

#!/bin/bash
# Check k3d and minikube setup actions
cat .github/actions/setup-k3d/action.yaml
cat .github/actions/setup-minikube/action.yaml

# Check for any related test failures or issues
rg -i "failed.*k3d|k3d.*failed" .github/workflows/
rg -i "failed.*minikube|minikube.*failed" .github/workflows/

Length of output: 4113


Script:

#!/bin/bash
# Check the specific e2e test configuration and its context
rg -B 10 -A 10 'require_k3d: "false"' .github/workflows/e2e.yaml

# Check if there are any minikube-specific configurations in the repository
rg -l "minikube" .

Length of output: 1049


Script:

#!/bin/bash
# Check the read replica test configuration
cat .github/helm/values/values-readreplica.yaml

# Check minikube configuration in Makefile
cat Makefile.d/minikube.mk

# Check if this is the only test using minikube
rg "require_minikube: \"true\"" .github/workflows/

Length of output: 3809

@vdaas-ci
Copy link
Collaborator

Profile Report

typevald-agent-ngtvald-lb-gatewayvald-discoverervald-manager-index
cpu
heap
other images

Copy link
Contributor

coderabbitai[bot]
coderabbitai bot previously approved these changes Dec 11, 2024
@vdaas-ci
Copy link
Collaborator

Profile Report

typevald-agent-ngtvald-lb-gatewayvald-discoverervald-manager-index
cpu
heap
other images

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
pkg/agent/core/ngt/service/ngt.go (1)

890-892: Return a closed channel instead of nil
Returning nil can be risky if any upstream code waits to receive from the channel. Consider returning a closed channel to avoid blocking or upstream panics.

Example refactor:

890     func (n *ngt) Start(ctx context.Context) <-chan error {
891         if n.isReadReplica {
-            return nil
+            c := make(chan error)
+            close(c) // signals no operations needed
+            return c
         }
         ...
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ee564f5 and 84c941b.

📒 Files selected for processing (1)
  • pkg/agent/core/ngt/service/ngt.go (1 hunks)

Copy link

codecov bot commented Dec 23, 2024

Codecov Report

Attention: Patch coverage is 0% with 4 lines in your changes missing coverage. Please review.

Project coverage is 16.93%. Comparing base (be30998) to head (11d4bd3).
Report is 8 commits behind head on main.

Files with missing lines Patch % Lines
pkg/agent/core/ngt/service/ngt.go 0.00% 2 Missing and 1 partial ⚠️
hack/docker/gen/main.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2771      +/-   ##
==========================================
- Coverage   17.20%   16.93%   -0.27%     
==========================================
  Files         136      136              
  Lines       15044    15047       +3     
==========================================
- Hits         2588     2548      -40     
- Misses      12154    12199      +45     
+ Partials      302      300       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@kmrmt kmrmt force-pushed the bugfix/e2e/readreplica-minikube-start-failure branch from 3358169 to a3fd605 Compare December 25, 2024 01:07
@kmrmt kmrmt force-pushed the bugfix/e2e/readreplica-minikube-start-failure branch from a3fd605 to 75d1062 Compare December 25, 2024 01:12
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
Makefile.d/kind.mk (3)

78-78: 🛠️ Refactor suggestion

Improve reliability of e2e.yaml modification

The current sed command might fail silently if:

  • host.docker.internal is not in /etc/hosts
  • The file pattern isn't found in e2e.yaml
-sed -e 's/apiServerAddress: "127.0.0.1"/apiServerAddress: "$(shell grep host.docker.internal /etc/hosts | cut -f1)"/' $(ROOTDIR)/k8s/debug/kind/e2e.yaml | kind create cluster --name $(NAME)-vs --config - 
+HOST_IP := $(shell grep host.docker.internal /etc/hosts | cut -f1)
+ifeq ($(HOST_IP),)
+	$(error host.docker.internal not found in /etc/hosts)
+endif
+sed -e 's/apiServerAddress: "127.0.0.1"/apiServerAddress: "$(HOST_IP)"/' $(ROOTDIR)/k8s/debug/kind/e2e.yaml > /tmp/e2e_modified.yaml
+test -s /tmp/e2e_modified.yaml || (echo "Failed to modify e2e.yaml" && exit 1)
+kind create cluster --name $(NAME)-vs --config /tmp/e2e_modified.yaml

81-85: 🛠️ Refactor suggestion

Add verification for CRD and controller installation

The installation of CRDs and controller should be verified to ensure successful deployment.

+define verify_resource
+	@echo "Verifying $$1..."
+	@kubectl wait --for=condition=established --timeout=60s crd/$$1 || (echo "Failed to install $$1" && exit 1)
+endef

 kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/$(SNAPSHOTTER_VERSION)/client/config/crd/snapshot.storage.k8s.io_volumesnapshotclasses.yaml
+$(call verify_resource,volumesnapshotclasses.snapshot.storage.k8s.io)
 kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/$(SNAPSHOTTER_VERSION)/client/config/crd/snapshot.storage.k8s.io_volumesnapshotcontents.yaml
+$(call verify_resource,volumesnapshotcontents.snapshot.storage.k8s.io)
 kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/$(SNAPSHOTTER_VERSION)/client/config/crd/snapshot.storage.k8s.io_volumesnapshots.yaml
+$(call verify_resource,volumesnapshots.snapshot.storage.k8s.io)
 kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/$(SNAPSHOTTER_VERSION)/deploy/kubernetes/snapshot-controller/rbac-snapshot-controller.yaml
 kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/$(SNAPSHOTTER_VERSION)/deploy/kubernetes/snapshot-controller/setup-snapshot-controller.yaml
+kubectl -n kube-system wait --for=condition=ready pod -l app=snapshot-controller --timeout=60s || (echo "Snapshot controller not ready" && exit 1)

95-97: 🛠️ Refactor suggestion

Add readiness check for metrics-server

Replace the sleep command with a proper readiness check.

 @make k8s/metrics/metrics-server/deploy
 helm upgrade --install --set args={--kubelet-insecure-tls} metrics-server metrics-server/metrics-server -n kube-system
-sleep $(K8S_SLEEP_DURATION_FOR_WAIT_COMMAND)
+kubectl wait --for=condition=available deployment/metrics-server -n kube-system --timeout=60s || (echo "metrics-server not ready" && exit 1)
🧹 Nitpick comments (3)
Makefile.d/k8s.mk (2)

Line range hint 135-156: Improved deployment process with manifest dependency.

The k8s/vald/deploy target now properly depends on manifest generation and includes comprehensive resource deployment with error handling.

Consider adding a comment explaining why errors are permitted:

-	@echo "Permitting error because there's some cases nothing to apply"
+	@echo "Note: Errors are permitted as some resources might already exist or be unavailable in certain configurations"

Line range hint 239-270: Enhanced readreplica deployment with proper sequencing.

The k8s/vald-readreplica/deploy target includes several improvements:

  • Proper dependency on base deployment
  • Gateway cleanup and redeployment
  • Pod status verification and readiness checks

However, there's a potential race condition in the pod deletion wait.

Consider adding a verification step before proceeding with gateway deletion:

 	kubectl delete -f $(TEMP_DIR)/vald/templates/gateway || true
 	kubectl delete -f $(TEMP_DIR)/vald/templates/gateway/lb || true
+	# Verify no gateway pods are in terminating state before proceeding
+	while kubectl get pods -l app=vald-lb-gateway 2>/dev/null | grep Terminating; do
+		echo "Waiting for gateway pods to terminate..."
+		sleep 5
+	done
 	kubectl wait --for=delete pod -l app=vald-lb-gateway --timeout=600s
pkg/agent/core/ngt/service/ngt.go (1)

890-892: Add test coverage for read replica functionality.

The read replica handling code is not covered by tests. Consider adding test cases to verify the behavior.

Would you like me to help generate test cases for:

  1. Verifying that write operations are blocked on read replicas
  2. Ensuring read operations work correctly on read replicas
  3. Testing the early return in the Start method
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 891-892: pkg/agent/core/ngt/service/ngt.go#L891-L892
Added lines #L891 - L892 were not covered by tests

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3358169 and 75d1062.

⛔ Files ignored due to path filters (1)
  • hack/docker/gen/main.go is excluded by !**/gen/**
📒 Files selected for processing (17)
  • .github/actions/e2e-deploy-vald-readreplica/action.yaml (1 hunks)
  • .github/actions/e2e-deploy-vald/action.yaml (2 hunks)
  • .github/actions/setup-e2e/action.yaml (2 hunks)
  • .github/helm/values/values-readreplica.yaml (2 hunks)
  • .github/workflows/_detect-ci-container.yaml (1 hunks)
  • .github/workflows/e2e.yaml (2 hunks)
  • .gitignore (1 hunks)
  • Makefile.d/e2e.mk (2 hunks)
  • Makefile.d/k8s.mk (5 hunks)
  • Makefile.d/kind.mk (2 hunks)
  • Makefile.d/minikube.mk (2 hunks)
  • Makefile.d/tools.mk (1 hunks)
  • charts/vald/templates/_helpers.tpl (1 hunks)
  • dockers/ci/base/Dockerfile (2 hunks)
  • dockers/dev/Dockerfile (1 hunks)
  • k8s/debug/kind/e2e.yaml (1 hunks)
  • pkg/agent/core/ngt/service/ngt.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (12)
  • dockers/ci/base/Dockerfile
  • Makefile.d/minikube.mk
  • k8s/debug/kind/e2e.yaml
  • .github/actions/e2e-deploy-vald/action.yaml
  • Makefile.d/tools.mk
  • Makefile.d/e2e.mk
  • dockers/dev/Dockerfile
  • .github/actions/setup-e2e/action.yaml
  • .github/actions/e2e-deploy-vald-readreplica/action.yaml
  • charts/vald/templates/_helpers.tpl
  • .github/helm/values/values-readreplica.yaml
  • .gitignore
🧰 Additional context used
🪛 GitHub Check: codecov/patch
pkg/agent/core/ngt/service/ngt.go

[warning] 891-892: pkg/agent/core/ngt/service/ngt.go#L891-L892
Added lines #L891 - L892 were not covered by tests

🔇 Additional comments (8)
.github/workflows/_detect-ci-container.yaml (1)

44-44: LGTM: Improved tag extraction by removing double quotes.

The addition of sed -e 's/"//g' ensures clean Docker image tag values by removing any double quotes, which is a good practice for tag handling.

.github/workflows/e2e.yaml (2)

338-338: Extended index creation wait duration for better reliability.

The wait duration for index creation has been increased from 3 minutes to 10 minutes. This change provides better reliability for the test by allowing more time for the index creation process to complete.


318-318: Infrastructure change: Switched from minikube to kind.

The change from require_minikube to require_kind aligns with the PR objective to fix the minikube start failure. Using Kind (Kubernetes in Docker) is a good choice as it's more lightweight and faster than minikube.

Run the following script to verify the infrastructure change is consistent across the codebase:

✅ Verification successful

The switch from minikube to kind is consistent across the codebase

The verification shows that there are no remaining references to minikube in any YAML files, while kind is properly configured in the codebase, particularly in:

  • k8s/debug/kind/e2e.yaml - Contains the kind cluster configuration for e2e tests
  • k8s/debug/kind/config.yaml - Contains the kind cluster configuration
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the switch from minikube to kind is consistent.

# Test 1: Check for any remaining minikube references
echo "Checking for remaining minikube references..."
rg -i "minikube" --type yaml

# Test 2: Verify kind configuration
echo "Verifying kind configuration..."
rg -i "kind" --type yaml

Length of output: 67507

Makefile.d/k8s.mk (2)

Line range hint 114-134: LGTM: Well-structured manifest generation target.

The new k8s/vald/manifests target properly generates Kubernetes manifests using Helm template with appropriate image configurations and CRDs.


273-273: LGTM: Proper dependency for readreplica deletion.

The k8s/vald-readreplica/delete target correctly depends on k8s/vald/delete ensuring proper cleanup order.

pkg/agent/core/ngt/service/ngt.go (1)

890-892: LGTM! Early return for read replicas prevents write operations.

The early return for read replicas is a good addition that prevents unnecessary index creation operations on read-only instances.

Run the following script to verify the read replica handling is consistent across the codebase:

✅ Verification successful

Early return for read replicas is consistently implemented across write operations

The early return for read replicas in the Start method is part of a consistent pattern of preventing write operations on read replicas. The codebase shows that:

  • The RegenerateIndexes method also has the same read replica check and returns ErrWriteOperationToReadReplica
  • Other write operations like Insert, Update, Delete, etc. are protected by the IsFlushing() check which prevents concurrent modifications
  • The read replica check is appropriately placed at the start of methods that modify the index
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify read replica handling across write operations
# Test: Search for write operations in read replica context

# Check for write operation methods that should handle read replicas
rg -A 5 'func.*\b(Insert|Update|Delete|Create|Save|Regenerate).*\('

# Check for isReadReplica checks
rg 'isReadReplica'

Length of output: 84523

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 891-892: pkg/agent/core/ngt/service/ngt.go#L891-L892
Added lines #L891 - L892 were not covered by tests

Makefile.d/kind.mk (2)

87-93: 🛠️ Refactor suggestion

Improve CSI driver installation robustness

The CSI driver installation process needs several improvements:

  1. Version should be a variable
  2. Error handling is missing
  3. Cleanup might fail silently
+CSI_DRIVER_VERSION ?= v1.15.0
+CSI_TEMP_DIR := $(shell mktemp -d)
+
+.PHONY: csi/cleanup
+csi/cleanup:
+	@if [ -d "$(CSI_TEMP_DIR)" ]; then \
+		rm -rf "$(CSI_TEMP_DIR)" || (echo "Failed to cleanup CSI temp directory" >&2; exit 1); \
+	fi

-mkdir -p $(TEMP_DIR)/csi-driver-hostpath \
-	&& curl -fsSL https://github.com/kubernetes-csi/csi-driver-host-path/archive/refs/tags/v1.15.0.tar.gz | tar zxf - -C $(TEMP_DIR)/csi-driver-hostpath --strip-components 1 \
-	&& cd $(TEMP_DIR)/csi-driver-hostpath \
-	&& deploy/kubernetes-latest/deploy.sh \
-	&& kubectl apply -f examples/csi-storageclass.yaml \
-	&& kubectl apply -f examples/csi-pvc.yaml \
-	&& rm -rf $(TEMP_DIR)/csi-driver-hostpath
+mkdir -p "$(CSI_TEMP_DIR)" || (echo "Failed to create temp directory" >&2; exit 1)
+curl -fsSL "https://github.com/kubernetes-csi/csi-driver-host-path/archive/refs/tags/$(CSI_DRIVER_VERSION).tar.gz" | \
+	tar zxf - -C "$(CSI_TEMP_DIR)" --strip-components 1 || (echo "Failed to download/extract CSI driver" >&2; $(MAKE) csi/cleanup; exit 1)
+cd "$(CSI_TEMP_DIR)" && \
+	./deploy/kubernetes-latest/deploy.sh || (echo "Failed to deploy CSI driver" >&2; $(MAKE) csi/cleanup; exit 1)
+kubectl apply -f "$(CSI_TEMP_DIR)/examples/csi-storageclass.yaml" || (echo "Failed to apply storage class" >&2; $(MAKE) csi/cleanup; exit 1)
+kubectl apply -f "$(CSI_TEMP_DIR)/examples/csi-pvc.yaml" || (echo "Failed to apply PVC" >&2; $(MAKE) csi/cleanup; exit 1)
+$(MAKE) csi/cleanup

Likely invalid or redundant comment.


17-17: Verify the snapshotter version compatibility

The external-snapshotter version is pinned to v8.2.0. Let's verify this version's compatibility with the cluster version.

✅ Verification successful

External-snapshotter version v8.2.0 is appropriate for the setup

The external-snapshotter version v8.2.0 is the latest stable version and is appropriate for use with the current setup. The implementation includes:

  • Installation of all necessary CRDs and RBAC configurations
  • Deployment of the snapshot controller
  • Integration with CSI driver host path v1.15.0
  • Proper configuration of storage classes and PVCs
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check external-snapshotter version compatibility
# Get the latest stable version and compare with current version
LATEST=$(curl -s https://api.github.com/repos/kubernetes-csi/external-snapshotter/releases/latest | jq -r '.tag_name')
echo "Current version: v8.2.0"
echo "Latest version: $LATEST"

# Check compatibility matrix in README
curl -s https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/README.md | grep -A 5 "Compatibility Matrix"

Length of output: 370


Script:

#!/bin/bash
# Let's check the Kubernetes version used in the project and related configurations
# First, check if Kind version or K8s version is defined in the Makefile
grep -r "KIND_VERSION\|KUBERNETES_VERSION" .

# Also check for any Kind configuration files that might specify K8s version
fd -e yaml -e yml | xargs grep -l "kind:\|kubernetes:"

# Let's also check the README for any version information
cat README.md

Length of output: 33060


Script:

#!/bin/bash
# Let's check the Kind configuration and Kubernetes version used
fd -e yaml -e yml kind | xargs cat

# Also check if there are any version constraints or compatibility info in kind.mk
cat Makefile.d/kind.mk

Length of output: 4769


Script:

#!/bin/bash
# Let's check the e2e.yaml file to see the Kubernetes version used
cat k8s/debug/kind/e2e.yaml

# Also check the csi-driver-host-path version compatibility
curl -s https://raw.githubusercontent.com/kubernetes-csi/csi-driver-host-path/v1.15.0/README.md | grep -A 5 "Compatibility"

Length of output: 881

@kmrmt kmrmt changed the title fix minikube start failure use kind for starting minikube failure Jan 15, 2025
@vankichi vankichi changed the title use kind for starting minikube failure [Pending] use kind for starting minikube failure Jan 27, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
.github/workflows/e2e.yaml (1)

319-325: 💡 Verification agent

🧩 Analysis chain

Verify readreplica deployment configuration.

The deployment configuration should be verified to ensure proper setup.

Run this script to check the configuration:


🏁 Script executed:

#!/bin/bash
# Description: Verify readreplica deployment configuration

# Check if values file exists and contains required configurations
test -f .github/helm/values/values-readreplica.yaml || (echo "Missing values file" && exit 1)

# Verify the values file contains required configurations
yq e '.agent.readReplica' .github/helm/values/values-readreplica.yaml

Length of output: 274


Action Required: Fix Missing Read Replica Configuration and CI Dependency

The deployment configuration for the Vald Read Replica is currently failing validation. Our verification script indicates two issues:

  • Missing values file: The file .github/helm/values/values-readreplica.yaml does not exist. This file is expected to contain the necessary configuration (specifically an .agent.readReplica key) for the deployment.
  • Missing dependency: The yq utility is not available in the CI environment, which prevents the script from properly parsing the configuration file.

Next Steps:

  • Add or restore the values file: Ensure that .github/helm/values/values-readreplica.yaml is committed to the repository with the required configuration settings.
  • Resolve the dependency: Update your CI environment to include the yq installation (or adjust the script to handle its absence) if you intend to use it for configuration validations.

Please address these issues and manually verify the changes to confirm the read replica deployment is correctly configured.

♻️ Duplicate comments (1)
Makefile.d/kind.mk (1)

87-93: 🛠️ Refactor suggestion

Improve CSI driver installation robustness.

The CSI driver installation process could be more robust:

  1. Version should be defined as a variable
  2. Error handling is needed for curl and tar operations
  3. Temporary directory handling could be improved

Apply this diff to improve robustness:

+CSI_DRIVER_VERSION ?= v1.15.0
+CSI_TEMP_DIR := $(shell mktemp -d)

-mkdir -p $(TEMP_DIR)/csi-driver-hostpath \
-  && curl -fsSL https://github.com/kubernetes-csi/csi-driver-host-path/archive/refs/tags/v1.15.0.tar.gz | tar zxf - -C $(TEMP_DIR)/csi-driver-hostpath --strip-components 1 \
+mkdir -p $(CSI_TEMP_DIR) || (echo "Failed to create temp directory" && exit 1)
+curl -fsSL https://github.com/kubernetes-csi/csi-driver-host-path/archive/refs/tags/$(CSI_DRIVER_VERSION).tar.gz | tar zxf - -C $(CSI_TEMP_DIR) --strip-components 1 || (echo "Failed to download/extract CSI driver" && exit 1)
+cd $(CSI_TEMP_DIR) \
🧹 Nitpick comments (1)
Makefile.d/k8s.mk (1)

137-156: Add error handling for kubectl apply commands.

While the comment permits errors, it would be better to handle specific error cases explicitly.

Consider adding error handling:

-kubectl apply -f $(TEMP_DIR)/vald/templates/manager/index || true
+if ! kubectl apply -f $(TEMP_DIR)/vald/templates/manager/index; then
+  echo "Warning: Failed to apply manager/index manifests, this may be expected if resources don't exist"
+fi
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 75d1062 and 11d4bd3.

📒 Files selected for processing (11)
  • .github/actions/e2e-deploy-vald-readreplica/action.yaml (1 hunks)
  • .github/actions/e2e-deploy-vald/action.yaml (2 hunks)
  • .github/actions/setup-e2e/action.yaml (2 hunks)
  • .github/helm/values/values-readreplica.yaml (2 hunks)
  • .github/workflows/_detect-ci-container.yaml (1 hunks)
  • .github/workflows/e2e.yaml (2 hunks)
  • Makefile.d/e2e.mk (2 hunks)
  • Makefile.d/k8s.mk (5 hunks)
  • Makefile.d/kind.mk (2 hunks)
  • Makefile.d/minikube.mk (2 hunks)
  • Makefile.d/tools.mk (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
  • .github/actions/setup-e2e/action.yaml
  • Makefile.d/minikube.mk
  • .github/actions/e2e-deploy-vald-readreplica/action.yaml
  • Makefile.d/tools.mk
  • Makefile.d/e2e.mk
  • .github/workflows/_detect-ci-container.yaml
  • .github/actions/e2e-deploy-vald/action.yaml
  • .github/helm/values/values-readreplica.yaml
⏰ Context from checks skipped due to timeout of 90000ms (163)
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / dump-contexts-to-log
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / dump-contexts-to-log
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / dump-contexts-to-log
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / dump-contexts-to-log
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / dump-contexts-to-log
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
  • GitHub Check: build / build
🔇 Additional comments (8)
Makefile.d/kind.mk (5)

17-17: LGTM: Version pinning for external-snapshotter.

The version is appropriately pinned to a specific release (v8.2.0) which helps maintain consistency and reproducibility.


23-23: LGTM: Docker dependency added.

Good practice to ensure Docker is available before running Kind operations.


78-78: Improve reliability of e2e.yaml modification.


81-85: Add verification for CRD and controller installation.


95-98: Add readiness check for metrics-server.

Makefile.d/k8s.mk (1)

114-134: LGTM: Clean manifest generation target.

Good separation of concerns by splitting manifest generation into a separate target.

.github/workflows/e2e.yaml (2)

318-318: LGTM: Migration to Kind for e2e tests.

Appropriate change to align with the new Kind-based infrastructure.


338-338: LGTM: Increased wait duration for index creation.

Good adjustment to allow more time for index creation in the readreplica scenario.

Comment on lines +254 to +270
kubectl delete -f $(TEMP_DIR)/vald/templates/gateway || true
kubectl delete -f $(TEMP_DIR)/vald/templates/gateway/lb || true
kubectl get pods
kubectl wait --for=delete pod -l app=vald-lb-gateway --timeout=600s

kubectl apply -f $(TEMP_DIR)/vald-readreplica/templates
sleep 2
sleep 5

kubectl get pods
kubectl wait --for=condition=ready pod -l app=vald-agent --timeout=600s
kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=vald-readreplica --timeout=600s

kubectl apply -f $(TEMP_DIR)/vald/templates/gateway || true
kubectl apply -f $(TEMP_DIR)/vald/templates/gateway/lb || true

kubectl get pods

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add validation for pod deletion completion.

The current implementation could proceed before pods are fully terminated.

Add explicit checks:

 kubectl delete -f $(TEMP_DIR)/vald/templates/gateway || true
 kubectl delete -f $(TEMP_DIR)/vald/templates/gateway/lb || true
-kubectl get pods
-kubectl wait --for=delete pod -l app=vald-lb-gateway --timeout=600s
+if ! kubectl wait --for=delete pod -l app=vald-lb-gateway --timeout=600s; then
+  echo "Error: Timed out waiting for gateway pods to terminate"
+  exit 1
+fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
kubectl delete -f $(TEMP_DIR)/vald/templates/gateway || true
kubectl delete -f $(TEMP_DIR)/vald/templates/gateway/lb || true
kubectl get pods
kubectl wait --for=delete pod -l app=vald-lb-gateway --timeout=600s
kubectl apply -f $(TEMP_DIR)/vald-readreplica/templates
sleep 2
sleep 5
kubectl get pods
kubectl wait --for=condition=ready pod -l app=vald-agent --timeout=600s
kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=vald-readreplica --timeout=600s
kubectl apply -f $(TEMP_DIR)/vald/templates/gateway || true
kubectl apply -f $(TEMP_DIR)/vald/templates/gateway/lb || true
kubectl get pods
kubectl delete -f $(TEMP_DIR)/vald/templates/gateway || true
kubectl delete -f $(TEMP_DIR)/vald/templates/gateway/lb || true
if ! kubectl wait --for=delete pod -l app=vald-lb-gateway --timeout=600s; then
echo "Error: Timed out waiting for gateway pods to terminate"
exit 1
fi
kubectl apply -f $(TEMP_DIR)/vald-readreplica/templates
sleep 5
kubectl get pods
kubectl wait --for=condition=ready pod -l app=vald-agent --timeout=600s
kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=vald-readreplica --timeout=600s
kubectl apply -f $(TEMP_DIR)/vald/templates/gateway || true
kubectl apply -f $(TEMP_DIR)/vald/templates/gateway/lb || true
kubectl get pods

@vdaas-ci
Copy link
Collaborator

Profile Report

typevald-agent-ngtvald-lb-gatewayvald-discoverervald-manager-index
cpu
heap
other images

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants