Skip to content

feat(skills): add eks-genai skill + Day 1 workflow#47

Open
jalawala wants to merge 1 commit into
aws-samples:mainfrom
jalawala:feat/eks-genai-skill
Open

feat(skills): add eks-genai skill + Day 1 workflow#47
jalawala wants to merge 1 commit into
aws-samples:mainfrom
jalawala:feat/eks-genai-skill

Conversation

@jalawala
Copy link
Copy Markdown
Contributor

@jalawala jalawala commented Jun 2, 2026

Add an opinionated GenAI-on-EKS skill, a matching Day 1 steering workflow and command shim, and the full repo fan-out (catalogues, hub routing, sibling-map eval updates).

Skill — skills/eks-genai/:

  • SKILL.md (132 lines) + 12 references teaching the AWS-canonical 6-layer GenAI-on-EKS stack: compute (NVIDIA GPU vs AWS Neuron), cluster/scheduler (Karpenter, device plugins, EFA, Capacity Blocks), frameworks (JARK + vLLM + Ray Serve, Triton/Dynamo/KServe), storage (FSx Lustre, Mountpoint S3 CSI, EFS, S3 Vectors), observability (DCGM/Neuron Monitor + Prometheus/Grafana + AMP), and the LiteLLM AI gateway; plus distributed training, KV-cache tiering (LMCache), cost levers, agentic/RAG, a non-negotiable security baseline, the concrete validated stack (versions), and 5 worked use cases.
  • Grounded in the EKS AI/ML Best Practices guide + awslabs/ai-on-eks and validated against the GenAI-on-EKS NVIDIA workshop currency.

Steering:

  • steering/workflows/eks-genai.md (Day 1 - Build, advisory, 4 phases, STOP gates) - passes quick_validate clean (0/0).
  • /apex:eks-genai command shim.
  • Wired into steering/eks.md routing table + workflow index.

Evals + sibling fan-out — misc/evals/eks-genai/:

  • triggering.json (8 positives, 8 attributed negatives), evals.json (2 task prompts, 5 grader-checkable expectations each).
  • Added eks-genai to the SIBLING_MAP + a routing negative in the 4 neighbours (best-practices, design, build, platform-engineering).

Catalogues: README.md (skills, steering, slash-command tables) and skills/README.md detail block.

Gates: quick_validate PASS (0/0); make hygiene + check-evals-coverage PASS for all 10 skills.

Summary

If this PR adds or changes a skill

  • Ran /apex:new-skill (or walked the equivalent manual steps in CONTRIBUTING.md)
  • skills/<skill>/SKILL.md present and passes make validate-<skill> (run from misc/evals/)
  • misc/evals/<skill>/triggering.json authored (≥16 prompts; balanced positives and near-miss negatives)
  • misc/evals/<skill>/evals.json authored (≥2 realistic task prompts with ≥3 expectations each; every assertion tagged TODO: human review until tuned)
  • misc/evals/<skill>/README.md filled in — including the SIBLING_MAP block (or explicitly empty with rationale if the skill has no siblings)
  • For each neighbour: misc/evals/<neighbour>/SIBLING_MAP gained a bullet and its triggering.json gained the matching negatives (via update_sibling_map.py or hand-edit)
  • make init-evals-finalize SKILL=<skill> exits 0
  • make check-evals-coverage exits 0
  • Ran the update-docs skill and committed any resulting changes (regenerated wrappers/manifest, marker-block updates, prose edits)

See misc/evals/README.md for the capability catalogue and CONTRIBUTING.md for the full new-skill workflow.

Add an opinionated GenAI-on-EKS skill, a matching Day 1 steering
workflow and command shim, and the full repo fan-out (catalogues,
hub routing, sibling-map eval updates).

Skill — skills/eks-genai/:
- SKILL.md (132 lines) + 12 references teaching the AWS-canonical
  6-layer GenAI-on-EKS stack: compute (NVIDIA GPU vs AWS Neuron),
  cluster/scheduler (Karpenter, device plugins, EFA, Capacity Blocks),
  frameworks (JARK + vLLM + Ray Serve, Triton/Dynamo/KServe), storage
  (FSx Lustre, Mountpoint S3 CSI, EFS, S3 Vectors), observability
  (DCGM/Neuron Monitor + Prometheus/Grafana + AMP), and the LiteLLM
  AI gateway; plus distributed training, KV-cache tiering (LMCache),
  cost levers, agentic/RAG, a non-negotiable security baseline, the
  concrete validated stack (versions), and 5 worked use cases.
- Grounded in the EKS AI/ML Best Practices guide + awslabs/ai-on-eks
  and validated against the GenAI-on-EKS NVIDIA workshop currency.

Steering:
- steering/workflows/eks-genai.md (Day 1 - Build, advisory, 4 phases,
  STOP gates) - passes quick_validate clean (0/0).
- /apex:eks-genai command shim.
- Wired into steering/eks.md routing table + workflow index.

Evals + sibling fan-out — misc/evals/eks-genai/:
- triggering.json (8 positives, 8 attributed negatives), evals.json
  (2 task prompts, 5 grader-checkable expectations each).
- Added eks-genai to the SIBLING_MAP + a routing negative in the 4
  neighbours (best-practices, design, build, platform-engineering).

Catalogues: README.md (skills, steering, slash-command tables) and
skills/README.md detail block.

Gates: quick_validate PASS (0/0); make hygiene + check-evals-coverage
PASS for all 10 skills.
@devfloor9 devfloor9 self-requested a review June 2, 2026 07:51
@utkarpun
Copy link
Copy Markdown
Contributor

utkarpun commented Jun 2, 2026

Linked: closes #6 (AI/ML | GenAI Reference). This PR delivers the genai skill that addresses the original ask.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants