Complete AI prompt library for DevOps engineers. Covers Kubernetes manifests, Helm charts, GitHub Actions CI/CD, Terraform IaC, Docker multi-stage builds, monitoring with Prometheus and Grafana, security scanning, and production deployment patterns.
DevOps & Kubernetes in 2026: Infrastructure as Code at Scale
Modern DevOps is defined by reproducibility β every environment identical, every deployment automated, every configuration in version control. AI dramatically accelerates Kubernetes manifest writing, Terraform module creation, and CI/CD pipeline design when prompted with the right production requirements. These prompts produce infrastructure code that passes a production readiness review β not just configurations that deploy without errors.
1. Kubernetes Deployment with Production Best Practices
You are a senior Kubernetes engineer who has run production clusters at scale.
Write a complete Kubernetes Deployment manifest for a Node.js API service:
Deployment requirements:
- replicas: 3 (for HA)
- Strategy: RollingUpdate with maxUnavailable: 0, maxSurge: 1 (zero-downtime deployments)
- Resource requests: cpu: 100m, memory: 128Mi
- Resource limits: cpu: 500m, memory: 512Mi (REQUIRED β explain why missing limits cause node starvation)
- Image: specify imagePullPolicy: Always with SHA digest (not mutable tag)
Probes (explain the difference before writing each):
- startupProbe: GET /health, failureThreshold: 30, periodSeconds: 10 (slow start tolerance)
- readinessProbe: GET /health, failureThreshold: 3, periodSeconds: 5 (traffic control)
- livenessProbe: GET /health, failureThreshold: 3, periodSeconds: 15 (restart trigger)
Security context (REQUIRED on every production deployment):
- runAsNonRoot: true, runAsUser: 1001
- readOnlyRootFilesystem: true
- allowPrivilegeEscalation: false
- capabilities: drop: [ALL]
Also write: Service (ClusterIP), HorizontalPodAutoscaler (min:3, max:10, CPU 70%), PodDisruptionBudget (minAvailable: 2).
Add a comment on every field explaining why it exists.
Why it works: Requesting comments on every field forces the AI to explain the production reasoning β you get documentation that would normally take a Kubernetes expert's code review.
2. Helm Chart for Microservice
You are a Helm chart expert.
Create a production Helm chart for a Node.js microservice:
Chart structure:
charts/my-service/
Chart.yaml (name, version, appVersion, description)
values.yaml (all defaults)
templates/
deployment.yaml
service.yaml
ingress.yaml
hpa.yaml
pdb.yaml
configmap.yaml
secret.yaml (sealed-secrets or external-secrets reference)
serviceaccount.yaml
_helpers.tpl (common labels, selector labels, name truncation)
values.yaml defaults:
- replicaCount: 3
- image.repository, image.tag (overridden per environment)
- resources.requests and resources.limits
- autoscaling.enabled, minReplicas, maxReplicas, targetCPU
- ingress.enabled, ingress.hosts, ingress.tls
- env: {} (map of environment variables from ConfigMap or Secret)
- probes: liveness and readiness enabled by default
Environment overrides (values-prod.yaml):
- Higher resource limits, more replicas, different image tag, production ingress host
Provide:
1. All template files with proper Go templating
2. NOTES.txt explaining post-install steps
3. Helm install command for staging and production
4. How to use helm diff before upgrading
3. GitHub Actions β Full CI/CD Pipeline
You are a GitHub Actions expert building production CI/CD pipelines.
Write a complete GitHub Actions workflow for a Node.js microservice deployed to Kubernetes:
Trigger: push to main, PRs to main, manual dispatch (workflow_dispatch)
Jobs:
test (every push and PR):
- ubuntu-latest
- Setup Node 22, pnpm cache by pnpm-lock.yaml hash
- pnpm install --frozen-lockfile
- TypeScript compile check, ESLint, Prettier
- Jest unit + integration tests (Testcontainers PostgreSQL via Docker-in-Docker)
- Upload coverage to Codecov
- Fail if coverage drops below 80%
security (parallel to test):
- npm audit --audit-level=high
- Trivy image scan: scan Dockerfile for OS vulnerabilities (CRITICAL severity fails build)
- SARIF output uploaded to GitHub Security tab
build (after test passes, main branch only):
- Multi-stage Docker build with BuildKit cache
- Tag with: git SHA (immutable) + 'latest'
- Push to AWS ECR (OIDC auth β no long-lived secrets)
- Image size check: fail if > 200MB
deploy-staging (after build):
- Update Helm values: image.tag = git SHA
- helm upgrade --install my-service charts/my-service -f values-staging.yaml
- Wait for rollout: kubectl rollout status deployment/my-service
- Smoke test: curl staging.api.example.com/health β assert 200
deploy-production (manual approval gate, after deploy-staging):
- environment: production (with required reviewers in GitHub)
- Same helm upgrade to production namespace
- Slack notification: success with deploy URL, failure with rollback command
Secrets: use OIDC for AWS (no access keys), GitHub Environment Secrets for Slack.
4. Terraform Infrastructure
You are a Terraform expert for AWS infrastructure.
Write Terraform (1.8+) to provision a production Kubernetes application infrastructure on AWS:
Modules to create:
module/networking:
- VPC with public, private, and database subnets across 3 AZs
- NAT Gateway (one per AZ for HA)
- VPC flow logs to CloudWatch
module/eks:
- EKS cluster 1.30 with managed node groups
- Node groups: general (t3.medium, min:3, max:10), memory-optimised for databases
- IRSA (IAM Roles for Service Accounts) for pods to access AWS services
- aws-load-balancer-controller, cluster-autoscaler, external-dns via Helm
module/rds:
- PostgreSQL 16 Multi-AZ RDS instance
- Subnet group in private subnets
- Security group: only allow from EKS security group on 5432
- Automated backups 7 days, encryption at rest
- Parameter group for performance tuning (shared_buffers, work_mem)
module/elasticache:
- Redis 7 cluster mode disabled (single primary + replica)
- Private subnets only
- Auth token (password) stored in AWS Secrets Manager
Global:
- All resources tagged: Environment, Team, CostCenter, ManagedBy=terraform
- Remote state: S3 bucket + DynamoDB table for locking
- Variables with validation blocks for required values
- Outputs: cluster endpoint, RDS endpoint, ECR URLs
Output: all module files with variables.tf, main.tf, outputs.tf, and root configuration.
5. Monitoring with Prometheus & Grafana
You are a Kubernetes observability expert.
Set up production monitoring for a Node.js microservice on Kubernetes:
Prometheus scraping:
- ServiceMonitor (Prometheus Operator CRD) for the microservice
- Expose /metrics endpoint: express-prometheus-middleware for Node.js
- Custom metrics: http_request_duration_seconds (histogram), http_requests_total (counter by status), active_connections (gauge), background_job_duration_seconds
Prometheus Alert Rules (PrometheusRule CRD):
- Critical: error rate > 1% for 5 minutes (5xx / total)
- Critical: pod unavailable (all replicas down)
- Warning: P99 latency > 500ms for 10 minutes
- Warning: pod restarts > 3 in 5 minutes
- Info: deployment rollout in progress
Grafana Dashboard (JSON model for 4 panels):
- Request rate (requests per second by status code)
- Error rate percentage (5xx / total, red threshold at 1%)
- Latency percentiles (P50, P95, P99 on one graph)
- Infrastructure (CPU usage, memory usage, pod count)
AlertManager routing:
- Critical β PagerDuty (immediate)
- Warning β Slack #alerts channel
- Resolved notifications on both channels
Output: ServiceMonitor YAML, PrometheusRule YAML, AlertManager config, and Grafana dashboard JSON.
6. Docker Security Hardening
You are a container security expert.
Write a security-hardened multi-stage Dockerfile for a Node.js API:
Stage 1 (deps):
- node:22-alpine as base
- Set npm config loglevel=error, no-fund, no-audit (faster build)
- COPY package.json pnpm-lock.yaml
- RUN pnpm install --frozen-lockfile --prod (production only)
Stage 2 (build):
- COPY source, run tsc
- RUN pnpm install --frozen-lockfile (includes dev deps for build)
Stage 3 (production β most important):
- Base: gcr.io/distroless/nodejs22-debian12 (Google Distroless β no shell, no package manager, minimal CVE surface)
- Alternative if distroless is too restrictive: node:22-alpine with security updates applied
- COPY from deps stage: /app/node_modules
- COPY from build stage: /app/dist
- USER nonroot:nonroot (distroless built-in non-root user, UID 65532)
- EXPOSE port via ENV, not hardcoded
- No COPY of .env files, secrets, or source code
- HEALTHCHECK CMD ["/nodejs/bin/node", "-e", "require('http').get('http://localhost:3000/health', r => process.exit(r.statusCode === 200 ? 0 : 1))"]
Security scan: show Trivy command to scan the final image and interpret the report.
After Dockerfile: explain attack surface reduction β why distroless has 90% fewer CVEs than ubuntu-based images.
7. Good vs Bad DevOps Prompts
| Task | β Bad Prompt | β Good Prompt |
|---|---|---|
| Kubernetes | "Deploy my app to Kubernetes" | "Write a Kubernetes Deployment for a Node.js API: 3 replicas, RollingUpdate maxUnavailable=0, resource requests cpu:100m/memory:128Mi and limits cpu:500m/memory:512Mi, readinessProbe + livenessProbe on /health, securityContext runAsNonRoot+readOnlyRootFilesystem+drop:ALL, HPA 3-10 pods at 70% CPU, PDB minAvailable:2." |
| CI/CD | "Set up GitHub Actions for my app" | "Write GitHub Actions for Node.js β EKS: test job (Jest+Testcontainers, 80% coverage gate), parallel security job (npm audit + Trivy), build job (Docker BuildKit + ECR push via OIDC), deploy-staging (Helm upgrade + rollout wait + smoke test), deploy-production (environment gate with required reviewers + Slack notification)." |
| Terraform | "Create AWS infrastructure" | "Write Terraform 1.8 modules for: VPC (3 AZ, public+private+DB subnets), EKS 1.30 (managed node groups, IRSA), RDS PostgreSQL 16 Multi-AZ (private subnets, encrypted), ElastiCache Redis 7 (auth token in Secrets Manager). S3+DynamoDB remote state. Variable validation blocks. Mandatory tags: Environment, Team, CostCenter." |
Generate a custom DevOps/K8s prompt β Try PromptPrepare free
Found this helpful? Share it.