Full Table of Contents

01

Section 01 — The Silent Revolution of DevOps

FREE PREVIEW

And Why You Need to Lead It

1.1 3 AM. 47 Alerts.

1.2 DevOps as Human Glue

1.3 Why AI Fails Without Standardization

1.4 See the Transformation

1.5 What Makes This Guide Different

1.6 Critical Pain Points

1.7 What You Will Build

1.8 The Journey Begins

02

Section 02 — AI Fundamentals for DevOps

LLMs, Claude, MCP and Agent Frameworks

2.1 What Are Large Language Models (LLMs)

2.2 Claude, GPT and Other Models: When to Use Each

2.3 Critical Limitations You Need to Know

2.4 Model Context Protocol (MCP): The Bridge Between LLM and Tools

2.5 Agent Frameworks: LangChain, CrewAI and AutoGen

2.6 Architectural Decisions of This Guide

03

Section 03 — The Modern IDE for DevOps with AI

FREE PREVIEW

VS Code, Claude Code CLI and Environment Configuration

Setup & Configuration

3.1 The IDE as DevOps Command Center

3.2 VS Code + Claude Code: The Recommended Setup

3.3 Essential Extensions for DevOps

3.4 Configuration Files: .cursorrules, CLAUDE.md, settings.json

3.5 Starting Your Environment: claude init

AI Capabilities

3.6 Copilots, Chat and Agents in the IDE

3.7 MCP: The Model Context Protocol

3.8 Common Anti-Patterns in IDE AI Usage

3.9 Agent in Action - Initial IDE Setup

3.10 Claude Skills: Complete Guide

Advanced & MCP

3.11 Building Your Own MCP Server

3.12 Context7: Real-Time Documentation via MCP

3.13 Opus 4.7: Adaptive Thinking, Task Budgets and xhigh Effort

3.14 Hooks: Automation & Guardrails in Claude Code

3.15 AWS MCP Servers — From Ecosystem to Canonical

› 3.15.1 AWS MCP Server (GA, May 2026) — The New Canonical Entry Point

› 3.15.2 The awslabs Ecosystem — Specialized Servers Worth Keeping

› 3.15.3 Decision Tree — When to Use What

› 3.15.4 New Patterns Unlocked by GA

3.16 GCP MCP Servers — Google's Remote Approach

04

Section 04 — AI Agents for DevOps

From Concept to Practice with Specialized Subagents

Concepts

4.1 What Is (and Isn't) an AI Agent

4.2 Practical Difference: Prompt, Script, Copilot and Agent

4.3 Anatomy of a DevOps Agent

4.4 Types of Agents in the DevOps World

4.5 MCP Applied to Agents

Guardrails & Debugging

4.6 Minimum Guardrails for Production

4.7 When NOT to Use Agents

4.8 Anatomy of an Agent Execution (Step-by-Step)

4.9 Anti-Patterns: Where Agents Fail (Real Cases)

4.10 Debugging: When the Agent Makes Mistakes

Practice & Teams

4.11 Metrics: How to Measure Agent ROI

4.12 Agent in Action — Configuring the DevOps Agent

4.13 Specialized Subagents: Creating an AI Team

4.14 Agent Teams: Multiple Agents Working in Parallel

4.15 Managed Agents API — Anthropic's Stateful Alternative

4.16 Spec-Driven Agents: From Vibe Coding to Engineering Discipline

› 4.16.1 The Problem with Vibe Coding

› 4.16.2 The Three Pillars: Contracts, Agents, Runtime

› 4.16.3 The Canonical Workflow

› 4.16.4 Implementing in Claude Code

› 4.16.5 Landscape: Kiro vs Spec Kit vs Native Claude Code

› 4.16.6 When NOT to Use Spec-Driven

4.17 Headless Claude Code — From CLI to SDK to Embedded Orchestrators

› 4.17.1 The Claude Code as Library Pattern

› 4.17.2 The CLI Route — claude -p (Print Mode)

› 4.17.3 The SDK Route — Agent SDK (Python + TypeScript)

› 4.17.4 Three Orchestrator Patterns — Real-World Examples

› 4.17.5 CI Authentication and Cost Guardrails

› 4.17.6 When NOT to Use Headless Mode

05

Section 05 — Terraform with AI

Intelligent Infrastructure in Practice

Foundations & MCP

5.1 Terraform in the Real World (The Silent Pain)

5.2 Where AI Really Helps in Terraform

5.3 MCP Applied to Terraform: Choosing Your Architecture

5.4 Installation Guide: VS Code, Cursor and Claude Code CLI

5.5 Installing the Right MCPs

Practice & Safety

5.6 Guardrails Configuration: Protecting Production

5.7 The "End-to-End" Flow (Supervised)

5.8 Practical Walkthrough: From Zero to First Module (Generic, Cloud-Agnostic)

5.9 Workspace Safety: AI as Environment Guardian

› 5.9.5 Hard Guardrail IAM Plan-Only

5.10 FinOps: AI as Cost Analyst

Agents & Advanced

5.11 Agent in Action - The Infrastructure Architect

5.12 Terraform MCP Troubleshooting

5.13 Specialized Subagent: terraform-reviewer

5.14 Agent Teams: Multi-Module Infrastructure Refactoring

5.15 Beyond Terraform — CloudFormation & CDK with AWS MCP

› 5.15.6 When to Use call_aws vs Specialized awslabs Servers

5.16 Spec-Driven IaC: From Business Requirements to Modules

› 5.16.1 Why IaC Is the Killer App for Spec-Driven

› 5.16.2 The Three Pillars Applied to Terraform

› 5.16.3 Walkthrough: Multi-AZ VPC for PCI Environment

› 5.16.4 Drift Detection as Spec Validation

› 5.16.5 Tooling: IBM iac-spec-kit and Native Claude Code

5.17 From Monolith to Composable: Refactoring CLAUDE.md into Skills and Subagents

› 5.17.1 Diagnosis: Why Refactor

› 5.17.2 The Partition Rule: Always-On vs On-Demand

› 5.17.3 Walkthrough: Migrating 3 Concrete Patterns

› 5.17.4 The Final CLAUDE.md (~80 lines)

› 5.17.5 Final Directory Structure

› 5.17.6 When NOT to Refactor

06

Section 06 — AWS with AI: Deep Dive

MCP Server, IAM Hard Guardrail, CFN/CDK, Bedrock, and Least-Privilege

6.1 AWS MCP Server (deep dive)

6.2 Hard Guardrail Plan-Only IAM (duplicated from 5.9.5)

6.3 CloudFormation and CDK with AWS MCP

6.4 Claude Code on Bedrock

6.5 Least-Privilege IAM for AWS MCP

6.6 Closing ... future GCP and Azure

07

Section 7 — Kubernetes with AI

Operation, Policies and Intelligent Scaling

Core & MCP

7.1 Kubernetes: The Distributed Operating System

7.2 K8sGPT: From CLI to Continuous Monitoring

7.3 MCP for Kubernetes: Giving "Eyes" to the Agent

7.4 MCP Installation Verification

7.5 Real End-to-End Case: Payment Service Down

Policies & Scaling

7.6 Policies as Code: Kyverno + AI

7.7 Intelligent Autoscaling: KEDA + AI

7.8 Intelligent HPA/VPA: AI-Guided Configuration

7.9 Deployment Strategies: Canary and Blue-Green with AI

Agents & Advanced

7.10 Specialized Subagent: k8s-troubleshoot

7.12 Hosting MCP Servers on Kubernetes/EKS

7.13 AI-Assisted EKS/Kubernetes Upgrade

7.14 Agent Teams: Multi-Agent EKS Upgrade Validation

7.15 Karpenter + AI: Intelligent Node Scaling

7.16 GKE vs EKS — When Each Makes Sense

7.17 GKE MCP in Action — The 27 Tools in a Real Flow

7.18 Agent Team — Multi-Region GKE Provisioning

7.19 Spec-Driven Kubernetes: From Requirements to Manifests + Policy

› 6.19.1 The Manifests-Without-Intent Problem

› 6.19.2 The Three Pillars Applied to Kubernetes

› 6.19.3 Real Walkthrough: payment-service Deployment

› 6.19.4 The Spec as Dev ↔ Platform Contract

7.11 Chapter Conclusion

08

Section 8 — CI/CD with AI

Pipelines as Product

Foundations

8.1 Pipelines as Product

8.2 Intelligent Test Selection (Predictive Test Selection)

8.3 Pipeline Failure Auto-Triage

8.4 Supply Chain Security

8.5 Flakiness: Unstable Tests

Generation & Security

8.6 Pipeline Generation with AI

8.7 Pipeline Security

8.8 When NOT to Use AI in CI/CD

8.9 ROI of AI in CI/CD

8.10 Subagent: ci-security-analyst

Advanced & Automation

8.11 Final Configuration

8.12 End-to-End Practical Scenario

8.13 GitHub Actions with Claude Code

8.14 Claude Code SDK — Programmatic Automation

8.15 Conclusion

09

Section 9 — Observability and Incidents

From Signal Overload to Intelligent Action

Foundations

9.1 The Problem of Signal Overload

9.2 Logs, Metrics and Traces Correlation

9.3 AI Support for On-Call

9.4 MTTR Reduction with Assisted Decision

9.5 Resource Forecasting with Prophet

Architecture & Agents

9.6 Real Incident Practical Case

9.7 Incident Agent Architecture

9.8 Persona and Subagent Configuration

9.9 Limitations and When NOT to Use AI

9.10 Agent Teams: Automated War Room for P1 Incidents

MCP & Observability

9.11 CloudWatch MCP Server — Native AWS Observability

9.12 Cost Explorer MCP — Intelligent FinOps with AI

9.13 Grafana MCP + OpenTelemetry — Multi-Cloud Observability

9.14 Chapter Conclusion

9.15 SLOs as Specs: Auto-Generating Alerts and Dashboards

› 8.15.1 The SLO Document Gap

› 8.15.2 The Three Pillars Applied to Observability

› 8.15.3 Real Walkthrough: payment-service SLO

› 8.15.4 The Compounding Effect: SLO Changes Become Regenerable

10

Section 10 — Container and Kubernetes Security

Intelligent Vulnerability Triage

Detection

10.1 The Problem of Security at Scale

10.2 Intelligent Vulnerability Triage

10.3 Triage System Architecture

Action

10.4 Automated Prioritization: From Detection to Action

10.5 Secrets Management with AI

10.6 Specialized Subagent: security-auditor

Configuration

10.7 Agent Configuration (.cursorrules)

10.8 Limitations and When NOT to Use AI

10.9 Chapter Conclusion

11

Section 11 — FinOps: Cost Optimization with AI

Intelligent Cloud Cost Reduction

11.1 The Structural Problem of FinOps

11.2 Intelligent FinOps Architecture

11.3 Implementation: Essential Components

11.4 FinOps ROI with AI

11.5 Conclusion

12

Section 12 — Runbook RAG

Operational Knowledge Instantly Accessible

Foundations & Stack

12.1 The Problem of Distributed Documentation

12.2 Fundamentals: RAG, BM25 and Embeddings

12.3 Stack Selection: Detailed Comparison (2025-2026)

12.4 Step-by-Step Implementation with Qdrant + LlamaIndex

12.5 Integrating Multiple Sources: Jira, GitHub, Confluence, Slack

Integration & Metrics

12.6 Creating an Agent/MCP that Uses RAG (Anti-Hallucination)

12.7 Slack Integration for Quick Access

12.8 RAG Metrics and Evaluation

12.9 Limitations and Troubleshooting

12.10 Conclusion and Implementation Checklist

13

Section 13 — Security, Guardrails and Professional Use

Human-in-the-Loop and Responsibility

13.1 Why AI Without Limits Becomes Risk

13.2 Human-in-the-Loop in Practice

13.3 Simple Guardrails That Work

13.4 Responsibility Remains Human

13.5 Security Checklist

13.6 Claude Code on Amazon Bedrock: Enterprise Deployment for Regulated Industries

13.6.1 The Compliance Gap

13.6.2 Architecture: Bedrock-Native Inference

13.6.3 Senior Setup: From Zero to Production

13.6.4 The Mantle Endpoint

13.6.5 Real-World Pattern: Q2 Code (Banking)

13.6.6 What This Means for Your Career

13.7 AWS MCP Server: Least-Privilege IAM and Operational Guardrails

13.7.1 The Blast Radius Problem

13.7.2 Three Deployment Tiers

13.7.3 Ready-to-Use IAM Templates

13.7.4 Defense in Depth: Five Layers

13.7.5 Agent vs Human: IAM Condition Keys

13.7.6 CloudTrail and Real-Time Alarms

13.7.7 PreToolUse Anti-Destructive Hook in Claude Code

14

Section 14 — AI-Assisted GitOps

ArgoCD, Flux and Intelligent Automation

Foundations

14.1 What is GitOps (Recap)

14.2 Where AI Adds Value in GitOps Flow

14.3 Automated IaC PR Review

Practice

14.4 Manifest Generation with Claude

14.5 Drift Detection and Correction

14.6 Integration with ArgoCD and Flux

Automation

14.7 Production-Ready Prompts

14.8 Where to Place Prompts and How to Automate

14.9 Guardrails: What NOT to Automate

15

Section 15 — Governance and Organizational Adoption

How to Scale AI in DevOps at the Enterprise

15.1 The Problem of Uncoordinated Adoption

15.2 Governance Architecture: Repositories and Structure

15.3 CLAUDE.md: The Single Source of Truth

15.4 Adoption Architecture: People and Processes

16

Section 16 — Conclusion and the Future of DevOps with AI

FREE PREVIEW

Future Vision and Your Action Plan

16.0 What You've Seen in This Guide

16.1 What Changed (and What Didn't)

16.2 What's Coming in the Next 2-3 Years

16.3 How to Prepare (Practical Actions)

16.4 The Real Risks (Not the Hype)

16.5 The Inconvenient Truth

16.6 Your 6-Month Plan

16.7 The Final Principle

16.8 You're Ready. Start Tomorrow.

Read Section 16 for Free

I Part I: Fundamentals and Context

Section 01 — The Silent Revolution of DevOps

Section 02 — AI Fundamentals for DevOps

Section 03 — The Modern IDE for DevOps with AI

Setup & Configuration

AI Capabilities

Advanced & MCP

Section 04 — AI Agents for DevOps

Concepts

Guardrails & Debugging

Practice & Teams

II Part II: Infrastructure as Code

Section 05 — Terraform with AI

Foundations & MCP

Practice & Safety

Agents & Advanced

Section 06 — AWS with AI: Deep Dive

Section 7 — Kubernetes with AI

Core & MCP

Policies & Scaling

Agents & Advanced

III Part III: DevOps Practices with AI

Section 8 — CI/CD with AI

Foundations

Generation & Security

Advanced & Automation

Section 9 — Observability and Incidents

Foundations

Architecture & Agents

MCP & Observability

IV Part IV: Specializations

Section 10 — Container and Kubernetes Security

Detection

Action

Configuration

Section 11 — FinOps: Cost Optimization with AI

Section 12 — Runbook RAG

Foundations & Stack

Integration & Metrics

V Part V: Governance and Adoption

Section 13 — Security, Guardrails and Professional Use

Section 14 — AI-Assisted GitOps

Foundations

Practice

Automation

Section 15 — Governance and Organizational Adoption

VI Part VI: Conclusion and Future

Section 16 — Conclusion and the Future of DevOps with AI

Ready to master DevOps with AI?