When AI Operates Your Infrastructure: Why Every Control Must Be Structural

An AI coding assistant deleted 2.5 years of production data in seconds. The infrastructure was Terraform. The assistant was Claude Code. Every root cause was preventable — but not with the controls most teams have in place.

In March 2026, a developer named Alexey Grigorev used Claude Code to migrate a website’s infrastructure to AWS via Terraform. The agent was given broad permissions and a missing state file. When the state file was later uploaded, the agent logically followed it — issuing terraform destroy on a shared environment that contained a second, unrelated production website. The result: a full wipe of both sites, including an RDS database with 2.5 years of records and the backup snapshots that the operator had counted on for recovery.

Amazon Business Support restored the data within a day, but only because the cloud provider’s internal retention happened to still hold. That is not a recovery strategy. That is luck.

In his post-mortem, Grigorev described several corrective measures: applying delete protections, moving the state file to S3, and — crucially — stopping the agent from running Terraform commands altogether. He admitted he “over-relied on the AI agent to run Terraform commands.”

Most commentary framed this as a cautionary tale about AI agents gone wrong. It is more precisely a story about controls that were behavioural when they needed to be structural. Grigorev did not instruct the agent to destroy his production database. The agent arrived at terraform destroy as a logical step in its workflow. It was not malicious. It was not confused. It was doing exactly what Terraform does: converging current state to desired state, which — in the absence of a valid state file — meant destroying everything.

This is the core problem with AI-assisted infrastructure-as-code, and it is not specific to Claude Code, to Terraform, or to Grigorev’s setup. It is a property of the interaction between two systems: one that converges declaratively and one that executes literally.

This article covers an approach that problem: the distinction between behavioural and structural controls, the five-layer defence model that helps prevent any single failure from cascading into data loss, and the specific engineering practices — from IAM deny policies to automated policy-as-code validation — that make the difference between a standard that is followed and a standard that is enforced.

Why AI-Assisted IaC Has a Unique Risk Profile

To understand why the Grigorev incident happened, we need to understand four properties of Terraform that interact badly with AI coding assistants.

Terraform is declarative and convergent. A terraform apply does not merely create resources. It converges current state to desired state. If the desired state has fewer resources than the current state, Terraform destroys the difference. This is not a bug — it is the fundamental design principle. But it means that a terraform apply can be a creation operation or a destruction operation, and the difference is only visible in the plan output.

State files are the ground truth. Terraform does not query AWS to find out what exists. It reads its state file. If the state file is missing, stale, or corrupt, Terraform believes nothing exists. An agent operating without a valid state file will produce a plan that creates duplicates or — when the state is later reconciled — destroys everything.

Destroy is indistinguishable from create in the command syntax. Both are terraform apply. The agent does not need to run a separate, obviously dangerous command. It runs the same command it runs for every change. The destruction is embedded in the plan, not in the command.

AI agents execute literally. If an agent is asked to “clean up duplicate resources” and it has a destroy path available, it will take it. It does not understand that an RDS database represents 2.5 years of business records. It does not feel the weight of that. It has no muscle memory caution, no fear of consequences, no contextual knowledge that says “this is the one you do not touch.” It sees a Terraform state file, a desired configuration, and a convergence path. It follows the path.

A human engineer with the same access can cause the same damage. But a human is slowed by context: they know the database matters, they hesitate before running destructive commands, they double-check the environment. An AI agent has none of these brakes. The controls must therefore be structural — encoded in IAM policies, account boundaries, and automated gates — not behavioural. A behavioural control asks the agent to comply. A structural control makes non-compliance physically impossible.

That distinction — behavioural versus structural — is the central engineering insight that runs through everything that follows.

Behavioural Controls vs. Structural Controls

This is the taxonomy that most AI tooling governance frameworks can get wrong.

A behavioural control sets expectations and relies on the agent choosing to comply. Examples: a CLAUDE.md file that lists forbidden commands, a system prompt that says “never run terraform destroy,” a policy document that asks engineers not to give the agent production access. Behavioural controls are useful for documentation, for setting norms, for communicating intent. They are not enforcement mechanisms.

A structural control operates at the system level and cannot be bypassed by the agent’s reasoning, a different prompt, a tool update, or a configuration change. Examples: an IAM role that lacks write permissions (the agent physically cannot apply or destroy because the AWS API rejects the call), a Service Control Policy that denies destructive actions on an entire AWS account, a CI pipeline gate that automatically rejects plans containing stateful resource deletions.

The failure mode of a behavioural control is that the agent reasons around it. “I know I’m not supposed to run terraform destroy, but the state file shows resources that shouldn’t exist, and the most logical way to clean them up is to destroy and recreate.” That is exactly what happened in the Grigorev incident. The agent did not ignore an instruction — it followed a logical path that happened to include a destructive operation.

The failure mode of a structural control is that the AWS API returns Access Denied. There is no reasoning around that. There should be no logical path that bypasses an IAM deny policy. The control operates at a layer the agent cannot reach and is not supposed to reach.

This does not mean behavioural controls are useless. A CLAUDE.md file that explicitly lists forbidden commands, explains the environment hierarchy, and includes an uncertainty principle (“if you are unsure whether an action is destructive, stop and ask the human”) materially reduces the probability of an AI agent taking a destructive action. It is a valuable first layer. But it cannot be the only layer, because it relies on the agent’s compliance — and compliance is a behavioural property, not a structural one.

The engineering principle: require at least one structural control that makes the destructive action impossible, in addition to any behavioural controls that make it unlikely. Both matter.

The Five Layers of Defence

At QualitaX, our approach is to implement five independent layers of protection for infrastructure managed with AI-assisted tooling. Each layer operates independently. No single failure — of a person, a tool, a policy, or a configuration — can cascade into production data loss.

Layer 1: IAM Deny Policies and Service Control Policies. This is the AWS API boundary — the outermost wall. A deny policy explicitly rejects destructive API calls (rds:DeleteDBInstance, rds:DeleteDBCluster, rds:DeleteDBSnapshot, s3:DeleteBucket, elasticache:DeleteReplicationGroup) on production resources. In a multi-account architecture, the SCP is attached to the production account’s organisational unit and applies unconditionally — everything in the production account is production by definition, so no tag-based conditions are needed. This is the single most important control in the entire framework. Terraform’s prevent_destroy can be removed by editing code. AWS deletion_protection can be toggled by an API call. A CLAUDE.md instruction can be ignored by a future tool. An IAM deny policy cannot be bypassed without AWS account administrator access.

Layer 2: Account-Level Isolation. Ideally: Dev, staging, production, and backup exist in separate AWS accounts within an AWS Organization. Credentials issued in the dev account physically cannot enumerate, access, or modify resources in the production account. This is not directory-level isolation within a shared account — it is physical isolation at the AWS identity boundary. A mis-scoped terraform destroy in the dev account cannot wipe production because the dev account’s credentials do not resolve to production resources. The blast radius of any single credential compromise is bounded by the account boundary.

Layer 3: Policy-as-Code Validation. The CI pipeline runs automated plan validation — using OPA/Rego, terraform-compliance, or when required HashiCorp Sentinel — before any human sees the plan. The policy engine parses terraform show -json plan.tfplan and applies hard deny rules: any plan that destroys or replaces a stateful resource (aws_db_instance, aws_rds_cluster, aws_s3_bucket) is rejected. Any plan that weakens a protection setting (deletion_protection, skip_final_snapshot, backup_retention_period) on a production resource is rejected. Any plan that removes or modifies an IAM deny policy protecting production resources — the policy that guards the guard — is rejected. This layer exists because human plan review is necessary but not sufficient. A human reviewing a 1,000-line plan will miss a single - (destroy marker) hidden in the noise. The policy engine will not.

Layer 4: Terraform Lifecycle Protections. lifecycle { prevent_destroy = true } on production stateful resources causes the plan itself to fail if a destroy is attempted. deletion_protection = true on RDS instances causes the AWS API to reject deletion even if Terraform tries. skip_final_snapshot = false ensures a snapshot is taken if the instance is ever deleted. These are the Terraform-native controls that would have individually prevented the Grigorev data loss. In our framework, they are the fourth layer — not the first — because each can be weakened by editing code and submitting a PR. The layers above catch that PR before it reaches production.

Layer 5: Human Execution Gate. No terraform apply runs without human review and explicit approval. The AI agent may run terraform plan and present the output. The human runs terraform apply after reviewing the plan. This is non-negotiable for all environments, including dev. The agent plans. The human applies. Always.

The defence-in-depth model means that for production data to be destroyed, all five layers must fail simultaneously: the IAM deny policy must be absent, the account isolation must be breached, the policy-as-code gate must be misconfigured, the Terraform lifecycle protections must be removed, and the human reviewer must approve the destruction. Each layer is designed to catch the failure of the layers below it.

The CLAUDE.md: A Behavioural Layer Done Right

We said behavioural controls are insufficient alone. That does not mean they are unimportant. A well-written CLAUDE.md file — the project-level instruction file that constrains Claude Code’s behaviour — materially reduces the probability of an agent attempting a destructive action, which reduces the load on the structural controls that catch it.

A CLAUDE.md template for infrastructure work should contain four sections.

Forbidden commands. An explicit, enumerated list of commands the agent must never run. Not “don’t do dangerous things” — a precise list: terraform apply, terraform destroy, terraform state rm, terraform state mv, terraform import, terraform force-unlock, terraform taint, and every aws CLI command containing delete, destroy, terminate, or remove. Specificity matters because the agent matches against concrete patterns, not abstract intentions.

Permitted commands. An equally explicit list of what the agent may do: terraform init, terraform plan, terraform validate, terraform fmt, terraform state list (read-only), terraform output (read-only), aws CLI read-only commands. The permitted list is as important as the forbidden list — it gives the agent a clear operating space rather than leaving it to infer what is safe.

The session start checklist. Before running any Terraform command: verify the correct environment directory, run terraform init to confirm backend connectivity, run terraform state list to confirm state is populated. If state is empty or init fails, stop. This directly addresses the Grigorev root cause — the agent operated without a valid state file. The checklist catches that condition before any plan is generated.

The uncertainty principle. “If you are uncertain whether an action is destructive, STOP and ask the human. Do not attempt to determine destructiveness programmatically or by reasoning about the command.” This closes the gap where the agent might reason that a particular operation is safe because it does not match the forbidden commands list — even though it has destructive side effects (a parameter change that triggers resource replacement, for instance). The instruction to stop on uncertainty, rather than proceed on best-effort reasoning, is a meaningful behavioural safeguard.

The CLAUDE.md is a behavioural control. It relies on the agent choosing to comply. But it is a well-designed behavioural control: specific, enumerated, and anchored by an uncertainty principle that defaults to stopping rather than proceeding. Combined with the structural controls above it, it creates a layered defence where the probability of a destructive action succeeding is vanishingly small.

Automated Policy-as-Code: The Gate Humans Cannot Match

Human plan review is a judgement-based process. A human reviewer reads a Terraform plan, evaluates whether the changes are intentional, checks for unexpected destroys, and approves or rejects. This is valuable and irreplaceable — human judgement catches context that no automated rule can express.

But human plan review has a structural weakness: humans get tired, and Terraform plans can be long. A plan that creates 40 resources and destroys 1 stateful resource looks, at a glance, like a creation plan. The single destroy marker is there — buried on line 847 of the plan output. A fresh reviewer on a Monday morning will catch it. The same reviewer at 5pm on a Friday, after three other PR reviews, may not.

Policy-as-code validation addresses this structural weakness. It is not a replacement for human review — it is a pre-filter that catches the categories of error that humans are worst at detecting: a single dangerous change in a large plan, a protection setting weakened from true to false, a stateful resource being replaced (which is destruction followed by creation — the data loss is identical).

The implementation uses terraform show -json plan.tfplan to produce a machine-readable plan, then evaluates Rego policies (or equivalent) against the JSON. The policies are specific:

# DENY: destruction of stateful resources
deny[msg] {
    resource := input.resource_changes[_]
    resource.type == "aws_db_instance"
    resource.change.actions[_] == "delete"
    msg := sprintf(
        "BLOCKED: Plan deletes stateful resource %s (%s). "
        "Requires POLICY-OVERRIDE with justification.",
        [resource.address, resource.type]
    )
}

# DENY: weakening of deletion protection
deny[msg] {
    resource := input.resource_changes[_]
    resource.type == "aws_db_instance"
    resource.change.actions[_] == "update"
    resource.change.after.deletion_protection == false
    resource.change.before.deletion_protection == true
    msg := sprintf(
        "BLOCKED: Plan weakens deletion_protection on %s from true to false.",
        [resource.address]
    )
}

The policy gate is a hard fail in the CI pipeline. A PR that triggers a deny rule cannot be merged — the branch protection rule blocks it at the GitHub level. The human reviewer never sees a plan that the policy engine has rejected, which means the human’s review time is spent on the changes that actually require judgement rather than on catching mechanical violations.

Policy overrides exist, because an automated gate that cannot be overridden will be circumvented. But the override mechanism is auditable: a # POLICY-OVERRIDE: <reason> annotation committed alongside the change, checked by the policy engine, and visible in the PR review. The override is a conscious, documented decision — not a silent bypass.

Immutable Backups: The Last Line That Cannot Be Deleted

The Grigorev incident had a secondary failure that is easy to overlook: the RDS snapshots were destroyed in the same operation that destroyed the instance. The operator had counted on snapshots as a recovery mechanism. The snapshots were co-located with the instance — same account, same region, same Terraform state. When the state was destroyed, the snapshots went with it.

This is a common architectural assumption that fails catastrophically: “our backups are in the same account as the thing being backed up.” In that architecture, any credential with sufficient permissions — including a CI/CD role, an AI agent role, or an over-permissioned operator role — can delete both the resource and its backups in a single operation. The backup provides zero protection against the threat model that matters most: a destructive action by an authorised actor (human or AI) with broad permissions.

The solution is two-part: cross-account replication and immutable storage.

Cross-account replication sends automated backup copies to a separate AWS account — the backup account — in a different region. No role in the production account can delete resources in the backup account. The blast radius of a production account compromise or a destructive Terraform operation is bounded by the account boundary. The backups survive because they exist in a place the destructive action cannot reach.

AWS Backup Vault Lock applies a WORM (Write Once Read Many) policy to the backup vault. Once the lock is engaged (after a short ChangeableForDays configuration window), the policy becomes immutable. No one — not the account root user, not an IAM administrator, not an AI agent with broad permissions, not even AWS Support — can delete or modify backups before the retention period expires. This is the only control that makes backups truly tamper-proof.

The combination is the correct architecture for any team whose production data matters: backups replicated to a separate account where the production environment cannot reach them, stored in a vault that even the backup account’s administrator cannot delete from. This is the difference between “we have backups” and “we have backups that survive the worst-case scenario.” The worst case is not a disk failure — it is an authorised actor (human or AI) with broad permissions executing a destructive operation that reaches both the resource and its backups. Cross-account Vault Lock is an architecture where that operation fails.

For regulated industries — banking, healthcare, legal — the calculus is even starker. Immutable backups are not just good engineering; they are evidence that can be presented to a regulator that recovery capability cannot be tampered with by anyone inside the organisation. But the engineering rationale stands on its own: if your backups can be deleted by the same credentials that can delete your production database, they are not backups. They are a copy that shares a fate with the original.

The Break-Glass Procedure: Because Every Framework Needs an Exception Path

A control framework without a documented exception path is a control framework that will be bypassed undocumented during an emergency.

Production is down. The fix requires a Terraform apply. The normal approval chain — Technology Delivery Lead reviews the plan, approves, the engineer applies — cannot be completed within the incident’s recovery time target. What happens?

Without a documented procedure: the engineer either waits (extending the outage) or bypasses controls (creating an unauditable gap). Both outcomes are worse than a structured exception.

With a documented break-glass procedure: the engineer declares the emergency in an incident channel, receives authorisation from the next person in the escalation chain, executes the change with a saved plan file, logs every detail (who authorised, who executed, what was changed, when, why), revokes any temporary credentials within an hour, and conducts a post-incident review within 48 hours.

The break-glass procedure is not a weakening of controls. It is a recognition that controls exist in a world where production incidents happen, and that a controlled exception path — auditable, time-limited, subject to post-incident review — is safer than an uncontrolled bypass.

This applies to any team with production infrastructure, not just regulated industries. The difference between “we bypassed controls during an outage” and “we followed the break-glass procedure during an outage” is the difference between an audit gap and an auditable event. The first erodes trust in the control framework. The second demonstrates that the framework was designed for reality.

In regulated industries — banking, financial services, healthcare — break-glass procedures are a standard component of operational resilience frameworks, and the absence of one is itself a regulatory finding. But the engineering rationale does not require a regulator to be compelling: if your team has ever bypassed a control during an emergency without documenting it, you need a break-glass procedure.

The Principle That Connects All of This

Looking across the five layers — IAM deny policies, account isolation, policy-as-code gates, Terraform lifecycle protections, human execution gates — and the supporting practices — the CLAUDE.md template, the break-glass procedure, immutable backups — a single principle connects them.

Controls for AI-assisted infrastructure must be structural, not behavioural. A behavioural control asks the agent to comply. A structural control aims at making non-compliance physically impossible. Both have a role.

This principle extends beyond Terraform. Any system where an AI agent operates on consequential infrastructure — databases, networking, identity, storage — requires the same analysis: where are the behavioural controls, where are the structural controls, and what happens if every behavioural control fails simultaneously? If the answer is “the structural controls still hold,” the architecture is sound. If the answer is “there are no structural controls,” the architecture is one logical reasoning step away from a Grigorev incident.

The Grigorev incident was less of failure of AI and more of a failure of controls. The AI did exactly what it was designed to do: converge state. The controls did not prevent the convergence from being destructive. That is an engineering problem, and it has an engineering solution.

The solution is not to stop using AI for infrastructure. Claude Code and equivalent tools genuinely accelerate Terraform development — module creation, plan analysis, configuration review, documentation. The solution is to ensure that the AI can plan but never apply, that the plan is validated by both a machine and a human, and that even if every control fails, the production data survives in an immutable vault in a separate account that the destructive action cannot reach.

That is what structural governance looks like. Not a policy document. Not a responsible AI framework. An architectural decision — made before the first resource is created — that no single person, tool, or misconfiguration can delete production data.

QualitaX builds agentic systems for B2B businesses. If your AI agent costs are higher than they should be — or you want to build something that won't surprise you with the bill — get in touch.