Failure Categories

12 categories used to classify AI agent failure modes.

The agent produced, committed, or deployed broken, destructive, or insecure code.

The agent's actions resulted in significant unexpected financial costs.

The agent transmitted sensitive data to external or unintended destinations.

Messages, emails, or data were sent to unintended recipients.

The agent confidently generated false information, fabricated APIs, URLs, or facts.

The agent permanently deleted files, database records, or storage objects.

Credentials, secrets, or sensitive data were exposed or mishandled.

The agent caused embarrassment, reputation damage, or interpersonal harm.

The agent entered an unrecoverable loop, causing resource exhaustion or runaway costs.

The agent took actions far beyond the intended scope of the task.

The agent's actions violated legal, regulatory, or policy requirements.

The agent fundamentally misinterpreted a clear instruction and acted on the wrong assumption.