UnrestrictedPublic Registry

Every AI Agent
Failure, Documented.

A structured public ledger for AI agent incidents. Submit anonymously. Every case numbered, tagged, and searchable. Built so the next team doesn't make the same mistake.

Cases Filed

$3.4M

Estimated Damage

Agents Implicated

+ File a Case Report How it works →

APM-0023·Cursor·MODERATEApr 30, 2026

Cursor agent deleted .env file and committed empty replacement to git

I asked Cursor to clean up the project root directory. The agent identified .env as an unnecessary file (it wasn't tracked in git) and deleted it, then created an empty .env placeholder and committed it. All local environment variables were lost. The production deployment that ran immediately after had missing API keys and went down for 20 minutes before I noticed. I had to reconstruct the .env from memory and other team members' machines.

code-disaster

891

APM-0002·GPT-4·SEVERE·~$22kApr 28, 2026

GPT-4 hallucinated API endpoint and sent 4000 emails to wrong recipients

Agent was tasked with sending a product update to opted-in users. It hallucinated a field mapping in the CRM API and sent confidential pricing data to a competitor contact list. Legal was notified within the hour. GDPR breach reported to supervisory authority within 72 hours as required.

hallucination expensive-mistake

763

APM-0003·Cursor·SEVERE·~$3kApr 28, 2026

Cursor agent committed AWS root credentials to public GitHub repository

Developer asked Cursor to commit and push a refactor. The agent did not verify the .gitignore was correctly excluding the .env file. AWS root credentials were publicly visible for 11 minutes before an automated scanner detected and alerted. Credentials were rotated immediately but the repo had already been indexed.

security-fail code-disaster

1.2k

APM-0001·Devin·CRITICAL·~$85kApr 28, 2026

Agent deleted production database after misreading schema migration

Automated agent executed DROP TABLE on live database during a routine migration task. No backup had been taken in 48 hours. Six hours of customer data lost. The agent had been given unrestricted database access and interpreted an ambiguous instruction about cleaning up old tables as permission to drop the primary orders table.

deleted-data code-disaster

APM-0025·Cursor·SEVERE·~$8kApr 27, 2026

Cursor agent rewrote entire authentication module without being asked

A developer asked Cursor to 'clean up the login page styling'. The agent interpreted this as permission to refactor the entire authentication stack. It deleted the existing OAuth implementation, rewrote session management from scratch, and committed 47 files across 6 modules. The new code had subtle token validation bugs that only appeared in production. Rolling back took 4 hours and the incident caused 2 hours of user-facing login failures affecting 12,000 active users.

code-disaster

APM-0024·Devin·CRITICAL·~$15kApr 25, 2026

Devin deleted all feature branches after misreading cleanup instructions

A senior engineer asked Devin to 'clean up old stale branches in the repo'. Devin queried all branches, identified any branch without a commit in the last 30 days as stale, and deleted 34 branches — including 8 active feature branches that happened to not have recent commits because developers were on vacation. Three branches contained 2-3 weeks of work each with no remote backup. Git reflog recovery salvaged most code but two branches were irrecoverable. Estimated 6 developer-weeks of work at risk.

APM-0004·Claude·SEVERE·~$11kApr 24, 2026

Claude agent booked 14 duplicate flights while attempting to reschedule one trip

A travel assistant built on Claude was given access to a booking API. The user asked it to reschedule an upcoming flight to a day earlier. The agent made repeated API calls — each time interpreting the previous booking as a failed attempt when it was actually confirmed. After 14 booking attempts, the user had 14 confirmed tickets on the same route totaling $11,200 in charges. The airline's API had no idempotency key and the agent had no retry deduplication logic. Refunds took 3 weeks.

expensive-mistakevia @travel_dev_anon

APM-0005·GitHub Copilot·CRITICAL·~$120kApr 24, 2026

GitHub Copilot Workspace merged conflicting migrations that corrupted production schema

Two developers were working in parallel on database migrations using Copilot Workspace. Copilot auto-resolved the merge conflict between their migration files by combining both — resulting in a migration that ran ALTER TABLE statements in an order that violated foreign key constraints. The migration ran successfully in staging (empty DB) but caused a cascade of constraint violations in production when approximately 2.3 million rows failed to migrate. Database restore from backup took 6 hours of downtime.

code-disaster

APM-0026·GPT-4·CRITICAL·~$50kApr 21, 2026

GPT-4 assistant sent draft legal notice to opposing counsel instead of internal team

A paralegal used a GPT-4 powered assistant to draft a legal notice for internal review. When asked to 'send it to the team for review', the assistant resolved 'the team' using the email thread context — which included opposing counsel from a recent email chain. The draft legal notice, containing settlement strategy and internal legal assessment, was sent to the opposing party's lawyers. The law firm had to immediately notify their client and the incident required emergency containment. Legal exposure was significant.

wrong-recipient data-exfiltration

APM-0006·Replit Agent·MODERATE·~$3kApr 21, 2026

Replit agent spun up 40 concurrent workers and exhausted cloud budget in 3 hours

A developer asked the Replit agent to 'make the data processing pipeline faster using parallelism'. The agent refactored the pipeline to use 40 concurrent workers, each spawning a cloud function. The developer stepped away for lunch. When they returned 3 hours later, the pipeline had processed 4 datasets but had consumed $2,800 in cloud compute — exhausting the team's entire monthly budget. There were no cost guardrails configured and the agent had no built-in spend awareness.

expensive-mistake

APM-0022·Claude·SEVERE·~$7kApr 19, 2026

Claude agent unsubscribed user from all email lists including critical security alerts

A user asked a Claude-powered email management agent to 'unsubscribe me from all the marketing emails I keep getting'. The agent processed all emails with 'unsubscribe' links in the footer — including cloud provider billing alerts, security incident notifications, domain expiry warnings, and two-factor authentication setup emails that used a similar footer format. Three weeks later, the user's domain expired (renewal notice had been missed) and they missed a critical security alert about unauthorized access to their AWS account.

APM-0021·GPT-4·SEVERE·~$40kApr 19, 2026

GPT-4 powered chatbot revealed other users' order details due to context bleed

An e-commerce company deployed a GPT-4 customer service bot. Due to a prompt engineering error, the system prompt included a 'recent orders' context block that was shared across sessions and not properly isolated per user. When customers asked about their orders, the bot would sometimes reference order details from other users whose queries had been in the shared context window. Over 3 days, approximately 140 customers received responses containing another customer's name, address, or order details. GDPR breach notification was required.

data-exfiltration

APM-0017·n8n AI Agent·MODERATEApr 19, 2026

n8n AI agent workflow looped invoice sending and billed client 91 times in one night

A freelancer built an n8n workflow with an AI agent node to automate invoice sending. The workflow was triggered by a webhook and included a 'confirm invoice was received' step that polled the client's email for a reply. Due to a logic error in the AI node's loop condition, the workflow kept resending the invoice every 3 minutes throughout the night when no reply was received. By morning, the client had received 91 invoices totaling $182,000 (91x the $2,000 invoice). The client's email system had flagged the sender as spam and blocked further communication.

expensive-mistake

APM-0008·OpenAI Assistants API·SEVERE·~$18kApr 19, 2026

OpenAI Assistants API agent recursively generated 8GB of log files in 20 minutes

An internal operations agent built on the Assistants API was tasked with diagnosing a slow database query. Its tool use included the ability to run shell commands on a bastion host. The agent decided to enable verbose query logging to diagnose the issue, then looped on 'check if the issue is resolved' — re-running the slow query and logging each attempt. After 20 minutes, 8GB of logs had been written to the /var partition, filling the disk. This caused the primary web server to stop accepting writes, resulting in a 40-minute outage.

APM-0016·AutoGPT·MODERATEApr 18, 2026

AutoGPT submitted 200 job applications on behalf of user without final confirmation

A user configured AutoGPT to help with job searching. They provided their resume, preferences, and LinkedIn credentials. The agent was told to 'apply to suitable software engineering roles'. Without any human-in-the-loop confirmation, AutoGPT applied to 200 positions over 48 hours — including senior roles the candidate was underqualified for, positions at the user's current employer's direct competitors (visible on LinkedIn), and two roles at companies where the user had previously been rejected. Several applications included a cover letter hallucinated with incorrect employment history.

social-blundervia @jobseeker_anon

APM-0007·Gemini·SEVERE·~$25kApr 17, 2026

Gemini agent emailed entire customer database a test message with debug headers

A marketing engineer was testing a new email campaign integration with a Gemini-powered automation agent. They asked it to 'send a test email to verify the setup'. The agent, interpreting 'test the setup' literally, sent a test email to all 47,000 contacts in the connected CRM — each email containing visible debug headers including internal API keys, database table names, and the phrase '[DEBUG MODE] DO NOT SEND TO REAL USERS]'. The team received over 300 complaint emails within the hour. GDPR notification procedures were triggered.

wrong-recipient data-exfiltration

APM-0014·Zapier AI·SEVERE·~$22kApr 15, 2026

Zapier AI agent added 15,000 random contacts to CRM from scraped LinkedIn data

A sales ops manager used Zapier's AI agent to 'find and add potential leads to the CRM'. The agent, connected to a web scraping integration, pulled 15,000 LinkedIn profiles matching a broad keyword search and bulk-imported them into Salesforce. The import overloaded the CRM's deduplication engine, corrupted 3,400 existing contact records, and triggered 15,000 automated onboarding emails to people who had never interacted with the company. LinkedIn's terms of service were violated and the company received a cease-and-desist letter.

wrong-recipient

APM-0020·Perplexity·SEVERE·~$500kApr 13, 2026

Perplexity research agent cited retracted paper as primary evidence in medical report

A clinical research team used a Perplexity-powered agent to compile a literature review on a new treatment protocol. The agent cited a 2019 paper as key supporting evidence for efficacy claims. The paper had been retracted in 2022 due to data fabrication, but Perplexity's index had not been updated to reflect the retraction. The literature review was included in a grant application submitted to NIH. The NIH review panel flagged the retracted citation, which called into question the entire application's rigor. The grant was denied and the team lost 6 months of work.

hallucination

APM-0011·LangChain Agent·CRITICALApr 13, 2026

LangChain agent published internal pricing spreadsheet to public S3 bucket

A LangChain-based document processing agent was given access to both an internal SharePoint and an AWS S3 bucket used for public assets. A business analyst asked it to 'move the Q3 pricing docs to S3 so the sales team can access them easily'. The agent moved all documents with 'pricing' in the filename — including a master pricing strategy document and competitor analysis — to the public-facing S3 bucket with public-read ACL. The files were indexed by Google within 6 hours. A competitor found them via search.

data-exfiltration wrong-recipient

APM-0012·Azure OpenAI·CRITICAL·~$2.3MApr 13, 2026

Azure OpenAI agent cancelled all pending vendor purchase orders during 'cleanup'

An enterprise procurement agent built on Azure OpenAI was given access to the company's ERP system. A procurement manager asked it to 'clear out the old pending items cluttering up the dashboard'. The agent interpreted all purchase orders in 'pending' status older than 90 days as candidates for cancellation — and cancelled 847 purchase orders totalling $2.3M in vendor commitments. Many of these were legitimate long-lead-time orders for manufacturing components. Re-placing the orders reset delivery timelines by months and some vendors charged re-order fees.

Every AI AgentFailure, Documented.

Cursor agent deleted .env file and committed empty replacement to git

GPT-4 hallucinated API endpoint and sent 4000 emails to wrong recipients

Cursor agent committed AWS root credentials to public GitHub repository

Agent deleted production database after misreading schema migration

Cursor agent rewrote entire authentication module without being asked

Devin deleted all feature branches after misreading cleanup instructions

Claude agent booked 14 duplicate flights while attempting to reschedule one trip

GitHub Copilot Workspace merged conflicting migrations that corrupted production schema

GPT-4 assistant sent draft legal notice to opposing counsel instead of internal team

Replit agent spun up 40 concurrent workers and exhausted cloud budget in 3 hours

Claude agent unsubscribed user from all email lists including critical security alerts

GPT-4 powered chatbot revealed other users' order details due to context bleed

n8n AI agent workflow looped invoice sending and billed client 91 times in one night

OpenAI Assistants API agent recursively generated 8GB of log files in 20 minutes

AutoGPT submitted 200 job applications on behalf of user without final confirmation

Gemini agent emailed entire customer database a test message with debug headers

Zapier AI agent added 15,000 random contacts to CRM from scraped LinkedIn data

Perplexity research agent cited retracted paper as primary evidence in medical report

LangChain agent published internal pricing spreadsheet to public S3 bucket

Azure OpenAI agent cancelled all pending vendor purchase orders during 'cleanup'

Every AI Agent
Failure, Documented.