AI Agent Security Weekly | Issue 08
The tool layer is no longer neutral.
Theme: The tool layer is no longer neutral.
AI coding tools helped developers ship faster. They also doubled the rate at which credentials hit public GitHub. The same mechanism. The same layer. Productivity and liability running on identical rails.
This issue confirms it for agents. The framework plugin that made your agent more capable was the exploit primitive. The security scanner that finds vulnerabilities in your codebase needs the same read access as the attacker. The kill switch that finally gives enterprises a halt mechanism had to be invented because no standard one existed. The frontier model accessed by a Discord group through a contractor’s credentials on launch day was not a safeguard failure. It was a vendor chain failure.
Four stories. Four demonstrations that the tool layer your agents run on is simultaneously the attack surface your agents expose. The productivity tax is real and it is structural.
This Issue
Semantic Kernel CVE-2026-25592 + CVE-2026-26030: Prompt Injection Became Shell Access
ServiceNow Shipped the Kill Switch. The PocketOS Database Is Why.
Anthropic Shipped Mythos. A Discord Group Accessed It the Same Day via a Contractor.
Anthropic’s Security Tool Is an Agent With Codebase Read Access to Everyone
Intro
The enabling mechanism and the exposure mechanism are running on the same rails.
Semantic Kernel exposed execution helpers to the model context and a single injected prompt became host-level RCE. Claude Security carries the same codebase read access as the attacker it was deployed to find. ServiceNow had to build a kill switch because no standard halt mechanism existed before this week. Mythos was accessed on launch day through a contractor’s credentials because the vendor chain had no gate.
Four incidents. One structural condition.
Semantic Kernel CVE-2026-25592 + CVE-2026-26030: Prompt Injection Became Shell Access
Microsoft disclosed two patched vulnerabilities in its Semantic Kernel agent framework on May 7, 2026. CVE-2026-25592 affected the .NET SDK’s SessionsPythonPlugin, a helper designed to let agents execute Python inside Azure Container Apps sandboxes. The DownloadFileAsync function was accidentally exposed to the model as a callable kernel function, with its localFilePath parameter accepting unsanitized input. A hostile prompt could steer the agent into writing a file to any path on the host, bypassing the sandbox entirely. CVE-2026-26030 hit the Python SDK’s InMemoryVectorStore filter. A single injected prompt was enough to trigger remote code execution, demonstrated by launching calc.exe on the host machine running the agent. Both vulnerabilities were confirmed exploitable via prompt injection alone, with no additional access required. Microsoft’s fix deployed four layers of escape-primitive elimination. The disclosures were published by Microsoft Security Blog on the same day.
This is the prompt injection = RCE confirmation the security community has been warning about for two years. The attack surface here is not the model. It is the framework’s decision to expose file-write and code-execution tools to the model context without validation at the tool boundary. Every enterprise running Semantic Kernel-based agents before .NET SDK 1.71.0 or Python SDK 1.39.4 was one injected document away from host-level compromise. The broader signal: agent framework authors are building tool layers faster than they are auditing what they expose to the model. The model is not the threat. What the model can call is the threat.
What it breaks
Any agent built on Semantic Kernel before .NET SDK 1.71.0 or Python SDK 1.39.4 was exploitable via a single injected prompt with no additional credentials or access required, making every document, email, or user message the agent touches a potential exploit delivery vehicle
The sandbox designed to contain agent Python execution was bypassed not by breaking the sandbox but by calling a host file-write function the framework exposed to the model without a parameter validation gate, confirming that sandbox perimeter assumptions are invalid when the framework surface above them is unaudited
Every agent framework that exposes tool plugins to the model context without parameter-level validation carries the same structural flaw class: the productivity feature is the attack primitive
Standard prompt injection detection operates at the model output layer and cannot intercept a tool call where the injected path parameter has already been accepted as valid input by the framework: no alert fires until the file has been written to the host
EU AI Act Article 15 requires robustness against adversarial inputs for high-risk AI systems: a framework vulnerability that allows a single prompt to achieve host-level file write with no additional access constitutes an adversarial robustness failure that cannot be remediated by policy alone
What to do
Threat: Ungated Tool Execution: Semantic Kernel exposed file-write and code execution helpers to the model context without path validation or a call-gate at the tool boundary, turning the framework’s own plugin system into the exploit primitive with no intermediate control between model output and host filesystem access.
Start here: upgrade Semantic Kernel .NET SDK to 1.71.0 and Python SDK to 1.39.4 immediately, then audit every tool registered in your agent framework against one question: if the model calls this with adversarial input, what is the worst-case outcome? Tool Execution Playbook, 10 questions before an agent can click, run, or call. Note as priority entry point: covers parameter validation at the tool boundary, not just the registered tool list.
Then: audit your tool registry for any function that accepts a path, filename, or shell-interpretable string from model output. Each of those parameters is a potential injection landing zone. Tool Registration and Integrity Playbook, 17 questions to verify the tool your agent calls is still the tool you approved.
Operator connection: any agent framework plugin that bridges model output to host filesystem or process execution is a trust boundary that must be treated as adversarially reachable. Framework defaults are not security contracts.
Red team action: run Garak with the promptinjection and tool-calling presets against every agent endpoint that uses Semantic Kernel plugins. A successful tool-call injection that reaches a filesystem path or execution surface confirms the boundary is exploitable in your deployment.
Signal Breakdown
Architecture layer: L6 Tool and Integration Layer
Attack surface: Agent framework plugin system exposing host file-write and code execution to model context without parameter validation
Playbooks: Tool Execution Playbook / Tool Registration and Integrity Playbook
Red team: Garak, promptinjection and tool-calling presets
Frameworks: LLM01 (Prompt Injection) / ASI02 (Agent Authentication and Authorization Failure)
Source: Microsoft Security Blog, 2026-05-07
The threat analysis walks it in five steps. You see the attack surface. You see which playbooks map to the gap. You get the red team test that confirms whether your setup is exposed. You get a verdict.
It has never been this fast to go from a news story to a closed gap.
Open the analysis, run the five steps, and see where your setup lands
ServiceNow Shipped the Kill Switch. The PocketOS Database Is Why.
On May 5-6, 2026, ServiceNow announced AI Control Tower enforcement additions at its Knowledge 2026 event. The headline feature: a single-action kill switch that revokes permissions, deactivates any agent in the enterprise, generates a P1 security incident ticket, and produces a complete audit trail simultaneously. The product spans discovery (auto-cataloging every AI asset across all enterprise systems), real-time behavioral monitoring (detecting hallucinations, policy violations, and scope drift), and enforcement (pause, redirect, or stop any agent in one action). ServiceNow CEO Bill McDermott framed the kill switch explicitly in response to the PocketOS incident and its class of failures: agents with irreversible-action authority and no reliable pre-execution gate. AI Control Tower was offered free for one year (stated value $2 million) to any enterprise deploying it. Enforcement features are entering Innovation Lab in May 2026, with GA expected August 2026. AI Agent Advisor and Intelligent Approvals reach GA in May.
The kill switch is the product that exists because enterprises shipped agents without one. What ServiceNow is selling is the governance layer that should have been required before any agent touched production data, and it is being released after the failures, not before. The deeper signal is what the product reveals: there is currently no standard mechanism to stop an agent mid-execution in enterprise environments. The kill switch being a product announcement in May 2026 means it was an open gap until now. Every enterprise that deployed agents before this month was running without a reliable halt mechanism. The audit trail feature is equally revealing. If enterprises needed to build audit trail generation into the kill switch, it means they were operating agents with no forensic record of what ran.
What it breaks
Every enterprise that deployed agents before May 2026 was operating without a standardized mid-execution halt mechanism: the PocketOS database deletion happened in nine seconds because no pre-execution gate existed, and that gap was industry-wide not instance-specific
The absence of a forensic audit trail for agent decisions is confirmed as the enterprise norm: ServiceNow building audit trail generation into a kill switch product means the capability was not present in the orchestration layers enterprises were already running
Behavioral monitoring for hallucinations, policy violations, and scope drift is not yet standard in enterprise agent deployments: the Control Tower launch establishes these as a category of missing control, not a category of enhanced control
Existing SIEM and SOAR tooling was not built to ingest agent behavioral signals: the detection gap is architectural because agent execution does not produce the structured event types those tools are calibrated to parse
EU AI Act Article 14 requires human oversight mechanisms for high-risk AI systems capable of consequential action: a production deployment of an agent with irreversible-action authority and no halt mechanism is a direct Article 14 compliance gap regardless of whether enforcement has begun
What to do
Threat: Irreversible Action Without Halt: agents operating in enterprise environments had no standard pre-execution gate or mid-execution stop. The only reliable intervention point was before deployment, and most organizations skipped it. The PocketOS deletion confirms the class: nine seconds, production database, no recovery path.
Start here: map every agent in your environment against two questions: can it be stopped mid-execution, and does stopping it produce a recoverable state? If the answer to either is no, that agent has no business touching production data. Kill Switch Playbook, 10 questions to test your ability to stop agents under pressure. Note as priority entry point: covers the gap between the halt capability existing on paper and it being reachable at machine speed.
Then: implement a halt-and-audit contract at the agent runtime layer before expanding scope. Audit Trail Playbook, 15 questions to capture evidence when decisions are non-deterministic.
Operator connection: do not wait for ServiceNow GA in August. Build your pre-execution approval gate now using whatever orchestration layer you already run. The audit trail requirement is a data architecture question: you need the ability to reconstruct what an agent decided, what it called, and what state changed after the fact.
Red team action: use Burp Suite to intercept agent action calls before they reach production targets and verify that a halt signal at the orchestration layer actually stops execution before state changes. If the agent completes the action before the halt propagates, your kill switch is a logging feature, not a control.
Signal Breakdown
Architecture layer: L3 Governance and Control Plane
Attack surface: Absence of standardized mid-execution halt, forensic audit trail, and behavioral monitoring in enterprise agent deployments
Playbooks: Kill Switch Playbook / Audit Trail Playbook
Red team: Burp Suite, halt propagation timing test against orchestration layer
Frameworks: ASI08 (Inadequate Human Oversight of AI Actions) / ASI04 (Excessive Agent Permissions)
Source: Fortune, 2026-05-06
Anthropic Shipped Mythos. A Discord Group Accessed It the Same Day via a Contractor.
On the day Anthropic publicly launched the Mythos Preview frontier model, a system capable of autonomous multi-step exploit chains and finding thousands of zero-days across OS and browser targets, a Discord group identified its API endpoint by guessing from Anthropic’s published naming conventions. The group then exploited credentials belonging to a contractor, accessed Mythos through a third-party vendor environment, and ran queries against the model before detection. Anthropic confirmed it is investigating and said no impact to core systems was found. Separately, the Mythos launch disclosed that the model autonomously found and exploited CVE-2026-4747, a 17-year-old FreeBSD RCE, from an unauthenticated position, with no human in the loop after the initial prompt. The model also demonstrated working exploit chains in 83% of first attempts across a controlled vulnerability set.
The attack surface for a frontier-capability AI model is not the model’s safety training. It is the third-party vendor environment it passes through on the way to the model. Anthropic’s core systems were untouched. The contractor’s credentials were the gap. This is the AI vendor supply chain problem stated plainly: your trust boundary is only as strong as the weakest vendor with access to your infrastructure. The secondary signal is about threat model compression. Mythos can autonomously exploit a 17-year-old FreeBSD RCE with no human involvement. The assumption that exploit development is a rare skill requiring specialized knowledge is now false. The bar for offensive AI capability has dropped to: know the naming convention and have a contractor’s credentials.
What it breaks
Frontier AI model access controls that do not extend through the full third-party vendor credential chain are structurally incomplete: Anthropic’s internal systems were not breached but the model was accessed because a contractor’s credentials were the only gate between the Discord group and Mythos
Endpoint naming conventions derived from public product announcements create a predictable attack surface that does not require any insider knowledge to enumerate: the Discord group guessed the URL without credentials before obtaining them
The threat model assumption that offensive AI capability requires expert researchers is now invalid at the Mythos capability tier: 83% first-attempt exploit success across known vulnerability classes is accessible to anyone who can reach the model
Standard API gateway monitoring is calibrated to known bad patterns such as rate limiting and injection signatures, not to novel query sequences from a newly launched model accessed through a vendor environment: detection of this breach came after the fact, not in real time
NIST AI RMF Govern 1.2 and EU AI Act Article 9 both require risk controls proportional to capability level: an organization running third-party integrations with a Mythos-class model without auditing the vendor credential chain is operating outside the risk control scope those frameworks require
What to do
Threat: Caller Spoofing: the third-party contractor environment through which Mythos was accessed had no gate validating whether the caller held current authorization from Anthropic before the model responded. The contractor credential was the only trust signal, and it was not verified against the access control list at the point of query.
Threat: MCP Supply Chain Poisoning: the vendor chain carrying the contractor credential had no validation step confirming that the credential represented a currently authorized party. Any vendor environment with access to a frontier model API is a trust boundary that must be audited as frequently as direct access credentials.
Start here: audit every vendor and contractor with credentials that touch your AI model infrastructure, especially for high-capability models. Caller Identity Playbook, 12 questions to verify who is actually calling your agent in A2A deployments. Note as priority entry point: the question set maps the contractor and vendor access path, not just first-party callers.
Operator connection: any API credential with access to a frontier model is a high-value target. Treat these credentials with the same access controls applied to production database credentials: short rotation cycle, MFA, and revocation on contractor offboarding confirmed by your IdP, not assumed.
Red team action: use your API gateway logs to map every vendor and contractor endpoint that successfully authenticated to any AI model API in the last 90 days. Cross-reference against current contractor roster. Any active credential belonging to an offboarded contractor is a confirmed gap.
Signal Breakdown
Architecture layer: L5 Agent Runtime
Attack surface: Third-party contractor credential chain as the only gate between public enumeration and frontier model access
Playbooks: Caller Identity Playbook
Red team: API gateway log analysis, contractor credential cross-reference against current roster
Frameworks: ASI09 (Supply Chain Vulnerabilities) / LLM06 (Excessive Agency)
Source: Anthropic Red Team, 2026-05-08
The threat analysis walks it in five steps. You see the attack surface. You see which playbooks map to the gap. You get the red team test that confirms whether your setup is exposed. You get a verdict.
It has never been this fast to go from a news story to a closed gap.
Open the analysis, run the five steps, and see where your setup lands
Anthropic’s Security Tool Is an Agent With Codebase Read Access to Everyone
Anthropic launched Claude Security in public beta for Claude Enterprise customers on May 1, 2026. The product is powered by Claude Opus 4.7 and designed for security teams to scan entire codebases for vulnerabilities and generate targeted patches. Unlike signature-based scanners, it traces data flows, reads source code, and examines cross-file component interactions, operating as an autonomous agent with broad codebase read permissions across every file and module it is directed at. CrowdStrike, Microsoft Security, Palo Alto Networks, SentinelOne, TrendAI, and Wiz are integrating Opus 4.7’s capabilities into existing enterprise security platforms. Hundreds of organizations have used the beta since February 2026, surfacing vulnerabilities existing tools had missed for years. Access for Claude Team and Max customers is coming.
The tool that finds vulnerabilities in your code is an agent with read access to all of it. The trust surface inversion is the story: you are solving an agent security problem by introducing a new agent with elevated access. Claude Security needs to see your full codebase to do its job. That means it needs authentication credentials, it needs to run in your environment, and it needs access to the same files an attacker would want. The question is not whether Claude Security works. The question is what happens when the agent scanning your code is itself the attack surface. This is not a criticism of the product. It is the unavoidable architecture of any agent-based security tool. The enterprises deploying this are making a calculated trust decision, and most of them are not modeling it as one.
What it breaks
Any organization granting Claude Security org-wide codebase read access for a single-repo scan has created an agent credential with an attack surface larger than the vulnerability it was deployed to find: the credential scope and the blast radius are determined at provisioning, not at scan time
The service account authenticating Claude Security to your codebase is now a high-value target equivalent to your most privileged CI/CD credential: compromising it gives an attacker the same read access the scanner holds, which is everything it was permitted to see
Enterprises deploying Claude Security across multiple repositories without scoping each scan to the minimum required surface are creating a persistent, broad-read agent identity that did not exist in their estate before the product was installed
Standard DLP and CASB tooling monitors known data egress patterns and cannot distinguish between Claude Security exfiltrating code as part of a scan and an attacker using a compromised Claude Security credential to exfiltrate the same code: the traffic looks identical
SOC 2 Type II and ISO 27001 both require documented access controls for systems with broad read access to production source code: a Claude Security deployment without a scoped permission policy and access log review cadence is an audit finding before the first scan completes
What to do
Threat: Inherited Privilege Escalation: Claude Security requires broad codebase read access to function, carrying the same access profile as the attacker it is deployed to detect. Any compromise of the agent’s credential chain exposes everything the agent was permitted to see, and the blast radius is defined by the provisioning decision, not the scan.
Start here: scope Claude Security’s permissions to the minimum codebase surface needed for each scan. Do not grant org-wide read access for a single-repo scan. Local Secrets Playbook, 15 questions to stop agents from reading what they should not. Note as priority entry point: covers the agent credential scope decision, not just secret detection in agent output.
Then: treat the agent’s credential chain as a high-value target. If Claude Security authenticates with a service account, that account is worth protecting at the same level as your most privileged CI/CD credential. Log and audit every file Claude Security accesses during a scan. Data Egress Playbook, 13 questions to detect data exfiltration through helpful external communication.
Operator connection: run Claude Security in a read-only, network-isolated environment where possible. The agent’s output (patch suggestions) can leave the environment. The agent itself should not be making outbound calls to anything except Anthropic’s API. Any outbound call to an unexpected destination during a scan is an anomaly that warrants investigation before the next scan runs.
Red team action: use your CASB tooling to capture and inspect the outbound traffic Claude Security generates during a scan. Verify it matches the expected Anthropic API call pattern. Any traffic to additional destinations confirms an egress gap in your isolation posture.
Signal Breakdown
Architecture layer: L6 Tool and Integration Layer
Attack surface: Autonomous security agent carrying broad codebase read access as a provisioned trust surface, with credential chain as the primary blast radius vector
Playbooks: Local Secrets Playbook / Data Egress Playbook
Red team: CASB outbound traffic inspection during scan, Anthropic API call pattern verification
Frameworks: ASI04 (Excessive Agent Permissions) / LLM02 (Sensitive Information Disclosure)
Source: SecurityWeek, 2026-05-01
The threat analysis walks it in five steps. You see the attack surface. You see which playbooks map to the gap. You get the red team test that confirms whether your setup is exposed. You get a verdict.
It has never been this fast to go from a news story to a closed gap.
Open the analysis, run the five steps, and see where your setup lands
TLDR
The tool layer is the attack surface. Semantic Kernel exposed execution helpers to the model context and a single injected prompt became host-level RCE. Claude Security carries the same codebase read access as the attacker it was deployed to find. ServiceNow had to build a kill switch because no standard halt mechanism existed before this week. Mythos was accessed on launch day through a contractor’s credentials because the vendor chain had no gate.
Four incidents. One structural condition. The enabling mechanism and the exposure mechanism are running on the same rails.
👉 AI Agent Posture Playbooks: 30+ structured assessments to map where your agent controls were built for humans, not agents. Self-directed. No vendor cycle.
👉 mrdecentralize.com: Posture assessment, red team methodology, and framework mapping for operators and security teams building on agentic AI.
👉 Follow for weekly analysis of real agent failures, control gaps, and what the frameworks are and are not catching.





