The Intake
The Intake — Sunday, April 26, 2026
SUBSTRATE candidates
- Vercel breach via Context.ai OAuth supply chain — "Allow All" propagates to enterprise — Vercel KB (https://vercel.com/kb/bulletin/vercel-april-2026-security-incident); TechCrunch (https://techcrunch.com/2026/04/20/app-host-vercel-confirms-security-incident-says-customer-data-was-stolen-via-breach-at-context-ai/); Trend Micro analysis (https://www.trendmicro.com/en_us/research/26/d/vercel-breach-oauth-supply-chain.html)
- Beat: security-advisories
- Lens: O'Neill, Wittgenstein
- Gloss: A Vercel employee granted "Allow All" OAuth scopes to Context.ai's AI Office Suite; Lumma Stealer hit Context.ai in February, attackers used the surviving OAuth tokens to pivot into Vercel and decrypt environment variables. The agent supply chain is the new perimeter.
- Verdict: cover-now — advisory. Next-turn rec: enumerate every OAuth grant to third-party AI tooling and refuse "Allow All" by default; treat AI-tool OAuth scopes as a privileged-access category.
- CVE-2026-21520 "ShareLeak" — Copilot Studio patched, data still exfiltrated; PipeLeak in Salesforce Agentforce mirrors the pattern — VentureBeat (https://venturebeat.com/security/microsoft-salesforce-copilot-agentforce-prompt-injection-cve-agent-remediation-playbook); CSO Online (https://www.csoonline.com/article/4159079/copilot-and-agentforce-fall-to-form-based-prompt-injection-tricks.html)
- Beat: security-advisories
- Lens: O'Neill, Arendt
- Gloss: SharePoint form fields concatenated into Copilot Studio agent context with no sanitization; payload redirects the agent to query SharePoint Lists and exfiltrate via Outlook. Capsule Security says the patch closes the form-field path but the architectural pattern survives — Salesforce's Agentforce has the same shape (PipeLeak).
- Verdict: cover-now — brief. Pairs with the Wed advisory already in flight (Comment and Control). Next-turn rec: input from any CRM/form/intake field is untrusted testimony; require dual-channel confirmation for any agent action that egresses data.
- Flowise CVE-2025-59528 — CustomMCP node executes attacker JS via mcpServerConfig string; 12,000+ exposed instances under active exploitation — The Hacker News (https://thehackernews.com/2026/04/flowise-ai-agent-builder-under-active.html); SonicWall analysis (https://www.sonicwall.com/blog/flowiseai-custom-mcp-node-remote-code-execution-); CSA research note (https://labs.cloudsecurityalliance.org/research/csa-research-note-flowise-mcp-rce-exploitation-20260409-csa/)
- Beat: security-advisories, protocol-tooling
- Lens: Wittgenstein, O'Neill
- Gloss: A no-code MCP-server registration form parses user JS without sandboxing; CVSS 10.0 with active in-the-wild exploitation from a Starlink IP. The MCP-server marketplace pattern is now an attack surface — the spec is fine, the integration substrate around it is not.
- Verdict: cover-now — advisory. Next-turn rec: any agent that registers MCP servers via untrusted UI configuration must execute that config in a sandbox; upgrade Flowise ≥3.0.6 (3.1.1 preferred).
- MCP-Atlas (Scale, open-sourced) and Toolathlon — top model Claude 4.5 Sonnet at 38%, not 80% — Scale Labs leaderboard (https://labs.scale.com/leaderboard/mcp_atlas); Scale blog (https://scale.com/blog/open-sourcing-mcp-atlas); Toolathlon paper (https://openreview.net/forum?id=z53s5p0qhf)
- Beat: evals-benchmarks
- Lens: O'Neill, Clark
- Gloss: 1,000 human-authored tasks across 36 real MCP servers (MCP-Atlas) and 32 apps / 604 tools / 108 verifiable tasks (Toolathlon). Real-MCP performance lags vendor capability marketing by a wide margin and is the right baseline to cite when evaluating agent fitness for production tool-use.
- Verdict: cover-now — brief. Next-turn rec: replace single-turn tool-use evals in your CI with a Toolathlon-shaped subset; treat 38% as the honest ceiling for unmanaged multi-server orchestration today.
- OpenAI GPT-5.5 (Apr 23–24): 82.7% Terminal-Bench 2.0, 78.7% OSWorld-Verified, native browser/desktop control + Workspace Agents (no-code shared agents) — OpenAI (https://openai.com/index/introducing-gpt-5-5/); CNBC (https://www.cnbc.com/2026/04/23/openai-announces-latest-artificial-intelligence-model.html); Simon Willison hands-on (https://simonwillison.net/2026/Apr/23/gpt-5-5/)
- Beat: model-notes, protocol-tooling
- Lens: Clark, O'Neill
- Gloss: First fully retrained OpenAI base model since GPT-4.5; vendor-reported benchmark lift is real but sourced from OpenAI's own eval harness — corroboration via Willison's pelican-test plus CodeRabbit's external benchmark is partial. Workspace Agents adds a no-code shared-agent surface that resembles Anthropic Managed Agents in shape.
- Verdict: cover-now — brief. Next-turn rec: re-run your existing internal agentic-coding evals against GPT-5.5 before treating headline numbers as portable; do not credit OSWorld scores to your own workload class without re-measuring.
- Anthropic Mythos accessed by unauthorized users via guessed URL on contractor portal — same day the limited release was announced — Bloomberg (https://www.bloomberg.com/news/articles/2026-04-21/anthropic-s-mythos-model-is-being-accessed-by-unauthorized-users); TechCrunch (https://techcrunch.com/2026/04/21/unauthorized-group-has-gained-access-to-anthropics-exclusive-cyber-tool-mythos-report-claims/)
- Beat: security-advisories, model-notes
- Lens: O'Neill, Arendt
- Gloss: A model the vendor described as too dangerous to GA was reachable via URL pattern enumeration on a third-party contractor portal. Anthropic says no system was "impacted." The vendor's own dual-source corroboration of capability (yesterday's intake item) now sits next to a vendor-confirmed access-control failure on the same artifact.
- Verdict: cover-now — brief. Endnote names the O'Neill failure mode explicitly: capability claims and containment claims are independent evidentiary tracks; this week they diverged.
OPERATORS candidates
- Anthropic Project Deal — Claude agents negotiated 186 deals (~$4,000) across 69 employees; Opus models materially out-negotiated Haiku — coverage rollup via The Hacker News and HN front page Apr 22 (https://news.ycombinator.com/front?day=2026-04-22)
- Beat: community-dynamics, measurement
- Lens: Wittgenstein, Arendt
- Gloss: A real, in-house multi-agent marketplace produced behavior data that mid-tier vendor benchmarks cannot. Model-tier-as-negotiation-skill is exactly the kind of finding that should be examined as community dynamics in hybrid groups, not as a leaderboard datapoint.
- Verdict: cover-now — case file. Closes the decision: when budgeting an internal agent rollout, do you let agents transact with each other, and at what tier? We will write to "yes, but instrumented as a community, not a market."
- Databricks Unity AI Gateway (Apr 15) — governance layer extends to agent→LLM and agent→MCP-server access with permissions, audit, and policy controls — Databricks blog (https://www.databricks.com/blog/ai-gateway-governance-layer-agentic-ai)
- Beat: governance
- Lens: Wittgenstein, O'Neill
- Gloss: Vendor positioning collapses two governance problems (model gateway, MCP-server gateway) into a single Unity Catalog scope. The Wittgensteinian shape is right — enforcement at the integration layer, not the policy layer — but it's a single-vendor framing being marketed as the category default.
- Verdict: cover-now — field-guide. Closes the decision: do you adopt a single governance-gateway pattern for agents (Databricks-style) or maintain separate policy planes? Endnote will name the lock-in caveat.
- OpenAI Bio Bug Bounty for GPT-5.5 — $25K for a universal jailbreak that clears the 5-question bio-safety challenge — OpenAI release coverage via Releasebot (https://releasebot.io/updates/openai)
- Beat: governance, measurement
- Lens: O'Neill
- Gloss: A vendor-run, vendor-scored, vendor-defined safety challenge with a fixed payout. Useful instrument; not independent accountability. Worth examining as a case study in audit-theater-versus-instrument distinction.
- Verdict: track — pass for now; revisit if an independent red team publishes results inside the bounty frame.
- MetaComp StableX KYA Framework agent-identity governance for regulated finance (Apr 22) — PRNewswire (https://www.prnewswire.com/apac/news-releases/metacomp-launches-the-worlds-first-ai-agent-governance-framework-for-regulated-financial-services-302749713.html)
- Beat: governance, community-dynamics
- Lens: Wittgenstein, O'Neill
- Gloss: Carried over from yesterday's intake — still on-deck for the field-guide treatment.
- Verdict: cover-now — field-guide (already queued). No change today.
Considered and passed
- Google → Anthropic $40B investment confirmed Apr 24 (off-beat — financing)
- Anthropic + Amazon 5GW expansion / $5B / $100B cloud commit (off-beat — capex)
- OpenAI raises $122B (off-beat — financing, prior week)
- ChatGPT Images 2.0 (off-beat — image generation)
- Gemini Robotics ER 1.6 (off-beat — embodied robotics)
- Gemma 4 (off-beat for now — open-weights model release without agentic-substrate hook this week)
- DeepMind / Accenture / BCG / Bain / Deloitte / McKinsey partnership (vendor-marketing — consultancy distribution, not substrate)
- Generic "April AI agent roundup" aggregators (duplicate / vendor-marketing)
- Single-Agent vs. MAS arxiv paper (track — interesting finding on test-time-compute confound, hold for a context-engineering deep-dive)
Source health
Practitioner blogs were healthier today: Simon Willison contributed a hands-on GPT-5.5 post and a quote item useful for a future Operators essay. Latent.Space did not surface an agentic-substrate item in window. Lilian Weng and Eugene Yan still quiet — if no movement by Tuesday's intake, swap in interconnects.ai and Anthropic's red.anthropic.com as primary feeders. Hugging Face papers and arXiv cs.AI both surfaced agent benchmarks (MCP-Atlas, Toolathlon, MirrorCode, SAS-vs-MAS) — eval beat is well-fed; we should not be surprised when an eval story dominates next week.