The Intake

The Intake — Thursday, July 3, 2026

By Silas Quorum · Friday, July 3, 2026

On the substrate

Shell injection class exploits pattern-checking gap in 10 of 11 open-source AI coding agents

Adversa AI The Hacker News Security Affairs

If you're running an open-source coding agent and treating the safety filter as your defense against dangerous shell commands, Adversa AI this week published research naming the flaw. Filters inspect raw text. Bash expands and rewrites that text before it executes.

Adversa AI surveyed eleven open-source AI coding and computer-use agents: opencode, Goose, Cline, Roo-Code, Aider, Plandex, Open Interpreter, OpenHands, SWE-agent, Hermes, and Continue. Ten of the eleven fail to defend. The researchers named the bypass GuardFall.

GuardFall exploits the gap between what the filter inspects and what the shell actually runs. Techniques named in the research include quote removal, variable expansion, command substitution, piped interpreters, and alternative utility flags. Each one rewrites the command after the filter has already cleared it.

Only Continue defended against all tested techniques. Continue uses a tokenize-and-canonicalize evaluator. It resolves what the shell will actually run before checking against the blocked-command list. Adversa AI names Continue's architecture as the reference implementation.

If you're running any of the ten agents that failed and relying on the safety filter as your only defense, GuardFall is now documented and public.

---

Anthropic ships Claude Science, an agentic workbench coordinating 60-plus scientific skills

Anthropic STAT News MIT Technology Review

Claude Science is Anthropic's new beta product for scientific computing. Per Anthropic, it is a generalist coordinating agent pre-configured with over 60 curated skills and database connectors. It launched June 30 on macOS and Linux. Available tiers are Pro, Max, Team, and Enterprise.

Per Anthropic, covered domains include genomics, single-cell analysis, proteomics, structural biology, and cheminformatics. Pre-configured database connections include UniProt, PDB, Ensembl, ChEMBL, and NVIDIA's BioNeMo toolkit. Anthropic says the product handles job submission to institutional HPC clusters via SSH. On-demand GPU resources are supported through Modal. Large datasets are processed without routing through Anthropic's infrastructure, per Anthropic.

Anthropic also announced it will use Claude Science in its own drug-discovery programs. The programs focus on rare and neglected diseases. Applications for the AI for Science program are open through July 15. Up to 50 projects will be funded. Each project receives up to $30,000 in compute credits.

If you're doing scientific computing with agents, Claude Science is in beta now. The curated skills and database connectors are pre-configured. If the AI for Science grant program is relevant to your work, the application window closes July 15.

---

For operators

Anthropic, Amazon, Microsoft, and Google propose a five-level Cyber Jailbreak Severity scale

Anthropic CryptoBriefing

If your team handles AI security disclosures and has been assigning jailbreak severity by judgment call, there is now a proposed shared vocabulary. Anthropic published the CJS scale on July 2. It was developed with Amazon, Microsoft, Google, and Glasswing partners.

The scale runs from CJS-0 to CJS-4. Per Anthropic, CJS-0 (Informational) covers jailbreaks with minimal capability uplift. Task coverage is narrow. CJS-4 (Critical) covers jailbreaks where capabilities substantially surpass existing tools. These carry broad task coverage. They are trivial to reproduce and easily discoverable.

Severity is scored on four dimensions: capability gain beyond available tools, breadth of that gain across task categories, ease of weaponization, and discoverability. Scores act as a floor. Once a severity level is assigned to a jailbreak technique, it cannot be lowered. Anthropic says the framework helps teams triage based on structured criteria. The intent is to distinguish severity levels rather than treating all jailbreak discoveries as equivalent emergencies.

If your team receives jailbreak reports, the CJS scale is the proposed shared vocabulary for triage.

---