Gate News message: Researchers at Google DeepMind warn that an open internet environment could be leveraged to hijack autonomous AI agents and manipulate their behavior. The report, titled “AI Agent Traps,” states that when companies deploy AI agents to carry out real tasks, attackers may also launch targeted attacks over the network. The study identifies six major risks, including content injection traps, semantic manipulation traps, cognitive state traps, behavior control traps, system traps, and human-agent interaction traps.
The content injection trap is the most direct: attackers can place instructions in HTML comments, metadata, or hidden page elements, which the agent can read and then execute. Semantic manipulation traps work by loading authoritative phrasing or by disguising themselves as webpages in a research environment, quietly affecting the agent’s understanding of the task—sometimes even bypassing safety mechanisms. Cognitive state traps work by implanting false data into the agent’s information sources, causing it to mistakenly believe for the long term that this information has been verified. Behavior control traps target the agent’s actual operations, potentially luring it to access sensitive data and transmit it to an external target.
System traps involve coordinated manipulation across multiple AI systems, which could trigger cascading effects—similar to how algorithmic trading can cause sudden market crashes. Human-agent interaction traps exploit human review steps by creating seemingly credible review content, allowing harmful behavior to slip past oversight.
To address these risks, DeepMind recommends combining adversarial training, input filtering, behavior monitoring, and network content reputation systems, while also establishing a clearer legal responsibility framework. However, the study notes that the industry still lacks unified defense standards, and that existing measures are often fragmented and focused differently. The study calls on developers and businesses to pay attention to operational environment security for AI agents to prevent potential risks of network manipulation and abuse.