How to mitigate the impact of rogue AI risks

In previous parts of this series on Rogue AI, we briefly examined what organizations can do to better manage risk across their entire AI attack surface. And we touched on ways to mitigate threats by creating trusted AI identities. We also mentioned the great work MIT is doing to map AI risks and OWASP is doing to propose effective mitigations for LLM vulnerabilities.

Now it’s time to fill in the missing pieces of the puzzle by describing how zero trust and layered defenses can protect against rogue AI threats.

Causal factors for rogue AI

LLM Vulnerability / Type of Rogue	Randomly	Subverted	Malicious
Excessive functionality	Misconfiguration of capabilities or guardrails	Abilities were changed or added directly or guardrails were bypassed	Functionality required for malicious targets
Excessive permissions	Authorization misconfiguration	Privileges escalated	Must acquire all privileges; none to start
Excessive autonomy	Misconfiguration of tasks that require human review	Human removed from the loop	Not under the defender’s control

The above causal factors can be used to identify and mitigate risks associated with fraudulent AI services. The first step is to properly configure the relevant AI services, which creates a foundation for security against all types of rogue AI by defining permitted behaviors. Protecting and remediating the points where known AI services come into contact with data or use tools primarily prevents subverted rogues, but can also combat other causes of accidents. Restricting AI systems to permissible data and tool usage as well as checking the content of inputs and outputs of AI systems are at the core of safe use.

Malicious villains can attack your organization from outside or act as AI malware in your environment. Many patterns used to detect malicious activity by cyber attackers can also be used to detect the activity of malicious villains. But as new abilities increase villain evasion, learning patterns for detection will not cover the unknown unknowns. In this case, machine behaviors need to be identified across devices, workloads, and network activity. In some cases, this is the only way to catch malicious villains.

Behavioral analysis can also detect other cases of excessive functionality, privileges, or autonomy. Anomalous activity on devices, workloads, and the network can be an early indicator of fraudulent AI activity, regardless of how it was caused.

Comprehensive protection for the entire OSI communications stack

However, for a more comprehensive approach, we need to consider defense in depth at each level of the OSI model, as follows:

Physically: Monitor processor usage (CPU, GPU, TPU, NPU, DPU) in cloud, endpoint and edge devices. This applies to AI-specific workload patterns, querying AI models (inference), and loading model parameters into memory close to AI-specific processing.

Data layer: Use MLOps/LLMOps versioning and verification to ensure models are not corrupted or replaced, and record hashes to identify models. Use software and AI model BOMs (SBoMs/MBoMs) to ensure that the AI service software and model are trustworthy.

Network: Limit AI services that can be reached externally and the tools and APIs that AI services can reach. Detect anomalous communication factors such as human-to-machine transitions and novel machine activities.

Transport: Consider rate limiting external AI services and scanning for anomalous packets.

Meeting: Include verification processes such as human-in-the-loop checks, especially when instantiating AI services. Use timeouts to limit session hijacking. Analyze user context authentications and detect anomalous sessions.

Application and presentation levels: Identify misconfigurations of functionality, permissions, and autonomy (according to the table above). Leverage protections for AI inputs and outputs, e.g. Such as cleaning personal (PII) and other sensitive information, offensive content, and instant injections or system jailbreaks. Restrict LLM agent tools according to an allowlist that restricts APIs and plugins and only allows clearly defined usage of well-known websites.

Rogue AI and the Zero Trust Maturity Model

The Zero Trust security architecture provides many tools to mitigate the risk of rogue AI. The Zero Trust Maturity Model was developed by the U.S. Cybersecurity and Infrastructure Security Agency (CISA) to support federal agencies’ efforts to comply with Executive Order (EO) 14028: Improving the Nation’s Cybersecurity. It reflects the seven principles of Zero Trust as outlined in NIST SP 800-207:

All data sources and computing power are considered resources.
All communication is secured regardless of network location.
Access to individual company resources is granted on a session basis.
Access to resources is determined by dynamic policies.
The Company monitors and measures the integrity and security status of all owned and related assets.
All resource authentication and authorization is dynamic and strictly enforced before access is granted.
The company collects as much information as possible about the current state of assets, network infrastructure and communications and uses this to improve its security posture.

To effectively mitigate rogue AI risk, organizations must reach the “advanced” level described in the CISA document:

“Where applicable, automated controls for lifecycle and allocation of configurations and policies with coordination across pillars; centralized visibility and identity control; Policy enforcement integrated across all pillars; responding to predefined remedial actions; Least privilege changes based on risk and posture assessments; and building company-wide awareness (including externally hosted resources).”

Recent Posts