Home

Gatebleed

Overview

GateBleed is a newly discovered microarchitectural side-channel affecting Intel’s Advanced Matrix Extensions (AMX) and potentially other AI accelerators, including NVIDIA’s A100 GPU.

It exploits the staged power-gating mechanism designed to improve performance-per-watt in AI workloads.

When AMX or similar accelerators wake from deep power states, they incur a large, repeatable latency penalty — up to ~20,000 CPU cycles — which can be measured remotely and used to infer sensitive information.

Microarchitectural side-channel attacks exploit the specific implementation of a processor’s instruction-set architecture (e.g., x86, ARM). Unlike traditional malware, these attacks leverage benign code paths that unintentionally reveal information about a victim’s workload through measurable side effects. The landmark examples—Spectre and Meltdown (2018)—show how performance optimizations can be a double-edged sword: enabling high performance while simultaneously exposing systems to new classes of attacks.

The end of Dennard Scaling, the principle that shrinking transistors increases density without significantly raising power consumption, has further complicated matters. To reduce power use, modern processors aggressively power-gate idle components. However, the resulting wake-up latency creates observable patterns that attackers can exploit to infer component usage. Similarly, AI workloads are power-intensive. To cope, practitioners employ software-level model gating optimizations such as early exiting and mixture-of-experts. GateBleed demonstrates that these optimizations introduce uneven hardware utilization, enabling privacy attacks against AI models; GateBleed is the first demonstration of data privacy being compromised through power optimization. Unlike Spectre and Meltdown, which exploit fundamentally limited performance optimizations, power gating and model gating are likely to become even more widespread as the demand for ever-larger AI models grows in the post-Dennard Scaling era, increasing the potential for GateBleed attacks.

Our peer-reviewed paper, “GATEBLEED: Exploiting On-Core Accelerator Power Gating for High Performance & Stealthy Attacks on AI”, has been accepted at MICRO 2025.

📄 Read the preprint on arXiv


Key Findings

  • Power optimization backfire: Latency penalties from AMX’s cold-start wakeups create a strong timing signal.
  • First hardware-only membership attack: 81% accuracy (precision 0.89) on an early-exit Transformer without any access to logits or model outputs.
  • Broader AI privacy impact:
    • 100% accurate Mixture-of-Experts (MoE) routing inference
    • Leakage of secret control-flow decisions across OS, VM, and SGX boundaries
  • Cross-platform relevance: Preliminary tests confirm similar power-gating patterns on NVIDIA A100 GPUs.

How GateBleed Works

  1. Power Gating in AI Accelerators
    • To save energy, AMX units shut down between uses.
    • Waking from a deep state takes significantly longer than from a shallow state.
  2. Measurable Latency Gap
    • The first matmul after wake-up pays a “cold-start” cost: ~20,000 cycles in AMX.
  3. Turning Latency into Leakage
    • By carefully scheduling and timing operations, attackers can determine if a model took a specific execution path (e.g., selected an MoE expert, exited early).
  4. Remote Feasibility
    • The signal remains robust over a one-hop network and achieves a 70,000× higher transmission rate than NetSpectre using our amplification gadget.

Impact

  • Cloud AI deployments: Multi-tenant environments with shared accelerators are especially vulnerable.
  • Privacy threats: Enables label leakage, membership inference, and watermark recovery without model outputs.
  • Mitigation challenges: Standard defenses like cache partitioning, timer fuzzing, and speculation barriers are ineffective.

Technical Resources