{"id":2,"date":"2025-08-12T16:59:18","date_gmt":"2025-08-12T16:59:18","guid":{"rendered":"https:\/\/research.ece.ncsu.edu\/gatebleed\/?page_id=2"},"modified":"2025-10-01T18:29:02","modified_gmt":"2025-10-01T18:29:02","slug":"sample-page","status":"publish","type":"page","link":"https:\/\/research.ece.ncsu.edu\/gatebleed\/","title":{"rendered":"Home"},"content":{"rendered":"\n<div class=\"wp-block-cover alignfull has-parallax wp-duotone-rgb15300-rgb20400-1\" style=\"margin-top:0;padding-top:48px;padding-right:48px;padding-bottom:48px;padding-left:48px;min-height:66vh;aspect-ratio:unset;\"><div class=\"wp-block-cover__image-background wp-image-12 size-large has-parallax\" style=\"background-position:50% 50%;background-image:url(https:\/\/research.ece.ncsu.edu\/gatebleed\/wp-content\/uploads\/sites\/44\/2025\/08\/AdobeStock_721699984-1024x512.jpeg)\"><\/div><span aria-hidden=\"true\" class=\"wp-block-cover__background has-background-dim-40 has-background-dim\" style=\"background-color:#4c82ad\"><\/span><div class=\"wp-block-cover__inner-container is-layout-flow wp-block-cover-is-layout-flow\">\n<div class=\"wp-block-group is-content-justification-left\"><div class=\"wp-block-group__inner-container is-layout-constrained wp-container-core-group-is-layout-7206f975 wp-block-group-is-layout-constrained\">\n<h1 class=\"wp-block-heading has-text-align-left has-white-color has-text-color\" style=\"font-size:86px;font-style:normal;font-weight:700;letter-spacing:0px;line-height:1;text-transform:uppercase\">Gatebleed<\/h1>\n\n\n\n<h2 class=\"wp-block-heading has-white-color has-text-color has-link-color wp-elements-168bea5a651e859e7bb99da7fe368561\">Exploiting Power Gating for Stealthy AI Timing Attacks<\/h2>\n\n\n\n<p class=\"has-white-color has-text-color has-link-color wp-elements-704365e90635f56ad1adf782f9512dda\">A new hardware side-channel that turns a power optimization into a high-signal timing leak \u2014 the first hardware-only membership inference attack on AI accelerators.<\/p>\n<\/div><\/div>\n\n\n\n<div style=\"height:72px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n<\/div><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Overview<\/h2>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"alignright size-medium\"><img loading=\"lazy\" decoding=\"async\" width=\"300\" height=\"300\" src=\"https:\/\/research.ece.ncsu.edu\/gatebleed\/wp-content\/uploads\/sites\/44\/2025\/08\/image5-300x300.png\" alt=\"\" class=\"wp-image-31\" srcset=\"https:\/\/research.ece.ncsu.edu\/gatebleed\/wp-content\/uploads\/sites\/44\/2025\/08\/image5-300x300.png 300w, https:\/\/research.ece.ncsu.edu\/gatebleed\/wp-content\/uploads\/sites\/44\/2025\/08\/image5-150x150.png 150w, https:\/\/research.ece.ncsu.edu\/gatebleed\/wp-content\/uploads\/sites\/44\/2025\/08\/image5-768x768.png 768w, https:\/\/research.ece.ncsu.edu\/gatebleed\/wp-content\/uploads\/sites\/44\/2025\/08\/image5-350x350.png 350w, https:\/\/research.ece.ncsu.edu\/gatebleed\/wp-content\/uploads\/sites\/44\/2025\/08\/image5.png 1024w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/figure>\n<\/div>\n\n\n<p>GateBleed is a newly discovered microarchitectural side-channel affecting Intel\u2019s Advanced Matrix Extensions (AMX) and potentially other AI accelerators.<\/p>\n\n\n\n<p>It exploits the staged power-gating mechanism designed to improve performance-per-watt in AI workloads.<\/p>\n\n\n\n<p>When AMX or similar accelerators wake from deep power states, they incur a large, repeatable latency penalty \u2014 up to ~20,000 CPU cycles \u2014 which can be measured remotely and used to infer sensitive information.<\/p>\n\n\n\n<p>Microarchitectural side-channel attacks exploit the specific implementation of a processor\u2019s instruction-set architecture (e.g., x86, ARM). Unlike traditional malware, these attacks leverage benign code paths that unintentionally reveal information about a victim\u2019s workload through measurable side effects. The landmark examples\u2014Spectre and Meltdown (2018)\u2014show how performance optimizations can be a double-edged sword: enabling high performance while simultaneously exposing systems to new classes of attacks.<\/p>\n\n\n\n<p>The end of Dennard Scaling, the principle that shrinking transistors increases density without significantly raising power consumption, has further complicated matters. To reduce power use, modern processors aggressively power-gate idle components. However, the resulting wake-up latency creates observable patterns that attackers can exploit to infer component usage. Similarly, AI workloads are power-intensive. To cope, practitioners employ software-level model gating optimizations such as <em>early exiting<\/em> and <em>mixture-of-experts<\/em>. GateBleed demonstrates that these optimizations introduce uneven hardware utilization, enabling privacy attacks against AI models; GateBleed is the first demonstration of data privacy being compromised through power optimization. Unlike Spectre and Meltdown, which exploit fundamentally limited performance optimizations, power gating and model gating are likely to become even more widespread as the demand for ever-larger AI models grows in the post-Dennard Scaling era, increasing the potential for GateBleed attacks.<\/p>\n\n\n\n<p>Our peer-reviewed paper, \u201cGATEBLEED: Exploiting On-Core Accelerator Power Gating for High Performance &amp; Stealthy Attacks on AI\u201d, has been accepted at MICRO 2025.<\/p>\n\n\n\n<p><img decoding=\"async\" alt=\"\ud83d\udcc4\" src=\"https:\/\/fonts.gstatic.com\/s\/e\/notoemoji\/16.0\/1f4c4\/72.png\"> <a href=\"https:\/\/arxiv.org\/pdf\/2507.17033\" target=\"_blank\" rel=\"noreferrer noopener\">Read the preprint on arXiv<\/a><\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Key Findings<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Power optimization backfire: Latency penalties from AMX\u2019s cold-start wakeups create a strong timing signal.<\/li>\n\n\n\n<li>First hardware-only membership attack: 81% accuracy (precision 0.89) on an early-exit Transformer without any access to logits or model outputs.<\/li>\n\n\n\n<li>Broader AI privacy impact:\n<ul class=\"wp-block-list\">\n<li>100% accurate Mixture-of-Experts (MoE) routing inference<\/li>\n\n\n\n<li>Leakage of secret control-flow decisions across OS, VM, and SGX boundaries<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Cross-platform relevance: Preliminary tests confirm similar power-gating patterns on NVIDIA A100 GPUs.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">How GateBleed Works<\/h2>\n\n\n\n<ol start=\"1\" class=\"wp-block-list\">\n<li>Power Gating in AI Accelerators\n<ul class=\"wp-block-list\">\n<li>To save energy, AMX units shut down between uses.<\/li>\n\n\n\n<li>Waking from a deep state takes significantly longer than from a shallow state.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Measurable Latency Gap\n<ul class=\"wp-block-list\">\n<li>The first matmul after wake-up pays a \u201ccold-start\u201d cost: ~20,000 cycles in AMX.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Turning Latency into Leakage\n<ul class=\"wp-block-list\">\n<li>By carefully scheduling and timing operations, attackers can determine if a model took a specific execution path (e.g., selected an MoE expert, exited early).<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Remote Feasibility\n<ul class=\"wp-block-list\">\n<li>The signal remains robust over a one-hop network and achieves a 70,000\u00d7 higher transmission rate than NetSpectre using our amplification gadget.<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Impact<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Cloud AI deployments: Multi-tenant environments with shared accelerators are especially vulnerable.<\/li>\n\n\n\n<li>Privacy threats: Enables label leakage, membership inference, and watermark recovery without model outputs.<\/li>\n\n\n\n<li>Mitigation challenges: Standard defenses like cache partitioning, timer fuzzing, and speculation barriers are ineffective.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Technical Resources<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><img decoding=\"async\" src=\"https:\/\/fonts.gstatic.com\/s\/e\/notoemoji\/16.0\/1f4c4\/72.png\" alt=\"\ud83d\udcc4\"> <a href=\"https:\/\/arxiv.org\/pdf\/2507.17033\" target=\"_blank\" rel=\"noreferrer noopener\">Research Paper (arXiv Preprint)<\/a><\/li>\n\n\n\n<li><img decoding=\"async\" src=\"https:\/\/fonts.gstatic.com\/s\/e\/notoemoji\/16.0\/1f5a5_fe0f\/72.png\" alt=\"\ud83d\udda5\ufe0f\"> Proof-of-Concept demo (MICRO artifact evaluation link \u2014 placeholder for now)<\/li>\n\n\n\n<li><img decoding=\"async\" src=\"https:\/\/fonts.gstatic.com\/s\/e\/notoemoji\/16.0\/1f517\/72.png\" alt=\"\ud83d\udd17\"> <a href=\"https:\/\/www.intel.com\/content\/www\/us\/en\/security-center\/default.html\" target=\"_blank\" rel=\"noreferrer noopener\">Intel PSIRT Guidance on Constant-Time Programming<\/a> (note: does not mitigate MoE variant)<\/li>\n<\/ul>\n\n\n\n\n<h3 class=\"wp-block-heading\">Artifact URL<\/h3>\n\n\n\n<p><a href=\"https:\/\/github.com\/jkalya\/gatebleed\">https:\/\/github.com\/jkalya\/gatebleed<\/a><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Badges Applied For<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Artifact Available<\/li>\n\n\n\n<li>Artifact Evaluated &#8211; Functional<\/li>\n\n\n\n<li>Results Reproduced<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Hardware Dependencies (if any)<\/h3>\n\n\n\n<p>In our experiments, we used the Lenovo SRV-650 V3 server with UEFI version 3.14. The CPU we used was the Intel Xeon Gold 5420+ but any CPU with Intel AMX should work. If using a remote system in artifacts 03 and 04, you can use any Intel x86 system running any Linux distribution.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Software Dependency (if any)<\/h3>\n\n\n\n<p>We found the attack works best when using Linux kernel version 5.14.0-427. We found later versions of the kernel don&#8217;t show as many performance stages. We tested on both RHEL 9.4 and Ubuntu 22.04. To run 02_performance_stages_in_sgx, we need the Intel SGX SDK available at \\hyperlink{<a href=\"https:\/\/github.com\/intel\/linux-sgx%7D%7Bhttps:\/\/github.com\/intel\/linux-sgx%7D\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/github.com\/intel\/linux-sgx}{https:\/\/github.com\/intel\/linux-sgx}<\/a>. We use Python 3.9.21 but any version of Python 3 should work. All other artifacts do not have any other external dependencies.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Key Results to be Reproduced<\/h3>\n\n\n\n<p>01 &#8211; Observation of the performance stages that constitute the backbone of the GATEBLEED attack<\/p>\n\n\n\n<p>02 &#8211; Observation of these performance stages even within an Intel SGX enclave, making GATEBLEED exploitable within SGX<\/p>\n\n\n\n<p>03 &#8211; Remote GATEBLEED covert channel. Server sends a &#8220;0&#8221; by not running an AMX operation, a &#8220;1&#8221; by running an AMX operation. Client then requests a server endpoint that unconditionally runs an AMX operation and measures the response time, inferring if the Server used AMX recently<\/p>\n\n\n\n<p>04 &#8211; Remote GATEBLEED Spectre-v1 attack. Attacking client has knowledge of a branch on the victim server that conditionally executes an AMX operation based on the value of a secret bit.<\/p>\n\n\n\n<p>05 &#8211; Custom implementation of a transformer model that uses AMX whenever possible for inference. End-to-end runtime of model inference varies based on if the inference started while AMX was warm or cold<\/p>\n\n\n\n<p>06 &#8211; Membership inference attack on an entropy-based early-exiting transformer model<\/p>\n\n\n\n<p>07 &#8211; Membership Inference attack of early exit CNN model<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Estimated Completion Time<\/h3>\n\n\n\n<p>40 minutes<\/p>\n\n","protected":false},"excerpt":{"rendered":"<p>Overview GateBleed is a newly discovered microarchitectural side-channel affecting Intel\u2019s Advanced Matrix Extensions (AMX) and potentially other AI accelerators. It exploits the staged power-gating mechanism&#8230;<\/p>\n","protected":false},"author":3,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"open","template":"page-landing.php","meta":{"footnotes":""},"class_list":["post-2","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/research.ece.ncsu.edu\/gatebleed\/wp-json\/wp\/v2\/pages\/2","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/research.ece.ncsu.edu\/gatebleed\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/research.ece.ncsu.edu\/gatebleed\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/research.ece.ncsu.edu\/gatebleed\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/research.ece.ncsu.edu\/gatebleed\/wp-json\/wp\/v2\/comments?post=2"}],"version-history":[{"count":15,"href":"https:\/\/research.ece.ncsu.edu\/gatebleed\/wp-json\/wp\/v2\/pages\/2\/revisions"}],"predecessor-version":[{"id":38,"href":"https:\/\/research.ece.ncsu.edu\/gatebleed\/wp-json\/wp\/v2\/pages\/2\/revisions\/38"}],"wp:attachment":[{"href":"https:\/\/research.ece.ncsu.edu\/gatebleed\/wp-json\/wp\/v2\/media?parent=2"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}