{"id":392,"date":"2026-06-23T14:29:47","date_gmt":"2026-06-23T14:29:47","guid":{"rendered":"https:\/\/research.ece.ncsu.edu\/brainspec\/?page_id=392"},"modified":"2026-06-23T17:11:24","modified_gmt":"2026-06-23T17:11:24","slug":"publications","status":"publish","type":"page","link":"https:\/\/research.ece.ncsu.edu\/brainspec\/?page_id=392","title":{"rendered":"Publications"},"content":{"rendered":"<form role=\"search\" method=\"get\" action=\"https:\/\/research.ece.ncsu.edu\/brainspec\/\" class=\"wp-block-search__button-outside wp-block-search__text-button wp-block-search\"    ><label class=\"wp-block-search__label\" for=\"wp-block-search__input-1\" >Search<\/label><div class=\"wp-block-search__inside-wrapper\" ><input class=\"wp-block-search__input\" id=\"wp-block-search__input-1\" placeholder=\"Search\" value=\"\" type=\"search\" name=\"s\" required \/><button aria-label=\"Search\" class=\"wp-block-search__button wp-element-button\" type=\"submit\" >Search<\/button><\/div><\/form>\n\n\n<h1 class=\"wp-block-heading\">GATEBLEED: Exploiting On-Core Accelerator Power Gating for High Performance and Stealthy Attacks on AI<\/h1>\n\n\n\n<h4 class=\"wp-block-heading\">Joshua Kalyanapu, Darsh Asher, Farshad Dizani, Azam Ghanbari, Rosario Cammarota, Samira Mirbagher Ajorpaz<\/h4>\n\n\n\n<h5 class=\"wp-block-heading\">IEEE Micro Top Picks 2025<\/h5>\n\n\n\n<h6 class=\"wp-block-heading\"><a href=\"https:\/\/research.ece.ncsu.edu\/brainspec\/wp-content\/uploads\/sites\/35\/2026\/06\/gatebleed-toppicks.pdf\" data-type=\"attachment\" data-id=\"401\">Top Picks Paper<\/a><\/h6>\n\n\n\n<h6 class=\"wp-block-heading\"><a href=\"https:\/\/research.ece.ncsu.edu\/brainspec\/wp-content\/uploads\/sites\/35\/2026\/06\/gatebleed.pdf\" data-type=\"attachment\" data-id=\"398\">Paper presented at MICRO 2025<\/a><\/h6>\n\n\n\n<h6 class=\"wp-block-heading\"><a href=\"https:\/\/research.ece.ncsu.edu\/brainspec\/wp-content\/uploads\/sites\/35\/2026\/06\/MICRO25_GateBleed_Presentation.pdf\" data-type=\"attachment\" data-id=\"405\">Slides presented at MICRO 2025<\/a><\/h6>\n\n\n\n<h6 class=\"wp-block-heading\"><a href=\"https:\/\/research.ece.ncsu.edu\/brainspec\/wp-content\/uploads\/sites\/35\/2026\/06\/gatebleed_poster.pdf\" data-type=\"attachment\" data-id=\"408\">Poster presented at MICRO 2025<\/a><\/h6>\n\n\n\n<h6 class=\"wp-block-heading\"><a href=\"https:\/\/github.com\/jkalya\/gatebleed\" data-type=\"link\" data-id=\"https:\/\/github.com\/jkalya\/gatebleed\">Code<\/a><\/h6>\n\n\n\n<p class=\"wp-block-paragraph\">AI accelerators are being integrated directly into the CPU. This paper discloses GATEBLEED, a family of timing side channels caused by aggressive power gating of one such on-core accelerator\u2013Intel AMX. The attacks expose a four-axis shift in the microarchitectural attack threat model, not rooted in a bug but in a power optimization that makes efficient and seamless on-core AI acceleration possible: the target moves from leaking stored bytes to inferring AI training-data properties, specifically, hardware-observed membership inference attacks (MIAs) for the first time. The channel bypasses software defenses such as padding and confidence masking; it provides magnification that turns theoretical microarchitectural side channels, previously requiring controlled, low-traffic networks to be observable, into attacks with realistic leakage rates on production networks; it bypasses timer-coarsening defenses; it remains stealthy through a passive reset and low repetition requirement; and it escapes state-of-the-art anomaly detectors. Defense becomes a performance\u2013power\u2013privacy trilemma that requires rethinking how computer architects design the future of power-efficient, low-cost AI accelerators from the ground up.<\/p>\n\n\n\n<hr class=\"wp-block-separator alignfull has-alpha-channel-opacity\" \/>\n\n\n\n<h1 class=\"wp-block-heading\"><a href=\"https:\/\/research.ece.ncsu.edu\/brainspec\/wp-content\/uploads\/sites\/35\/2026\/06\/cachemind.pdf\" data-type=\"attachment\" data-id=\"400\">CacheMind: From Miss Rates to Why-Natural-Language, Trace-Grounded Reasoning for Cache Replacement<\/a><\/h1>\n\n\n\n<h4 class=\"wp-block-heading\">Kaushal Mhapsekar, Azam Ghanbari, Bita Aslrousta, and Samira Mirbagher Ajorpaz<\/h4>\n\n\n\n<h5 class=\"wp-block-heading\">ASPLOS 2026<\/h5>\n\n\n\n<h6 class=\"wp-block-heading\"><a href=\"https:\/\/research.ece.ncsu.edu\/brainspec\/wp-content\/uploads\/sites\/35\/2026\/06\/cachemind.pdf\" data-type=\"attachment\" data-id=\"398\">Paper<\/a><\/h6>\n\n\n\n<h6 class=\"wp-block-heading\"><a href=\"https:\/\/research.ece.ncsu.edu\/brainspec\/wp-content\/uploads\/sites\/35\/2026\/06\/CacheMind_slides.pdf\" data-type=\"attachment\" data-id=\"416\">Slides<\/a><\/h6>\n\n\n\n<h6 class=\"wp-block-heading\"><a href=\"https:\/\/research.ece.ncsu.edu\/brainspec\/wp-content\/uploads\/sites\/35\/2026\/06\/CacheMind_Poster.pdf\" data-type=\"attachment\" data-id=\"415\">Poster<\/a><\/h6>\n\n\n\n<h6 class=\"wp-block-heading\"><a href=\"https:\/\/github.com\/kaushal1803\/cachemind\" data-type=\"link\" data-id=\"https:\/\/github.com\/kaushal1803\/cachemind\">Code<\/a><\/h6>\n\n\n\n<p class=\"wp-block-paragraph\">Cache replacement remains a challenging problem in CPU microarchitecture, often addressed using hand-crafted heuristics that limit cache performance. Cache data analysis requires parsing millions of trace entries with manual filtering, making the process slow and non-interactive.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">To address this, we introduce CacheMind, a conversational tool that uses Retrieval-Augmented Generation (RAG) and Large Language Models (LLMs) to enable semantic reasoning over cache traces. Architects can now ask natural language questions like, <em>&#8221;Why is the memory access associated with PC X causing more evictions?&#8221;,<\/em> and receive trace grounded, human-readable answers linked to program semantics for the first time.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">To evaluate CacheMind, we present CacheMindBench, the first verified benchmark suite for LLM-based reasoning for the cache replacement problem. Using the Sieve retriever, CacheMind achieves 66.67% on 75 unseen trace-grounded questions and 84.80% on 25 unseen policy-specific reasoning tasks; with Ranger, it achieves 89.33% and 64.80% on the same evaluations. Additionally, with Ranger, CacheMind achieves 100% accuracy on 4 out of 6 categories in the trace-grounded tier of CacheMindBench. Compared to LlamaIndex (10% retrieval success), Sieve achieves 60% and Ranger achieves 90%, demonstrating that existing RetrievalAugmented Generation (RAGs) are insufficient for precise, trace-grounded microarchitectural reasoning.<\/p>\n\n\n\n<hr class=\"wp-block-separator alignfull has-alpha-channel-opacity\" \/>\n\n\n\n<h1 class=\"wp-block-heading\"><a href=\"https:\/\/research.ece.ncsu.edu\/brainspec\/wp-content\/uploads\/sites\/35\/2026\/06\/featurebleed.pdf\" data-type=\"attachment\" data-id=\"399\">FeatureBleed: Inferring Private Enriched Attributes From Sparsity-Optimized AI Accelerators<\/a><\/h1>\n\n\n\n<h4 class=\"wp-block-heading\">Darsh Asher, Farshad Dizani, Joshua Kalyanapu, Rosario Cammarota, Aydin Aysu, and Samira Mirbagher Ajorpaz<\/h4>\n\n\n\n<h5 class=\"wp-block-heading\">IEEE CAL Jan.-June 2026<\/h5>\n\n\n\n<p class=\"wp-block-paragraph\">Backend enrichment deployed in sensitive domains such as product recommendation pipelines, healthcare, and finance, where models are trained on confidential data and retrieve private features whose values influence inference behavior while remaining hidden from the API caller. This paper presents the first hardware-level backend retrieval data-stealing attack, showing that accelerator optimizations designed for performance can directly undermine data confidentiality and bypass state-of-the-art privacy defenses. Our attack, FeatureBleed, exploits zero-skipping in AI accelerators to infer private backend-retrieved features solely through end-to-end timing, without relying on power analysis, DVFS manipulation, or shared-cache side channels. We evaluate FeatureBleed on three datasets spanning medical and non-medical domains\u2014Texas-100X (clinical records), OrganAMNIST (medical imaging), and Census-19 (socioeconomic data). We further evaluate FeatureBleed across three hardware backends (Intel AVX, Intel AMX, and NVIDIA A100) and three model architectures (DNNs, CNNs, and hybrid CNN\u2013MLP pipelines), demonstrating that the leakage generalizes across CPU and GPU accelerators, data modalities, and application domains, with an adversarial advantage of up to 98.87 pp. Finally, we identify the root cause of the leakage as sparsity-driven zero-skipping in modern hardware. We quantify the privacy\u2013performance\u2013power trade-off: disabling zero-skipping increases Intel AMX\u2019s per-operation energy by up to 25% and incurs 100% performance overhead. We propose a padding-based defense that masks timing leakage by equalizing responses to the worst-case execution time, achieving protection with only 7.24% average performance overhead and no additional power cost is now widely.<\/p>\n\n\n\n<hr class=\"wp-block-separator alignfull has-alpha-channel-opacity\" \/>\n\n\n\n<h1 class=\"wp-block-heading\">GateBleed: Exploiting On-Core Accelerator Power Gating for High Performance and Stealthy Attacks on AI<\/h1>\n\n\n\n<h4 class=\"wp-block-heading\">Joshua Kalyanapu, Farshad Dizani, Darsh Asher, Azam Ghanbari, Rosario Cammarota, Aydin Aysu, and Samira Mirbagher Ajorpaz<\/h4>\n\n\n\n<h5 class=\"wp-block-heading\">IEEE MICRO 2025<\/h5>\n\n\n\n<h6 class=\"wp-block-heading\"><a href=\"https:\/\/research.ece.ncsu.edu\/brainspec\/wp-content\/uploads\/sites\/35\/2026\/06\/gatebleed.pdf\" data-type=\"attachment\" data-id=\"398\">Paper<\/a><\/h6>\n\n\n\n<h6 class=\"wp-block-heading\"><a href=\"https:\/\/research.ece.ncsu.edu\/brainspec\/wp-content\/uploads\/sites\/35\/2026\/06\/MICRO25_GateBleed_Presentation.pdf\" data-type=\"attachment\" data-id=\"405\">Slides<\/a><\/h6>\n\n\n\n<h6 class=\"wp-block-heading\"><a href=\"https:\/\/research.ece.ncsu.edu\/brainspec\/wp-content\/uploads\/sites\/35\/2026\/06\/gatebleed-toppicks.pdf\" data-type=\"attachment\" data-id=\"401\">Top Picks Paper<\/a><\/h6>\n\n\n\n<h6 class=\"wp-block-heading\"><a href=\"https:\/\/research.ece.ncsu.edu\/brainspec\/wp-content\/uploads\/sites\/35\/2026\/06\/gatebleed_poster.pdf\" data-type=\"attachment\" data-id=\"408\">Poster<\/a><\/h6>\n\n\n\n<h6 class=\"wp-block-heading\"><a href=\"https:\/\/github.com\/jkalya\/gatebleed\" data-type=\"link\" data-id=\"https:\/\/github.com\/jkalya\/gatebleed\">Code<\/a><\/h6>\n\n\n\n<p class=\"wp-block-paragraph\">As power consumption from AI training and inference continues to increase, AI accelerators are being integrated directly into the CPU. Intel\u2019s Advanced Matrix Extensions (AMX) is one such example, debuting in the 4th Generation Intel Xeon Scalable CPU, attaining significant gains in the metrics of performance\/watt and decreased memory offloading penalty. This paper discloses a timing side and covert channel, GateBleed, caused by the aggressive power gating utilized to keep the CPU within operating limits. This paper shows that the GateBleed side channel is a threat to AI privacy, as many ML models such as Transformers and CNNs make critical computationally-heavy decisions based on private values like confidence thresholds and routing logits. Timing delays from the selective powering down of AMX components mean that each matrix multiplication is a potential leakage point when executed on the AMX accelerator. This paper identifies over a dozen potential gadgets across popular ML libraries (Hugging Face, PyTorch, TensorFlow, etc.), revealing that they can leak sensitive and private information, including class labels and internal states. GateBleed poses a risk for local and remote timing inference, even under previous protective measures. GateBleed gadgets can also be used as a a generic high performance, stealthy magnifier for microarchitectural attacks to bypass timer resolution coarsening defenses and create robust and realistic side channels in noisy environments, such as remote attacks on networks with high traffic. This paper shows that when GateBleed gadget is used as a transmission channel for Spectre, it can leak arbitrary memory addresses of the victim with high performance (0.067 bps), and evade the state-of-the-art microarchitectural attack detectors for the first time. This paper implements an end-to-end membership inference attack with 81% accuracy on a Transformer model optimized with Intel AMX and 99% accuracy on an early-exit CNN classifier. GateBleed achieves 0.89 precision while leaking expert choice in a Transformer mixture-of-experts (MoE) with 100% accuracy. These attacks do not rely on confidence scores or model outputs, but only on the execution time of attacker-controlled AMX instructions on the shared hardware accelerator with power gating. To the authors\u2019 knowledge, this is the first side-channel attack on AI privacy that exploits hardware accelerator power optimizations. The paper also suggests effective mitigations and measures their trade-off between power consumption and performance.<\/p>\n\n\n\n<hr class=\"wp-block-separator alignfull has-alpha-channel-opacity\" \/>\n\n\n\n<h1 class=\"wp-block-heading\"><a href=\"https:\/\/research.ece.ncsu.edu\/brainspec\/wp-content\/uploads\/sites\/35\/2026\/06\/exploiting_intel_amx_power_gating.pdf\" data-type=\"link\" data-id=\"https:\/\/research.ece.ncsu.edu\/brainspec\/wp-content\/uploads\/sites\/35\/2026\/06\/exploiting_intel_amx_power_gating.pdf\">Exploiting Intel AMX Power Gating<\/a><\/h1>\n\n\n\n<h4 class=\"wp-block-heading\">Joshua Kalyanapu, Farshad Dizani, Azam Ghanbari, Darsh Asher, and Samira Mirbagher Ajorpaz<\/h4>\n\n\n\n<h5 class=\"wp-block-heading\">IEEE CAL Jan.-June 2025<\/h5>\n\n\n\n<p class=\"wp-block-paragraph\">We identify a novel vulnerability in Intel AMX\u2019s dynamic power performance scaling, enabling NETLOKI, a stealthy and high-performance remote speculative attack that bypasses traditional cache defenses and leaks arbitrary addresses over a realistic network where other attacks fail. NETLOKI shows a 34,900% improvement in leakage rate over NetSpectre. We show that NETLOKI evades detection by three state-of-the-art microarchitectural attack detectors (EVAX, PerSpectron, RHMD) and requires a 20,000x reduction in the system\u2019s timer resolution (10 us) than the standard 0.5 ns hardware timer to be mitigated via timer coarsening. Finally, we analyze the root cause of the leakage and propose an effective defense. We show that the mitigation increases CPU power consumption by 12.33%.<\/p>\n\n\n\n<hr class=\"wp-block-separator alignfull has-alpha-channel-opacity\" \/>\n\n\n\n<h1 class=\"wp-block-heading\"><a href=\"https:\/\/research.ece.ncsu.edu\/brainspec\/wp-content\/uploads\/sites\/35\/2026\/06\/thor.pdf\" data-type=\"attachment\" data-id=\"397\">Thor: A Non-Speculative Value Dependent Timing Side Channel Attack Exploiting Intel AMX<\/a><\/h1>\n\n\n\n<h4 class=\"wp-block-heading\">Farshad Dizani, Azam Ghanbari, Joshua Kalyanapu, Darsh Asher, and Samira Mirbagher Ajorpaz<\/h4>\n\n\n\n<h5 class=\"wp-block-heading\">IEEE CAL Jan.-June 2025<\/h5>\n\n\n\n<p class=\"wp-block-paragraph\">The rise of on-chip accelerators signifies a major shift in computing, driven by the growing demands of artificial intelligence (AI) and specialized applications. These accelerators have gained popularity due to their ability to substantially boost performance, cut energy usage, lower total cost of ownership (TCO), and promote sustainability. Intel\u2019s Advanced Matrix Extensions (AMX) is one such on-chip accelerator, specifically designed for handling tasks involving large matrix multiplications commonly used in machine learning (ML) models, image processing, and other computational-heavy operations. In this paper, we introduce a novel value-dependent timing side-channel vulnerability in Intel AMX. By exploiting this weakness, we demonstrate a software-based, value-dependent timing side-channel attack capable of inferring the sparsity of neural network weights without requiring any knowledge of the confidence score, privileged access or physical proximity. Our attack method can fully recover the sparsity of weights assigned to 64 input elements within 50 minutes, which is 631% faster than the maximum leakage rate achieved in the Hertzbleed attack.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n","protected":false},"excerpt":{"rendered":"<p>GATEBLEED: Exploiting On-Core Accelerator Power Gating for High Performance and Stealthy Attacks on AI Joshua Kalyanapu, Darsh Asher, Farshad Dizani, Azam Ghanbari, Rosario Cammarota, Samira Mirbagher Ajorpaz IEEE Micro Top Picks 2025 Top Picks Paper Paper presented at MICRO 2025 Slides presented at MICRO 2025 Poster presented at MICRO 2025 Code AI accelerators are being [&hellip;]<\/p>\n","protected":false},"author":140,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_acf_changed":false,"footnotes":""},"class_list":["post-392","page","type-page","status-publish","hentry"],"blocksy_meta":[],"acf":[],"_links":{"self":[{"href":"https:\/\/research.ece.ncsu.edu\/brainspec\/index.php?rest_route=\/wp\/v2\/pages\/392","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/research.ece.ncsu.edu\/brainspec\/index.php?rest_route=\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/research.ece.ncsu.edu\/brainspec\/index.php?rest_route=\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/research.ece.ncsu.edu\/brainspec\/index.php?rest_route=\/wp\/v2\/users\/140"}],"replies":[{"embeddable":true,"href":"https:\/\/research.ece.ncsu.edu\/brainspec\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=392"}],"version-history":[{"count":8,"href":"https:\/\/research.ece.ncsu.edu\/brainspec\/index.php?rest_route=\/wp\/v2\/pages\/392\/revisions"}],"predecessor-version":[{"id":418,"href":"https:\/\/research.ece.ncsu.edu\/brainspec\/index.php?rest_route=\/wp\/v2\/pages\/392\/revisions\/418"}],"wp:attachment":[{"href":"https:\/\/research.ece.ncsu.edu\/brainspec\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=392"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}