RF-ASL Research
Radar-based methods and apparatus for communication and interpretation of sign languages
Inventors: S.Z. Gurbuz, A.C. Gurbuz, C. Crawford, and D.J. Griffin
- Invention Disclosure filed February 16, 2018.
- U.S. Provisional Patent Application #: 62834760, filed April 16, 2019.
- Non-Provisional Patent Application filed April 24, 2020.
- US Patent Application Publication #: US2020/0334452, Oct. 22, 2020 -> Link to Publication PDF
- US Patent Granted: 4/12/2022 -> Link to Patent
Emre Kurtoglu (2024, PhD, University of Alabama): Fully-Adaptive RF Sensing for Non-Intrusive ASL Recognition via Interactive Smart Environments
M. Mahbubur Rahman (2023, PhD, University of Alabama): Physics-Aware Deep Learning for Radar-Based Cyber-Physical Human Systems
Trevor Macks (2020, MS, University of Alabama): American Sign Language Recognition Using Adversarial Learning in a Multi-Frequency RF Sensor Network
S.Z. Gurbuz and E. Malaia, “Kinematic and Linguistic Interpretation of Human Motion via RF Signal Analysis,” in New Methodologies for Understanding Radar Data, Ed: Amit Kumar Mishra and Stefan Brüggenwirth, IET, 2021.
RF-ChessSIGN: Radar-enabled Human-Computer Interaction in a Real-Time Sign Language-Controlled Game
Proceedings of the IEEE/CVF International Conference on Computer Vision, October 2025
K. Dehaan, E. Kurtoglu, S. Biswas, C. Kobek Pezzarossi, D. Griffin, C. Crawford, A.C. Gurbuz, E. Malaia, A. Glasser, R. Kushalnagar, S.Z. Gurbuz
[ PDF ] [ Link to Pub ]
Radio frequency (RF) sensors have gained an enormous popularity in the past decade with advances in machine learning and small-package, high-sensitivity sensor design. Cost efficiency and robustness against environmental factors such as lighting, skin color or room layout make them ideal sensors for indoor monitoring applications such as human activity, hand gesture and sign language recognition. Although there has been great interest in the research community in exploring the potential of radars for sign language recognition, most studies are often conducted in controlled laboratory environments with certain experimental restrictions and instructions limiting the user behavior, which consequently yield overoptimistic results. Moreover, these studies often consist of repetitive articulation of the signs of interest, which is overwhelming for the participants. In this work, we explore the real-world challenges of RF sensors by freeing the data collection process from experimental restrictions by gamifying it. Essentially, we develop and deploy a radar-based sign language-controlled Chess game (RF-ChessSIGN) where users move the Chess pieces on the board by articulating the signs registered to a particular position. The developed approach has enabled an entertaining and sustainable way of collecting data, and revealed the ineffectiveness of the datasets acquired under controlled experimental settings which cannot capture the nuances and intricacies of natural signing.
Interactive Learning of Natural Sign Language with Radar
E. Kurtoglu, K. DeHaan, C. Kobek Pezzarossi, D. J. Griffin, C. Crawford, S. Z. Gurbuz
IET Radar Sonar and Navigation, 2024
[ Link to Pub ] [ PDF ]
Over the past decade, there have been great advancements in radio frequency (RF) sensor technology for human computer interaction applications, such as gesture recognition, and human activity recognition more broadly. While there is a significant amount of study on these topics, in most cases, experimental data are acquired in controlled settings by directing participants what motion to articulate. However, especially for communicative motions, such as sign language, such directed datasets do not accurately capture natural, in-situ articulations. This results in a difference in the distribution of directed American Sign Language (ASL) versus natural ASL, which severely degrades natural sign language recognition in real-world scenarios. To overcome these challenges and acquire more representative data for training deep models, this work develops an interactive gaming environment, ChessSIGN, which records video and radar data of participants as they play the game without any external direction. We investigate various ways of generating synthetic samples from Directed ASL data, but show that ultimately such data does not offer much improvement over just initializing using imagery from ImageNet. In contrast, we propose an interactive learning paradigm in which model training is shown to improve as more and more natural ASL samples are acquired and augmented via synthetic samples generated from a physics-aware generative adversarial network. We show that our proposed approach enables the recognition of natural ASL in a real-world setting, achieving an accuracy of 69% for 29 ASL signs – a 60% improvement over conventional training with directed ASL data.
CV-SincNet: Learning Complex Sinc Filters from Raw Radar Data for Computationally Efficient Human Motion Recognition
IEEE Transactions on Radar Systems, vol. 1, pp. 493-504, 2023
S. Biswas, C. O. Ayna, S.Z. Gurbuz, and A.C. Gurbuz
[ Link to Pub ] [ PDF ]
The utilization of radio-frequency (RF) sensing in cyber-physical human systems, such as human-computer interfaces or smart environments, is an emerging application that requires real-time human motion recognition. However, current state-of-the-art radar-based recognition techniques rely on computing various RF data representations, such as range-Doppler or range-Angle maps, micro-Doppler signatures, or higher dimensional representations, which have great computational complexity. Consequently, classification of raw radar data has garnered increasing interest, while remaining limited in the accuracy that can be attained for recognition of even simple gross motor activities. To help address this challenge, this paper proposes a more interpretable complex-valued neural network design. Complex sinc filters are designed to learn frequency-based relationships directly from the complex raw radar data in the initial layer of the proposed model. The complex-valued sinc layer consists of windowed band-pass filters that learn the center frequency and bandwidth of each filter. A challenging RF dataset consisting of 100 words from American Sign Language (ASL) is selected to verify the model. About 40% improvement in classification accuracy was achieved over the application of a 1D CNN on raw RF data, while 8% improvement was achieved compared to real-valued SincNet. Our proposed approach achieved a 4% improvement in accuracy over that attained with a 2D CNN applied to micro-Doppler spectrograms, while also reducing the overall computational latency by 71%.
Boosting Multi-Target Recognition Performance with MIMO Radar-based Angular Subspace Projection and Multi-View DNN
IET Radar, Sonar and Navigation, vol. 17, no. 7, July 2023
E. Kurtoglu, S. Biswas, A.C. Gurbuz, and S.Z. Gurbuz
[ Link to Pub ] [ PDF ]
American Sign Language (ASL) recognition using radars has become an emerging research field, especially with the development of small package, commercially available RF sensors. American Sign Language signs are composed of a mixture of various hand movement types (e.g. circular, straight, and back-and-forth). While some signs are articulated with one hand, some are articulated using both hands. Separation of return signals from left and right hands can be quite useful in order to retrieve the individual characteristics of each hand’s motion. However, rapid change in the spatial position of the hands and two hands being very close to each other introduce challenging scenarios and classical representations such as RA domain cannot separate the right and left hand as two separate targets. To address this challenge, and enable separation of not just the signals of the left and right hand, but also separation of signals from multiple people in a room, we propose an angular subspace project technique that can be used to provide novel perspectives to a multi-view DNN. Our results show increased ASL recognition performance as a result of the proposed approach.
Effect of Kinematics and Fluency in Adversarial Synthetic Data Generation for ASL Recognition with RF Sensors
IEEE Transactions on Aerospace and Electronic Systems, Volume: 58, Iss. 4, August 2022
M.M. Rahman, E.A. Malaia, A.C. Gurbuz, D. Griffin, C. Crawford and S.Z. Gurbuz
* https://arxiv.org/abs/2201.00055 (Jan 2022)
[ Link to Pub ] [ PDF ]
RF sensors have been recently proposed as a new modality for sign language processing technology. They are non-contact, effective in the dark, and acquire a direct measurement of signing kinematic via exploitation of the micro-Doppler effect. First, this work provides an in depth, comparative examination of the kinematic properties of signing as measured by RF sensors for both fluent ASL users and hearing imitation signers. Second, as ASL recognition techniques utilizing deep learning requires a large amount of training data, this work examines the effect of signing kinematics and subject fluency on adversarial learning techniques for data synthesis. Two different approaches for the synthetic training data generation are proposed: 1) adversarial domain adaptation to minimize the differences between imitation signing and fluent signing data, and 2) kinematically-constrained generative adversarial networks for accurate synthesis of RF signing signatures. The results show that the kinematic discrepancies between imitation signing and fluent signing are so significant that training on data directly synthesized from fluent RF signers offers greater performance (93% top-5 accuracy) than that produced by adaptation of imitation signing (88% top-5 accuracy) when classifying 100 ASL signs.
ASL Trigger Recognition in Mixed Activity/Signing Sequences for RF Sensor-Based User Interfaces
IEEE Transactions on Human Machine Systems, Volume: 52, Iss. 4, August 2022
E. Kurtoglu, A.C. Gurbuz, E.A. Malaia, D. Griffin, C. Crawford, and S.Z. Gurbuz
* https://arxiv.org/abs/2111.05480 (Dec. 2021)
[ Link to Pub ] [ PDF ]
The past decade has seen great advancements in speech recognition for control of interactive devices, personal assistants, and computer interfaces. However, Deaf and hard-of hearing (HoH) individuals, whose primary mode of communication is sign language, cannot use voice-controlled interfaces. Although there has been significant work in video-based sign language recognition, video is not effective in the dark and has raised privacy concerns in the Deaf community when used in the context of human ambient intelligence. RF sensors have been recently proposed as a new modality that can be effective under the circumstances where video is not. This paper considers the problem of recognizing a trigger sign (wake word) in the context of daily living, where gross motor activities are interwoven with signing sequences. The proposed approach exploits multiple RF data domain representations (time-frequency, range-Doppler, and range-angle) for sequential classification of mixed motion data streams. The recognition accuracy of signs with varying kinematic properties is compared and used to make recommendations on appropriate trigger sign selection for RF sensor based user interfaces. The proposed approach achieves a trigger sign detection rate of 98.9% and a classification accuracy of 92% for 15 ASL words and 3 gross motor activities.
Multi-Frequency RF Sensor Fusion for Word-Level Fluent ASL Recognition
IEEE Sensors Journal, Volume: 22, Iss. 12, pp. 11373-11381, June 2022
S.Z. Gurbuz, M.M. Rahman, E. Kurtoglu, E. Malaia, A.C. Gurbuz, D.J. Griffin, C. Crawford
[ Link to Pub ] [ PDF ]
Methods: This paper investigates the RF transmit waveform parameters required for effective measurement of ASL signs and their effect on word-level classification accuracy attained with transfer learning and convolutional autoencoders (CAE). A multi-frequency fusion network is proposed to exploit data from all sensors in an RF sensor network and improve the recognition accuracy of fluent ASL signing. Results: For fluent signers, CAEs yield a 20-sign classification accuracy of %76 at 77 GHz and %73 at 24 GHz, while at X-band (10 Ghz) accuracy drops to 67%. For hearing imitation signers, signs are more separable, resulting in a 96% accuracy with CAEs. Further, fluent ASL recognition accuracy is significantly increased with use of the multi-frequency fusion network, which boosts the 20-sign fluent ASL recognition accuracy to 95%, surpassing conventional feature level fusion by 12%. Implications: Signing involves finer spatiotemporal dynamics than typical hand gestures, and thus requires interrogation with a transmit waveform that has a rapid succession of pulses and high bandwidth. Millimeter wave RF frequencies also yield greater accuracy due to the increased Doppler spread of the radar backscatter. Comparative analysis of articulation dynamics also shows that imitation signing is not representative of fluent signing, and not effective in pre-training networks for fluent ASL classification. Deep neural networks employing multi-frequency fusion capture both shared, as well as sensor-specific features and thus offer significant performance gains in comparison to using a single sensor or feature-level fusion.
American Sign Language Recognition Using RF Sensing
IEEE Sensors Journal, vol. 21, iss. 3, Feb. 2021
* IEEE Early Access, September 7, 2020
S.Z. Gurbuz, A.C. Gurbuz, E. Malaia, D. Griffin, C. Crawford, M. Rahman, E. Kurtoglu, R. Aksu, T. Macks, R. Mdrafi
* https://arxiv.org/abs/2009.01224 (August 2020)
[ Link to Pub ] [ PDF ]
Methods: This paper proposes the use of RF sensors for HCI applications serving the Deaf community. A multi-frequency RF sensor network is used to acquire non-invasive, non-contact measurements of ASL signing irrespective of lighting conditions. The unique patterns of motion present in the RF data due to the micro-Doppler effect are revealed using time-frequency analysis with the Short-Time Fourier Transform. Linguistic properties of RF ASL data are investigated using machine learning (ML). Results: The information content, measured by fractal complexity, of ASL signing is shown to be greater than that of other upper body activities encountered in daily living. This can be used to differentiate daily activities from signing, while features from RF data show that imitation signing by non-signers is 99% differentiable from native ASL signing. Feature-level fusion of RF sensor network data is used to achieve 72.5% accuracy in classification of 20 native ASL signs. Implications: RF sensing can be used to study dynamic linguistic properties of ASL and design Deaf-centric smart environments for non-invasive, remote recognition of ASL. ML algorithms should be benchmarked on native, not imitation, ASL data.
Capturing Motion: Using Radar to Build Better Sign Language Corpora
The Joint Int. Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING)
Torino, Italy, May 20-25, 2024
E. Malaia, J. Borneman, and S.Z. Gurbuz
[ PDF ]
Sign language conveys information using dynamic visual signal. Proficient signers rely on the skill in processing and predictive motion information during sign language comprehension. Much current work in sign language corpora development relies on video data. However, from the perspective of information transfer in communication, video recordings are limited in capturing spatial and temporal frequencies of sign language signal in sufficient resolution. In contrast, radar can capture 3D motion data at high temporal and spatial resolution, preserving depth articulations lost in 2D video. Radar’s recording parameters can also be adapted in real time to optimize temporal resolution for rapid signing motions. Thus, radar recordings provide higher-fidelity corpora for analyzing linguistic features of sign languages and creating smart environments that respond to signed input. Crucially, radar recordings uphold user privacy, only capturing kinematic parameters of communicative signal, as opposed to signer identity. Radar resolution in capturing dynamic data from sign language production, and privacy advantages it provides to users, make it uniquely suited for advancing sign language research through corpora development.
Complexity in Sign languages: Linguistic and Dimensional Analysis of Information Transfer in Dynamic Visual Communication
Linguistic Vanguard, vol. 9, no. s1, 2023, pp. 121-131, published online in October 2022.
E.A. Malaia, J.D. Borneman, E. Kurtoglu, S.Z. Gurbuz, D. Griffin, C. Crawford, and A.C. Gurbuz
[ Link to Pub ] [ PDF ]
Sign languages are human communication systems that are equivalent to spoken language in their capacity for information transfer, but which use a dynamic visual signal for communication. Thus, linguistic metrics of complexity, which are typically developed for linear, symbolic linguistic representation (such as written forms of spoken languages) do not translate easily into sign language analysis. A comparison of physical signal metrics, on the other hand, is complicated by the higher dimensionality (spatial and temporal) of the sign language signal as compared to a speech signal (solely temporal). Here, we review a variety of approaches to operationalizing sign language complexity based on linguistic and physical data, and identify the approaches that allow for high fidelity modeling of the data in the visual domain, while capturing linguistically-relevant features of the sign language signal.
Performance Comparison of Radar and Video for American Sign Language Recognition
IEEE Radar Conference, New York City, NY, March 2022
M. M. Rahman, E. Kurtoglu, M. Taskin, K. Esme, A. C. Gurbuz, E. Malaia, and S. Z. Gurbuz
[ PDF ]
In the past decade, there has been a great research in the developments of American Sign Language (ASL) enabled user interfaces and smart environments, especially using wearables, RGB and RGB-D video cameras , and radio frequency (RF) sensors. Each sensor modality provides distinct advantages and suffer from various problems. Although each sensor modality is studied for ASL recognition a comparison of video and RF
based sensing performance in terms of ASL recognition is not available. This study aims to compare word level ASL recognition performance over the same 100 ASL glosses data from both RF and video sensors. A top-5 accuracy of 93% was achieved while using the RF micro-Doppler spectrogram representation in a convolutional neural network (CNN) classifier, whereas with video ASL data for the same 100 words, a top-5 accuracy of 90%
was achieved. This shows that radar has comparable recognition performance to video for ASL recognition.
A Linguistic Perspective on Radar Micro-Doppler Analysis of American Sign Language
IEEE International Radar Conference, April 2020
SZ. Gurbuz, AC. Gurbuz, EA. Malaia, DJ. Griffin, C. Crawford, MM. Rahman, et.al.
[PDF]
Although users of American Sign Language (ASL) comprise a significant minority in the U.S. and Canada, people in the Deaf community have been unable to benefit from many new technologies, which depend upon vocalized speech,and are designed for hearing individuals. While video has led to tremendous advances in ASL recognition, concerns over invasion of privacy have limited its use for in-home smart environments. This work presents initial work on the use of RF sensors, which can protect user privacy, for the purpose of ASL recognition. The new offerings of 2D/3D RF data representations and optical flow are presented. The fractal complexity of ASL is shown to be greater than that of daily activities – a relationship consistent with linguistic analysis conducted using video.