Sujan Kumar Gonugondla
Machine Learning Researcher
Pioneering the future of AI, one algorithm at a time.
Learn MoreAbout Me
I am a Machine Learning Scientist with interests in designing and implementing AI and machine learning algorithms that are both efficient and effective. At Amazon, I led the efficient inference efforts for Amazon Q Developer, an LLM-based coding assistant that empowers developers to write code more efficiently.
I hold a Ph.D. from the University of Illinois at Urbana-Champaign, where my research focused on enabling efficient machine learning at the edge. My doctoral work explored novel techniques for optimizing deep learning models and hardware architectures to enable real-time inference on resource-constrained devices.
Download ResumePublications
BASS: Batched Attention-optimized Speculative Sampling
Haifeng Qian, Sujan Kumar Gonugondla, Sungsoo Ha, Mingyue Shang, Sanjay Krishna Gouda, Ramesh Nallapati, Sudipta Sengupta, Xiaofei Ma, Anoop Deoras (2024)
This paper proposes a effective way to implement speculative sampling under batched settings.
Read MoreToken Alignment via Character Matching for Subword Completion
Ben Athiwaratkun, Shiqi Wang, Mingyue Shang, Yuchen Tian, Zijian Wang, Sujan Kumar Gonugondla, Sanjay Krishna Gouda, Robert Kwiatkowski, Ramesh Nallapati, Bing Xiang (2024)
This paper proposes a novel approach for token alignment in subword completion tasks, leveraging character-level matching techniques to improve accuracy and efficiency.
Read MoreBifurcated Attention for Single-Context Large-Batch Sampling
Ben Athiwaratkun, Sujan Kumar Gonugondla, Sanjay Krishna Gouda, Haifeng Qian, Hantian Ding, Qing Sun, Jun Wang, Liangfu Chen, Jiacheng Guo, Parminder Bhatia, et al. (2024)
This paper introduces Bifurcated Attention for Single-Context Large-Batch Sampling and generalized multi-group attention mechanisms to improve the IO efficiency of attention computations in deep learning models.
Read MoreMulti-lingual Evaluation of Code Generation Models
Ben Athiwaratkun, Sanjay Krishna Gouda, Zijian Wang, Xiaopeng Li, Yuchen Tian, Ming Tan, Wasi Uddin Ahmad, Shiqi Wang, Qing Sun, Mingyue Shang, Sujan Kumar Gonugondla, et al. (2022)
This work presents a comprehensive evaluation of code generation models across multiple programming languages, providing insights into their performance and generalizability.
Read MoreGreener yet Powerful: Taming Large Code Generation Models with Quantization
Xiaokai Wei, Sujan Gonugondla, Wasi Ahmad, Shiqi Wang, Baishakhi Ray, Haifeng Qian, Xiaopeng Li, Varun Kumar, Zijian Wang, Yuchen Tian, et al. (2023)
This research explores the application of quantization techniques to large code generation models, aiming to reduce their environmental impact while maintaining performance.
Read MoreIMPQ: Reduced Complexity Neural Networks Via Granular Precision Assignment
Sujan Kumar Gonugondla, Naresh R Shanbhag (2022)
IMPQ is a novel technique for reducing the complexity of neural networks by assigning granular precision to weights and activations, leading to more efficient neural-nets.
Read MoreFundamental Limits on Energy-Delay-Accuracy of In-Memory Architectures in Inference Applications
Sujan K Gonugondla, Charbel Sakr, Hassan Dbouk, Naresh R Shanbhag (2022)
This work investigates the fundamental limits on energy-delay-accuracy tradeoffs in in-memory architectures for inference applications, providing insights into for system design using analog computing architectures.
Read MoreMore Publications
For more of my publications, please visit my Google Scholar page.
Explore my complete list of publications on Google Scholar to learn more about my research contributions.
Google ScholarIn the News
Articles talking about my research and/or me:
Celebrating Our Graduates: Sujan Gonugondla
by Allie Arp in Coordinated Science Laboratory, 2020
CSL students awarded prestigious IEEE-SSCS Predoctoral Achievement Award
by Allie Arp in Coordinated Science Laboratory, 2020
To Speed Up AI, Mix Memory and Processing
by Katherine Bourzac in IEEE Spectrum, 2018
7 Ideas for AI Silicon from ISSCC
by Rick Merritt in EE Times, 2018
University of Illinois, Micron enhance speed and battery life of mobile, IoT devices with deep in-memory architecture
by Kim Gudeman in Coordinated Science Laboratory, 2018
Gonugondla wins Best Paper Award at IEEE Conference
by August Schiess in Coordinated Science Laboratory, 2016