Sujan Kumar Gonugondla

Machine Learning Researcher

Pioneering the future of AI, one algorithm at a time.

Learn More

About Me

I am a Machine Learning Scientist with interests in designing and implementing AI and machine learning algorithms that are both efficient and effective. At Amazon, I led the efficient inference efforts for Amazon Q Developer, an LLM-based coding assistant that empowers developers to write code more efficiently.

I hold a Ph.D. from the University of Illinois at Urbana-Champaign, where my research focused on enabling efficient machine learning at the edge. My doctoral work explored novel techniques for optimizing deep learning models and hardware architectures to enable real-time inference on resource-constrained devices.

Download Resume

Publications

BASS: Batched Attention-optimized Speculative Sampling

Haifeng Qian, Sujan Kumar Gonugondla, Sungsoo Ha, Mingyue Shang, Sanjay Krishna Gouda, Ramesh Nallapati, Sudipta Sengupta, Xiaofei Ma, Anoop Deoras (2024)

This paper proposes a effective way to implement speculative sampling under batched settings.

Token Alignment via Character Matching for Subword Completion

Ben Athiwaratkun, Shiqi Wang, Mingyue Shang, Yuchen Tian, Zijian Wang, Sujan Kumar Gonugondla, Sanjay Krishna Gouda, Robert Kwiatkowski, Ramesh Nallapati, Bing Xiang (2024)

This paper proposes a novel approach for token alignment in subword completion tasks, leveraging character-level matching techniques to improve accuracy and efficiency.

Bifurcated Attention for Single-Context Large-Batch Sampling

Ben Athiwaratkun, Sujan Kumar Gonugondla, Sanjay Krishna Gouda, Haifeng Qian, Hantian Ding, Qing Sun, Jun Wang, Liangfu Chen, Jiacheng Guo, Parminder Bhatia, et al. (2024)

This paper introduces Bifurcated Attention for Single-Context Large-Batch Sampling and generalized multi-group attention mechanisms to improve the IO efficiency of attention computations in deep learning models.

Multi-lingual Evaluation of Code Generation Models

Ben Athiwaratkun, Sanjay Krishna Gouda, Zijian Wang, Xiaopeng Li, Yuchen Tian, Ming Tan, Wasi Uddin Ahmad, Shiqi Wang, Qing Sun, Mingyue Shang, Sujan Kumar Gonugondla, et al. (2022)

This work presents a comprehensive evaluation of code generation models across multiple programming languages, providing insights into their performance and generalizability.

Greener yet Powerful: Taming Large Code Generation Models with Quantization

Xiaokai Wei, Sujan Gonugondla, Wasi Ahmad, Shiqi Wang, Baishakhi Ray, Haifeng Qian, Xiaopeng Li, Varun Kumar, Zijian Wang, Yuchen Tian, et al. (2023)

This research explores the application of quantization techniques to large code generation models, aiming to reduce their environmental impact while maintaining performance.

IMPQ: Reduced Complexity Neural Networks Via Granular Precision Assignment

Sujan Kumar Gonugondla, Naresh R Shanbhag (2022)

IMPQ is a novel technique for reducing the complexity of neural networks by assigning granular precision to weights and activations, leading to more efficient neural-nets.

Fundamental Limits on Energy-Delay-Accuracy of In-Memory Architectures in Inference Applications

Sujan K Gonugondla, Charbel Sakr, Hassan Dbouk, Naresh R Shanbhag (2022)

This work investigates the fundamental limits on energy-delay-accuracy tradeoffs in in-memory architectures for inference applications, providing insights into for system design using analog computing architectures.

More Publications

For more of my publications, please visit my Google Scholar page.

Explore my complete list of publications on Google Scholar to learn more about my research contributions.

Sujan Kumar Gonugondla

Machine Learning Researcher

About Me

Publications

BASS: Batched Attention-optimized Speculative Sampling

Token Alignment via Character Matching for Subword Completion

Bifurcated Attention for Single-Context Large-Batch Sampling

Multi-lingual Evaluation of Code Generation Models

Greener yet Powerful: Taming Large Code Generation Models with Quantization

IMPQ: Reduced Complexity Neural Networks Via Granular Precision Assignment

Fundamental Limits on Energy-Delay-Accuracy of In-Memory Architectures in Inference Applications

More Publications

In the News

Celebrating Our Graduates: Sujan Gonugondla

CSL students awarded prestigious IEEE-SSCS Predoctoral Achievement Award

To Speed Up AI, Mix Memory and Processing

7 Ideas for AI Silicon from ISSCC

University of Illinois, Micron enhance speed and battery life of mobile, IoT devices with deep in-memory architecture

Gonugondla wins Best Paper Award at IEEE Conference

Blog

Coming Soon