Sujan Kumar Gonugondla

Machine Learning Scientist at Amazon

Pioneering the future of AI, one algorithm at a time.

Learn More
Sujan Kumar Gonugondla

About Me

I am a Machine Learning Scientist with interests in designing and implementing AI and machine learning algorithms that are both efficient and effective. At Amazon, I led the efficient inference efforts for Amazon CodeWhisperer, an LLM-based coding assistant that empowers developers to write code more efficiently.

I hold a Ph.D. from the University of Illinois at Urbana-Champaign, where my research focused on enabling efficient machine learning at the edge. My doctoral work explored novel techniques for optimizing deep learning models and hardware architectures to enable real-time inference on resource-constrained devices.

Download Resume

Publications

Token Alignment via Character Matching for Subword Completion

Ben Athiwaratkun, Shiqi Wang, Mingyue Shang, Yuchen Tian, Zijian Wang, Sujan Kumar Gonugondla, Sanjay Krishna Gouda, Robert Kwiatkowski, Ramesh Nallapati, Bing Xiang (2023)

This paper proposes a novel approach for token alignment in subword completion tasks, leveraging character-level matching techniques to improve accuracy and efficiency.

Read More

On IO-efficient attention mechanisms: Context-aware bifurcated attention and the generalized multi-group attention

Ben Athiwaratkun, Sujan Kumar Gonugondla, Sanjay Krishna Gouda, Haifeng Qian, Hantian Ding, Qing Sun, Jun Wang, Liangfu Chen, Jiacheng Guo, Parminder Bhatia, et al. (2023)

This paper introduces Bifurcated Attention for Single-Context Large-Batch Sampling and generalized multi-group attention mechanisms to improve the IO efficiency of attention computations in deep learning models.

Read More

Multi-lingual Evaluation of Code Generation Models

Ben Athiwaratkun, Sanjay Krishna Gouda, Zijian Wang, Xiaopeng Li, Yuchen Tian, Ming Tan, Wasi Uddin Ahmad, Shiqi Wang, Qing Sun, Mingyue Shang, Sujan Kumar Gonugondla, et al. (2022)

This work presents a comprehensive evaluation of code generation models across multiple programming languages, providing insights into their performance and generalizability.

Read More

Greener yet Powerful: Taming Large Code Generation Models with Quantization

Xiaokai Wei, Sujan Gonugondla, Wasi Ahmad, Shiqi Wang, Baishakhi Ray, Haifeng Qian, Xiaopeng Li, Varun Kumar, Zijian Wang, Yuchen Tian, et al. (2023)

This research explores the application of quantization techniques to large code generation models, aiming to reduce their environmental impact while maintaining performance.

Read More

IMPQ: Reduced Complexity Neural Networks Via Granular Precision Assignment

Sujan Kumar Gonugondla, Naresh R Shanbhag (2022)

IMPQ is a novel technique for reducing the complexity of neural networks by assigning granular precision to weights and activations, leading to more efficient neural-nets.

Read More

Fundamental Limits on Energy-Delay-Accuracy of In-Memory Architectures in Inference Applications

Sujan K Gonugondla, Charbel Sakr, Hassan Dbouk, Naresh R Shanbhag (2022)

This work investigates the fundamental limits on energy-delay-accuracy tradeoffs in in-memory architectures for inference applications, providing insights into for system design using analog computing architectures.

Read More

More Publications

For more of my publications, please visit my Google Scholar page.

Explore my complete list of publications on Google Scholar to learn more about my research contributions.

Google Scholar

In the News

Articles talking about my research and/or me:

Celebrating Our Graduates: Sujan Gonugondla

by Allie Arp in Coordinated Science Laboratory, 2020

CSL students awarded prestigious IEEE-SSCS Predoctoral Achievement Award

by Allie Arp in Coordinated Science Laboratory, 2020

To Speed Up AI, Mix Memory and Processing

by Katherine Bourzac in IEEE Spectrum, 2018

7 Ideas for AI Silicon from ISSCC

by Rick Merritt in EE Times, 2018

Gonugondla wins Best Paper Award at IEEE Conference

by August Schiess in Coordinated Science Laboratory, 2016

Blog

Coming Soon