Principal Engineer & Research Manager  ·  Intel

Saurav Sahay
AI Researcher & Leader

14+ years advancing safe, responsible, and intelligent AI systems.
Specializing in AI Safety, Agentic AI, Conversational AI, and Multimodal AI at Intel Labs. Ph.D., Georgia Institute of Technology.

14+
Years in AI
30+
Publications
5
Patents

About

Building AI that is powerful and trustworthy

I am a Principal Engineer and Research Engineering Manager at Intel's AI Innovation Group, where I lead research across two interconnected frontiers: Agentic AI systems — including cost-aware planning, routing, and orchestration for heterogeneous inference environments — and AI Safety, with active published research on efficient guardrail systems that make safe, responsible LLM deployment practical at scale. I also contribute to Intel's Responsible AI policy development and internal compliance programs.

Beyond Intel, I actively contribute to the MLCommons AI Risk & Reliability (AIRR) working group, co-authoring benchmarks and methodologies covering model security, jailbreak robustness, and agentic AI safety evaluation. My research spans the full spectrum of modern AI: from foundational NLP and dialog systems to LLM safety, bias detection, and enterprise AI adaptation for industrial environments.

I hold a Ph.D. in Computer Science from Georgia Institute of Technology, where my dissertation explored Socio-Semantic Conversational Information Access. Before Intel, I built healthcare AI at Siemens, worked on IBM's Watson (DeepQA) project, and co-founded a venture-backed healthcare AI startup. I serve on the program committees of ICML, NeurIPS, ICLR, ACL, EMNLP, and COLM.

Core Expertise

🛡️
AI Safety & Responsible AI
LLM safety benchmarking (MLCommons AIRR), jailbreak robustness methodology, RAI policy, bias detection, and production guardrail systems.
🤖
Agentic AI Systems
Multi-agent architectures with cost-aware planning, routing, and orchestration for heterogeneous inference. Agentic product maturity frameworks.
💬
Conversational AI
15+ years building dialog systems, NLU pipelines, and task-oriented agents across education, manufacturing, and assistive technology.
🧠
Large Language Models
Domain adaptation, PEFT, knowledge-enhanced LLMs, red-teaming for enterprise, and bias mitigation through model merging and fine-tuning.
👁️
Multimodal AI
Vision-language systems, multimodal fusion architectures, emotion understanding, and task guidance systems for industrial smart manufacturing.
🎓
Research Leadership
Managing distributed teams across US, Mexico, Germany, and Taiwan. University partnerships, grant writing experience, and mentorship.

Experience

Career Timeline

Principal Engineer & Research Engineering Manager Sept 2025 – Present
Multimodal Dialog & Interaction (MDI) · AI Innovation Group · Intel

Leading pathfinding research on Agentic AI systems: cost-aware planning, routing, guardrails, and orchestration within heterogeneous inference system development. Core contributor to Intel's Responsible AI policy development and internal compliance programs.

Contributor / Volunteer Jan 2024 – Present
MLCommons AI Risk & Reliability (AIRR) Working Group

Active contributor on LLM model evaluations, security (jailbreaks), and agentic benchmark development. Co-author of the AI Safety Benchmark v0.5 and the jailbreak robustness methodology pre-print.

Principal Engineer & AI Research Science Manager Feb 2024 – Aug 2025
Multimodal Dialog & Interaction Lab · Intelligent Systems Research Division · Intel Labs

Led a team on LLM applications spanning Responsible AI and Agentic AI: domain adaptation for enterprise data, agentic analytics for semiconductor manufacturing sensor data, and multimodal task guidance for industrial smart manufacturing.

Staff Scientist & Manager Feb 2019 – Jan 2024
Multimodal Dialog & Interaction Lab · Intelligent Systems Research Division · Intel Labs

Led a globally distributed team of scientists and contractors across the US, Mexico, Germany, and Taiwan. Delivered projects in education (multimodal dialog systems), manufacturing (vision-language systems), collaboration (multimodal meeting assistance), and assistive computing. Managed university-funded research on Few-Shot Learning and Dialog Systems. Core member of Intel's Responsible AI Council.

Senior Research Scientist Feb 2017 – Feb 2019
Anticipatory Computing Lab · Software & System Research Division · Intel Labs

Multimodal emotion understanding and dialog systems. Extended NLU and dialog management algorithms for the open-source Rasa platform. Led researchers and interns in a tech-lead capacity.

Research Scientist Oct 2012 – Feb 2017
Anticipatory Computing Lab · Software & System Research Division · Intel Labs

Developed the Cognitive Linguistics Information Platform featuring keyterm extraction, intent recognition, colloquial text normalization, knowledge-based missing information fulfillment, topic discovery, and sentiment analysis.

Research Scientist Aug 2011 – Oct 2012
Translational Informatics & Special Projects · Siemens Corporate Research · Princeton, NJ

Healthcare decision support, text analytics, semantic search, ontology-based reasoning, and data mining for patient-physician information systems.

CTO & Co-founder Aug 2010 – July 2011
Cobot Health Corporation · Georgia Tech VentureLab Spin-out

Founded a healthcare AI startup based on dissertation research with venture funding from Georgia Tech's VentureLab. Developed and deployed the Cobot Intelligent Assistant widget on a third-party platform.

Research Intern Summers 2005, 2006, 2010
IBM T.J. Watson Research Center · Hawthorne, NY & New Delhi, India

Contributed to the Watson (DeepQA) project with the medical team. Built biomedical semantic search and relation extraction systems. Customized the Slot Grammar Parser for medical ontologies and ontology-based semantic distances for improved answer-type detection.

Research

Selected Publications

Recent and representative work — spanning AI Safety, Agentic AI, LLMs, and Conversational AI. Full list on Google Scholar.

A Robust, Defensible, and Reproducible Methodology for Benchmarking Single-Turn Jailbreak Attacks on Large Language Models
Carsten Maple, Saurav Sahay, et al.
MLCommons · 2026 ↗ Paper
Agentic Product Maturity Ladder V0.1
Sean McGregor, Saurav Sahay, et al.
MLCommons · 2026 ↗ Paper
AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons
Shaona Ghosh, Saurav Sahay, et al.
ArXiv 2503.05731 · 2025
Safeguard Fine-Tuned LLMs Through Pre- and Post-Tuning Model Merging
Hua Farn, Hsuan Su, Shachi H Kumar, Saurav Sahay, Shang-Tse Chen, Hung-yi Lee
EMNLP Findings · 2025 ↗ ArXiv
Thoughts without Thinking: Reconsidering the Explanatory Value of Chain-of-Thought Reasoning in LLMs through Agentic Pipelines
Ramesh Manuvinakurike, Emanuel Moss, Elizabeth Watkins, Saurav Sahay, et al.
HCXAI @ CHI · 2025 ↗ ArXiv
Decoding Biases: Automated Methods and LLM Judges for Gender Bias Detection in Language Models
Shachi H Kumar, Saurav Sahay, Sahisnu Mazumder, Eda Okur, Ramesh Manuvinakurike, et al.
NeurIPS 2024 · Red Teaming GenAI Workshop
Introducing v0.5 of the AI Safety Benchmark from MLCommons
Bertie Vidgen, Saurav Sahay, et al.
ArXiv 2404.12241 · 2024
Systematic Analysis for Pretrained Language Model Priming for Parameter-Efficient Fine-tuning
Shih-Cheng Huang, Shih-Heng Wang, Min-Han Shih, Saurav Sahay, Hung-yi Lee
NAACL · 2024
Learning from Red Teaming: Gender Bias Provocation and Mitigation in Large Language Models
Hsuan Su, Cheng-Chu Farn, Shachi H Kumar, Saurav Sahay, Shang-Tse Chen, Hung-yi Lee
ArXiv 2310.11079 · 2023
Low Rank Fusion based Transformers for Multimodal Sequences
Saurav Sahay, Eda Okur, Shachi H Kumar, Lama Nachman
ACL · Human Multimodal Language Workshop · 2020
Technology Solutions to Combat Online Harassment
George Kennedy, Andrew McCollough, Edward Dixon, Alexei Bastidas, John Ryan, Chris Loo, Saurav Sahay
ACL · Workshop on Abusive Language Online · 2017
View all 30+ publications on Google Scholar ↗

Patents

US 10,380,256 Technologies for Automated Context-Aware Media Curation — Nachman, Sahay et al. (2019)
US 9,781,392 Saurav Sahay, Nachman et al. — Facilitating Personal Assistance for Curation of Multimedia and Generation of Stories (2017)
WO2016105803 Pereg, Wasserblat, Sahay et al. — Hybrid Techniques for Sentiment Analysis (2016)
US20180174244 Savage, Nachman, Sahay, Raffa — Socially and Contextually Appropriate Recommendation Systems (2015)
US 9,948,689 Savage, Wouhaybi, Nachman, Sahay — Online Social Persona Management (2014)

Program Committee & Community Service

ICML NeurIPS ICLR ACL EMNLP COLM MLCommons AIRR Intel Responsible AI Council

Technical Skills

Languages
Python Java C++ C Perl
Frameworks
PyTorch Transformers vLLM OpenVINO Ray Rasa
Tools
Docker Kubernetes Git Slurm

Thought Leadership

Writing & Commentary

On AI Safety, bias in language models, and responsible development of intelligent systems.

✍️
Understanding and Addressing Bias in Conversational AI
An in-depth exploration of the challenges in building fair, transparent, and safe conversational AI systems — covering sources of bias in language models, mitigation strategies, and practical approaches for responsible deployment. Published on Intel Technology Community.
Intel Technology Community Portal

Blog

Perspectives & Reflections

From Low-Rank Fusion to LoRA: When Old Ideas Find New Life

How a 2020 applied research paper on multimodal emotion understanding shares its mathematical DNA with today's biggest AI breakthroughs — and why the scientific community has more buried treasure than we think.

March 2025  ·  8 min read

In 2020, my co-authors and I at Intel Labs published a modest paper: Low Rank Fusion based Transformers for Multimodal Sequences. We weren't trying to change the world. We were trying to solve a practical problem: how do you get a model to understand human emotion by fusing what someone says, how they sound, and what their face does — without blowing up your parameter count?

Five years later, I find myself doing a double-take. The core mathematical ideas we used — low-rank factorization, cross-modal attention, parameter-efficient fusion — have become the backbone of techniques now powering the generative AI revolution. Not because of our paper, but because these ideas were always good. They just needed the right moment.

This blog isn't a victory lap. It's a reflection on how applied research plants seeds that sometimes bloom in unexpected places — and a call to dig deeper into the scientific literature for ideas whose time may have finally come.

The Problem We Were Solving

In 2020, "multimodal AI" meant classification, not generation. We weren't building chatbots that see and hear. We were building systems that could watch a YouTube video and tell you whether the speaker felt happy, sad, or angry — by jointly processing their facial expressions, vocal tone, and words.

The dominant approach at the time, the Multimodal Transformer (MulT), used nine parallel transformer models with pairwise cross-modal attention between every combination of modalities. It worked, but it was expensive — over a million parameters for a classification task.

Our question was simple: do you really need all of that?

Low-Rank Fusion: The Core Idea

The key insight was that the interaction space between modalities — the full tensor product of language, audio, and vision representations — is high-dimensional but has low intrinsic rank. You don't need to compute the entire thing.

Low Rank Matrix Factorization — instead of computing the expensive full tensor product of three modalities, we decompose it into low-rank modality-specific factors that are multiplied together efficiently.
Figure 2 from our paper: Low Rank Matrix Factorization. The unimodal tensor sequences are decomposed into low-rank modality-specific factors, avoiding the expensive full Cartesian product.

We used Low Rank Matrix Factorization (LMF) to approximate the full multimodal interaction tensor using compact, modality-specific factors. This fused representation then served as a hub — individual modalities would attend to it via cross-modal transformers to enrich their own representations.

Low Rank Fusion Transformer architecture — our most compact model, where all three modalities attend to the LMF-fused signal through a single crossmodal transformer, followed by one self-attention transformer for prediction.
Figure 4 from our paper: The Low Rank Fusion Transformer (LMF-MulT). Our most compact architecture — individual modalities attend to the fused signal, using far fewer transformers than competing approaches.

The result? Comparable performance to MulT with roughly half the parameters and 40% faster training. Not a breakthrough in accuracy — a breakthrough in efficiency.

The Mathematical Thread to LoRA

Here's where it gets interesting. In 2021, Hu et al. published LoRA: Low-Rank Adaptation of Large Language Models, and the technique quickly became the default way to fine-tune large models. The core idea? The weight update matrix during fine-tuning is high-dimensional but has low intrinsic rank. Instead of updating billions of parameters, you learn two small low-rank matrices.

Sound familiar?

We applied low-rank factorization to compress the multimodal fusion space. LoRA applies it to compress the weight update space. Different problem, same mathematical soul. The shared insight is that high-dimensional interactions in neural networks are often surprisingly low-rank — you can approximate them cheaply without losing what matters.

This wasn't a coincidence. The mathematical machinery traces back through Liu et al.'s 2018 LMF work, and further to classical matrix decomposition techniques. By 2024–2025, the circle closed further with Tensor LoRA methods (LoRTA, TT-LoRA) that use higher-order tensor decompositions — the very same family of techniques used in multimodal fusion research — to compress adaptation across layers and attention heads simultaneously.

From Multi-Tower to Single-Tower: A Journey We Started

In 2020, multimodal meant multi-tower. Separate encoders for each modality, with engineered bridges between them. Our work was already pushing toward consolidation — using a fused signal as a central hub to reduce the number of separate transformer stacks.

Today, the field has completed that journey. Models like Gemini, GPT-4o, and Claude process all modalities through a shared transformer backbone. Vision patches, audio tokens, and text tokens are projected into a unified embedding space, and a single attention mechanism handles the rest. Cross-modal reasoning isn't engineered anymore — it emerges from scale.

Our multi-tower architectures feel like artifacts now, but the impulse behind them — fewer towers, more fusion, less redundancy — was pointing in exactly the right direction.

The Bigger Point: Buried Treasure in the Literature

This is the part I care about most. Our paper wasn't foundational work. It was an applied contribution to multimodal sentiment analysis, building on ideas from Tsai et al. and Liu et al. But the mathematical principles we explored turned out to matter far beyond our specific problem.

This pattern repeats across AI research. Mixture of Experts was proposed in 1991 but didn't go mainstream until Mixtral in 2023. Attention mechanisms existed for years before the Transformer made them universal. Contrastive learning lived in metric learning papers long before CLIP made it transformative.

And the pattern continues today. Consider DeepSeek's recent Engram work (January 2025), which introduces conditional memory — a dedicated lookup mechanism for factual knowledge that complements the conditional computation of Mixture-of-Experts. Their key finding: given a fixed compute budget, the optimal architecture allocates ~75–80% of sparse capacity to dynamic computation (MoE) and ~20–25% to static memory lookup. The result is dramatic improvements on knowledge-intensive benchmarks with negligible throughput cost.

Engram is a perfect example of an idea that the broader community should pay attention to: separating what a model knows from how it thinks. It's the kind of architectural innovation — bringing structured domain knowledge into the learning process through dedicated mechanisms rather than forcing everything through the same compute pathway — that echoes years of prior work on knowledge integration, memory-augmented networks, and domain-adapted architectures.

How much more of this is sitting in workshop papers, applied research, and domain-specific venues, waiting for the right context to become relevant?

Looking Forward

The AI field moves fast, but the mathematical foundations evolve more slowly. Low-rank structure, efficient fusion, parameter sharing, domain-aware architectures — these aren't trends. They're principles. They were useful when we were classifying emotions from YouTube videos, and they're useful now that we're building models that see, hear, and generate.

If there's one takeaway, it's this: read more papers. Not just the ones at the top of the leaderboard, but the applied work, the workshop contributions, the domain-specific explorations. The next LoRA might already be published. It might just be waiting for its moment.


Saurav Sahay was a researcher at Intel Labs' Anticipatory Computing Lab. The original paper, "Low Rank Fusion based Transformers for Multimodal Sequences" (Sahay, Okur, Kumar, Nachman, 2020), is available on arXiv.

Education

Academic Background

Ph.D., Computer Science
Georgia Institute of Technology
2004 – 2011  ·  Atlanta, Georgia
Thesis: Socio-Semantic Conversational Information Access
M.S., Computer Science
Georgia Institute of Technology
2004 – 2009  ·  Atlanta, Georgia
Also completed coursework for the MS Bioinformatics program

Connect

Get in Touch

Open to conversations about AI Safety, Responsible AI, Agentic systems, research collaborations, and speaking opportunities.