Machine Learning Researcher
Soham Dan
I build reliable language and agentic systems with an emphasis on reasoning, robustness, memory, and efficient learning.
I am a Machine Learning Researcher at Scale AI working on LLM agents. Previously, I was a Senior Research Scientist at Microsoft working on Copilot and a Research Scientist at IBM Research. I completed my Ph.D. in Computer and Information Science at the University of Pennsylvania, where I was advised by Prof. Dan Roth in the Cognitive Computation Group. Before that, I earned my undergraduate degree in Computer Science from IIT Kharagpur, where I was awarded the President of India Gold Medal.

Research areas
Language models that are more dependable
My work spans LLM agents, reasoning and planning, long-context behavior, memory, multilingual modeling, robustness, and alignment. I am interested in systems that perform reliably outside narrow benchmark settings.
Current focus
Practical evaluation and control for agentic systems
I care about the gap between what models can do in demos and what they can do in sustained deployment. That includes benchmarks, training algorithms, and evaluation methods that surface failure modes early.
Background
Industry and research across multiple settings
My research has crossed academia and industry, with work published in venues including ACL, NAACL, EMNLP, ICML, ICLR, CVPR, COLING, and AISTATS.
Selected recent work
Representative papers
Large Language Models can be Strong Self-Detoxifiers
How language models can reduce toxic behavior via their own generation and control mechanisms — no external classifier required.
Multilingual Needle in a Haystack
Investigating the long-context behavior of multilingual LLMs and where their retrieval fidelity degrades across languages.
Larimar: LLMs with Episodic Memory Control
Memory control mechanisms for LLMs, with implications for persistent and agentic systems that need editable, attributable knowledge.
Generalized Planning in PDDL Domains with Pretrained LLMs
Using pretrained LLMs to synthesize generalized plans across PDDL domains — connecting symbolic planning and language modeling.
API-BLEND: Training and Benchmarking API LLMs
A comprehensive corpus for training and evaluating LLMs that invoke external tools and APIs in realistic settings.
Robust Generalization in Learning Regular Languages
How models generalize under robustness constraints, connecting formal language theory and the empirical behavior of neural learners.
Get in touch
Open to research conversations
If your work touches LLM agents, reasoning, evaluation, or dependable machine learning systems, the best starting points are Google Scholar, LinkedIn, and my CV.