Publications

For full list of publications see Google Scholar (*: Equal Contribution, ^: Joint Advising)

Multilingual Needle in a Haystack: Investigating Long-Context Behavior of Multilingual Large Language Models
Amey Hengle, Prasoon Bajpai, Soham Dan^, Tanmoy Chakraborty^. NAACL 2025
AssertionBench: A Benchmark to Evaluate Large-Language Models for Assertion Generation
Vaishnavi Pulavarthi, Deeksha Nandal, Soham Dan^, Debjit Pal^. Findings of NAACL 2025
CTBench: A Comprehensive Benchmark for Evaluating Language Model Capabilities in Clinical Trial Design
Nafis Neehal, Bowen Wang, Shayom Debopadhaya, Soham Dan, Keerthiram Murugesan, Vibha Anand, Kristin P. Bennett. Findings of NAACL 2025
Large Language Models can be Strong Self-Detoxifiers
Ching-Yun Ko, Pin-Yu Chen, Payel Das, Youssef Mroueh, Soham Dan, Georgios Kollias, Subhajit Chaudhury, Tejaswini Pedapati, Luca Daniel. ICLR 2025
On the Effects of Fine-tuning Language Models for Text-Based Reinforcement Learning
Mauricio Gruppi, Soham Dan, Keerthiram Murugesan, Subhajit Chaudhury. COLING 2025

NeuroPrune: A Neuro-inspired Topological Sparse Training Algorithm for Large Language Models
Amit Dhurandhar, Tejaswini Pedapati, Ronny Luss, Soham Dan, Aurelie Lozano, Payel Das, Georgios Kollias. Findings of ACL 2024
API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs
Kinjal Basu, Ibrahim Abdelaziz, Subhajit Chaudhury, Soham Dan, Maxwell Crouse, Asim Munawar, Sadhana Kumaravel, Vinod Muthusamy, Pavan Kapanipathi and Luis A. Lastras. ACL 2024
Larimar: Large Language Models with Episodic Memory Control
Payel Das, Subhajit Chaudhury, Elliot Nelson, Igor Melnyk, Sarath Swaminathan, Sihui Dai, Aurélie Lozano, Georgios Kollias, Vijil Chenthamarakshan, Soham Dan, Pin-Yu Chen. ICML 2024
Language Guided Exploration for RL Agents in Text Environments
Hitesh Golchha, Sahil Yerawar, Dhruvesh Patel, Soham Dan and Keerthiram Murugesan. Findings of NAACL 2024
On the Generalization Capacity of Neural Networks During Generic Multimodal Reasoning
Takuya Ito, Soham Dan, Mattia Rigotti, James Kozloski and Murray Campbell. ICLR 2024
Generalized Planning in PDDL Domains with Pretrained Large Language Models
Tom Silver, Soham Dan, Kavitha Srinivas, Joshua B Tenenbaum, Leslie Pack Kaelbling and Michael Katz. AAAI 2024

In and Out-of-Domain Text Adversarial Robustness via Label Smoothing
Yahan Yang*, Soham Dan*, Dan Roth and Insup Lee. ACL 2023
MISMATCH: Fine-grained Evaluation of Machine-generated Text with Mismatch Error Types
Keerthiram Murugesan, Sarathkrishna Swaminathan, Soham Dan, Subhajit Chaudhury, Chulaka Gunasekara, Maxwell Crouse, Diwakar Mahajan, Ibrahim Abdelaziz, Achille Fokoue, Pavan Kapanipathi, Salim Roukos and Alexander Gray. Findings of ACL 2023
One Arrow, Two Kills: An Unified Framework for Achieving Optimal Regret Guarantees in Sleeping Bandits
Pierre Gaillard, Aadirupa Saha, Soham Dan. AISTATS 2023
Two-Sample Tests for Inhomogeneous Random Graphs in 𝐿𝑟 norm: Optimality and Asymptotics
Sayak Chatterjee, Dibyendu Saha, Soham Dan and Bhaswar B. Bhattacharya. AISTATS 2023

Understanding Robust Generalization in Learning Regular Languages
Soham Dan, Osbert Bastani and Dan Roth. ICML 2022
Cross-modal Map Learning for Vision and Language Navigation
Georgios Georgakis, Karl Schmeckpeper, Karan Wanchoo, Soham Dan, Eleni Miltsakaki, Dan Roth, and Kostas Daniilidis. CVPR 2022

Few-Shot Novel Concept Learning for Semantic Parsing
Soham Dan, Osbert Bastani and Dan Roth. Findings of EMNLP 2021
Compositional Data and Task Augmentation for Instruction Following
Soham Dan*, Xinran Han* and Dan Roth. Findings of EMNLP 2021
On the Effects of Transformer Size on In- and Out-of-Domain Calibration
Soham Dan, Dan Roth. Findings of EMNLP 2021
Learning from Noisy Similar and Dissimilar Data
Soham Dan, Han Bao, Masashi Sugiyama. ECML-PKDD 2021
Variance Reduced Stochastic Proximal Algorithm for AUC Maximization
Soham Dan *, Dushyant Sahoo*. ECML-PKDD 2021
Generalization in Instruction Following Systems
Soham Dan, Michael Zhou, Dan Roth. NAACL 2021.
Human-guided Collaborative Problem Solving: A Natural Language based Framework
Harsha Kokel, Mayukh Das, Rakibul Islam, Julia Bonn, Jon Cai, Soham Dan, Anjali Narayan-Chen, Prashant Jayannavar, Janardhan Doppa, Julia Hockenmaier, Sriraam Natarajan, Martha Palmer, and Dan Roth. ICAPS 2021 (demos)
Proceedings of the First Workshop on Interactive Learning for Natural Language Processing
Kiante Brantley, Soham Dan, Iryna Gurevych, Ji-Ung Lee, Filip Radlinski, Hinrich Schutze, Edwin Simpson, and Lili Yu. InterNLP Workshop @ ACL 2021.

A Locally Linear Procedure for Word Translation
Soham Dan, Hagai Taitelbaum, Jacob Goldberger. COLING 2020.
Goodness-of-Fit Tests for Inhomogeneous Random Graphs
Soham Dan, Bhaswar B Bhattacharya. ICML 2020.
From Spatial Relations to Spatial Configurations
Soham Dan, Parisa Kordjamshidi, Julia Bonn, Archna Bhatia, Zheng Cai, Martha Palmer, Dan Roth. LREC 2020.
Understanding Spatial Relations through Multiple Modalities
Soham Dan, Hangfeng He, Dan Roth. LREC 2020.

AppTechMiner: Mining Applications and Techniques from Scientific Articles
Mayank Singh*, Soham Dan*, Sanyam Agarwal*, Pawan Goyal, Animesh Mukherjee. WOSP 2017.
Towards problem solving agents that communicate and learn
Anjali Narayan-Chen, Colin Graber, Mayukh Das, Md Rakibul Islam, Soham Dan, Sriraam Natarajan, Janardhan Rao Doppa, Julia Hockenmaier, Martha Palmer, Dan Roth. Proceedings of the First Workshop on Language Grounding for Robotics 2017.

Identifying and characterizing truck stops from GPS data
Russel Aziz, Manav Kedia, Soham Dan, Sayantan Basu, Sudeshna Sarkar, Sudeshna Mitra, Pabitra Mitra. ICDM 2016.

Segmenting Highway Network Based on Speed Profiles
Russel Aziz, Manav Kedia, Soham Dan, Sudeshna Sarkar, Sudeshna Mitra, Pabitra Mitra. IEEE ITSC 2015.

Techniques for generating a psychographic profile
Kokil Jaidka, Vamsi Krishna Bokam, Soham Dan, Atanu R. Sinha, Yogesh Singh.