Researcher · AI / ML / CompBio

Tuan Q. Dinh

Postdoc · UCSF & Maze Therapeutics

I build and study foundation models — with a focus on what happens after pre-training: alignment, adaptation, hallucination, and generalization to new domains. Currently applying these ideas to computational biology (proteins, genetics) as a postdoc at UCSF and Maze Therapeutics. Ph.D. from UW–Madison with Kangwook Lee, where I worked on modular and robust machine learning systems.

■  What I'm working on

Right now I'm thinking a lot about how protein language models encode functional information — and whether that signal is recoverable enough to drive variant effect prediction at clinical scale. It's a surprisingly hard problem: the models are powerful but their uncertainty is poorly calibrated.

More broadly I'm interested in continual learning and domain adaptation — how to keep a model useful as the world (or the data) changes, without forgetting what it already knows.


May'26 Happy to receive complimentary registration for ICML 2026 as "Technical Reviewer : Gold"
Mar'26 VESM got accepted to Nature Methods - unsupervised co-distillation of protein LMs for variant effect prediction
May'25 Two collaborative works (with UCLA and UCSF) on protein LMs are accepted to Cell Systems
May'25 TabFlex accepted to ICML 2025 (Spotlight) — scaling tabular learning with linear attention

Nature Methods 2026
VESM: Compressing the collective knowledge of ESMs
A co-distillation framework that teaches protein LMs to learn from each other — achieving SoTA variant effect prediction on clinical genetics benchmarks w/o structural or alignment data.
Protein LMDistillationUnsupervised Learning
Paper & code
NeurIPS 2023
LLMs of Code Fail on Buggy Completions
Revealed a systematic blind spot: code LLMs confidently complete buggy code even when prompted to be cautious. Built a benchmark exposing this failure across GPT-4, Codex, and others.
LLM EvaluationCode GenerationHallucination
Paper & code
NeurIPS 2022
LIFT: Language-Interfaced Fine-Tuning
Serializing non-language tasks as natural language and fine-tuning a language model on them works surprisingly well — opening a path toward unified LLM-based ML pipelines across tabular and structured data.
LLMFine-tuningTabular
Paper & code

Nature Methods 20262026
Tuan Dinh, et al.
Protein Language ModelCo-Distillation
ICML 2025 — Spotlight2025
Yuchen Zeng*, Tuan Dinh*, et al.
TabularLinear Attention
NeurIPS 20232023
Tuan Dinh, et al. (Amazon Science)
LLMCodeHallucination
NeurIPS 20222022
Tuan Dinh*, Yuchen Zeng*, Ruisu Zhang*, et al.
LLMFine-tuning
EMNLP 2022 (Findings)2022
Tuan Dinh*, Jy-yong Sohn*, et al.
MultimodalWord Alignment
ICML 2021 — Oral2021
Tuan Dinh, Kangwook Lee.
MLSysGAN

Full list on Google Scholar. Includes 2 US patents in deep learning optimization and inverse graphics.


Currently reading

Alternating between ML papers and whatever I can finish on train/shuttle. Favorite: The Three-Body Problem, and a lot on mechanistic interpretability.

On poetry

I read and sometimes translate old poems — searching for ones written for a specific moment. One I keep returning to:

"I chase the mist where whispers lie, The wise take wing beneath the sky."

Background

Originally from Hue, Vietnam. The name is Tuan — [tʰwɑ̃n]. In Vietnamese, the name structure is reverse of standard English; and the given name is primary.