Researcher · AI / ML / CompBio

Tuan Q. Dinh

Postdoc · UCSF & Maze Therapeutics

I build and study foundation models — with a focus on what happens after pre-training: alignment, adaptation, hallucination, and generalization to new domains. Currently applying these ideas to computational biology (proteins, genetics) as a postdoc at UCSF and Maze Therapeutics. Ph.D. from UW–Madison with Kangwook Lee, where I worked on modular and robust machine learning systems.

Google Scholar GitHub Email CV

■ What I'm working on

Right now I'm thinking a lot about how protein language models encode functional information — and whether that signal is recoverable enough to drive variant effect prediction at clinical scale. It's a surprisingly hard problem: the models are powerful but their uncertainty is poorly calibrated.

More broadly I'm interested in continual learning and domain adaptation — how to keep a model useful as the world (or the data) changes, without forgetting what it already knows.

News

May'26 Happy to receive complimentary registration for ICML 2026 as "Technical Reviewer : Gold"

Mar'26 VESM got accepted to Nature Methods - unsupervised co-distillation of protein LMs for variant effect prediction

May'25 Two collaborative works (with UCLA and UCSF) on protein LMs are accepted to Cell Systems

May'25 TabFlex accepted to ICML 2025 (Spotlight) — scaling tabular learning with linear attention

Selected Projects

Nature Methods 2026

VESM: Compressing the collective knowledge of ESMs

A co-distillation framework that teaches protein LMs to learn from each other — achieving SoTA variant effect prediction on clinical genetics benchmarks w/o structural or alignment data.

Protein LMDistillationUnsupervised Learning

Paper & code

NeurIPS 2023

LLMs of Code Fail on Buggy Completions

Revealed a systematic blind spot: code LLMs confidently complete buggy code even when prompted to be cautious. Built a benchmark exposing this failure across GPT-4, Codex, and others.

LLM EvaluationCode GenerationHallucination

Paper & code

NeurIPS 2022

LIFT: Language-Interfaced Fine-Tuning

Serializing non-language tasks as natural language and fine-tuning a language model on them works surprisingly well — opening a path toward unified LLM-based ML pipelines across tabular and structured data.

LLMFine-tuningTabular

Paper & code

Selected Publications

Nature Methods 20262026

VESM: Compressing the collective knowledge of ESM into a single protein language model

Tuan Dinh, et al.

Protein Language ModelCo-Distillation

Paper Code

ICML 2025 — Spotlight2025

TabFlex: Scaling Tabular Learning to Millions with Linear Attention

Yuchen Zeng*, Tuan Dinh*, et al.

TabularLinear Attention

Paper Code

NeurIPS 20232023

Large Language Models of Code Fail at Completing Code with Potential Bugs

Tuan Dinh, et al. (Amazon Science)

LLMCodeHallucination

Paper Code

NeurIPS 20222022

LIFT: Language-Interfaced Fine-Tuning for Non-Language Machine Learning Tasks

Tuan Dinh*, Yuchen Zeng*, Ruisu Zhang*, et al.

LLMFine-tuning

Paper Code

EMNLP 2022 (Findings)2022

Utilizing Language-Image Pretraining for Efficient and Robust Bilingual Word Alignment

Tuan Dinh*, Jy-yong Sohn*, et al.

MultimodalWord Alignment

Paper Code

ICML 2021 — Oral2021

Coded-InvNet for Resilient Prediction Serving Systems

Tuan Dinh, Kangwook Lee.

MLSysGAN

arXiv Code

IEEE TPAMI 20202020

Group Difference Testing on Graph Structured Data from GANs: Applications in Neuroimaging

Tuan Dinh, et al.

GANMedical Imaging

Full list on Google Scholar. Includes 2 US patents in deep learning optimization and inverse graphics.

A bit more

Currently reading

Alternating between ML papers and whatever I can finish on train/shuttle. Favorite: The Three-Body Problem, and a lot on mechanistic interpretability.

On poetry

I read and sometimes translate old poems — searching for ones written for a specific moment. One I keep returning to:

"I chase the mist where whispers lie, The wise take wing beneath the sky."

Background

Originally from Hue, Vietnam. The name is Tuan — [tʰwɑ̃n]. In Vietnamese, the name structure is reverse of standard English; and the given name is primary.

Contact

tuan.quang.dinh@proton.me