Hello, World!

I am Tuan, a postdoc fellow working on foundation models and their development for scientific discovery, especially for proteins and human genetics. I am jointly supervised by Prof. Vasilis Ntranos at UCSF and the Data Sciences group at Maze Therapeutics.

Education: I obtained my Ph.D. in Computer Sciences (minor in Statistics) with Prof. Kangwook Lee, studying modular neural networks built on pre-trained models. Previously, I completed my M.S. with Prof. Vikas Singh, studing GANs for graph-structured data and my B.E. with Prof. Tru Cao, studying AI systems for disease forecast and healthcare.


My research interests are AI/ML and AI4Science. My current foci are language models and modular deep learning, with the applied research in computational biology.

09.23: BugComp is accepted to NeurIPS 2023.
05.23: Joined UCSF as a Postdoc Scholar!
03.23: Defended Ph.D. Thanks Kangwook and everyone!
... see all News

Deep Learning with Foundation Models

Link Topic Title Summary Github
NeurIPS'23 LLM Large Language Models of Code Fail at Completing Code with Potential Bugs summary code
TL;DR: LLMs may fail drastically at completing functional code when potential bugs (aka anti-flow pattens) exist in the context.
EMNLP'22 (Findings) Multimodal Utilizing Language-Image Pretraining for Efficient and Robust Bilingual Word Alignment summary code
TL;DR: Text-Image correlation (via CLIP embedding) can be effeciently utilized with static embedding for robust word translation.
NeurIPS'22 LLM LIFT: Language-Interfaced Fine-Tuning for Non-Language Machine Learning Tasks summary code
TL;DR: Pretrained LLMs, via language-interface, can be useful for learning non-language tasks, e.g., tabular data classification.
ICMLW'22 GAN, PEFT Improved Input Reprogramming for GAN Conditioning summary code
TL;DR: Pretrained GANs can be efficiently repurposed (without modification) to conditionally generate samples in their support.
ICML'21 (Oral) MLSys, GAN Coded-InvNet for Resilient Prediction Serving Systems summary code
TL;DR: Coded-InvNet is a coded computation method combined with image-to-image translation to improve resilience of MLSS.
TPAMI'20 GAN, Medical Imaging Performing Group Difference Testing on Graph Structured Data from GANs: Analysis and Applications in Neuroimaging code
TL;DR: Analyzing when GAN-based data can obtain the similar conclusions with trained data in scientific or biomedical studies.
AAAI'20 (Oral) Optimization, GAN The Promise of Conditional Gradient Methods for Training Deep Models code
TL;DR: Conditional gradients can be utilized to faster training of deep networks with provably better generalization guarantees.

AI for Science and Healthcare

Link Topic Title Summary Github
MobiCom'22 Healthcare PROS: an Efficient Pattern-Driven Compressive Sensing Framework for Low-Power Biopotentialbased Wearables with On-chip Intelligence code
MobiSys'21 Healthcare WAKE: A Behind-the-ear Wearable System for Microsleep Detection
IEEE TMC'21 Healthcare Detection of Microsleep Events with a Behind-the-ear Wearable System
Oxford Journal'18 Epidemiology Forecasting Dengue Incidences: Statistical and Dynamic Models
CtaD'17 Medical Imaging Graph Imputation techniques for estimating amyloid positivity from longitudinal cognitive and MRI measurements for efficient secondary prevention trials
ACIIDS'16 (Oral) Epidemiology Forecasting the Magnitude of Dengue in Southern Vietnam


Link Topic Title
US 11087525 AI Framework, Inverse Graphics Unsupervised learning of three dimensional visual alphabet
US 16186121 Algorithm, Training Framework Training System for Artificial Neural Networks Having a Global Weight Constrainer