Homepage - Chenghao Zhang

Chenghao Zhang

张程皓

M.S. Student, School of Software
Tsinghua University

chenghao24(at)mails.tsinghua.edu.cn

About Me

I am a master's student in Software Engineering at Tsinghua University, expected to graduate in 2027. I also received my B.Eng. in Software Engineering from Tsinghua University.

My interests center on LLM-driven representation learning: how to use the semantic understanding and world knowledge of LLMs/VLMs to learn transferable, compressible user/item embeddings for retrieval and recommendation.

I am particularly interested in representation alignment under weak or implicit supervision, and in jointly training embeddings with contrastive objectives and SFT/generative objectives so that they retain semantic knowledge while absorbing collaborative signals from retrieval and recommendation. My earlier work also studies disentangled representation learning and cross-domain generalization for graph and molecular tasks.

Education

Tsinghua University

School of Software

M.S. in Software Engineering

Aug. 2024 - Jun. 2027 (expected)
Tsinghua University

School of Software

B.Eng. in Software Engineering

Sep. 2020 - Jun. 2024

Experience

Tsinghua University

Graduate Student Association

Lead, Information Service Center

2024 - 2025

Honors & Awards

Tsinghua Outstanding Student Cadre

2025
Comprehensive Excellence Scholarship

2024-2025
Technology Innovation Scholarship

2023
Academic Excellence Scholarship

2021-2022
Tsinghua Challenge Cup, Third Prize

2023

News

2026

Paper on few-shot unsupervised domain adaptation for graph-level anomaly detection accepted to AAAI 2026.

Jan 01

2025

Seeking research internship opportunities in LLM algorithms and graduating in 2027.

Dec 01

Paper on cross-domain few-shot molecular property prediction appeared in Frontiers of Computer Science.

Jan 01

Selected Publications & Manuscripts (view all )

UBioRec: A Unified LLM Framework for Multi-Scenario User Understanding and Recommendation

Chenghao Zhang, et al.

Manuscript under review 2026 (third author)

Proposes UBioRec, a unified LLM framework that uses structured User Biographies to consolidate multi-scenario user understanding and recommendation into a single model, with adaptive token compression for efficient serving and a 6.5x QPS speedup.

UBioRec: A Unified LLM Framework for Multi-Scenario User Understanding and Recommendation

Chenghao Zhang, et al.

Manuscript under review 2026 (third author)

CSGR: Compressed Source Gradient Replay for Scalable Self-Supervised Retriever Training

Chenghao Zhang, et al.

Manuscript under review 2026

Introduces Compressed Source Gradient Replay (CSGR), an annotation-free retriever training framework that compresses source memories and replays gradients exactly, scaling NTP-supervised training from an 8-candidate single-GPU pool to a distributed pool of 256 while matching or improving CoIR retrieval quality.

CSGR: Compressed Source Gradient Replay for Scalable Self-Supervised Retriever Training

Chenghao Zhang, et al.

Manuscript under review 2026

Disentangled Generation-Based Prototypical Alignment for Few-Shot Unsupervised Domain Adaptation in Graph-Level Anomaly Detection

Z. Ni, Chenghao Zhang, H. Wan, X. Zhao

AAAI Conference on Artificial Intelligence (AAAI) 2026

Introduces DGPA to mitigate performance degradation in cross-domain few-shot graph-level anomaly detection, improving average AUROC by 5.72pp over the strongest baseline.

[Paper] [DOI] [PDF] [Resources]

Disentangled Generation-Based Prototypical Alignment for Few-Shot Unsupervised Domain Adaptation in Graph-Level Anomaly Detection

Z. Ni, Chenghao Zhang, H. Wan, X. Zhao

AAAI Conference on Artificial Intelligence (AAAI) 2026

Introduces DGPA to mitigate performance degradation in cross-domain few-shot graph-level anomaly detection, improving average AUROC by 5.72pp over the strongest baseline.

[Paper] [DOI] [PDF] [Resources]

Factor-wise Disentangled Contrastive Learning for Cross-domain Few-shot Molecular Property Prediction

Z. Ni, Chenghao Zhang, H. Wan, X. Zhao

Frontiers of Computer Science 2025

Studies factor-wise disentangled contrastive learning for cross-domain few-shot molecular property prediction and improves average ROC-AUC by 1.53pp.

[Paper] [DOI] [PDF]

Factor-wise Disentangled Contrastive Learning for Cross-domain Few-shot Molecular Property Prediction

Z. Ni, Chenghao Zhang, H. Wan, X. Zhao

Frontiers of Computer Science 2025

Studies factor-wise disentangled contrastive learning for cross-domain few-shot molecular property prediction and improves average ROC-AUC by 1.53pp.

[Paper] [DOI] [PDF]

Geometry-Guided Domain Generalization for Monocular 3D Object Detection

F. Yang, H. Chen, Y. He, S. Zhao, Chenghao Zhang, K. Ni, G. Ding

AAAI Conference on Artificial Intelligence (AAAI) 2024

Uses geometry priors to improve cross-domain generalization for monocular 3D object detection. I contributed to implementing and integrating the attention module.

[Paper] [Project] [DOI] [PDF] [Code]

Geometry-Guided Domain Generalization for Monocular 3D Object Detection

F. Yang, H. Chen, Y. He, S. Zhao, Chenghao Zhang, K. Ni, G. Ding

AAAI Conference on Artificial Intelligence (AAAI) 2024

Uses geometry priors to improve cross-domain generalization for monocular 3D object detection. I contributed to implementing and integrating the attention module.

[Paper] [Project] [DOI] [PDF] [Code]

All publications

Research Focus

LLM/VLM Embeddings

Learning transferable and compressible user/item embeddings that bring model semantics into retrieval, recommendation, and user understanding tasks.

Weakly Supervised Alignment

Supervising embedding learning without explicit relevance labels using LLM/VLM evidence reading, answer utility, and behavior-derived signals.

Semantic and Collaborative Signals

Jointly training embeddings with contrastive objectives and SFT/generative objectives so they retain world knowledge while fitting collaborative distributions.

Disentanglement and Transfer

Studying factor-wise disentangled representations, domain transfer, and long-tail or cold-start generalization in graph, molecular, and multi-domain tasks.