Portrait
Chenghao Zhang
张程皓
M.S. Student, School of Software
Tsinghua University
About Me

I am a master's student in Software Engineering at Tsinghua University, expected to graduate in 2027. I also received my B.Eng. in Software Engineering from Tsinghua University.

My interests center on LLM-driven representation learning: how to use the semantic understanding and world knowledge of LLMs/VLMs to learn transferable, compressible user/item embeddings for retrieval and recommendation.

I am particularly interested in representation alignment under weak or implicit supervision, and in jointly training embeddings with contrastive objectives and SFT/generative objectives so that they retain semantic knowledge while absorbing collaborative signals from retrieval and recommendation. My earlier work also studies disentangled representation learning and cross-domain generalization for graph and molecular tasks.

Education
  • Tsinghua University
    Tsinghua University
    School of Software
    M.S. in Software Engineering
    Aug. 2024 - Jun. 2027 (expected)
  • Tsinghua University
    Tsinghua University
    School of Software
    B.Eng. in Software Engineering
    Sep. 2020 - Jun. 2024
Experience
  • Tsinghua University
    Tsinghua University
    Graduate Student Association
    Lead, Information Service Center
    2024 - 2025
Honors & Awards
  • Tsinghua Outstanding Student Cadre
    2025
  • Comprehensive Excellence Scholarship
    2024-2025
  • Technology Innovation Scholarship
    2023
  • Academic Excellence Scholarship
    2021-2022
  • Tsinghua Challenge Cup, Third Prize
    2023
News
2026
Paper on few-shot unsupervised domain adaptation for graph-level anomaly detection accepted to AAAI 2026.
Jan 01
2025
Seeking research internship opportunities in LLM algorithms and graduating in 2027.
Dec 01
Paper on cross-domain few-shot molecular property prediction appeared in Frontiers of Computer Science.
Jan 01
Selected Publications & Manuscripts (view all )
UBioRec: A Unified LLM Framework for Multi-Scenario User Understanding and Recommendation
UBioRec: A Unified LLM Framework for Multi-Scenario User Understanding and Recommendation

Chenghao Zhang, et al.

Manuscript under review 2026 (third author)

Proposes UBioRec, a unified LLM framework that uses structured User Biographies to consolidate multi-scenario user understanding and recommendation into a single model, with adaptive token compression for efficient serving and a 6.5x QPS speedup.

UBioRec: A Unified LLM Framework for Multi-Scenario User Understanding and Recommendation

Chenghao Zhang, et al.

Manuscript under review 2026 (third author)

Proposes UBioRec, a unified LLM framework that uses structured User Biographies to consolidate multi-scenario user understanding and recommendation into a single model, with adaptive token compression for efficient serving and a 6.5x QPS speedup.

CSGR: Compressed Source Gradient Replay for Scalable Self-Supervised Retriever Training
CSGR: Compressed Source Gradient Replay for Scalable Self-Supervised Retriever Training

Chenghao Zhang, et al.

Manuscript under review 2026

Introduces Compressed Source Gradient Replay (CSGR), an annotation-free retriever training framework that compresses source memories and replays gradients exactly, scaling NTP-supervised training from an 8-candidate single-GPU pool to a distributed pool of 256 while matching or improving CoIR retrieval quality.

CSGR: Compressed Source Gradient Replay for Scalable Self-Supervised Retriever Training

Chenghao Zhang, et al.

Manuscript under review 2026

Introduces Compressed Source Gradient Replay (CSGR), an annotation-free retriever training framework that compresses source memories and replays gradients exactly, scaling NTP-supervised training from an 8-candidate single-GPU pool to a distributed pool of 256 while matching or improving CoIR retrieval quality.

Disentangled Generation-Based Prototypical Alignment for Few-Shot Unsupervised Domain Adaptation in Graph-Level Anomaly Detection

Z. Ni, Chenghao Zhang, H. Wan, X. Zhao

AAAI Conference on Artificial Intelligence (AAAI) 2026

Introduces DGPA to mitigate performance degradation in cross-domain few-shot graph-level anomaly detection, improving average AUROC by 5.72pp over the strongest baseline.

Disentangled Generation-Based Prototypical Alignment for Few-Shot Unsupervised Domain Adaptation in Graph-Level Anomaly Detection

Z. Ni, Chenghao Zhang, H. Wan, X. Zhao

AAAI Conference on Artificial Intelligence (AAAI) 2026

Introduces DGPA to mitigate performance degradation in cross-domain few-shot graph-level anomaly detection, improving average AUROC by 5.72pp over the strongest baseline.

Factor-wise Disentangled Contrastive Learning for Cross-domain Few-shot Molecular Property Prediction

Z. Ni, Chenghao Zhang, H. Wan, X. Zhao

Frontiers of Computer Science 2025

Studies factor-wise disentangled contrastive learning for cross-domain few-shot molecular property prediction and improves average ROC-AUC by 1.53pp.

Factor-wise Disentangled Contrastive Learning for Cross-domain Few-shot Molecular Property Prediction

Z. Ni, Chenghao Zhang, H. Wan, X. Zhao

Frontiers of Computer Science 2025

Studies factor-wise disentangled contrastive learning for cross-domain few-shot molecular property prediction and improves average ROC-AUC by 1.53pp.

Geometry-Guided Domain Generalization for Monocular 3D Object Detection

F. Yang, H. Chen, Y. He, S. Zhao, Chenghao Zhang, K. Ni, G. Ding

AAAI Conference on Artificial Intelligence (AAAI) 2024

Uses geometry priors to improve cross-domain generalization for monocular 3D object detection. I contributed to implementing and integrating the attention module.

Geometry-Guided Domain Generalization for Monocular 3D Object Detection

F. Yang, H. Chen, Y. He, S. Zhao, Chenghao Zhang, K. Ni, G. Ding

AAAI Conference on Artificial Intelligence (AAAI) 2024

Uses geometry priors to improve cross-domain generalization for monocular 3D object detection. I contributed to implementing and integrating the attention module.

All publications
Research Focus
LLM/VLM Embeddings

Learning transferable and compressible user/item embeddings that bring model semantics into retrieval, recommendation, and user understanding tasks.

Weakly Supervised Alignment

Supervising embedding learning without explicit relevance labels using LLM/VLM evidence reading, answer utility, and behavior-derived signals.

Semantic and Collaborative Signals

Jointly training embeddings with contrastive objectives and SFT/generative objectives so they retain world knowledge while fitting collaborative distributions.

Disentanglement and Transfer

Studying factor-wise disentangled representations, domain transfer, and long-tail or cold-start generalization in graph, molecular, and multi-domain tasks.