Portrait
张程皓
Chenghao Zhang
软件工程硕士研究生,软件学院
清华大学

张程皓

我是清华大学软件学院软件工程硕士研究生,预计 2027 年毕业,本科同样就读于清华大学软件学院。

我的兴趣集中在大模型驱动的 representation learning: 如何利用 LLM/VLM 的语义理解与世界知识,学习可迁移、可压缩、可用于检索与推荐的 user/item embeddings。

我关注弱监督/无显式标注场景下的表征对齐,以及对比学习与 SFT/生成式目标的联合训练, 使 embedding 同时保留语义知识并吸收推荐/检索中的协同信号。 早期工作也涉及图与分子任务中的解纠缠表征学习和跨域泛化。

教育经历
  • 清华大学
    清华大学
    软件学院
    软件工程硕士
    2024年8月 - 预计2027年6月
  • 清华大学
    清华大学
    软件学院
    软件工程学士
    2020年9月 - 2024年6月
经历
  • 清华大学
    清华大学
    研究生会
    信息服务中心负责人
    2024 - 2025
荣誉与奖项
  • 清华大学优秀学生干部
    2025
  • 综合优秀奖学金
    2024-2025
  • 科技创新奖学金
    2023
  • 学业优秀奖学金
    2021-2022
  • 清华大学挑战杯三等奖
    2023
实习经历
快手|基础大模型与应用部|大模型算法实习
2025.12 - 2026.05

方向:面向短视频推荐的多类型 item 表征学习,覆盖商品、短视频、直播间等业务对象。

  • 个人负责:在 Qwen2.5-Omni-3B / CREM 基础上改造共享 item encoder, 按业务对象接入不同任务头;负责 LLM item embedding 的数据拉取、训练/推理代码、 embedding 推理服务镜像构建与上线。
  • 训练目标:用 LLM 判别用户历史序列中的 item 相关性,构造千万级相关 pair; 使用 InfoNCE 将 LLM 表征对齐到推荐域分布,并结合百万级 QA 的 cross entropy 监督保留语义理解能力。
  • 协作边界:我负责 LLM embedding 与对齐训练链路;下游 SIM GSU / 推荐主链路的特征接入、 流量策略和最终发布由推荐链路团队协作完成。
  • 实验结果:在该 LLM item embedding 特征接入后的实验中,离线 AUC 提升 0.45pp,核心业务指标提升 1.88%。 项目入选社区科学线优秀案例公示。
动态
2026
Paper on few-shot unsupervised domain adaptation for graph-level anomaly detection accepted to AAAI 2026.
Jan 01
2025
Seeking research internship opportunities in LLM algorithms and graduating in 2027.
Dec 01
Paper on cross-domain few-shot molecular property prediction appeared in Frontiers of Computer Science.
Jan 01
论文与手稿 (查看全部 )
UBioRec: A Unified LLM Framework for Multi-Scenario User Understanding and Recommendation
UBioRec: A Unified LLM Framework for Multi-Scenario User Understanding and Recommendation

Chenghao Zhang, et al.

Manuscript under review 2026 (third author)

Proposes UBioRec, a unified LLM framework that uses structured User Biographies to consolidate multi-scenario user understanding and recommendation into a single model, with adaptive token compression for efficient serving and a 6.5x QPS speedup.

UBioRec: A Unified LLM Framework for Multi-Scenario User Understanding and Recommendation

Chenghao Zhang, et al.

Manuscript under review 2026 (third author)

Proposes UBioRec, a unified LLM framework that uses structured User Biographies to consolidate multi-scenario user understanding and recommendation into a single model, with adaptive token compression for efficient serving and a 6.5x QPS speedup.

CSGR: Compressed Source Gradient Replay for Scalable Self-Supervised Retriever Training
CSGR: Compressed Source Gradient Replay for Scalable Self-Supervised Retriever Training

Chenghao Zhang, et al.

Manuscript under review 2026

Introduces Compressed Source Gradient Replay (CSGR), an annotation-free retriever training framework that compresses source memories and replays gradients exactly, scaling NTP-supervised training from an 8-candidate single-GPU pool to a distributed pool of 256 while matching or improving CoIR retrieval quality.

CSGR: Compressed Source Gradient Replay for Scalable Self-Supervised Retriever Training

Chenghao Zhang, et al.

Manuscript under review 2026

Introduces Compressed Source Gradient Replay (CSGR), an annotation-free retriever training framework that compresses source memories and replays gradients exactly, scaling NTP-supervised training from an 8-candidate single-GPU pool to a distributed pool of 256 while matching or improving CoIR retrieval quality.

Disentangled Generation-Based Prototypical Alignment for Few-Shot Unsupervised Domain Adaptation in Graph-Level Anomaly Detection

Z. Ni, Chenghao Zhang, H. Wan, X. Zhao

AAAI Conference on Artificial Intelligence (AAAI) 2026

Introduces DGPA to mitigate performance degradation in cross-domain few-shot graph-level anomaly detection, improving average AUROC by 5.72pp over the strongest baseline.

Disentangled Generation-Based Prototypical Alignment for Few-Shot Unsupervised Domain Adaptation in Graph-Level Anomaly Detection

Z. Ni, Chenghao Zhang, H. Wan, X. Zhao

AAAI Conference on Artificial Intelligence (AAAI) 2026

Introduces DGPA to mitigate performance degradation in cross-domain few-shot graph-level anomaly detection, improving average AUROC by 5.72pp over the strongest baseline.

Factor-wise Disentangled Contrastive Learning for Cross-domain Few-shot Molecular Property Prediction

Z. Ni, Chenghao Zhang, H. Wan, X. Zhao

Frontiers of Computer Science 2025

Studies factor-wise disentangled contrastive learning for cross-domain few-shot molecular property prediction and improves average ROC-AUC by 1.53pp.

Factor-wise Disentangled Contrastive Learning for Cross-domain Few-shot Molecular Property Prediction

Z. Ni, Chenghao Zhang, H. Wan, X. Zhao

Frontiers of Computer Science 2025

Studies factor-wise disentangled contrastive learning for cross-domain few-shot molecular property prediction and improves average ROC-AUC by 1.53pp.

Geometry-Guided Domain Generalization for Monocular 3D Object Detection

F. Yang, H. Chen, Y. He, S. Zhao, Chenghao Zhang, K. Ni, G. Ding

AAAI Conference on Artificial Intelligence (AAAI) 2024

Uses geometry priors to improve cross-domain generalization for monocular 3D object detection. I contributed to implementing and integrating the attention module.

Geometry-Guided Domain Generalization for Monocular 3D Object Detection

F. Yang, H. Chen, Y. He, S. Zhao, Chenghao Zhang, K. Ni, G. Ding

AAAI Conference on Artificial Intelligence (AAAI) 2024

Uses geometry priors to improve cross-domain generalization for monocular 3D object detection. I contributed to implementing and integrating the attention module.

全部论文
研究主线
LLM/VLM Embeddings

学习可迁移、可压缩的 user/item embeddings,使大模型语义知识能进入检索、推荐和用户理解任务。

弱监督表征对齐

在缺少显式 relevance labels 的场景下,利用 LLM/VLM 证据阅读、问答效用和行为派生信号监督 embedding 学习。

语义与协同信号融合

通过对比学习与 SFT/生成式目标联合训练,使 embedding 既保留世界知识,也贴近推荐/检索中的协同分布。

解纠缠与跨域泛化

研究图、分子和多域任务中的因素级解纠缠表示、跨域迁移和长尾/冷启动泛化能力。