YEHUI
profile photo

Yehui Tang (唐业辉)

I am a senior researcher at Huawei Noah's Ark Lab working with Yunhe Wang. Before that, I obtained PhD from school of artifical intelligence in Peking University, supervised by Prof. Chao Xu. During my PhD, I study neural architecture design and model compression. Currently, I focus on developing powerful large language models, which feature a parameter scale from billions to trillions.

yehui.tang@huawei.com  /  yhtang@pku.edu.cn  /  Google Scholar

Recruitment: I am seeking highly self-motivated employees and interns who possess excellent coding skills and have a profound interest in Large Language Models. Please feel free to send me your resume!

News

  • 05/2025, We release Pangu Ultra MOE 718B and Pangu Pro MOE 72B.

  • 04/2025, Serve as Area Chair for NeurIPS 2025.

  • 02/2024, We release the technical report of Pangu-Pi Pro, a tiny language model (1.5B) which can be easily implemented on edge devices. [technical report]
  • 12/2023, We release Pangu-Pi, a new architecture of LLM. [technical report]
  • 03/2022, 5 papers are accepted by CVPR 2022.
  • Recent Projects

    Pangu Pro MoE (72B): Mixture of Grouped Experts for Efficient Sparsity

    Technical Report | Synced 机器之心

  • A hareware-friendly MoE model with 72B total parameters (16B activated parameters for each token). It achieves high inference efficiency on Ascend 300I Duo and 800I A2 NPUs.

  • MoGE (Mixture of Grouped Experts) model architecture. Effective and efficient!

  • High performance! It ties for first place in SuperCLUE (within the sub-100B total parameter class).

    Pangu Ultra MoE (718B): Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs

    Technical Report | 量子位

  • A powerful MoE model with 718B total parameters (39B activated parameters for each token).

    Pangu Ultra (135B): Pushing the Limits of Dense Large Language Models on Ascend NPUs

    Technical Report | Synced 机器之心

  • A 135B dense LLM trained on 8192 Ascend NPUs.

  • Competitive performance with DeepSeek-R1, whose sparse model structure contains much more parameters.

    PanGU-π Pro: Powerful Tiny Language Models (1B、1.5B、3B) for Edge Devices

    Technical Report | Synced 机器之心

    We introducing PanGu-π Pro, powerful tiny language model (1B, 1.5B, 3B) which can be easily implementd on edge devices. By an empirical investigation, we propose four strategies to improve performance:

  • Compact Tokenizer: efficient coverage of corpus.

  • Architecture Tweak: better depth and width tradeoffs.

  • Parameter Inheritance: powerful knowledge from larger LLMs.

  • Multiple-Round Training: memory reinforcement of tiny models.

  • Selected Publications

    The complete list of articles can be found on Google Scholar (8000+ citations).

  • Pangu Pro MoE (72B): Mixture of Grouped Experts for Efficient Sparsity.
    Core contributor: Yehui Tang, Xiaosong Li, Fangcheng Liu, Wei Guo, Hang Zhou, Yaoyuan Wang et.al.
    Technical Report

  • Pangu Ultra MoE (718B): Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs.
    Core contributor: Yehui Tang, Yichun Yin, Yaoyuan Wang, Hang Zhou, Yu Pan, Wei Guo et.al.
    Technical Report

  • Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition.
    Core contributor
    Technical Report

  • Pangu Light: Weight Re-Initialization for Pruning and Accelerating LLMs.
    Core contributor
    Technical Report

  • Pangu Ultra (135B): Pushing the Limits of Dense Large Language Models on Ascend NPUs.
    Core contributor: Yichun Yin, Wenyong Huang,, Kaikai Song, Yehui Tang, Xueyu Wu, ZhWei Guo et.al
    paper

  • Mixture of Lookup Experts.
    Shibo Jie, Yehui Tang#, Kai Han, Yitong Li, Duyu Tang, Zhi-Hong Deng#, Yunhe Wang#
    ICML 2025 Spotlight | paper | code

  • Mixture of Lookup Experts.
    Shibo Jie, Yehui Tang#, Kai Han, Yitong Li, Duyu Tang, Zhi-Hong Deng#, Yunhe Wang#
    ICML 2025 Spotlight | paper | code

  • Forest-of-thought: Scaling test-time compute for enhancing LLM reasoning.
    Zhenni Bi, Kai Han, Chuanjian Liu, Yehui Tang#, Yunhe Wang#
    ICML 2025 | paper | code

  • MemoryFormer: Minimize Transformer Computation by Removing Fully-Connected Layers.
    Ning Ding*, Yehui Tang*, Haochen Qin, Zhenli Zhou, Chao Xu, Lin Li, Kai Han, Liao Heng, Yunhe Wang
    NeurIPS 2024 | paper

  • PanGU-π Pro: Rethinking Optimization and Architecture for Tiny Language Models
    Yehui Tang, Kai Han, Fangcheng Liu, Yunsheng Ni, Yuchuan Tian, Zheyuan Bai, Yi-Qi Hu, Sichao Li, Shangling Jui, Yunhe Wang
    paper

  • PanGU-π: Enhancing Language Model Architectures via Nonlinearity Compensation
    Yunhe Wang, Hanting Chen, Yehui Tang, Tianyu Guo, Kai Han, Ying Nie, Xutao Wang, Hailin Hu, Zheyuan Bai, Yun Wang, Fangcheng Liu, Zhicheng Liu, Jianyuan Guo, Sinan Zeng, Yinchen Zhang, Qinghua Xu, Qun Liu, Jun Yao, Chao Xu, Dacheng Tao
    paper

  • One-for-All: Bridge the Gap Between Heterogeneous Architectures in Knowledge Distillation
    Zhiwei Hao, Jianyuan Guo, Kai Han, Yehui Tang, Han Hu, Yunhe Wang, Chang Xu
    NeurIPS 2023 Highlight | paper

  • Masked Image Modeling with Local Multi-Scale Reconstruction
    Haoqing Wang, Yehui Tang, Yunhe Wang, Jianyuan Guo, Zhi-Hong Deng, Kai Han
    CVPR 2023 Highlight | paper | code

  • Network Expansion for Practical Training Acceleration
    Ning Ding, Yehui Tang, Kai Han, Chao Xu, Yunhe Wang
    CVPR 2023 | paper | code

  • GhostNetV2: Enhance Cheap Operation with Long-Range Attention.
    Yehui Tang, Kai Han, Jianyuan Guo, Chang Xu, Chao Xu, Yunhe Wang
    NeurIPS 2022 Spotlight | paper | code | MindSpore code

  • An Image Patch is a Wave: Quantum Inspired Vision MLP (WaveMLP).
    Yehui Tang, Kai Han, Jianyuan Guo, Chang Xu, Yanxi Li, Chao Xu, Yunhe Wang
    CVPR 2022 Oral | paper

  • Patch Slimming for Efficient Vision Transformers.
    Yehui Tang, Kai Han, Yunhe Wang, Chang Xu, Jianyuan Guo, Chao Xu, Dacheng Tao
    CVPR 2022 | paper

  • Hire-MLP: Vision MLP via Hierarchical Rearrangement.
    Jianyuan Guo*, Yehui Tang*, Kai Han, Xinghao Chen, Han Wu, Chao Xu, Chang Xu, Yunhe Wang
    CVPR 2022 (* equal contribution) | paper

  • CMT: Convolutional Neural Networks Meet Vision Transformers.
    Jianyuan Guo, Kai Han, Han Wu, Chang Xu, Yehui Tang, Chunjing Xu, Yunhe Wang
    CVPR 2022 | paper

  • Source-Free Domain Adaptation via Distribution Estimation
    Ning Ding, Yixing Xu, Yehui Tang, Chao Xu, Yunhe Wang, Dacheng Tao
    CVPR 2022

  • Augmented Shortcuts for Vision Transformers
    Yehui Tang, Kai Han, Chang Xu, An Xiao, Yiping Deng, Chao Xu, Yunhe Wang
    NeurIPS 2021 | paper | MindSpore code

  • Manifold Regularized Dynamic Network Pruning
    Yehui Tang, Yunhe Wang, Yixing Xu, Yiping Deng, Chao Xu, Dacheng Tao, Chang Xu
    CVPR 2021 | paper | code | MindSpore code

  • SCOP: Scientific Control for Reliable Neural Network Pruning
    Yehui Tang, Yunhe Wang, Yixing Xu, Dacheng Tao, Chunjing Xu, Chao Xu, Chang Xu
    NeurIPS 2020 | paper | code

  • A Semi-Supervised Assessor of Neural Architectures
    Yehui Tang, Yunhe Wang, Yixing Xu, Hanting Chen, Boxin Shi, Chao Xu, Chunjing Xu, Qi Tian, Chang Xu
    CVPR 2020 | paper

  • Beyond Dropout: Feature Map Distortion to Regularize Deep Neural Networks
    Yehui Tang, Yunhe Wang, Yixing Xu, Boxin Shi, Chao Xu, Chunjing Xu, Chang Xu
    AAAI 2020 | paper | code

  • Reborn filters: Pruning convolutional neural networks with limited data
    Yehui Tang, Shan You, Chang Xu, Jin Han, Chen Qian, Boxin Shi, Chao Xu, Changshui Zhang
    AAAI 2020 | paper

  • Homogeneous Architecture Augmentation for Neural Predictor
    Yuqiao Liu*, Yehui Tang*, Yanan Sun
    ICCV 2021 (* equal contribution) | paper

  • Learning frequency domain approximation for binary neural networks
    Yixing Xu, Kai Han, Chang Xu, Yehui Tang, Chunjing Xu, Yunhe Wang
    NeurIPS 2021 | paper | MindSpore code

  • ReNAS: Relativistic Evaluation of Neural Architecture Search
    Yixing Xu, Yunhe Wang, Kai Han, Yehui Tang, Shangling Jui, Chunjing Xu, Chang Xu
    CVPR 2021 Oral | paper | MindSpore code

  • Neuromorphic Camera Guided High Dynamic Range Imaging
    Jin Han, Chu Zhou, Peiqi Duan, Yehui Tang, Chang Xu, Chao Xu, Tiejun Huang, Boxin Shi
    CVPR 2020 | paper

  • Frequency Domain Compact 3D Convolutional Neural Networks
    Hanting Chen, Yunhe Wang, Han Shu, Yehui Tang, Chunjing Xu, Boxin Shi, Chao Xu, Qi Tian, Chang Xu
    CVPR 2020 | paper

  • A Survey on Vision Transformer
    Kai Han, Yunhe Wang, Hanting Chen, Xinghao Chen, Jianyuan Guo, Zhenhua Liu, Yehui Tang, An Xiao, Chunjing Xu, Yixing Xu, Zhaohui Yang, Yiman Zhang, Dacheng Tao
    IEEE T-PAMI 2022 | paper

  • Services

  • Area chair of NeurIPS 2025.

  • Senior program committee members of IJCAI 2021.

  • Program committee members of top-tie conferences like NeurIPS, ICML, ICLR, CVPR, ICCV, AAAI, etc.

  • Journal Reviewers of IEEE T-PAMI, IEEE T-NNLS, Pattern Recognition, Neurocomputing, etc.

  • Awards

  • 2020, President's PhD Scholarship, Peking University.

  • 2020, National Scholarship (top 1%), Chinese Ministry of Education.

  • 2020, Pacemaker to Merit Student (top 1%), Peking University.

  • 2016, National Scholarship (top 1%), Chinese Ministry of Education.

  • 2015, National Scholarship (top 1%), Chinese Ministry of Education.

  • This website is based on the source code shared by Dr. Yunhe Wang. Thanks.