Yehui Tang (唐业辉)

I am a senior researcher at Huawei Noah's Ark Lab working with Yunhe Wang. Before that, I obtained PhD from school of artifical intelligence in Peking University, supervised by Prof. Chao Xu. During my PhD, I study neural architecture design and model compression. Currently, I focus on developing powerful large language models, which feature a parameter scale from billions to trillions.

yehui.tang@huawei.com / yhtang@pku.edu.cn / Google Scholar

Recruitment: I am seeking highly self-motivated employees and interns who possess excellent coding skills and have a profound interest in Large Language Models. Please feel free to send me your resume!

News

05/2025, We release Pangu Ultra MOE 718B and Pangu Pro MOE 72B.

04/2025, Serve as Area Chair for NeurIPS 2025.

02/2024, We release the technical report of Pangu-Pi Pro, a tiny language model (1.5B) which can be easily implemented on edge devices. [technical report]

12/2023, We release Pangu-Pi, a new architecture of LLM. [technical report]

03/2022, 5 papers are accepted by CVPR 2022.

Recent Projects

Pangu Pro MoE (72B): Mixture of Grouped Experts for Efficient Sparsity

Technical Report | Synced 机器之心

A hareware-friendly MoE model with 72B total parameters (16B activated parameters for each token). It achieves high inference efficiency on Ascend 300I Duo and 800I A2 NPUs.

MoGE (Mixture of Grouped Experts) model architecture. Effective and efficient!

High performance! It ties for first place in SuperCLUE (within the sub-100B total parameter class).

Pangu Ultra MoE (718B): Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs

Technical Report | 量子位

A powerful MoE model with 718B total parameters (39B activated parameters for each token).

Pangu Ultra (135B): Pushing the Limits of Dense Large Language Models on Ascend NPUs

Technical Report | Synced 机器之心

A 135B dense LLM trained on 8192 Ascend NPUs.

Competitive performance with DeepSeek-R1, whose sparse model structure contains much more parameters.

PanGU-π Pro: Powerful Tiny Language Models (1B、1.5B、3B) for Edge Devices

Technical Report | Synced 机器之心

We introducing PanGu-π Pro, powerful tiny language model (1B, 1.5B, 3B) which can be easily implementd on edge devices. By an empirical investigation, we propose four strategies to improve performance:

Compact Tokenizer: efficient coverage of corpus.

Architecture Tweak: better depth and width tradeoffs.

Parameter Inheritance: powerful knowledge from larger LLMs.

Multiple-Round Training: memory reinforcement of tiny models.

Selected Publications

The complete list of articles can be found on Google Scholar (8000+ citations).

Pangu Pro MoE (72B): Mixture of Grouped Experts for Efficient Sparsity.
Core contributor: Yehui Tang, Xiaosong Li, Fangcheng Liu, Wei Guo, Hang Zhou, Yaoyuan Wang et.al.
Technical Report

Pangu Ultra MoE (718B): Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs.
Core contributor: Yehui Tang, Yichun Yin, Yaoyuan Wang, Hang Zhou, Yu Pan, Wei Guo et.al.
Technical Report

Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition.
Core contributor
Technical Report

Pangu Light: Weight Re-Initialization for Pruning and Accelerating LLMs.
Core contributor
Technical Report

Pangu Ultra (135B): Pushing the Limits of Dense Large Language Models on Ascend NPUs.
Core contributor: Yichun Yin, Wenyong Huang,, Kaikai Song, Yehui Tang, Xueyu Wu, ZhWei Guo et.al
paper

Mixture of Lookup Experts.
Shibo Jie, Yehui Tang#, Kai Han, Yitong Li, Duyu Tang, Zhi-Hong Deng#, Yunhe Wang#
ICML 2025 Spotlight | paper | code

Forest-of-thought: Scaling test-time compute for enhancing LLM reasoning.
Zhenni Bi, Kai Han, Chuanjian Liu, Yehui Tang#, Yunhe Wang#
ICML 2025 | paper | code

MemoryFormer: Minimize Transformer Computation by Removing Fully-Connected Layers.
Ning Ding*, Yehui Tang*, Haochen Qin, Zhenli Zhou, Chao Xu, Lin Li, Kai Han, Liao Heng, Yunhe Wang
NeurIPS 2024 | paper

PanGU-π Pro: Rethinking Optimization and Architecture for Tiny Language Models
Yehui Tang, Kai Han, Fangcheng Liu, Yunsheng Ni, Yuchuan Tian, Zheyuan Bai, Yi-Qi Hu, Sichao Li, Shangling Jui, Yunhe Wang
paper

PanGU-π: Enhancing Language Model Architectures via Nonlinearity Compensation
Yunhe Wang, Hanting Chen, Yehui Tang, Tianyu Guo, Kai Han, Ying Nie, Xutao Wang, Hailin Hu, Zheyuan Bai, Yun Wang, Fangcheng Liu, Zhicheng Liu, Jianyuan Guo, Sinan Zeng, Yinchen Zhang, Qinghua Xu, Qun Liu, Jun Yao, Chao Xu, Dacheng Tao
paper

One-for-All: Bridge the Gap Between Heterogeneous Architectures in Knowledge Distillation
Zhiwei Hao, Jianyuan Guo, Kai Han, Yehui Tang, Han Hu, Yunhe Wang, Chang Xu
NeurIPS 2023 Highlight | paper

Masked Image Modeling with Local Multi-Scale Reconstruction
Haoqing Wang, Yehui Tang, Yunhe Wang, Jianyuan Guo, Zhi-Hong Deng, Kai Han
CVPR 2023 Highlight | paper | code

Network Expansion for Practical Training Acceleration
Ning Ding, Yehui Tang, Kai Han, Chao Xu, Yunhe Wang
CVPR 2023 | paper | code

GhostNetV2: Enhance Cheap Operation with Long-Range Attention.
Yehui Tang, Kai Han, Jianyuan Guo, Chang Xu, Chao Xu, Yunhe Wang
NeurIPS 2022 Spotlight | paper | code | MindSpore code

An Image Patch is a Wave: Quantum Inspired Vision MLP (WaveMLP).
Yehui Tang, Kai Han, Jianyuan Guo, Chang Xu, Yanxi Li, Chao Xu, Yunhe Wang
CVPR 2022 Oral | paper

Patch Slimming for Efficient Vision Transformers.
Yehui Tang, Kai Han, Yunhe Wang, Chang Xu, Jianyuan Guo, Chao Xu, Dacheng Tao
CVPR 2022 | paper

Hire-MLP: Vision MLP via Hierarchical Rearrangement.
Jianyuan Guo*, Yehui Tang*, Kai Han, Xinghao Chen, Han Wu, Chao Xu, Chang Xu, Yunhe Wang
CVPR 2022 (* equal contribution) | paper

CMT: Convolutional Neural Networks Meet Vision Transformers.
Jianyuan Guo, Kai Han, Han Wu, Chang Xu, Yehui Tang, Chunjing Xu, Yunhe Wang
CVPR 2022 | paper

Source-Free Domain Adaptation via Distribution Estimation
Ning Ding, Yixing Xu, Yehui Tang, Chao Xu, Yunhe Wang, Dacheng Tao
CVPR 2022

Augmented Shortcuts for Vision Transformers
Yehui Tang, Kai Han, Chang Xu, An Xiao, Yiping Deng, Chao Xu, Yunhe Wang
NeurIPS 2021 | paper | MindSpore code

Manifold Regularized Dynamic Network Pruning
Yehui Tang, Yunhe Wang, Yixing Xu, Yiping Deng, Chao Xu, Dacheng Tao, Chang Xu
CVPR 2021 | paper | code | MindSpore code

SCOP: Scientific Control for Reliable Neural Network Pruning
Yehui Tang, Yunhe Wang, Yixing Xu, Dacheng Tao, Chunjing Xu, Chao Xu, Chang Xu
NeurIPS 2020 | paper | code

A Semi-Supervised Assessor of Neural Architectures
Yehui Tang, Yunhe Wang, Yixing Xu, Hanting Chen, Boxin Shi, Chao Xu, Chunjing Xu, Qi Tian, Chang Xu
CVPR 2020 | paper

Beyond Dropout: Feature Map Distortion to Regularize Deep Neural Networks
Yehui Tang, Yunhe Wang, Yixing Xu, Boxin Shi, Chao Xu, Chunjing Xu, Chang Xu
AAAI 2020 | paper | code

Reborn filters: Pruning convolutional neural networks with limited data
Yehui Tang, Shan You, Chang Xu, Jin Han, Chen Qian, Boxin Shi, Chao Xu, Changshui Zhang
AAAI 2020 | paper

Homogeneous Architecture Augmentation for Neural Predictor
Yuqiao Liu*, Yehui Tang*, Yanan Sun
ICCV 2021 (* equal contribution) | paper

Learning frequency domain approximation for binary neural networks
Yixing Xu, Kai Han, Chang Xu, Yehui Tang, Chunjing Xu, Yunhe Wang
NeurIPS 2021 | paper | MindSpore code

ReNAS: Relativistic Evaluation of Neural Architecture Search
Yixing Xu, Yunhe Wang, Kai Han, Yehui Tang, Shangling Jui, Chunjing Xu, Chang Xu
CVPR 2021 Oral | paper | MindSpore code

Neuromorphic Camera Guided High Dynamic Range Imaging
Jin Han, Chu Zhou, Peiqi Duan, Yehui Tang, Chang Xu, Chao Xu, Tiejun Huang, Boxin Shi
CVPR 2020 | paper

Frequency Domain Compact 3D Convolutional Neural Networks
Hanting Chen, Yunhe Wang, Han Shu, Yehui Tang, Chunjing Xu, Boxin Shi, Chao Xu, Qi Tian, Chang Xu
CVPR 2020 | paper

A Survey on Vision Transformer
Kai Han, Yunhe Wang, Hanting Chen, Xinghao Chen, Jianyuan Guo, Zhenhua Liu, Yehui Tang, An Xiao, Chunjing Xu, Yixing Xu, Zhaohui Yang, Yiman Zhang, Dacheng Tao
IEEE T-PAMI 2022 | paper

Services

Area chair of NeurIPS 2025.

Senior program committee members of IJCAI 2021.

Program committee members of top-tie conferences like NeurIPS, ICML, ICLR, CVPR, ICCV, AAAI, etc.

Journal Reviewers of IEEE T-PAMI, IEEE T-NNLS, Pattern Recognition, Neurocomputing, etc.

Awards

2020, President's PhD Scholarship, Peking University.

2020, National Scholarship (top 1%), Chinese Ministry of Education.

2020, Pacemaker to Merit Student (top 1%), Peking University.

2016, National Scholarship (top 1%), Chinese Ministry of Education.

2015, National Scholarship (top 1%), Chinese Ministry of Education.

This website is based on the source code shared by Dr. Yunhe Wang. Thanks.