Recruitment: I am seeking highly self-motivated employees and interns who possess excellent coding skills and have a profound interest in Large Language Models. Please feel free to send me your resume!
News
05/2025, We release Pangu Ultra MOE 718B and Pangu Pro MOE 72B.
04/2025, Serve as Area Chair for NeurIPS 2025.
02/2024, We release the technical report of Pangu-Pi Pro, a tiny language model (1.5B) which can be easily implemented on edge devices. [technical report]
12/2023, We release Pangu-Pi, a new architecture of LLM. [technical report]
03/2022, 5 papers are accepted by CVPR 2022.
Recent Projects
Pangu Pro MoE (72B): Mixture of Grouped Experts for Efficient Sparsity
Technical Report | Synced
机器之心
A hareware-friendly MoE model with 72B total parameters (16B activated parameters for each token). It achieves high inference efficiency on Ascend 300I Duo and 800I A2 NPUs.
MoGE (Mixture of Grouped Experts) model architecture. Effective and efficient!
High performance! It ties for first place in SuperCLUE (within the sub-100B total parameter class).
Pangu Ultra MoE (718B): Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs
Technical Report | 量子位
A powerful MoE model with 718B total parameters (39B activated parameters for each token).
Pangu Ultra (135B): Pushing the Limits of Dense Large Language Models on Ascend NPUs
Technical Report | Synced
机器之心
A 135B dense LLM trained on 8192 Ascend NPUs.
Competitive performance with DeepSeek-R1, whose sparse
model structure contains much more parameters.
PanGU-π Pro: Powerful Tiny Language Models (1B、1.5B、3B) for Edge Devices
Technical Report | Synced
机器之心
We introducing PanGu-π Pro, powerful tiny language model (1B, 1.5B, 3B) which can be easily implementd on edge devices. By an empirical investigation, we propose four strategies to improve performance:
Compact Tokenizer: efficient coverage of corpus.
Architecture Tweak: better depth and width tradeoffs.
Parameter Inheritance: powerful knowledge from larger LLMs.
Multiple-Round Training: memory reinforcement of tiny models.
Selected Publications
The complete list of articles can be found on
Google Scholar (8000+ citations).
Pangu Pro MoE (72B): Mixture of Grouped Experts for Efficient Sparsity.
Core contributor: Yehui Tang, Xiaosong Li, Fangcheng Liu, Wei Guo, Hang Zhou, Yaoyuan Wang et.al.
Technical Report
Pangu Ultra MoE (718B): Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs.
Core contributor: Yehui Tang, Yichun Yin, Yaoyuan Wang, Hang Zhou, Yu Pan, Wei Guo et.al.
Technical Report
Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition.
Core contributor
Technical Report
Pangu Light: Weight Re-Initialization for Pruning and Accelerating LLMs.
Core contributor
Technical Report
Pangu Ultra (135B): Pushing the Limits of Dense Large Language Models on Ascend NPUs.
Core contributor: Yichun Yin, Wenyong Huang,, Kaikai Song, Yehui Tang, Xueyu Wu, ZhWei Guo et.al
paper
Mixture of Lookup Experts.
Shibo Jie, Yehui Tang#, Kai Han, Yitong Li, Duyu Tang, Zhi-Hong Deng#, Yunhe Wang#
ICML 2025 Spotlight | paper | code
Mixture of Lookup Experts.
Shibo Jie, Yehui Tang#, Kai Han, Yitong Li, Duyu Tang, Zhi-Hong Deng#, Yunhe Wang#
ICML 2025 Spotlight | paper | code
Forest-of-thought: Scaling test-time compute for enhancing LLM reasoning.
Zhenni Bi, Kai Han, Chuanjian Liu, Yehui Tang#, Yunhe Wang#
ICML 2025 | paper | code
MemoryFormer: Minimize Transformer Computation by Removing Fully-Connected Layers.
Ning Ding*, Yehui Tang*, Haochen Qin, Zhenli Zhou, Chao Xu, Lin Li, Kai Han, Liao Heng, Yunhe Wang
NeurIPS 2024 | paper
PanGU-π Pro: Rethinking Optimization and Architecture for Tiny Language Models
Yehui Tang, Kai Han, Fangcheng Liu, Yunsheng Ni, Yuchuan Tian, Zheyuan Bai, Yi-Qi Hu, Sichao Li, Shangling Jui, Yunhe Wang
paper
PanGU-π: Enhancing Language Model Architectures via Nonlinearity Compensation
Yunhe Wang, Hanting Chen, Yehui Tang, Tianyu Guo, Kai Han, Ying Nie, Xutao Wang, Hailin Hu, Zheyuan Bai, Yun Wang, Fangcheng Liu, Zhicheng Liu, Jianyuan Guo, Sinan Zeng, Yinchen Zhang, Qinghua Xu, Qun Liu, Jun Yao, Chao Xu, Dacheng Tao
paper
One-for-All: Bridge the Gap Between Heterogeneous Architectures in Knowledge Distillation
Zhiwei Hao, Jianyuan Guo, Kai Han, Yehui Tang, Han Hu, Yunhe Wang, Chang Xu
NeurIPS 2023 Highlight | paper
Masked Image Modeling with Local Multi-Scale Reconstruction
Haoqing Wang, Yehui Tang, Yunhe Wang, Jianyuan Guo, Zhi-Hong Deng, Kai Han
CVPR 2023 Highlight | paper | code
Network Expansion for Practical Training Acceleration
Ning Ding, Yehui Tang, Kai Han, Chao Xu, Yunhe Wang
CVPR 2023 | paper | code
GhostNetV2: Enhance Cheap Operation with Long-Range Attention.
Yehui Tang, Kai Han, Jianyuan Guo, Chang Xu, Chao Xu, Yunhe Wang
NeurIPS 2022 Spotlight | paper | code | MindSpore code
An Image Patch is a Wave: Quantum Inspired Vision MLP (WaveMLP).
Yehui Tang, Kai Han, Jianyuan Guo, Chang Xu, Yanxi Li, Chao Xu, Yunhe Wang
CVPR 2022 Oral | paper
Patch Slimming for Efficient Vision Transformers.
Yehui Tang, Kai Han, Yunhe Wang, Chang Xu, Jianyuan Guo, Chao Xu, Dacheng Tao
CVPR 2022 | paper
Hire-MLP: Vision MLP via Hierarchical Rearrangement.
Jianyuan Guo*, Yehui Tang*, Kai Han, Xinghao Chen, Han Wu, Chao Xu, Chang Xu, Yunhe Wang
CVPR 2022 (* equal contribution) | paper
CMT: Convolutional Neural Networks Meet Vision Transformers.
Jianyuan Guo, Kai Han, Han Wu, Chang Xu, Yehui Tang, Chunjing Xu, Yunhe Wang
CVPR 2022 | paper
Source-Free Domain Adaptation via Distribution Estimation
Ning Ding, Yixing Xu, Yehui Tang, Chao Xu, Yunhe Wang, Dacheng Tao
CVPR 2022
Augmented Shortcuts for Vision Transformers
Yehui Tang, Kai Han, Chang Xu, An Xiao, Yiping Deng, Chao Xu, Yunhe Wang
NeurIPS 2021 | paper | MindSpore code
Manifold Regularized Dynamic Network Pruning
Yehui Tang, Yunhe Wang, Yixing Xu, Yiping Deng, Chao Xu, Dacheng Tao, Chang Xu
CVPR 2021 | paper |
code |
MindSpore code
SCOP: Scientific Control for Reliable Neural Network Pruning
Yehui Tang, Yunhe Wang, Yixing Xu, Dacheng Tao, Chunjing Xu, Chao Xu, Chang Xu
NeurIPS 2020 | paper | code
A Semi-Supervised Assessor of Neural Architectures
Yehui Tang, Yunhe Wang, Yixing Xu, Hanting Chen, Boxin Shi, Chao Xu, Chunjing Xu, Qi Tian, Chang Xu
CVPR 2020 | paper
Beyond Dropout: Feature Map Distortion to Regularize Deep Neural Networks
Yehui Tang, Yunhe Wang, Yixing Xu, Boxin Shi, Chao Xu, Chunjing Xu, Chang Xu
AAAI 2020 | paper | code
Reborn filters: Pruning convolutional neural networks with limited data
Yehui Tang, Shan You, Chang Xu, Jin Han, Chen Qian, Boxin Shi, Chao Xu, Changshui Zhang
AAAI 2020 | paper
Homogeneous Architecture Augmentation for Neural Predictor
Yuqiao Liu*, Yehui Tang*, Yanan Sun
ICCV 2021 (* equal contribution) | paper
Learning frequency domain approximation for binary neural networks
Yixing Xu, Kai Han, Chang Xu, Yehui Tang, Chunjing Xu, Yunhe Wang
NeurIPS 2021 | paper | MindSpore code
ReNAS: Relativistic Evaluation of Neural Architecture Search
Yixing Xu, Yunhe Wang, Kai Han, Yehui Tang, Shangling Jui, Chunjing Xu, Chang Xu
CVPR 2021 Oral | paper | MindSpore code
Neuromorphic Camera Guided High Dynamic Range Imaging
Jin Han, Chu Zhou, Peiqi Duan, Yehui Tang, Chang Xu, Chao Xu, Tiejun Huang, Boxin Shi
CVPR 2020 | paper
Frequency Domain Compact 3D Convolutional Neural Networks
Hanting Chen, Yunhe Wang, Han Shu, Yehui Tang, Chunjing Xu, Boxin Shi, Chao Xu, Qi Tian, Chang Xu
CVPR 2020 | paper
A Survey on Vision Transformer
Kai Han, Yunhe Wang, Hanting Chen, Xinghao Chen, Jianyuan Guo, Zhenhua Liu, Yehui Tang, An Xiao, Chunjing Xu, Yixing Xu, Zhaohui Yang, Yiman Zhang, Dacheng Tao
IEEE T-PAMI 2022 | paper
Services
Area chair of NeurIPS 2025.
Senior program committee members of IJCAI 2021.
Program committee members of top-tie conferences like NeurIPS, ICML, ICLR, CVPR, ICCV, AAAI, etc.
Journal Reviewers of IEEE T-PAMI, IEEE T-NNLS, Pattern Recognition, Neurocomputing, etc.
Awards
2020, President's PhD Scholarship, Peking University.
2020, National Scholarship (top 1%), Chinese Ministry of Education.
2020, Pacemaker to Merit Student (top 1%), Peking University.
2016, National Scholarship (top 1%), Chinese Ministry of Education.
2015, National Scholarship (top 1%), Chinese Ministry of Education.
This website is based on the source code shared by
Dr. Yunhe Wang. Thanks.