Publications
Featured Publications
Pre-Training with Whole Word Masking for Chinese BERT
- Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Ziqing Yang. IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), Vol.29. 2021.
- ๐ ESI Highly Cited Papers in Engineering by Clarivateโข.
- ๐ Top-25 Downloaded Papers in IEEE Signal Processing Society (2021-2023).
- ๐ IEEE Signal Processing Society Best Paper Award (2025).
๐ PDF
๐ Bib
IEEE Xplore
Chinese-BERT-wwm
MacBERT
Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca
- Yiming Cui, Ziqing Yang, Xin Yao. arXiv pre-print: 2304.08177. 2023.
- ๐ The open-source projects have been ranked 1st place in GitHub Trending repositories.
๐ PDF ๐ Bib arXiv Chinese-LLaMA-Alpaca Chinese-LLaMA-Alpaca-2 Chinese-LLaMA-Alpaca-3
Attention-over-Attention Neural Networks for Reading Comprehension
- Yiming Cui, Zhipeng Chen, Si Wei, Shijin Wang, Ting Liu, Guoping Hu. ACL 2017. 2017.
- ๐ This paper has been selected as one of the Most Influential ACL 2017 Paper (Top 11) by Paper Digest.
๐ PDF
๐ Bib
๐ชง Slides
ACL Anthology
2026
TBA
2025
-
Evaluating Large Language Models on Multimodal Chemistry Olympiad Exams. Yiming Cui, Xin Yao, Yuxuan Qin, Xin Li, Shijin Wang, Guoping Hu. Communications Chemistry, Vol. 8, Nature Portfolio. 2025. ๐ Bib arXiv
Nature -
You Might Not Need Attention Diagonals. Yiming Cui, Xin Yao, Shijin Wang, Guoping Hu. IEEE Signal Processing Letters, Vol. 32. 2025. ๐ Bib
IEEE Xplore
-
ChartHal: A Fine-grained Framework Evaluating Hallucination of Large Vision Language Models in Scientific Chart Understanding. Xingqi Wang, Yiming Cui, Xin Yao, Shijin Wang, Guoping Hu, Xiaoyu Qin. arXiv pre-print: 2509.17481. ๐ Bib arXiv ๐ Project Page GitHub
-
Chart2Code53: A Large-Scale Diverse and Complex Dataset for Enhancing Chart-to-Code Generation. Tianhao Niu, Yiming Cui, Baoxin Wang, Xiao Xu, Xin Yao, Qingfu Zhu, Dayong Wu, Shijin Wang, Wanxiang Che. EMNLP 2025. ๐ Bib
ACL Anthology GitHub
-
๐ Natural Language Processing: A Large Language Model Approach (่ช็ถ่ฏญ่จๅค็๏ผๅบไบๅคง่ฏญ่จๆจกๅ็ๆนๆณ). Wanxiang Che, Jiang Guo, Yiming Cui. Publishing House of Electronics Industry. ๐ JD (Online Store) ๐ PDF (Front Matter) ๐ Bib
2024
-
Self-Evolving GPT: A Lifelong Autonomous Experiential Learner. Jinglong Gao, Xiao Ding, Yiming Cui, Jianbai Zhao, Hepeng Wang, Ting Liu, Bing Qin. ACL 2024. ๐ Bib
ACL Anthology GitHub
-
Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral. Yiming Cui, Xin Yao. arXiv pre-print: 2403.01851. 2024. ๐ Bib arXiv GitHub
2023
-
Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca. Yiming Cui*, Ziqing Yang, Xin Yao. arXiv pre-print: 2304.08177. ๐ Bib arXiv GitHub GitHub
-
Gradient-based Intra-attention Pruning on Pre-trained Language Models. Ziqing Yang, Yiming Cui, Xin Yao, Shijin Wang. ACL 2023. ๐ Bib
ACL Anthology GitHub
-
IDOL: Indicator-oriented Logic Pre-training for Logical Reasoning. Zihang Xu, Ziqing Yang, Yiming Cui, Shijin Wang. ACL 2023 (Findings). ๐ Bib
ACL Anthology GitHub
-
MiniRBT: A Two-stage Distilled Small Chinese Pre-trained Model. Xin Yao, Ziqing Yang, Yiming Cui, Shijin Wang. arXiv pre-print: 2304.00717. ๐ Bib arXiv GitHub
2022
-
LERT: A Linguistically-motivated Pre-trained Language Model. Yiming Cui, Wanxiang Che, Shijin Wang, Ting Liu. arXiv pre-print: 2211.05344. ๐ Bib arXiv GitHub
-
Visualizing Attention Zones in Machine Reading Comprehension Models. Yiming Cui, Wei-Nan Zhang, Ting Liu. STAR Protocols, Vol.3. ๐ Bib

-
Multilingual Multi-Aspect Explainability Analyses on Machine Reading Comprehension Models. Yiming Cui, Wei-Nan Zhang, Wanxiang Che, Ting Liu, Zhigang Chen, Shijin Wang. iScience, Vol.25. ๐ Bib
GitHub -
ExpMRC: Explainability Evaluation for Machine Reading Comprehension. Yiming Cui, Ting Liu, Wanxiang Che, Zhigang Chen, Shijin Wang. Heliyon, Vol.8. ๐ Bib
๐ Leaderboard GitHub -
Teaching Machines to Read, Answer and Explain. Yiming Cui, Ting Liu, Wanxiang Che, Zhigang Chen, Shijin Wang. IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), Vol.30. ๐ Bib arXiv
IEEE Xplore
-
PERT: Pre-training BERT with Permuted Language Model. Yiming Cui, Ziqing Yang, Ting Liu. arXiv pre-print: 2203.06906. ๐ Bib arXiv GitHub
-
Interactive Gated Decoder for Machine Reading Comprehension. Yiming Cui, Wanxiang Che, Ziqing Yang, Ting Liu. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), Vol.21. ๐ Bib
ACM DL -
A Static and Dynamic Attention Framework for Multi Turn Dialogue Generation. Wei-Nan Zhang, Yiming Cui, Kaiyan Zhang, Yifa Wang, Qingfu Zhu, Lingzhi Li, Ting Liu. ACM Transactions on Information Systems (TOIS). ๐ Bib
ACM DL -
TextPruner: A Model Pruning Toolkit for Pre-Trained Language Models. Ziqing Yang, Yiming Cui, Zhigang Chen. ACL 2022 (System Demonstration). ๐ Bib
ACL Anthology GitHub
-
Cross-Lingual Text Classification with Multilingual Distillation and Zero-Shot-Aware Training. Ziqing Yang, Yiming Cui, Zhigang Chen, Shijin Wang. arXiv pre-print: 2202.13654. ๐ Bib arXiv
-
CINO: A Chinese Minority Pre-trained Language Model. Ziqing Yang, Zihang Xu, Yiming Cui, Baoxin Wang, Min Lin, Dayong Wu, Zhigang Chen. COLING 2022. ๐ Bib
ACL Anthology GitHub
-
HFL at SemEval-2022 Task 8: A Linguistics-inspired Regression Model with Data Augmentation for Multilingual News Similarity. Zihang Xu, Ziqing Yang, Yiming Cui, Zhigang Chen. SemEval 2022. "Best Paper Honorable Mention Award" at SemEval-2022. ๐ Bib
ACL Anthology GitHub
-
HIT at SemEval-2022 Task 2: Pre-trained Language Model for Idioms Detection. Zheng Chu, Ziqing Yang, Yiming Cui, Zhigang Chen, Ming Liu. SemEval 2022. ๐ Bib
ACL Anthology
-
Augmented and challenging datasets with multi-step reasoning and multi-span questions for Chinese judicial reading comprehension. Qingye Meng, Ziyue Wang, Hang Chen, Xianzhen Luo, Baoxin Wang, Zhipeng Chen, Yiming Cui, Dayong Wu, Zhigang Chen, Shijin Wang. AI Open, Vol.3. ๐ Bib
ScienceDirect
2021
-
Pre-Training with Whole Word Masking for Chinese BERT. Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Ziqing Yang. IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), Vol.29. ๐ ESI Highly Cited Papers. ๐ IEEE SPS Best Paper Award (2025). ๐ Bib IEEE Xplore GitHub
-
Adversarial Training for Machine Reading Comprehension with Virtual Embeddings. Ziqing Yang, Yiming Cui, Chenglei Si, Wanxiang Che, Ting Liu, Shijin Wang, Guoping Hu. *SEM 2021. ๐ Bib
ACL Anthology
-
Memory Augmented Sequential Paragraph Retrieval for Multi-hop Question Answering. Nan Shao, Yiming Cui, Ting Liu, Shijin Wang, Guoping Hu. arXiv pre-print: 2102.03741. ๐ Bib arXiv
-
๐ Natural Language Processing: A Pre-trained Model Approach (่ช็ถ่ฏญ่จๅค็๏ผๅบไบ้ข่ฎญ็ปๆจกๅ็ๆนๆณ). Wanxiang Che, Jiang Guo, Yiming Cui. Publishing House of Electronics Industry. ๐Top 1% Highly Cited Book (2019-2023) by CNKI. ๐ JD (Online Store) ๐ PDF (Front Matter) ๐ Bib
-
Benchmarking Robustness of Machine Reading Comprehension Models. Chenglei Si, Ziqing Yang, Yiming Cui, Wentao Ma, Ting Liu, Shijin Wang. ACL 2021 (Findings). ๐ Bib
ACL Anthology GitHub
-
Bilingual Alignment Pre-training for Zero-shot Cross-lingual Transfer. Ziqing Yang, Wentao Ma, Yiming Cui, Jiani Ye, Wanxiang Che, Shijin Wang. MRQA 2021. ๐ Bib
ACL Anthology
2020
-
Discriminative Sentence Modeling for Story Ending Prediction. Yiming Cui, Wanxiang Che, Wei-Nan Zhang, Ting Liu, Shijin Wang, Guoping Hu. AAAI 2020. ๐ Bib ๐ข AAAI Publisher
-
A Sentence Cloze Dataset for Chinese Machine Reading Comprehension. Yiming Cui, Ting Liu, Ziqing Yang, Zhipeng Chen, Wentao Ma, Wanxiang Che, Shijin Wang, Guoping Hu. COLING 2020. ๐ Bib
ACL Anthology ๐ Leaderboard GitHub
-
Revisiting Pre-Trained Models for Chinese Natural Language Processing. Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Shijin Wang, Guoping Hu. EMNLP 2020 (Findings). Most Influential EMNLP 2020 Papers (Top 11) by Paper Digest. ๐ Bib
ACL Anthology GitHub
-
CharBERT: Character-aware Pre-trained Language Model. Wentao Ma, Yiming Cui, Chenglei Si, Ting Liu, Shijin Wang, Guoping Hu. COLING 2020. ๐ Bib
ACL Anthology GitHub
-
Is Graph Structure Necessary for Multi-hop Question Answering?. Nan Shao, Yiming Cui, Ting Liu, Shijin Wang, Guoping Hu. EMNLP 2020. ๐ Bib
ACL Anthology
-
Conversational Word Embedding for Retrieval-based Dialog System. Wentao Ma, Yiming Cui, Ting Liu, Dong Wang, Shijin Wang, Guoping Hu. ACL 2020. ๐ Bib ๐ชง Slides
ACL Anthology GitHub
-
TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing. Ziqing Yang, Yiming Cui, Zhipeng Chen, Wanxiang Che, Ting Liu, Shijin Wang, Guoping Hu. ACL 2020 (System Demonstration). ๐ Bib ๐ชง Slides
ACL Anthology GitHub
-
Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less Forgetting. Sanyuan Chen, Yutai Hou, Yiming Cui, Wanxiang Che, Ting Liu, Xiangzhan Yu. EMNLP 2020. ๐ Bib
ACL Anthology GitHub
2019
-
Cross-Lingual Machine Reading Comprehension. Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Shijin Wang, Guoping Hu. EMNLP 2019. ๐ Bib ๐ชง Slides
ACL Anthology GitHub
-
A Span-Extraction Dataset for Chinese Machine Reading Comprehension. Yiming Cui, Ting Liu, Wanxiang Che, Li Xiao, Zhipeng Chen, Wentao Ma, Shijin Wang, Guoping Hu. EMNLP 2019. ๐ Bib
ACL Anthology ๐ Leaderboard GitHub
-
Contextual Recurrent Units for Cloze-style Reading Comprehension. Yiming Cui, Wei-Nan Zhang, Wanxiang Che, Ting Liu, Zhipeng Chen, Shijin Wang, Guoping Hu. arXiv pre-print: 1911.05960. ๐ Bib arXiv
-
Convolutional Spatial Attention Model for Reading Comprehension with Multiple-Choice Questions. Zhipeng Chen, Yiming Cui, Wentao Ma, Shijin Wang, Guoping Hu. AAAI 2019. ๐ Bib ๐ชง Slides ๐ข AAAI Publisher
-
TripleNet: Triple Attention Network for Multi-Turn Response Selection in Retrieval-based Chatbots. Wentao Ma, Yiming Cui, Nan Shao, Su He, Wei-Nan Zhang, Ting Liu, Shijin Wang, Guoping Hu. CoNLL 2019. ๐ Bib ๐ชง Slides
ACL Anthology GitHub
-
Improving Machine Reading Comprehension via Adversarial Training. Ziqing Yang, Yiming Cui, Wanxiang Che, Ting Liu, Shijin Wang, Guoping Hu. arXiv pre-print: 1911.03614. ๐ Bib arXiv
-
Exploiting Persona Information for Diverse Generation of Conversational Responses. Haoyu Song, Wei-Nan Zhang, Yiming Cui, Dong Wang, Ting Liu. IJCAI 2019. ๐ Bib ๐ข IJCAI Publisher
-
CJRC: A Reliable Human-Annotated Benchmark DataSet for Chinese Judicial Reading Comprehension. Xingyi Duan, Baoxin Wang, Ziyue Wang, Wentao Ma, Yiming Cui, Dayong Wu, Shijin Wang, Ting Liu, Tianxiang Huo, Zhen Hu, Heng Wang, Zhiyuan Liu. CCL 2019. ๐ Bib arXiv GitHub
2018
-
Dataset for the First Evaluation on Chinese Machine Reading Comprehension. Yiming Cui, Ting Liu, Zhipeng Chen, Wentao Ma, Shijin Wang, Guoping Hu. LREC 2018. ๐ Bib
ACL Anthology GitHub
-
Context-Sensitive Generation of Open-Domain Conversational Responses. Wei-Nan Zhang, Yiming Cui, Yifa Wang, Qingfu Zhu, Lingzhi Li, Lianqiang Zhou, Ting Liu. COLING 2018. ๐ Bib
ACL Anthology
-
HFL-RC System at SemEval-2018 Task 11: Hybrid Multi-Aspects Model for Commonsense Reading Comprehension. Zhipeng Chen, Yiming Cui*, Wentao Ma, Shijin Wang, Ting Liu, Guoping Hu. arXiv pre-print: 1803.05655. ๐ Bib arXiv
-
A Car Manual Question Answering System based on Neural Network (ๅบไบ็ฅ็ป็ฝ็ป็ๆฑฝ่ฝฆ่ฏดๆไนฆ้ฎ็ญ็ณป็ป). Le Qi, Yu Zhang, Wentao Ma, Yiming Cui, Shijin Wang, Ting Liu. Journal of Shanxi University (Natural Science Edition). ๐ Bib
2017
-
Attention-over-Attention Neural Networks for Reading Comprehension. Yiming Cui, Zhipeng Chen, Si Wei, Shijin Wang, Ting Liu, Guoping Hu. ACL 2017. Most Influential ACL 2017 Papers (Top 11) by Paper Digest. ๐ Bib ๐ชง Slides ACL Anthology
-
The Brilliant Chinese Achievements in SQuAD Challenge (ๆฏๅฆ็ฆSQuADๆๆ่ต็ไธญๅฝไบฎไธฝๆฆๅ). Yiming Cui, Ting Liu, Shijin Wang, Zhipeng Chen, Wentao Ma, Guoping Hu. Communication of CCF. ๐ Bib
-
Generating and Exploiting Large-scale Pseudo Training Data for Zero Pronoun Resolution. Ting Liu, Yiming Cui, Qingyu Yin, Wei-Nan Zhang, Shijin Wang, Guoping Hu. ACL 2017. ๐ Bib ๐ชง Slides
ACL Anthology
2016
-
Consensus Attention-based Neural Networks for Chinese Reading Comprehension. Yiming Cui, Ting Liu, Zhipeng Chen, Shijin Wang, Guoping Hu. COLING 2016. ๐ Bib ๐ชง Slides ACL Anthology GitHub
-
LSTM Neural Reordering Feature for Statistical Machine Translation. Yiming Cui, Shijin Wang, Jianfeng Li. NAACL 2016. ๐ Bib ๐ชง Slides
ACL Anthology
2015 and earlier
-
Augmenting Phrase Table by Employing Lexicons for Pivot-based SMT. Yiming Cui, Conghui Zhu, Xiaoning Zhu, Tiejun Zhao. arXiv pre-print: 1512.00170. ๐ Bib arXiv
-
Context-extended Phrase Reordering Model for Pivot-based Statistical Machine Translation. Xiaoning Zhu, Tiejun Zhao, Yiming Cui, Conghui Zhu. IALP 2015. ๐ Bib
IEEE Xplore
-
The USTC Machine Translation System for IWSLT2014. Shijin Wang, Yuguang Wang, Jianfeng Li, Yiming Cui, Lirong Dai. IWSLT 2014. ๐ Bib
ACL Anthology
-
Phrase Table Combination Deficiency Analyses in Pivot-based SMT. Yiming Cui, Conghui Zhu, Xiaoning Zhu, Tiejun Zhao, Dequan Zheng. NLDB 2013. ๐ Bib
Springer -
The HIT-LTRC Machine Translation System for IWSLT 2012. Xiaoning Zhu, Yiming Cui, Conghui Zhu, Tiejun Zhao, Hailong Cao. IWSLT 2012. ๐ Bib ๐ชง Slides
ACL Anthology