CMRC 2018

A Span-Extraction Dataset for Chinese Machine Reading Comprehension

What is CMRC 2018?

CMRC 2018 is a Chinese Machine Reading Comprehension dataset that was used in The Second Evaluation Workshop on Chinese Machine Reading Comprehension. Specifically, CMRC 2018 is a span-extraction reading comprehension dataset that is similar to SQuAD. Besides the regular training, development, and test set, we also include a challenging set that need comprehensive reasoning over multiple sentences, which is far more difficult.

Paper [Cui et al., EMNLP 2019] BibTeX [Cui et al., EMNLP 2019]

Getting Started

Download a copy of the dataset (distributed under the CC BY-SA 4.0 license):

You may also be interested in a quick baseline system based on pre-trained language model (such as BERT).

Official Submission

To preserve the integrity of test results, we do not release the test and challenge set to the public. Instead, we require you to upload your model onto CodaLab so that we can run it on the test and challenge set for you. You can follow the instructions on CodaLab (which is similar to SQuAD submission). Submission Tutorial

Have Questions?

Ask us questions at our GitHub repository or at cmrc2018 [at] 126 [dot] com .

Leaderboard

CMRC 2018 challenge set requires comprehensive reasoning over multiple clues in the passage, while keeping the original span-extraction format, which is far more challenging than the test set. Will your system surpass the humans on this task?

Rank Model Test Challenge
EM F1 EM F1
Human Performance

Joint Laboratory of HIT and iFLYTEK Research

[Cui et al., EMNLP 2019]
92.400 97.914 90.382 95.248

🥇

Dec 8, 2020
MacBERT-large-extData-v2 (single model)

AISpeech

80.409 93.768 36.706 66.905

🥈

Nov 12, 2020
MacBERT-large-extData (single model)

AISpeech

77.998 92.882 38.492 67.109

🥉

Nov 3, 2020
RoBERTa-wwm-ext-large-extData (single model)

AISpeech

76.997 92.171 32.540 63.597

4

May 1, 2020
MacBERT-large (single model)

Joint Laboratory of HIT and iFLYTEK Research

[Cui et al., Findings of EMNLP 2020]
74.786 90.693 31.923 60.177

5

Jan 22, 2021
ESPReader-large (single model)

Shanghai Jiao Tong University

77.201 91.476 30.357 58.396

6

Oct 14, 2019
RoBERTa-wwm-ext-large (single model)

Joint Laboratory of HIT and iFLYTEK Research

[Cui et al., 2019]
74.198 90.604 31.548 60.074

7

Nov 5, 2019
RoBERTa-wwm-ext-large (single model)

ChineseGLUE Team

https://github.com/CLUEBenchmark
76.588 91.566 30.159 57.229

8

Nov 1, 2019
ALBERT-xlarge (single model)

ChineseGLUE Team

https://github.com/CLUEBenchmark
75.220 90.922 28.770 57.864

9

July 5, 2021
Macbert-QueryRestoration (single model)

xiaobing.ai

77.140 91.425 27.976 56.160

10

Nov 1, 2019
RoBERTa-large (single model)

ChineseGLUE Team

https://github.com/CLUEBenchmark
76.118 90.949 26.587 54.594

11

May 1, 2020
MacBERT-base (single model)

Joint Laboratory of HIT and iFLYTEK Research

[Cui et al., Findings of EMNLP 2020]
73.179 89.486 30.159 54.039

12

May 1, 2019
Dual BERT (w/ SQuAD) (single model)

Joint Laboratory of HIT and iFLYTEK Research

[Cui et al., EMNLP 2019]
73.600 90.200 27.800 55.200

13

Jan 22, 2021
ESPReader-base (single model)

Shanghai Jiao Tong University

75.567 89.926 26.587 53.129

14

June 25, 2021
MacBERT-large-fact-evidence (single model)

Anonymous

72.952 89.374 26.389 55.009

15

Nov 1, 2019
ALBERT-large (single model)

ChineseGLUE Team

https://github.com/CLUEBenchmark
73.667 90.175 27.976 51.622

16

Aug 19, 2019
XLNet-mid (single model)

Joint Laboratory of HIT and iFLYTEK Research

[Cui et al., 2019]
69.300 89.200 29.100 55.800

17

May 11, 2020
UER (single model)

Tencent Oteam

https://github.com/dbiir/UER-py
78.795 91.992 24.206 46.951

18

Oct 22, 2020
RoBERTa-wwm-ext-large (single model)

AISpeech

74.954 90.286 23.214 52.138

19

Nov 1, 2019
ERNIE 1.0 (maxlen=512) (single model)

ChineseGLUE Team

https://github.com/CLUEBenchmark
73.320 89.621 26.190 50.976

20

Nov 5, 2019
RoBERTa-wwm-ext-base (single model)

ChineseGLUE Team

https://github.com/CLUEBenchmark
73.892 89.748 25.198 50.759

21

Sep 10, 2019
RoBERTa-wwm-ext (single model)

Joint Laboratory of HIT and iFLYTEK Research

[Cui et al., 2019]
72.600 89.400 26.200 51.000

22

June 14, 2022
MacBERT-base (single model)

TrustNow

74.259 89.247 22.421 50.785

23

Nov 5, 2019
BERT-wwm-ext (single model)

ChineseGLUE Team

https://github.com/CLUEBenchmark
73.238 88.784 22.421 46.430

24

Jul 30, 2019
BERT-wwm-ext (single model)

Joint Laboratory of HIT and iFLYTEK Research

[Cui et al., 2019]
71.400 87.700 24.000 47.300

25

May 1, 2019
Dual BERT (single model)

Joint Laboratory of HIT and iFLYTEK Research

[Cui et al., EMNLP 2019]
70.400 88.100 23.800 47.900

26

Nov 5, 2019
XLNet-mid (single model)

ChineseGLUE Team

https://github.com/CLUEBenchmark
66.517 86.099 24.008 52.859

27

Jun 20, 2019
BERT-wwm (single model)

Joint Laboratory of HIT and iFLYTEK Research

[Cui et al., 2019]
70.500 87.400 21.000 47.000

28

Nov 1, 2019
BERT-base (google) (single model)

ChineseGLUE Team

https://github.com/CLUEBenchmark
69.724 87.174 21.230 44.861

29

Sep 17, 2018
Z-Reader (single model)

ZhuiYi

74.178 88.145 13.889 37.422

30

Sep 17, 2018
MCA-Reader (ensemble)

BISTU

71.175 88.090 15.476 37.104

31

Mar 27, 2019
P-Reader (single model)

swjtu_PF

65.189 84.386 15.079 39.583

32

Sep 17, 2018
RCEN (ensemble)

6ESTATES PTE LTD

68.662 85.743 15.278 34.479

33

Sep 17, 2018
MCA-Reader (single model)

BISTU

68.335 85.707 13.690 33.964

34

Sep 17, 2018
GM-Reader (ensemble)

BUPT CIST

64.045 83.046 15.675 37.315

35

Sep 17, 2018
OmegaOne (ensemble)

Fudan University

66.272 82.788 12.103 30.859

36

Sep 17, 2018
RCEN (single model)

6ESTATES PTE LTD

64.576 83.136 10.516 30.994

37

Sep 17, 2018
GM-Reader (single model)

BUPT CIST

60.470 80.035 13.690 33.990

38

Sep 17, 2018
OmegaOne (single model)

Fudan University

64.188 81.539 10.119 29.716

39

Mar 28, 2021
Xlingual-base42 (single model)

KEG

65.189 83.661 0.922 15.995

40

Sep 17, 2018
R-NET (single model)

ShanXi University

50.112 73.353 9.921 29.324

41

Nov 1, 2019
ALBERT-tiny (single model)

ChineseGLUE Team

https://github.com/CLUEBenchmark
53.687 75.738 6.746 25.600

42

Sep 17, 2018
SXU-Reader (single model)

ShanXi University

44.270 70.673 6.548 28.116

43

Sep 17, 2018
Unnamed System (single model)

THUIR & ILPS

44.883 66.859 7.341 22.317

44

Sep 17, 2018
Unnamed System (single model)

USST NLP

37.916 63.502 5.159 18.687

45

Sep 17, 2018
SXU-Reader (ensemble)

ShanXi University

46.210 70.482 0.000 0.000

46

Sep 17, 2018
Unnamed System (single model)

Wuhan University

22.288 46.774 2.183 21.587

47

Sep 17, 2018
Unnamed System (single model)

LittleBai - OpenMindClub

10.848 37.231 0.397 9.498

48

Sep 17, 2018
Unnamed System (single model)

JSPI-POAL

0.449 34.224 2.579 20.078

49

May 31, 2021
XLQA (single model)

Anonymous

1.961 32.438 0.198 8.687