CMRC 2018 is a Chinese Machine Reading Comprehension dataset that was used in The Second Evaluation Workshop on Chinese Machine Reading Comprehension. Specifically, CMRC 2018 is a span-extraction reading comprehension dataset that is similar to SQuAD. Besides the regular training, development, and test set, we also include a challenging set that need comprehensive reasoning over multiple sentences, which is far more difficult.
Paper [Cui et al., EMNLP 2019] BibTeX [Cui et al., EMNLP 2019]Download a copy of the dataset (distributed under the CC BY-SA 4.0 license):
You may also be interested in a quick baseline system based on pre-trained language model (such as BERT).
To preserve the integrity of test results, we do not release the test and challenge set to the public. Instead, we require you to upload your model onto CodaLab so that we can run it on the test and challenge set for you. You can follow the instructions on CodaLab (which is similar to SQuAD submission). Submission Tutorial
Ask us questions at our GitHub repository or at cmrc2018 [at] 126 [dot] com .
CMRC 2018 challenge set requires comprehensive reasoning over multiple clues in the passage, while keeping the original span-extraction format, which is far more challenging than the test set. Will your system surpass the humans on this task?
Rank | Model | Test | Challenge | ||
---|---|---|---|---|---|
EM | F1 | EM | F1 | ||
Human Performance
Joint Laboratory of HIT and iFLYTEK Research [Cui et al., EMNLP 2019] |
92.400 | 97.914 | 90.382 | 95.248 | |
🥇 Dec 8, 2020 |
MacBERT-large-extData-v2 (single model)
AISpeech |
80.409 | 93.768 | 36.706 | 66.905 |
🥈 Nov 12, 2020 |
MacBERT-large-extData (single model)
AISpeech |
77.998 | 92.882 | 38.492 | 67.109 |
🥉 Nov 3, 2020 |
RoBERTa-wwm-ext-large-extData (single model)
AISpeech |
76.997 | 92.171 | 32.540 | 63.597 |
4 May 1, 2020 |
MacBERT-large (single model)
Joint Laboratory of HIT and iFLYTEK Research [Cui et al., Findings of EMNLP 2020] |
74.786 | 90.693 | 31.923 | 60.177 |
5 Jan 22, 2021 |
ESPReader-large (single model)
Shanghai Jiao Tong University |
77.201 | 91.476 | 30.357 | 58.396 |
6 Oct 14, 2019 |
RoBERTa-wwm-ext-large (single model)
Joint Laboratory of HIT and iFLYTEK Research [Cui et al., 2019] |
74.198 | 90.604 | 31.548 | 60.074 |
7 Nov 5, 2019 |
RoBERTa-wwm-ext-large (single model)
ChineseGLUE Team https://github.com/CLUEBenchmark |
76.588 | 91.566 | 30.159 | 57.229 |
8 Nov 1, 2019 |
ALBERT-xlarge (single model)
ChineseGLUE Team https://github.com/CLUEBenchmark |
75.220 | 90.922 | 28.770 | 57.864 |
9 July 5, 2021 |
Macbert-QueryRestoration (single model)
xiaobing.ai |
77.140 | 91.425 | 27.976 | 56.160 |
10 Nov 1, 2019 |
RoBERTa-large (single model)
ChineseGLUE Team https://github.com/CLUEBenchmark |
76.118 | 90.949 | 26.587 | 54.594 |
11 May 1, 2020 |
MacBERT-base (single model)
Joint Laboratory of HIT and iFLYTEK Research [Cui et al., Findings of EMNLP 2020] |
73.179 | 89.486 | 30.159 | 54.039 |
12 May 1, 2019 |
Dual BERT (w/ SQuAD) (single model)
Joint Laboratory of HIT and iFLYTEK Research [Cui et al., EMNLP 2019] |
73.600 | 90.200 | 27.800 | 55.200 |
13 Jan 22, 2021 |
ESPReader-base (single model)
Shanghai Jiao Tong University |
75.567 | 89.926 | 26.587 | 53.129 |
14 June 25, 2021 |
MacBERT-large-fact-evidence (single model)
Anonymous |
72.952 | 89.374 | 26.389 | 55.009 |
15 Nov 1, 2019 |
ALBERT-large (single model)
ChineseGLUE Team https://github.com/CLUEBenchmark |
73.667 | 90.175 | 27.976 | 51.622 |
16 Aug 19, 2019 |
XLNet-mid (single model)
Joint Laboratory of HIT and iFLYTEK Research [Cui et al., 2019] |
69.300 | 89.200 | 29.100 | 55.800 |
17 May 11, 2020 |
UER (single model)
Tencent Oteam https://github.com/dbiir/UER-py |
78.795 | 91.992 | 24.206 | 46.951 |
18 Oct 22, 2020 |
RoBERTa-wwm-ext-large (single model)
AISpeech |
74.954 | 90.286 | 23.214 | 52.138 |
19 Nov 1, 2019 |
ERNIE 1.0 (maxlen=512) (single model)
ChineseGLUE Team https://github.com/CLUEBenchmark |
73.320 | 89.621 | 26.190 | 50.976 |
20 Nov 5, 2019 |
RoBERTa-wwm-ext-base (single model)
ChineseGLUE Team https://github.com/CLUEBenchmark |
73.892 | 89.748 | 25.198 | 50.759 |
21 Sep 10, 2019 |
RoBERTa-wwm-ext (single model)
Joint Laboratory of HIT and iFLYTEK Research [Cui et al., 2019] |
72.600 | 89.400 | 26.200 | 51.000 |
22 June 14, 2022 |
MacBERT-base (single model)
TrustNow |
74.259 | 89.247 | 22.421 | 50.785 |
23 Nov 5, 2019 |
BERT-wwm-ext (single model)
ChineseGLUE Team https://github.com/CLUEBenchmark |
73.238 | 88.784 | 22.421 | 46.430 |
24 Jul 30, 2019 |
BERT-wwm-ext (single model)
Joint Laboratory of HIT and iFLYTEK Research [Cui et al., 2019] |
71.400 | 87.700 | 24.000 | 47.300 |
25 May 1, 2019 |
Dual BERT (single model)
Joint Laboratory of HIT and iFLYTEK Research [Cui et al., EMNLP 2019] |
70.400 | 88.100 | 23.800 | 47.900 |
26 Nov 5, 2019 |
XLNet-mid (single model)
ChineseGLUE Team https://github.com/CLUEBenchmark |
66.517 | 86.099 | 24.008 | 52.859 |
27 Jun 20, 2019 |
BERT-wwm (single model)
Joint Laboratory of HIT and iFLYTEK Research [Cui et al., 2019] |
70.500 | 87.400 | 21.000 | 47.000 |
28 Nov 1, 2019 |
BERT-base (google) (single model)
ChineseGLUE Team https://github.com/CLUEBenchmark |
69.724 | 87.174 | 21.230 | 44.861 |
29 Sep 17, 2018 |
Z-Reader (single model)
ZhuiYi |
74.178 | 88.145 | 13.889 | 37.422 |
30 Sep 17, 2018 |
MCA-Reader (ensemble)
BISTU |
71.175 | 88.090 | 15.476 | 37.104 |
31 Mar 27, 2019 |
P-Reader (single model)
swjtu_PF |
65.189 | 84.386 | 15.079 | 39.583 |
32 Sep 17, 2018 |
RCEN (ensemble)
6ESTATES PTE LTD |
68.662 | 85.743 | 15.278 | 34.479 |
33 Sep 17, 2018 |
MCA-Reader (single model)
BISTU |
68.335 | 85.707 | 13.690 | 33.964 |
34 Sep 17, 2018 |
GM-Reader (ensemble)
BUPT CIST |
64.045 | 83.046 | 15.675 | 37.315 |
35 Sep 17, 2018 |
OmegaOne (ensemble)
Fudan University |
66.272 | 82.788 | 12.103 | 30.859 |
36 Sep 17, 2018 |
RCEN (single model)
6ESTATES PTE LTD |
64.576 | 83.136 | 10.516 | 30.994 |
37 Sep 17, 2018 |
GM-Reader (single model)
BUPT CIST |
60.470 | 80.035 | 13.690 | 33.990 |
38 Sep 17, 2018 |
OmegaOne (single model)
Fudan University |
64.188 | 81.539 | 10.119 | 29.716 |
39 Mar 28, 2021 |
Xlingual-base42 (single model)
KEG |
65.189 | 83.661 | 0.922 | 15.995 |
40 Sep 17, 2018 |
R-NET (single model)
ShanXi University |
50.112 | 73.353 | 9.921 | 29.324 |
41 Nov 1, 2019 |
ALBERT-tiny (single model)
ChineseGLUE Team https://github.com/CLUEBenchmark |
53.687 | 75.738 | 6.746 | 25.600 |
42 Sep 17, 2018 |
SXU-Reader (single model)
ShanXi University |
44.270 | 70.673 | 6.548 | 28.116 |
43 Sep 17, 2018 |
Unnamed System (single model)
THUIR & ILPS |
44.883 | 66.859 | 7.341 | 22.317 |
44 Sep 17, 2018 |
Unnamed System (single model)
USST NLP |
37.916 | 63.502 | 5.159 | 18.687 |
45 Sep 17, 2018 |
SXU-Reader (ensemble)
ShanXi University |
46.210 | 70.482 | 0.000 | 0.000 |
46 Sep 17, 2018 |
Unnamed System (single model)
Wuhan University |
22.288 | 46.774 | 2.183 | 21.587 |
47 Sep 17, 2018 |
Unnamed System (single model)
LittleBai - OpenMindClub |
10.848 | 37.231 | 0.397 | 9.498 |
48 Sep 17, 2018 |
Unnamed System (single model)
JSPI-POAL |
0.449 | 34.224 | 2.579 | 20.078 |
49 May 31, 2021 |
XLQA (single model)
Anonymous |
1.961 | 32.438 | 0.198 | 8.687 |