CMRC 2018

What is CMRC 2018?

CMRC 2018 is a Chinese Machine Reading Comprehension dataset that was used in The Second Evaluation Workshop on Chinese Machine Reading Comprehension. Specifically, CMRC 2018 is a span-extraction reading comprehension dataset that is similar to SQuAD. Besides the regular training, development, and test set, we also include a challenging set that need comprehensive reasoning over multiple sentences, which is far more difficult.

Paper [Cui et al., EMNLP 2019] BibTeX [Cui et al., EMNLP 2019]

Getting Started

Download a copy of the dataset (distributed under the CC BY-SA 4.0 license):

Download CMRC 2018 Dataset

You may also be interested in a quick baseline system based on pre-trained language model (such as BERT).

Get Baseline Code

Official Submission

To preserve the integrity of test results, we do not release the test and challenge set to the public. Instead, we require you to upload your model onto CodaLab so that we can run it on the test and challenge set for you. You can follow the instructions on CodaLab (which is similar to SQuAD submission). Submission Tutorial

Have Questions?

Ask us questions at our GitHub repository or at cmrc2018 [at] 126 [dot] com .

CMRC 2018

Leaderboard

CMRC 2018 challenge set requires comprehensive reasoning over multiple clues in the passage, while keeping the original span-extraction format, which is far more challenging than the test set. Will your system surpass the humans on this task?

Rank	Model	Test		Challenge
Rank	Model	EM	F1	EM	F1
	Human Performance Joint Laboratory of HIT and iFLYTEK Research [Cui et al., EMNLP 2019]	92.400	97.914	90.382	95.248
🥇 Dec 8, 2020	MacBERT-large-extData-v2 (single model) AISpeech	80.409	93.768	36.706	66.905
🥈 Nov 12, 2020	MacBERT-large-extData (single model) AISpeech	77.998	92.882	38.492	67.109
🥉 Nov 3, 2020	RoBERTa-wwm-ext-large-extData (single model) AISpeech	76.997	92.171	32.540	63.597
4 May 1, 2020	MacBERT-large (single model) Joint Laboratory of HIT and iFLYTEK Research [Cui et al., Findings of EMNLP 2020]	74.786	90.693	31.923	60.177
5 Jan 22, 2021	ESPReader-large (single model) Shanghai Jiao Tong University	77.201	91.476	30.357	58.396
6 Oct 14, 2019	RoBERTa-wwm-ext-large (single model) Joint Laboratory of HIT and iFLYTEK Research [Cui et al., 2019]	74.198	90.604	31.548	60.074
7 Nov 5, 2019	RoBERTa-wwm-ext-large (single model) ChineseGLUE Team https://github.com/CLUEBenchmark	76.588	91.566	30.159	57.229
8 Nov 1, 2019	ALBERT-xlarge (single model) ChineseGLUE Team https://github.com/CLUEBenchmark	75.220	90.922	28.770	57.864
9 July 5, 2021	Macbert-QueryRestoration (single model) xiaobing.ai	77.140	91.425	27.976	56.160
10 Nov 1, 2019	RoBERTa-large (single model) ChineseGLUE Team https://github.com/CLUEBenchmark	76.118	90.949	26.587	54.594
11 May 1, 2020	MacBERT-base (single model) Joint Laboratory of HIT and iFLYTEK Research [Cui et al., Findings of EMNLP 2020]	73.179	89.486	30.159	54.039
12 May 1, 2019	Dual BERT (w/ SQuAD) (single model) Joint Laboratory of HIT and iFLYTEK Research [Cui et al., EMNLP 2019]	73.600	90.200	27.800	55.200
13 Jan 22, 2021	ESPReader-base (single model) Shanghai Jiao Tong University	75.567	89.926	26.587	53.129
14 June 25, 2021	MacBERT-large-fact-evidence (single model) Anonymous	72.952	89.374	26.389	55.009
15 Nov 1, 2019	ALBERT-large (single model) ChineseGLUE Team https://github.com/CLUEBenchmark	73.667	90.175	27.976	51.622
16 Aug 19, 2019	XLNet-mid (single model) Joint Laboratory of HIT and iFLYTEK Research [Cui et al., 2019]	69.300	89.200	29.100	55.800
17 May 11, 2020	UER (single model) Tencent Oteam https://github.com/dbiir/UER-py	78.795	91.992	24.206	46.951
18 Oct 22, 2020	RoBERTa-wwm-ext-large (single model) AISpeech	74.954	90.286	23.214	52.138
19 Nov 1, 2019	ERNIE 1.0 (maxlen=512) (single model) ChineseGLUE Team https://github.com/CLUEBenchmark	73.320	89.621	26.190	50.976
20 Nov 5, 2019	RoBERTa-wwm-ext-base (single model) ChineseGLUE Team https://github.com/CLUEBenchmark	73.892	89.748	25.198	50.759
21 Sep 10, 2019	RoBERTa-wwm-ext (single model) Joint Laboratory of HIT and iFLYTEK Research [Cui et al., 2019]	72.600	89.400	26.200	51.000
22 June 14, 2022	MacBERT-base (single model) TrustNow	74.259	89.247	22.421	50.785
23 Nov 5, 2019	BERT-wwm-ext (single model) ChineseGLUE Team https://github.com/CLUEBenchmark	73.238	88.784	22.421	46.430
24 Jul 30, 2019	BERT-wwm-ext (single model) Joint Laboratory of HIT and iFLYTEK Research [Cui et al., 2019]	71.400	87.700	24.000	47.300
25 May 1, 2019	Dual BERT (single model) Joint Laboratory of HIT and iFLYTEK Research [Cui et al., EMNLP 2019]	70.400	88.100	23.800	47.900
26 Nov 5, 2019	XLNet-mid (single model) ChineseGLUE Team https://github.com/CLUEBenchmark	66.517	86.099	24.008	52.859
27 Jun 20, 2019	BERT-wwm (single model) Joint Laboratory of HIT and iFLYTEK Research [Cui et al., 2019]	70.500	87.400	21.000	47.000
28 Nov 1, 2019	BERT-base (google) (single model) ChineseGLUE Team https://github.com/CLUEBenchmark	69.724	87.174	21.230	44.861
29 Sep 17, 2018	Z-Reader (single model) ZhuiYi	74.178	88.145	13.889	37.422
30 Sep 17, 2018	MCA-Reader (ensemble) BISTU	71.175	88.090	15.476	37.104
31 Mar 27, 2019	P-Reader (single model) swjtu_PF	65.189	84.386	15.079	39.583
32 Sep 17, 2018	RCEN (ensemble) 6ESTATES PTE LTD	68.662	85.743	15.278	34.479
33 Sep 17, 2018	MCA-Reader (single model) BISTU	68.335	85.707	13.690	33.964
34 Sep 17, 2018	GM-Reader (ensemble) BUPT CIST	64.045	83.046	15.675	37.315
35 Sep 17, 2018	OmegaOne (ensemble) Fudan University	66.272	82.788	12.103	30.859
36 Sep 17, 2018	RCEN (single model) 6ESTATES PTE LTD	64.576	83.136	10.516	30.994
37 Sep 17, 2018	GM-Reader (single model) BUPT CIST	60.470	80.035	13.690	33.990
38 Sep 17, 2018	OmegaOne (single model) Fudan University	64.188	81.539	10.119	29.716
39 Mar 28, 2021	Xlingual-base42 (single model) KEG	65.189	83.661	0.922	15.995
40 Sep 17, 2018	R-NET (single model) ShanXi University	50.112	73.353	9.921	29.324
41 Nov 1, 2019	ALBERT-tiny (single model) ChineseGLUE Team https://github.com/CLUEBenchmark	53.687	75.738	6.746	25.600
42 Sep 17, 2018	SXU-Reader (single model) ShanXi University	44.270	70.673	6.548	28.116
43 Sep 17, 2018	Unnamed System (single model) THUIR & ILPS	44.883	66.859	7.341	22.317
44 Sep 17, 2018	Unnamed System (single model) USST NLP	37.916	63.502	5.159	18.687
45 Sep 17, 2018	SXU-Reader (ensemble) ShanXi University	46.210	70.482	0.000	0.000
46 Sep 17, 2018	Unnamed System (single model) Wuhan University	22.288	46.774	2.183	21.587
47 Sep 17, 2018	Unnamed System (single model) LittleBai - OpenMindClub	10.848	37.231	0.397	9.498
48 Sep 17, 2018	Unnamed System (single model) JSPI-POAL	0.449	34.224	2.579	20.078
49 May 31, 2021	XLQA (single model) Anonymous	1.961	32.438	0.198	8.687

CMRC 2018

A Span-Extraction Dataset for Chinese Machine Reading Comprehension

What is CMRC 2018?

Getting Started

Official Submission

Have Questions?

Leaderboard