CMRC 2019

A Sentence Cloze Dataset for Chinese Machine Reading Comprehension

What is CMRC 2019?

CMRC 2019 is a Chinese Machine Reading Comprehension dataset that was used in The Third Evaluation Workshop on Chinese Machine Reading Comprehension. Specifically, CMRC 2019 is a sentence cloze-style machine reading comprehension dataset that aims to evaluate the sentence-level inference ability.

CMRC 2019 paper
[Cui et al., COLING 2020]

Getting Started

Download a copy of the dataset (distributed under the CC BY-SA 4.0 license):

You may also be interested in a quick baseline system based on pre-trained language model (such as BERT).

Official Submission

To preserve the integrity of test results, we do not release the test and challenge set to the public. Instead, we require you to upload your model onto CodaLab so that we can run it on the test and challenge set for you. You can follow the instructions on CodaLab (which is similar to SQuAD or CMRC 2018 submission). Submission Tutorial

Have Questions?

Ask us questions at our GitHub repository or at cmrc2019 [at] 126 [dot] com .

Leaderboard

CMRC 2019 contains fake candidates that need the machine to distinguish from the correct ones and fill into the passage. Will your system surpass the humans on this task?

Rank Model
QAC PAC
Human Performance

Joint Laboratory of HIT and iFLYTEK Research

[Cui et al., COLING 2020]
95.326 75.000

1

2019/10/19
bert_scp_spm (ensemble)

PINGAN-GammaLab

90.054 57.600

2

2019/10/19
mojito system (ensemble)

SFTech

85.990 41.800

3

2019/10/19
CMRC 2019 MULTIPLE BERT (ensemble)

Six Estates

https://www.6estates.com
82.590 32.200

4

2019/10/19
DA-BERT (ensemble)

Anonymous

84.447 27.600

5

2019/10/19
nkyzhangyi_cmrc_v2 (ensemble)

CICC

79.562 26.600

6

2019/10/19
MRC-ZZ SYSTEM (single model)

Harbin Institute of Technology & Hanyi Fonts

78.780 26.600

7

2019/10/19
MB-Reader (ensemble)

ECUST

76.319 15.600