20 NLPTEA Chinese Grammatical Error Detection Based on BERT Model

1. 模型：

In 2020, Chinese grammatical error diagnosis(CGED) was held in NLP-TEA. -> BERT-based model： Chinese as Foreign Language (CFL) learners

链接：github.com/NYCU-NLP/NL…

可分为正确类别和以下四类错误：

Four error types are defined as redundant words (denoted as a capital “R”), missing words (“M”), word selection errors (“S”), and word ordering errors (“W”).

Word Redundant Error
Word Missing Error
Word Selection Error
Word Disorder Error

2. 数据集（4）：

CGED (比赛数据集）
HSK （汉语第二语言数据）
Lang8 （普通话学习者语料）
School (中小学作业语料，closed-source)

3. 发现：

中小学数据增强的同源错误有助于提高模型性能
样本比例分布偏差会阻碍模型性能
False Positive Rate (FPR)：误诊率对该任务很重要。

4. Case：

5. 补充

2014年NLPTEA-CGED，先二分类，再错误细分四类
2015年NLPTEA-CGED，识别the range of occurring error.
2016年NLPTEA-CGED，More than one errors + HSK curpus.

欢迎评论补充相关资源...