Refined Evaluation for End-to-End Grammatical Error Correction Using an Alignment-Based Approach (open-writing-evaluation.github.io/) 使用基于对齐的方法对端到端语法错误纠正进行精细评估
- 旨在复现并改进现有评估工具(例如 Errant)的结果,解决了文本预处理中句子边界检测偏差带来的挑战。
- 我们还提出了一个潜在的多语言 Errant,用于展示中文和韩语的 GEC 结果。
1. 模型 (Alignment-based errant)
our work adapted errant by incorporating an alignment-based preprocessing approach.
- Our primary contribution is the refined alignment process, which addresses discrepancies in sentence boundaries between gold-standard and system-generated results.
与MT的递归编辑(juejin.cn/post/749156… )不同的是,该工作采用jointly preprocessed 算法, 通过对齐来预处理源和目标之间的句子边界和标记。 (sentence boundary and tokenization between source and target)
jp-errant: 1) using stanza for improved Chinese word segmentation and POS tagging, and 2) adhering to the original errant conventions (Missing, Unecessary, and Replacement) for consistent multilingual grammatical error annotation.
2. 背景知识
-
分词变化可能会引入标准句子中不存在的语法词素,反之亦然。这里的语法词素,也就是功能词functional words如介词、连词等;与之对应的是内容词,即lexical word。
- can’t 可以被分词为can not or ca n’t
-
编辑距离Jaro-Winkler distance:原始的Jaro相似度用于计算两个序列前向匹配的字符数,Jaro-Winkler distance对给定前缀长度l引入前缀比例因子p。类似地,本文的工作旨在引入后缀比例因子,平衡权重。
-
Gale和Church在1993年提出了一个基于长度进行句对齐的算法
-
the Gale and Church algorithm
-
Write & Improve (W&I) corpus (Bryant et al., 2019).
6. the distribution of grammatical errors (Zeng et al., 2024). -
extended Gale-Church algorithm (EGC) and the jp algorithm (JP)
3. 评估指标
To measure the performance and reliability of GEC systems, there are several evaluation metrics:
- M 2 (Dahlmeier and Ng, 2012)
- GLEU (Napoles et al., 2015)
- errant (Bryant et al., 2017; Bryant, 2019)
- PT M2 (Gong et al., 2022)
errant (ERRor ANnotation Toolkit) 是当前评估GEC的事实上标准(the de facto standard for GEC)。
- 既有整体性能分析、也有特定错误类别分析
- 已经用于多语言,German, Chinese, and Korean, among others (Boyd, 2018; Hinson et al., 2020; Zhang et al., 2022; Sonawane et al., 2020; Belkebir and Habash, 2021; Náplava et al., 2022; Katinskaia et al., 2022; Yoon et al., 2023).
4. 工具
- ChERRANT for Chinese (Zhang et al., 2022)
- KAGAS for Korean (Korean Automatic Grammatical error Annotation System; Yoon et al., 2023)
- LTP (github.com/HIT-SCIR/lt…)
- stanza (Qi et al., 2020) :stanfordnlp.github.io/stanza/perf…
5. 数据集
- Chinese L2 GEC dataset: github.com/HillZhang19…
- Korean L2 GEC dataset:github.com/soyoung97/S…
参考
- 中文L2语法错误检测模型: huggingface.co/HillZhang/r… (Chinese BART-large model (Zhang et al.,2023) trained on HSK and Lang8 datasets.)
- 韩语语法错误纠正模型:huggingface.co/Soyoung97/g…
- various transformer models tested (Omelianchuk et al., 2020): github.com/grammarly/g…
- The model fine-tuned on the cleaned English LANG-8 corpus: huggingface.co/Unbabel/gec…