25 ICCL Numerical Error Detection Tasks for Language Models

Development of Numerical Error Detection Tasks to Analyze the Numerical Capabilities of Language Models

1. 模型：

数字错误检测任务，分析语言模型的数字能力。

链接：github.com/cogma/BeNED…

数据集：BeNEDect （understand numerical values.），数据来源如下：

包含4种错误类别：

2. 现状

Although GPT-3.5, GPT-4, and Llama 3 performed well on the numerical error detection task, their accuracy was still not as high as that of humans, indicating room for improvement.

尤其在需要算术计算和专家知识的数字错误。
LLM比人更容易误判正确数字
Prompt的轻微变化对结果影响较大，不鲁棒

3. 发现

Numerical NLP Tasks：相比普通的错误检测，BeNEDect侧重于分析LLM的数字常识、计算能力、记忆能力。

欢迎评论补充相关资源...