25 ICCL Numerical Error Detection Tasks for Language Models

19 阅读1分钟

Development of Numerical Error Detection Tasks to Analyze the Numerical Capabilities of Language Models

1. 模型:

数字错误检测任务,分析语言模型的数字能力。 image.png

链接:github.com/cogma/BeNED…

数据集:BeNEDect (understand numerical values.),数据来源如下:

image.png 包含4种错误类别:

image.png

2. 现状

Although GPT-3.5, GPT-4, and Llama 3 performed well on the numerical error detection task, their accuracy was still not as high as that of humans, indicating room for improvement.

  • 尤其在需要算术计算和专家知识的数字错误。
  • LLM比人更容易误判正确数字
  • Prompt的轻微变化对结果影响较大,不鲁棒

3. 发现

  • Numerical NLP Tasks:相比普通的错误检测,BeNEDect侧重于分析LLM的数字常识、计算能力、记忆能力。

欢迎评论补充相关资源...