23 ACL ACTIFY-5WQA: 5W Aspect-based Fact Verification

0 阅读2分钟

FACTIFY-5WQA: 5W Aspect-based Fact Verification through Question Answering

github.com/ankuranii/a…

  • Journalists follow an established practice for fact-checking, verifying the so-called 5Ws (Mott, 1942), (Stofer et al., 2009), (Silverman, 2020), (Su et al., 2019), (Smarts, 2017).
  • A false claim is very likely to have some truth in it, some correct information. This work aims to detect ‘exactly where the lie lies’. (Which aspect?)

动机

  • 已有事实核查系统侧重于使用数值分数来评估真实性,对人类可解释性较差。而人类事实核查员通常遵循若干逻辑步骤来验证真实性主张,从而判定真实性。

  • 现有核查网站也仅给出half true, half false, false, pants on fire, etc判断,因而需要给出方面级的解释系统(即哪一方面是真的或假的),从而可以分开验证,并辅助核查人员。(Fact checking demands aspect-based explainability

  • contemporary practices: (tedious and time-consuming)

    • Research and fact-checking: 可靠信息源 如新闻服务,学术研究,政府数据等
    • Interviews and expert opinions: 咨询专家意见
    • Cross-checking with multiple sources:交叉验证
    • Verifying the credibility of sources:验证来源可信度

提出方法(5W:who, what, when, where, and why)

question-answer-based fact explainability,分解成构成要素

  • A semantic role labeling (SRL) 被用于定位5W 【off-the-shelf tools,如 (i) Stanford SRL (Manning et al., 2014), (ii) AllenNLP (AllenNLP, 2020), etc.】【PropBank (Palmer et al., 2005) arguments are mapped to 5W semantic roles.】
  • 使用masked language model 生成QA对

image.png

image.png

image.png

数据集和Baselines

  • FACTIFY-5WQA (391, 041 facts along with relevant 5W QAs) github.com/ankuranii/a…
  • FEVER (Thorne et al., 2018b), HoVer (Jiang et al., 2020), VITC (Schuster et al., 2021), FaVIQ (Park et al., 2021), Factify 1.0 (Patwa et al., 2022) and Factify 2.0 (Mishra et al., 2022)

image.png

  • 5W QA pair generation

image.png

  • 数据集构造

    • 保证句法正确的前提下,最大化句法差异(linguistic variations)
    • 由于没有衡量不相似度的现有指标,因此使用BLEU(Papineni et al., 2002)的倒数衡量不相似度。
  • 其他事实核查数据集

    • FEVER (Thorne et al., 2018a), LIAR (Wang, 2017), PolitiFact (Garg and Sharma, 2020), FavIQ (Kwiatkowski et al., 2019), Hover (Jiang et al., 2020), X-Fact (Gupta and Srikumar, 2021), CREAK (Onoe et al., 2021), FEVEROUS (Aly et al., 2021)

QA生成的人为评估(随机抽样)

  • For the evaluation purpose, a random sample of 3000 data points was selected for annotation.

image.png

小结

  • 普遍认为,最小编辑距离大于 2 是自然语言生成系统中一个理想的特性。
  • In-Context Learning 值得尝试
  • 该方法无法生成更复杂的问题
  • 进一步地,可以设计更抽象的多模态事实验证任务问答对