24 EMNLP On Fake News Detection with LLM Enhanced Semantics Mining

52 阅读2分钟
  • LLM-enhanced semantic features (新闻、实体、话题)
  • Generalized Page-Rank model
  • Learning criterion for mining the local and global semantics LLMs 能否有效输出新闻表示,用于虚假新闻检测?
  • 当假新闻模仿真实新闻的语言风格时,简单使用LLM的方法就失效了。

GAP

image.png

  • 在特定主题的假新闻中,有意义实体的不规则共现。

  • Simply applying news embeddings from LLMs is ineffective for fake news detection. (本质上是想捕捉假新闻的__偏离模式__:high-level semantics among named entities and topics, which reveal the deviating patterns of fake news, have been ignored.) “Irregular co-occurrence”指的是在特定话题的假新闻中,有意义的实体(如人物、地点、事件等)之间的不规律或不一致的同时出现。这可能意味着这些实体在文本中出现的方式或频率与预期或常规模式不符,可能反映出信息的混乱或误导性。例如,如果一篇假新闻中频繁提到某个不相关的人物或事件,而这些人物或事件在真实报道中并不常见,这就可以被视为一种不规则的共现。这种现象可能是识别假新闻的重要线索。

  • 围绕以下两个问题展开:

  1. P1. How can we apply LLMs to explore high-level news semantics? (summarized topic-to-graph)
  2. P2. How can we identify the irregular semantics in fake news? (local and global news semantics)

Idea

We propose a topic model together with a set of specially designed prompts to extract topics and real entities from LLMs and model the relations among news, entities, and topics as a heterogeneous graph to facilitate investigating news semantics. We then propose a Generalized Page-Rank model and a consistent learning criterion for mining the local and global semantics centered on each news piece through the adaptive propagation of features across the graph.

  • Generalized PageRank (GPR),
  • Global and Local Semantics Mining: (small step = 2 \rightarrow local semantic; larger step = 20 \rightarrow global semantic.)
  • LceL_{ce}: Cross-entropy loss for labeled data, LconL_{con}: KL-divergence loss for unlabeled data.
  • y^i=1\hat{y}_i = 1: Fake;otherwise:True.

image.png image.png

Datasets

MM COVID, ReCOVery, MC Fake, LIAR, PAN2020

Experimental Results

image.png

  • pair-wise t-test at a 95% confidence level (a = 0.05).
  • potential data contamination
  • Silhouette Score

参考文献

  • Yuchen Zhang, Xiaoxiao Ma, Jia Wu, Jian Yang, and Hao Fan. 2024. Heterogeneous subgraph transformer for fake news detection. In WWW.