ELMo模型理解

279 阅读1分钟

Deep contextualized word representations

图片来自BERT原文

ELMO最好的理解

  1. (1) complex characteristics of word use (e.g., syntax and semantics)

BERT中的解释: ELMo and its predecessor (Peters et al., 2017, 2018a) generalize traditional word embedding research along a different dimension. They extract context-sensitive features from a left-to-right and a right-to-left language model. The contextual representation of each token is the concatenation of the left-to-right and right-to-left representations. ELMO使用左->右,右->左各两层LSTM来进行序列生成的训练,相比于GPT,增加了双向的语义信息。(也是BERT主要argue GPT的点)

  1. (2) how these uses vary across linguistic contexts(同词不同意)

此外ELMO利用最后一层的隐状态,和所有层训练一个线性组合的方式解决了通次不同意的问题。

优质答案:blog.csdn.net/qq_42791848…