文本生成-LSTM

358 阅读4分钟

知识积累:

  1. PyTorch:Bi-LSTM的文本生成

  2. How to create a poet / writer using Deep Learning (Text Generation using Python)?

2-1.如何利用深度学习写诗歌(使用Python进行文本生成)

  1. Essentials of Deep Learning : Introduction to Long Short Term Memory 学习LSTM理论

  2. Pytorch实现基于LSTM的文本生成示例

  3. PyTorch LSTM: Text Generation Tutorial 5-1 PyTorch LSTM: Text Generation Tutorial 5-2 PyTorch LSTM:文本生成教程

6.安娜卡列尼娜》文本生成——利用TensorFlow构建LSTM模型

以上都是前期的知识积累,有些优秀的代码, 下面: 磐创AI 进行代码练习:

下载原数据

Project Gutenberg is a library of over 60,000 free eBooks

image.png

解析文本数据的处理过程:

image.png

image.png 最后的文本只保留了letters中27个字符结构

image.png

字符是不能直接使用的,需要转化对应的数值类型。

def create_dictionary(text):
    char_to_idx = dict()
    idx_to_char = dict()
    idx = 0
    for char in text:
        if char not in char_to_idx.keys():
            # 构建字典
            char_to_idx[char] = idx
            idx_to_char[idx] = char
            idx += 1
    return char_to_idx,idx_to_char

测试一下:

image.png 就是对27个字符进行进行编号,这里其实我觉得完全可以自己手动完成,总共也就26个字母+一个空格的字符,没必要遍历全部的文档,我猜测可能是担心整个文档439522个字符中不是全部包含了26+''这些字符。

序列生成

中滑动窗口为4,如果包含空格字符的话:应该是5个字符才对啊

    x = list()
    y = list()

    for i in range(len(text)):
        try:
            # 从文本中获取字符窗口
            # 将其转换为其idx表示
            sequence1 = text[i:i + window]
            sequence = [char_to_idx[char] for char in sequence1]

            # 得到target
            # 转换到它的idx表示
            target1 = text[i + window]
            target = char_to_idx[target1]

            # 保存sequence和target
            x.append(sequence)
            y.append(target)

        except:
            pass

    x = np.array(x)
    y = np.array(y)

    return x, y

image.png

image.png

这么理解的:输入的事前四个字符 sequence,输出预测是不是 target

image.png 最后x是一个N*4的矩阵, 这个N是text的总长度

Bi-LSTM循环神经网络

该方法包括将每个字符序列传递到嵌入层,这将为构成序列的每个元素生成向量形式的表示,因此我们将形成一个嵌入字符序列。随后,嵌入字符序列的每个元素将被传递到Bi-LSTM层。随后,将生成构成双LSTM(前向LSTM和后向LSTM)的LSTM的每个输出的串联。紧接着,每个前向+后向串联的向量将被传递到LSTM层,最后一个隐藏状态将从该层传递给线性层。最后一个线性层将有一个Softmax函数作为激活函数,以表示每个字符的概率。

标准LSTM和Bi-LSTM的关键区别在于Bi-LSTM由2个LSTM组成,通常称为“正向LSTM”和“反向LSTM”

首先让我们了解一下如何构造TextGenerator类的构造函数

代码段4-文本生成器类的构造函数


import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F

class TextGenerator(nn.ModuleList):

    def __init__(self, args, vocab_size):
        super(TextGenerator, self).__init__()

        self.batch_size = args.batch_size
        self.hidden_dim = args.hidden_dim
        self.input_size = vocab_size
        self.num_classes = vocab_size

        self.sequence_len = args.window
    
        # Dropout
        self.dropout = nn.Dropout(0.25)
    
        # Embedding 层
        self.embedding = nn.Embedding(self.input_size, self.hidden_dim, padding_idx=0)
    
        # Bi-LSTM
        # 正向和反向
        self.lstm_cell_forward = nn.LSTMCell(self.hidden_dim, self.hidden_dim)
        self.lstm_cell_backward = nn.LSTMCell(self.hidden_dim, self.hidden_dim)
    
        # LSTM 层
        self.lstm_cell = nn.LSTMCell(self.hidden_dim * 2, self.hidden_dim * 2)
    
        # Linear 层
        self.linear = nn.Linear(self.hidden_dim * 2, self.num_classes)


第一次在jupyter中运行pytorch

image.png 按照警告:ipywidgets.readthedocs.io/en/stable/u… 安装 ipywidgets

pip install ipywidgets  -i https://pypi.douban.com/simple
jupyter nbextension enable --py widgetsnbextension

image.png

image.png 我在jupyter使用的事虚拟环境

image.png

在运行导入pytorch 就没有警告了。 解答

另一方面,在第20行和第21行中,我们定义了组成Bi-LSTM的两个「LSTMCells」 (向前和向后)。 在第24行中,我们定义了LSTMCell,它将与「Bi-LSTM」的输出一起馈送。 值得一提的是,隐藏状态的大小是Bi-LSTM的两倍,这是因为Bi-LSTM的输出是串联的。 稍后在第27行定义线性层,稍后将由softmax函数过滤。

image.png

完整代码

import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F

class TextGenerator(nn.ModuleList):
   def __init__(self, args, vocab_size):
      super(TextGenerator, self).__init__()
      
      self.batch_size = args.batch_size
      self.hidden_dim = args.hidden_dim
      self.input_size = vocab_size
      self.num_classes = vocab_size
      self.sequence_len = args.window
      
      # Dropout
      self.dropout = nn.Dropout(0.25)
      
      # Embedding layer
      self.embedding = nn.Embedding(self.input_size, self.hidden_dim, padding_idx=0)
      
      # Bi-LSTM
      # Forward and backward  正向和反向
      self.lstm_cell_forward = nn.LSTMCell(self.hidden_dim, self.hidden_dim)
      self.lstm_cell_backward = nn.LSTMCell(self.hidden_dim, self.hidden_dim)
      
      # LSTM layer
      self.lstm_cell = nn.LSTMCell(self.hidden_dim * 2, self.hidden_dim * 2)
      
      # Linear layer
      self.linear = nn.Linear(self.hidden_dim * 2, self.num_classes)
      
   def forward(self, x): # 代码片段5-权重初始化
   
      # Bi-LSTM
      # hs = [batch_size x hidden_size]
      # cs = [batch_size x hidden_size]
      hs_forward = torch.zeros(x.size(0), self.hidden_dim)
      cs_forward = torch.zeros(x.size(0), self.hidden_dim)
      hs_backward = torch.zeros(x.size(0), self.hidden_dim)
      cs_backward = torch.zeros(x.size(0), self.hidden_dim)
      
      # LSTM
      # hs = [batch_size x (hidden_size * 2)]
      # cs = [batch_size x (hidden_size * 2)]
      hs_lstm = torch.zeros(x.size(0), self.hidden_dim * 2)
      cs_lstm = torch.zeros(x.size(0), self.hidden_dim * 2)

      # Weights initialization
      torch.nn.init.kaiming_normal_(hs_forward)
      torch.nn.init.kaiming_normal_(cs_forward)
      torch.nn.init.kaiming_normal_(hs_backward)
      torch.nn.init.kaiming_normal_(cs_backward)
      torch.nn.init.kaiming_normal_(hs_lstm)
      torch.nn.init.kaiming_normal_(cs_lstm)

      # From idx to embedding
      out = self.embedding(x)
      
      # Prepare the shape for LSTM Cells
      out = out.view(self.sequence_len, x.size(0), -1)
      
      forward = []
      backward = []
      
      # Unfolding Bi-LSTM
      # Forward
      for i in range(self.sequence_len):
         hs_forward, cs_forward = self.lstm_cell_forward(out[i], (hs_forward, cs_forward))
         forward.append(hs_forward)
         
      # Backward
      for i in reversed(range(self.sequence_len)):
         hs_backward, cs_backward = self.lstm_cell_backward(out[i], (hs_backward, cs_backward))
         backward.append(hs_backward)

      # LSTM
      for fwd, bwd in zip(forward, backward):
         input_tensor = torch.cat((fwd, bwd), 1)
         hs_lstm, cs_lstm = self.lstm_cell(input_tensor, (hs_lstm, cs_lstm))

      # Last hidden state is passed through a linear layer
      out = self.linear(hs_lstm)

      return out
      

抱歉:后面的看不懂了,,我需要弄懂LSTM模型,代码流程看不懂了、