自己训练一个小模型

30 阅读16分钟

自己训练一个小模型

Chatgpt都出4o为什么还要自己训练一个小模型呢?

  • 如果你能自己训练一个小模型,说明你已经掌握Transformer了
  • 有些场景就需要小模型,小而美,就像单片机就是有它的市场
  • 训练小模型不是目的,目的是通过训练小模型跑通大模型训练的基本流程

想要看懂或写出训练小模型的代码,需要学习前置 Transformer模型、机器学习、线性代数等相关知识

看不懂也没关系,直接运行试试效果

上代码

1!pip install numpy requests torch tiktoken matplotlib pandas
1import os
2import requests
3import math
4import tiktoken
5import torch
6import torch.nn as nn
7from torch.nn import functional as F
8
9# Hyperparameters
10batch_size = 4  # How many batches per training step
11context_length = 16  # Length of the token chunk each batch
12d_model = 64  # The size of our model token embeddings
13num_blocks = 8  # Number of transformer blocks
14num_heads = 4  # Number of heads in Multi-head attention
15learning_rate = 1e-3  # 0.001
16dropout = 0.1  # Dropout rate
17max_iters = 5000  # Total of training iterations <- Change this to smaller number for testing
18eval_interval = 50  # How often to evaluate
19eval_iters = 20  # Number of iterations to average for evaluation
20device = 'cuda' if torch.cuda.is_available() else 'cpu'  # Use GPU if it's available.
21TORCH_SEED = 1337
22torch.manual_seed(TORCH_SEED)
23
24# Load training data
25if not os.path.exists('data/sales_textbook.txt'):
26    url = 'https://huggingface.co/datasets/goendalf666/sales-textbook_for_convincing_and_selling/raw/main/sales_textbook.txt'
27    with open('data/sales_textbook.txt', 'w') as f:
28        f.write(requests.get(url).text)
29
30with open('data/sales_textbook.txt', 'r', encoding='utf-8') as f:
31    text = f.read()
32
33# Using TikToken (Same as GPT3) to tokenize the source text
34encoding = tiktoken.get_encoding("cl100k_base")
35tokenized_text = encoding.encode(text)
36max_token_value = max(tokenized_text) + 1  # the maximum value of the tokenized numbers
37tokenized_text = torch.tensor(tokenized_text, dtype=torch.long, device=device)  # put tokenized text into tensor
38
39# Split train and validation
40split_idx = int(len(tokenized_text) * 0.9)
41train_data = tokenized_text[:split_idx]
42val_data = tokenized_text[split_idx:]
43
44
45# Define Feed Forward Network
46class FeedForward(nn.Module):
47    def __init__(self):
48        super().__init__()
49        self.d_model = d_model
50        self.dropout = dropout
51        self.ffn = nn.Sequential(
52            nn.Linear(in_features=self.d_model, out_features=self.d_model * 4),
53            nn.ReLU(),
54            nn.Linear(in_features=self.d_model * 4, out_features=self.d_model),
55            nn.Dropout(dropout),
56        )
57
58    def forward(self, x):
59        return self.ffn(x)
60
61
62# Define Scaled Dot Product Attention
63class Attention(nn.Module):
64    def __init__(self, head_size: int):
65        super().__init__()
66        self.d_model = d_model
67        self.head_size = head_size
68        self.context_length = context_length
69        self.dropout = dropout
70
71        self.key_layer = nn.Linear(in_features=self.d_model, out_features=self.head_size, bias=False)
72        self.query_layer = nn.Linear(in_features=self.d_model, out_features=self.head_size, bias=False)
73        self.value_layer = nn.Linear(in_features=self.d_model, out_features=self.head_size, bias=False)
74        self.register_buffer('tril', torch.tril(
75            torch.ones((self.context_length, self.context_length))))  # Lower triangular mask
76        self.dropout_layer = nn.Dropout(self.dropout)
77
78    def forward(self, x):
79        B, T, C = x.shape  # Batch size, Time steps(current context_length), Channels(dimensions)
80        assert T <= self.context_length
81        assert C == self.d_model
82        q = self.query_layer(x)
83        k = self.key_layer(x)
84        v = self.value_layer(x)
85
86        # Scaled dot product attention: Q @ K^T / sqrt(d_k)
87        weights = (q @ k.transpose(-2, -1)) * (1.0 / math.sqrt(k.size(-1)))
88        # Apply masked attention
89        weights = weights.masked_fill(self.tril[:T, :T] == 0, float('-inf'))
90        weights = F.softmax(input=weights, dim=-1)
91        weights = self.dropout_layer(weights)
92
93        # Apply dot product attention: weights @ V
94        out = weights @ v
95        return out
96
97
98class MultiHeadAttention(nn.Module):
99    def __init__(self, head_size: int):
100        super().__init__()
101        self.num_heads = num_heads
102        self.head_size = head_size
103        self.d_model = d_model
104        self.context_length = context_length
105        self.dropout = dropout
106
107        self.heads = nn.ModuleList([Attention(head_size=self.head_size) for _ in range(self.num_heads)])
108        self.projection_layer = nn.Linear(in_features=self.d_model, out_features=self.d_model)
109        self.dropout_layer = nn.Dropout(dropout)
110
111    def forward(self, x):
112        out = torch.cat([h(x) for h in self.heads], dim=-1)
113        out = self.projection_layer(out)
114        out = self.dropout_layer(out)
115        return out
116
117
118class TransformerBlock(nn.Module):
119
120    def __init__(self, num_heads: int):
121        super().__init__()
122        self.d_model = d_model
123        self.context_length = context_length
124        self.head_size = d_model // num_heads  # head size should be divisible by d_model
125        self.num_heads = num_heads
126        self.dropout = dropout
127
128        self.multi_head_attention_layer = MultiHeadAttention(head_size=self.head_size)
129        self.feed_forward_layer = FeedForward()
130        self.layer_norm_1 = nn.LayerNorm(normalized_shape=self.d_model)
131        self.layer_norm_2 = nn.LayerNorm(normalized_shape=self.d_model)
132
133    def forward(self, x):
134        # Note: The order of the operations is different from the original Transformer paper
135        # The order here is: LayerNorm -> Multi-head attention -> LayerNorm -> Feed forward
136        x = x + self.multi_head_attention_layer(self.layer_norm_1(x))  # Residual connection
137        x = x + self.feed_forward_layer(self.layer_norm_2(x))  # Residual connection
138        return x
139
140
141class TransformerLanguageModel(nn.Module):
142    def __init__(self):
143        super().__init__()
144        self.d_model = d_model
145        self.context_length = context_length
146        self.num_heads = num_heads
147        self.num_blocks = num_blocks
148        self.dropout = dropout
149        self.max_token_value = max_token_value
150        # Set up token embedding look-up table
151        self.token_embedding_lookup_table = nn.Embedding(num_embeddings=self.max_token_value + 1, embedding_dim=self.d_model)
152
153        # Run all the transformer blocks
154        # Different from original paper, here we add a final layer norm after all the blocks
155        self.transformer_blocks = nn.Sequential(*(
156                [TransformerBlock(num_heads=self.num_heads) for _ in range(self.num_blocks)] +
157                [nn.LayerNorm(self.d_model)]
158        ))
159        self.language_model_out_linear_layer = nn.Linear(in_features=self.d_model, out_features=self.max_token_value)
160
161    def forward(self, idx, targets=None):
162        B, T = idx.shape
163        """
164        # Set up position embedding look-up table
165        # following the same approach as the original Transformer paper (Sine and Cosine functions)
166        """
167        position_encoding_lookup_table = torch.zeros(self.context_length, self.d_model)
168        position = torch.arange(0, self.context_length, dtype=torch.float).unsqueeze(1)
169        div_term = torch.exp(torch.arange(0, self.d_model, 2).float() * (-math.log(10000.0) / self.d_model))
170        position_encoding_lookup_table[:, 0::2] = torch.sin(position * div_term)
171        position_encoding_lookup_table[:, 1::2] = torch.cos(position * div_term)
172        # change position_encoding_lookup_table from (context_length, d_model) to (T, d_model)
173        position_embedding = position_encoding_lookup_table[:T, :].to(device)
174        x = self.token_embedding_lookup_table(idx) + position_embedding
175        x = self.transformer_blocks(x)
176        # The "logits" are the output values of our model before applying softmax
177        logits = self.language_model_out_linear_layer(x)
178
179        if targets is not None:
180            B, T, C = logits.shape
181            logits_reshaped = logits.view(B * T, C)
182            targets_reshaped = targets.view(B * T)
183            loss = F.cross_entropy(input=logits_reshaped, target=targets_reshaped)
184        else:
185            loss = None
186        return logits, loss
187
188    def generate(self, idx, max_new_tokens):
189        # idx is (B,T) array of indices in the current context
190        for _ in range(max_new_tokens):
191            # Crop idx to the max size of our positional embeddings table
192            idx_crop = idx[:, -self.context_length:]
193            # Get predictions
194            logits, loss = self(idx_crop)
195            # Get the last time step from logits where the dimensions of the logits are (B,T,C)
196            logits_last_timestep = logits[:, -1, :]
197            # Apply softmax to get probabilities
198            probs = F.softmax(input=logits_last_timestep, dim=-1)
199            # Sample from the probabilities' distribution.
200            idx_next = torch.multinomial(input=probs, num_samples=1)
201            # Append the sampled indexes idx_next to idx
202            idx = torch.cat((idx, idx_next), dim=1)
203        return idx
204
205
206# Initialize the model
207model = TransformerLanguageModel()
208model = model.to(device)
209
210
211# Get input embedding batch
212def get_batch(split: str):
213    data = train_data if split == 'train' else val_data
214    idxs = torch.randint(low=0, high=len(data) - context_length, size=(batch_size,))
215    x = torch.stack([data[idx:idx + context_length] for idx in idxs]).to(device)
216    y = torch.stack([data[idx + 1:idx + context_length + 1] for idx in idxs]).to(device)
217    return x, y
218
219
220# Calculate loss
221@torch.no_grad()
222def estimate_loss():
223    out = {}
224    model.eval()
225    for split in ['train', 'valid']:
226        losses = torch.zeros(eval_iters)
227        for k in range(eval_iters):
228            x_batch, y_batch = get_batch(split)
229            logits, loss = model(x_batch, y_batch)
230            losses[k] = loss.item()
231        out[split] = losses.mean()
232    model.train()
233    return out
234
235
236# Use AdamW optimizer
237optimizer = torch.optim.AdamW(params=model.parameters(), lr=learning_rate)
238tracked_losses = list()
239for step in range(max_iters):
240    if step % eval_iters == 0 or step == max_iters - 1:
241        losses = estimate_loss()
242        tracked_losses.append(losses)
243        print('Step:', step, 'Training Loss:', round(losses['train'].item(), 3), 'Validation Loss:',
244              round(losses['valid'].item(), 3))
245
246    xb, yb = get_batch('train')
247    logits, loss = model(xb, yb)
248    optimizer.zero_grad(set_to_none=True)
249    loss.backward()
250    optimizer.step()
251
252# Save the model state dictionary
253torch.save(model.state_dict(), 'model-ckpt.pt')
254
255# Generate
256model.eval()
257start = 'The salesperson'
258start_ids = encoding.encode(start)
259x = (torch.tensor(start_ids, dtype=torch.long, device=device)[None, ...])
260y = model.generate(x, max_new_tokens=100)
261print('---------------')
262print(encoding.decode(y[0].tolist()))
263print('---------------')
264
1Step: 0 Training Loss: 11.663 Validation Loss: 11.716
2Step: 20 Training Loss: 10.297 Validation Loss: 10.478
3Step: 40 Training Loss: 8.867 Validation Loss: 9.022
4Step: 60 Training Loss: 7.346 Validation Loss: 7.613
5Step: 80 Training Loss: 6.878 Validation Loss: 7.297
6Step: 100 Training Loss: 6.659 Validation Loss: 7.208
7Step: 120 Training Loss: 6.544 Validation Loss: 7.104
8Step: 140 Training Loss: 6.325 Validation Loss: 7.199
9Step: 160 Training Loss: 6.34 Validation Loss: 6.684
10Step: 180 Training Loss: 6.154 Validation Loss: 6.89
11Step: 200 Training Loss: 6.202 Validation Loss: 6.673
12Step: 220 Training Loss: 6.045 Validation Loss: 6.761
13Step: 240 Training Loss: 5.871 Validation Loss: 6.497
14Step: 260 Training Loss: 5.957 Validation Loss: 6.347
15Step: 280 Training Loss: 5.679 Validation Loss: 6.389
16Step: 300 Training Loss: 5.816 Validation Loss: 6.603
17Step: 320 Training Loss: 5.415 Validation Loss: 6.496
18Step: 340 Training Loss: 5.32 Validation Loss: 6.1
19Step: 360 Training Loss: 5.206 Validation Loss: 6.222
20Step: 380 Training Loss: 5.403 Validation Loss: 6.451
21Step: 400 Training Loss: 5.317 Validation Loss: 5.937
22Step: 420 Training Loss: 5.159 Validation Loss: 6.033
23Step: 440 Training Loss: 5.153 Validation Loss: 6.333
24Step: 460 Training Loss: 5.232 Validation Loss: 6.001
25Step: 480 Training Loss: 5.126 Validation Loss: 6.067
26Step: 500 Training Loss: 5.123 Validation Loss: 6.044
27Step: 520 Training Loss: 4.966 Validation Loss: 5.676
28Step: 540 Training Loss: 4.774 Validation Loss: 6.023
29Step: 560 Training Loss: 4.792 Validation Loss: 6.079
30Step: 580 Training Loss: 4.743 Validation Loss: 5.722
31Step: 600 Training Loss: 4.818 Validation Loss: 5.686
32Step: 620 Training Loss: 4.675 Validation Loss: 5.741
33Step: 640 Training Loss: 4.805 Validation Loss: 6.014
34Step: 660 Training Loss: 4.81 Validation Loss: 5.758
35Step: 680 Training Loss: 4.727 Validation Loss: 5.723
36Step: 700 Training Loss: 4.737 Validation Loss: 5.792
37Step: 720 Training Loss: 4.609 Validation Loss: 5.761
38Step: 740 Training Loss: 5.018 Validation Loss: 5.705
39Step: 760 Training Loss: 4.906 Validation Loss: 5.721
40Step: 780 Training Loss: 4.791 Validation Loss: 5.779
41Step: 800 Training Loss: 4.467 Validation Loss: 5.881
42Step: 820 Training Loss: 4.443 Validation Loss: 5.502
43Step: 840 Training Loss: 4.567 Validation Loss: 5.832
44Step: 860 Training Loss: 4.577 Validation Loss: 5.956
45Step: 880 Training Loss: 4.55 Validation Loss: 5.583
46Step: 900 Training Loss: 4.478 Validation Loss: 5.465
47Step: 920 Training Loss: 4.237 Validation Loss: 5.674
48Step: 940 Training Loss: 4.462 Validation Loss: 5.427
49Step: 960 Training Loss: 4.323 Validation Loss: 5.632
50Step: 980 Training Loss: 4.323 Validation Loss: 5.711
51Step: 1000 Training Loss: 4.304 Validation Loss: 5.374
52Step: 1020 Training Loss: 4.295 Validation Loss: 5.597
53Step: 1040 Training Loss: 4.312 Validation Loss: 5.54
54Step: 1060 Training Loss: 4.351 Validation Loss: 5.456
55Step: 1080 Training Loss: 4.128 Validation Loss: 5.524
56Step: 1100 Training Loss: 4.285 Validation Loss: 5.44
57Step: 1120 Training Loss: 4.359 Validation Loss: 5.447
58Step: 1140 Training Loss: 4.276 Validation Loss: 5.527
59Step: 1160 Training Loss: 4.179 Validation Loss: 5.415
60Step: 1180 Training Loss: 4.057 Validation Loss: 5.42
61Step: 1200 Training Loss: 4.238 Validation Loss: 5.296
62Step: 1220 Training Loss: 3.979 Validation Loss: 5.535
63Step: 1240 Training Loss: 4.145 Validation Loss: 5.417
64Step: 1260 Training Loss: 4.093 Validation Loss: 5.34
65Step: 1280 Training Loss: 4.173 Validation Loss: 5.361
66Step: 1300 Training Loss: 3.876 Validation Loss: 5.449
67Step: 1320 Training Loss: 3.941 Validation Loss: 5.343
68Step: 1340 Training Loss: 4.172 Validation Loss: 5.335
69Step: 1360 Training Loss: 3.757 Validation Loss: 5.173
70Step: 1380 Training Loss: 4.106 Validation Loss: 5.207
71Step: 1400 Training Loss: 3.975 Validation Loss: 5.349
72Step: 1420 Training Loss: 4.11 Validation Loss: 5.224
73Step: 1440 Training Loss: 3.915 Validation Loss: 5.341
74Step: 1460 Training Loss: 4.05 Validation Loss: 5.302
75Step: 1480 Training Loss: 3.927 Validation Loss: 5.487
76Step: 1500 Training Loss: 3.952 Validation Loss: 5.191
77Step: 1520 Training Loss: 4.182 Validation Loss: 5.066
78Step: 1540 Training Loss: 3.851 Validation Loss: 5.205
79Step: 1560 Training Loss: 4.062 Validation Loss: 5.039
80Step: 1580 Training Loss: 3.848 Validation Loss: 4.952
81Step: 1600 Training Loss: 3.94 Validation Loss: 5.343
82Step: 1620 Training Loss: 3.78 Validation Loss: 5.243
83Step: 1640 Training Loss: 3.814 Validation Loss: 5.364
84Step: 1660 Training Loss: 3.979 Validation Loss: 5.25
85Step: 1680 Training Loss: 3.717 Validation Loss: 5.067
86Step: 1700 Training Loss: 3.681 Validation Loss: 5.574
87Step: 1720 Training Loss: 3.753 Validation Loss: 5.119
88Step: 1740 Training Loss: 3.584 Validation Loss: 5.335
89Step: 1760 Training Loss: 3.819 Validation Loss: 4.949
90Step: 1780 Training Loss: 3.823 Validation Loss: 4.921
91Step: 1800 Training Loss: 3.795 Validation Loss: 5.031
92Step: 1820 Training Loss: 3.54 Validation Loss: 5.292
93Step: 1840 Training Loss: 4.003 Validation Loss: 4.95
94Step: 1860 Training Loss: 3.759 Validation Loss: 4.86
95Step: 1880 Training Loss: 3.871 Validation Loss: 5.262
96Step: 1900 Training Loss: 3.791 Validation Loss: 4.975
97Step: 1920 Training Loss: 3.768 Validation Loss: 5.329
98Step: 1940 Training Loss: 3.689 Validation Loss: 5.011
99Step: 1960 Training Loss: 3.52 Validation Loss: 4.926
100Step: 1980 Training Loss: 3.648 Validation Loss: 5.128
101Step: 2000 Training Loss: 3.696 Validation Loss: 5.011
102Step: 2020 Training Loss: 3.756 Validation Loss: 5.086
103Step: 2040 Training Loss: 3.835 Validation Loss: 4.961
104Step: 2060 Training Loss: 3.626 Validation Loss: 5.27
105Step: 2080 Training Loss: 3.751 Validation Loss: 5.27
106Step: 2100 Training Loss: 3.856 Validation Loss: 4.967
107Step: 2120 Training Loss: 3.76 Validation Loss: 4.968
108Step: 2140 Training Loss: 3.678 Validation Loss: 4.971
109Step: 2160 Training Loss: 3.759 Validation Loss: 4.821
110Step: 2180 Training Loss: 3.504 Validation Loss: 5.243
111Step: 2200 Training Loss: 3.85 Validation Loss: 5.345
112Step: 2220 Training Loss: 3.74 Validation Loss: 5.287
113Step: 2240 Training Loss: 3.66 Validation Loss: 5.219
114Step: 2260 Training Loss: 3.684 Validation Loss: 5.101
115Step: 2280 Training Loss: 3.523 Validation Loss: 4.998
116Step: 2300 Training Loss: 3.628 Validation Loss: 5.237
117Step: 2320 Training Loss: 3.545 Validation Loss: 5.442
118Step: 2340 Training Loss: 3.428 Validation Loss: 5.192
119Step: 2360 Training Loss: 3.658 Validation Loss: 5.11
120Step: 2380 Training Loss: 3.592 Validation Loss: 5.14
121Step: 2400 Training Loss: 3.573 Validation Loss: 5.069
122Step: 2420 Training Loss: 3.414 Validation Loss: 4.745
123Step: 2440 Training Loss: 3.459 Validation Loss: 5.28
124Step: 2460 Training Loss: 3.678 Validation Loss: 5.044
125Step: 2480 Training Loss: 3.409 Validation Loss: 4.935
126Step: 2500 Training Loss: 3.484 Validation Loss: 5.054
127Step: 2520 Training Loss: 3.659 Validation Loss: 5.335
128Step: 2540 Training Loss: 3.423 Validation Loss: 5.333
129Step: 2560 Training Loss: 3.57 Validation Loss: 5.237
130Step: 2580 Training Loss: 3.57 Validation Loss: 4.961
131Step: 2600 Training Loss: 3.67 Validation Loss: 5.023
132Step: 2620 Training Loss: 3.451 Validation Loss: 4.958
133Step: 2640 Training Loss: 3.542 Validation Loss: 5.144
134Step: 2660 Training Loss: 3.474 Validation Loss: 5.076
135Step: 2680 Training Loss: 3.482 Validation Loss: 4.937
136Step: 2700 Training Loss: 3.428 Validation Loss: 5.087
137Step: 2720 Training Loss: 3.377 Validation Loss: 5.171
138Step: 2740 Training Loss: 3.404 Validation Loss: 4.779
139Step: 2760 Training Loss: 3.2 Validation Loss: 5.077
140Step: 2780 Training Loss: 3.28 Validation Loss: 5.184
141Step: 2800 Training Loss: 3.138 Validation Loss: 5.165
142Step: 2820 Training Loss: 3.374 Validation Loss: 5.091
143Step: 2840 Training Loss: 3.29 Validation Loss: 5.2
144Step: 2860 Training Loss: 3.375 Validation Loss: 5.022
145Step: 2880 Training Loss: 3.45 Validation Loss: 4.919
146Step: 2900 Training Loss: 3.465 Validation Loss: 5.134
147Step: 2920 Training Loss: 3.457 Validation Loss: 5.227
148Step: 2940 Training Loss: 3.322 Validation Loss: 4.94
149Step: 2960 Training Loss: 3.203 Validation Loss: 5.068
150Step: 2980 Training Loss: 3.372 Validation Loss: 4.924
151Step: 3000 Training Loss: 3.512 Validation Loss: 5.071
152Step: 3020 Training Loss: 3.469 Validation Loss: 4.782
153Step: 3040 Training Loss: 3.343 Validation Loss: 5.275
154Step: 3060 Training Loss: 3.201 Validation Loss: 4.854
155Step: 3080 Training Loss: 3.313 Validation Loss: 5.037
156Step: 3100 Training Loss: 3.41 Validation Loss: 4.707
157Step: 3120 Training Loss: 3.201 Validation Loss: 5.013
158Step: 3140 Training Loss: 3.344 Validation Loss: 4.895
159Step: 3160 Training Loss: 3.307 Validation Loss: 4.915
160Step: 3180 Training Loss: 3.186 Validation Loss: 4.955
161Step: 3200 Training Loss: 3.262 Validation Loss: 5.005
162Step: 3220 Training Loss: 3.331 Validation Loss: 4.845
163Step: 3240 Training Loss: 3.301 Validation Loss: 5.017
164Step: 3260 Training Loss: 3.529 Validation Loss: 4.58
165Step: 3280 Training Loss: 3.269 Validation Loss: 4.887
166Step: 3300 Training Loss: 3.1 Validation Loss: 5.046
167Step: 3320 Training Loss: 3.239 Validation Loss: 4.825
168Step: 3340 Training Loss: 3.341 Validation Loss: 5.413
169Step: 3360 Training Loss: 3.288 Validation Loss: 4.929
170Step: 3380 Training Loss: 3.315 Validation Loss: 5.259
171Step: 3400 Training Loss: 3.19 Validation Loss: 4.979
172Step: 3420 Training Loss: 3.237 Validation Loss: 5.082
173Step: 3440 Training Loss: 3.168 Validation Loss: 5.336
174Step: 3460 Training Loss: 3.305 Validation Loss: 5.259
175Step: 3480 Training Loss: 3.142 Validation Loss: 4.798
176Step: 3500 Training Loss: 3.179 Validation Loss: 5.061
177Step: 3520 Training Loss: 3.238 Validation Loss: 5.056
178Step: 3540 Training Loss: 3.171 Validation Loss: 4.955
179Step: 3560 Training Loss: 3.141 Validation Loss: 4.828
180Step: 3580 Training Loss: 3.154 Validation Loss: 4.858
181Step: 3600 Training Loss: 3.245 Validation Loss: 5.185
182Step: 3620 Training Loss: 3.076 Validation Loss: 4.518
183Step: 3640 Training Loss: 3.208 Validation Loss: 4.755
184Step: 3660 Training Loss: 3.343 Validation Loss: 4.94
185Step: 3680 Training Loss: 3.109 Validation Loss: 4.749
186Step: 3700 Training Loss: 3.137 Validation Loss: 4.929
187Step: 3720 Training Loss: 3.105 Validation Loss: 4.806
188Step: 3740 Training Loss: 3.053 Validation Loss: 4.917
189Step: 3760 Training Loss: 3.379 Validation Loss: 4.991
190Step: 3780 Training Loss: 3.278 Validation Loss: 5.268
191Step: 3800 Training Loss: 3.11 Validation Loss: 5.2
192Step: 3820 Training Loss: 3.049 Validation Loss: 5.134
193Step: 3840 Training Loss: 3.182 Validation Loss: 4.849
194Step: 3860 Training Loss: 2.989 Validation Loss: 5.004
195Step: 3880 Training Loss: 3.27 Validation Loss: 4.796
196Step: 3900 Training Loss: 3.007 Validation Loss: 4.805
197Step: 3920 Training Loss: 3.151 Validation Loss: 4.856
198Step: 3940 Training Loss: 3.125 Validation Loss: 4.832
199Step: 3960 Training Loss: 3.058 Validation Loss: 4.629
200Step: 3980 Training Loss: 3.031 Validation Loss: 4.963
201Step: 4000 Training Loss: 3.118 Validation Loss: 4.976
202Step: 4020 Training Loss: 3.152 Validation Loss: 4.949
203Step: 4040 Training Loss: 3.049 Validation Loss: 5.054
204Step: 4060 Training Loss: 3.065 Validation Loss: 5.069
205Step: 4080 Training Loss: 3.193 Validation Loss: 5.184
206Step: 4100 Training Loss: 2.92 Validation Loss: 5.0
207Step: 4120 Training Loss: 3.167 Validation Loss: 4.822
208Step: 4140 Training Loss: 3.117 Validation Loss: 4.895
209Step: 4160 Training Loss: 3.153 Validation Loss: 5.004
210Step: 4180 Training Loss: 3.213 Validation Loss: 4.874
211Step: 4200 Training Loss: 2.952 Validation Loss: 4.93
212Step: 4220 Training Loss: 3.089 Validation Loss: 5.009
213Step: 4240 Training Loss: 2.934 Validation Loss: 5.001
214Step: 4260 Training Loss: 3.035 Validation Loss: 5.085
215Step: 4280 Training Loss: 2.786 Validation Loss: 4.974
216Step: 4300 Training Loss: 3.009 Validation Loss: 4.948
217Step: 4320 Training Loss: 2.893 Validation Loss: 5.033
218Step: 4340 Training Loss: 2.859 Validation Loss: 4.889
219Step: 4360 Training Loss: 3.022 Validation Loss: 4.746
220Step: 4380 Training Loss: 2.983 Validation Loss: 5.146
221Step: 4400 Training Loss: 3.125 Validation Loss: 4.891
222Step: 4420 Training Loss: 3.003 Validation Loss: 5.253
223Step: 4440 Training Loss: 2.952 Validation Loss: 5.039
224Step: 4460 Training Loss: 3.043 Validation Loss: 4.736
225Step: 4480 Training Loss: 2.811 Validation Loss: 5.291
226Step: 4500 Training Loss: 2.927 Validation Loss: 4.883
227Step: 4520 Training Loss: 2.983 Validation Loss: 4.685
228Step: 4540 Training Loss: 3.092 Validation Loss: 4.898
229Step: 4560 Training Loss: 3.034 Validation Loss: 4.876
230Step: 4580 Training Loss: 3.036 Validation Loss: 5.188
231Step: 4600 Training Loss: 2.715 Validation Loss: 4.858
232Step: 4620 Training Loss: 3.009 Validation Loss: 5.125
233Step: 4640 Training Loss: 2.923 Validation Loss: 4.92
234Step: 4660 Training Loss: 2.869 Validation Loss: 4.923
235Step: 4680 Training Loss: 2.809 Validation Loss: 5.075
236Step: 4700 Training Loss: 3.002 Validation Loss: 5.103
237Step: 4720 Training Loss: 2.921 Validation Loss: 5.054
238Step: 4740 Training Loss: 2.81 Validation Loss: 5.074
239Step: 4760 Training Loss: 2.951 Validation Loss: 5.228
240Step: 4780 Training Loss: 2.919 Validation Loss: 4.913
241Step: 4800 Training Loss: 2.953 Validation Loss: 5.215
242Step: 4820 Training Loss: 3.022 Validation Loss: 4.832
243Step: 4840 Training Loss: 2.766 Validation Loss: 5.119
244Step: 4860 Training Loss: 2.898 Validation Loss: 5.103
245Step: 4880 Training Loss: 2.977 Validation Loss: 4.885
246Step: 4900 Training Loss: 3.036 Validation Loss: 5.128
247Step: 4920 Training Loss: 2.913 Validation Loss: 4.799
248Step: 4940 Training Loss: 2.966 Validation Loss: 4.863
249Step: 4960 Training Loss: 2.723 Validation Loss: 4.828
250Step: 4980 Training Loss: 2.752 Validation Loss: 4.666
251Step: 4999 Training Loss: 2.828 Validation Loss: 5.13
252---------------
253The salesperson, the customer, the salesperson can effectively gather information, and ultimately increasing the likelihood of the sale.
2541. Be mindful of reinforcing of-ended questions are identified persuasive and manipulative use can significantly impact, demonstrating genuine interest requires a deeper level that we have successfully suit their approach to share their responses. Some customers, while some the root cause for them, showcasing patterns, sales professionals can meet their concerns and increase their requirements.
255When faced with your product or service is not just about attacks but crafted situations
256---------------

在线查看或执行:colab.research.google.com/drive/1hvgn…

原创声明:本文为本人原创作品,首发于AI ONES wuxiongwei.com,如果转载,请保留本文链接,谢谢。