ai的学

39 阅读1分钟

embeding技术(个人感觉是降维手段)

Embedding在大语言模型中的主要应用有:

  1. 作为 Embedding 层嵌入到大语言模型中,实现将高维稀疏特征到低维稠密特征的转换(如 Wide&Deep、DeepFM 等模型);
  2. 作为预训练的 Embedding 特征向量,与其他特征向量拼接后,一同作为大语言模型输入进行训练(如 FNN)。
import pandas as pd # dataframe manipulation
import numpy as np # linear algebra
from sentence_transformers import SentenceTransformer
df = pd.read_csv("data/train.csv", sep = ";")
# -------------------- First Step --------------------
def compile_text(x):
text =f"""Age: {x['age']}, 
housing load:{x['housing']}, 
Job:{x['job']}, 
Marital:{x['marital']}, 
Education:{x['education']}, 
Default:{x['default']}, 
Balance:{x['balance']}, 
Personal loan:{x['loan']}, 
contact:{x['contact']}
"""
return text
sentences = df.apply(lambda x: compile_text(x), axis=1).tolist()
# -------------------- Second Step --------------------
model = SentenceTransformer(r"sentence-transformers/paraphrase-MiniLM-L6-v2")
output = model.encode(sentences=sentences,
show_progress_bar=True,
normalize_embeddings=True)
df_embedding = pd.DataFrame(output)
df_embedding