embeding技术(个人感觉是降维手段)
Embedding在大语言模型中的主要应用有:
- 作为 Embedding 层嵌入到大语言模型中,实现将高维稀疏特征到低维稠密特征的转换(如 Wide&Deep、DeepFM 等模型);
- 作为预训练的 Embedding 特征向量,与其他特征向量拼接后,一同作为大语言模型输入进行训练(如 FNN)。
import pandas as pd # dataframe manipulation
import numpy as np # linear algebra
from sentence_transformers import SentenceTransformer
df = pd.read_csv("data/train.csv", sep = ";")
# -------------------- First Step --------------------
def compile_text(x):
text =f"""Age: {x['age']},
housing load:{x['housing']},
Job:{x['job']},
Marital:{x['marital']},
Education:{x['education']},
Default:{x['default']},
Balance:{x['balance']},
Personal loan:{x['loan']},
contact:{x['contact']}
"""
return text
sentences = df.apply(lambda x: compile_text(x), axis=1).tolist()
# -------------------- Second Step --------------------
model = SentenceTransformer(r"sentence-transformers/paraphrase-MiniLM-L6-v2")
output = model.encode(sentences=sentences,
show_progress_bar=True,
normalize_embeddings=True)
df_embedding = pd.DataFrame(output)
df_embedding