继舆情系统的设计之后,我们需要基于其舆情数据进行销量预测和智能补货的数据建模。于是有此系列
舆情系列:
目录
系统概述
1.1 核心问题
大宗商品供应链企业面临的库存管理困境:
| 问题维度 | 表现形式 | 影响 | 预期改善 |
|---|---|---|---|
| 需求波动 | 订单量剧烈变化 (+50% ~ -30%) | 库存积压或缺货 | 预测精度>85% |
| 舆情驱动 | 地缘风险、供给中断导致采购激增 | 被动应对,库存失控 | 提前7天预警 |
| 价格波动 | 商品期货价格日波幅10%+ | 定价滞后,利润被侵蚀 | 收益提升12-18% |
| 供应链断裂 | 港口罢工、制裁导致补货延迟 | 应急成本高、计划无效 | 多源补货,成本-8% |
1.2 系统架构概览
舆情指标输入 ──┐
历史销量数据 ──┤
商品期货价格 ──┼──→ [销量预测模型] ──→ [预测订单]
库存水位 ─────┤ (XGBoost)
天气/季节性 ──┘
预测订单 ─┐
库存成本 ─┼──→ [智能补货决策] ──→ [采购方案]
供应约束 ─┤ (动态规划) [供应商选择]
└──→ [动态定价引擎] ──→ [价格策略]
(强化学习) [收益优化]
输出: Doris决策支持 → BI仪表板 → 交易/物流系统
需求分析与商业价值
2.1 与舆情系统的数据融合
从MGIR系统获取的关键指标用于预测补正:
# 舆情风险指标对库存决策的影响
舆情输入:
├─ LDI (物流延迟指数) 0-10
│ └─ 当LDI>7: 建议增加安全库存 +30%
│ └─ 供应商交期 +5-10天
│
├─ CRI (合规风险指数) 0-10
│ └─ 当CRI>6: 激活多源补货策略
│ └─ 关税成本 +2-8%
│
├─ GRI (地缘政治指数) 0-10
│ └─ 当GRI>7且持续: 预判采购商囤积行为
│ └─ 需求激增预测 +15-40%
│
└─ FRI (金融风险指数) 0-10
└─ 当FRI>7: 供应商信用评估下调
└─ 结算周期风险 +20%
预测修正公式:
预测需求_修正 = 基础预测 × (1 + ∑威胁系数) × (1 - 机会抵消)
威胁系数计算:
= (LDI/10) × 0.15 + (CRI/10) × 0.12 + (GRI/10) × 0.20 + (FRI/10) × 0.08
范围: 0-0.55 (55%的需求增长可能)
2.2 商业价值量化
投资成本:
├─ 基础设施 (GPU服务器, 存储): ¥150万
├─ 人力 (数据科学家, 工程师): ¥100万/年
└─ 维护运营: ¥50万/年
总投资: ¥300万 (第一年)
预期收益 (年度):
├─ 库存优化
│ ├─ 减少积压成本: ¥500万 (库存周转期↓20%)
│ ├─ 降低缺货损失: ¥300万 (缺货率↓25%)
│ └─ 资金释放: ¥1000万 (库存资金↓15%)
│
├─ 定价优化
│ ├─ 动态定价收益: ¥800万 (毛利↑8%)
│ └─ 需求侧管理: ¥400万 (高价期销量↑)
│
└─ 供应链效率
├─ 采购成本节省: ¥300万 (供应商竞争)
└─ 仓储运输优化: ¥200万
总收益: ¥3500万/年
ROI: 11.7倍 (第二年ROI 12倍+)
销量预测模型
3.1 预测架构
输入特征矩阵:
├─ 时间特征 (8维)
│ ├─ 日期、星期几、月份、季度
│ ├─ 是否假期、假期前后天数
│ ├─ 年度日期序号、周数
│
├─ 历史销量特征 (12维)
│ ├─ 过去7/14/30天平均销量
│ ├─ 同比增长率 (YoY)
│ ├─ 环比增长率 (MoM)
│ ├─ 销量方差、峰值、谷值
│ ├─ 销量趋势 (线性回归斜率)
│
├─ 库存特征 (6维)
│ ├─ 当前库存量、库存率 (库存/日均需求)
│ ├─ 库存龄 (最旧商品存放天数)
│ ├─ 库存周转率、超期库存率
│
├─ 价格特征 (5维)
│ ├─ 商品期货价格、价格涨跌幅
│ ├─ 价格同比、近30天波幅
│ ├─ 本公司当前报价与市场价差
│
├─ 舆情特征 (8维) ⭐ 关键
│ ├─ LDI (物流延迟指数)
│ ├─ CRI (合规风险指数)
│ ├─ GRI (地缘政治指数)
│ ├─ FRI (金融风险指数)
│ ├─ 近7天新增高风险事件数
│ ├─ 风险趋势 (风险指数变化率)
│ ├─ 需求侧情感指数 (采购商社交媒体/邮件情感)
│ ├─ 舆情突发指标 (新闻事件数/异常度)
│
├─ 市场竞争特征 (4维)
│ ├─ 竞争对手价格、库存水位
│ ├─ 市场集中度、供应充分率
│
└─ 采购商维度特征 (6维)
├─ 采购商类型、规模、地理位置
├─ 采购商历史订单行为
├─ 采购商风险承受度、信用等级
3.2 XGBoost模型设计
import xgboost as xgb
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import TimeSeriesSplit
import warnings
warnings.filterwarnings('ignore')
class DemandForecastingModel:
"""
基于XGBoost的大宗商品销量预测模型
融合舆情数据的需求预测能力
"""
def __init__(self, commodity_type: str, region: str):
self.commodity_type = commodity_type # 钢材、石油、谷物等
self.region = region # 采购商地区
self.model = None
self.scaler = StandardScaler()
self.feature_importance = None
def prepare_features(self, data_df: pd.DataFrame) -> pd.DataFrame:
"""
特征工程: 从原始数据生成模型输入特征
输入:
data_df: 包含销量、价格、舆情等时间序列数据
columns: [date, sales, inventory, price, ldi, cri, gri, fri, ...]
输出:
feature_matrix: (样本数, 43维特征)
"""
df = data_df.copy().sort_values('date').reset_index(drop=True)
# ===== 时间特征 =====
df['date'] = pd.to_datetime(df['date'])
df['dayofweek'] = df['date'].dt.dayofweek
df['month'] = df['date'].dt.month
df['quarter'] = df['date'].dt.quarter
df['dayofyear'] = df['date'].dt.dayofyear
df['weekofyear'] = df['date'].dt.isocalendar().week
# 假期标记 (示例: 中国假期)
holiday_dates = [
'2025-01-29', '2025-01-30', '2025-01-31', '2025-02-01', '2025-02-02', # 春节
'2025-04-04', '2025-04-05', '2025-04-06', # 清明
'2025-06-10', # 端午
'2025-09-17', # 中秋
'2025-10-01', '2025-10-02', '2025-10-03', '2025-10-04', '2025-10-05', '2025-10-06', '2025-10-07' # 国庆
]
df['is_holiday'] = df['date'].isin(pd.to_datetime(holiday_dates)).astype(int)
df['days_to_holiday'] = df['date'].apply(
lambda x: min([abs((x - pd.to_datetime(h)).days) for h in holiday_dates]) if holiday_dates else 365
)
# ===== 历史销量特征 (滚动统计) =====
for window in [7, 14, 30]:
df[f'sales_ma_{window}'] = df['sales'].rolling(window=window, min_periods=1).mean()
df[f'sales_std_{window}'] = df['sales'].rolling(window=window, min_periods=1).std().fillna(0)
df[f'sales_pct_change_{window}'] = df['sales'].pct_change(periods=window).fillna(0)
# 同比增长 (与1年前对比)
if len(df) > 365:
df['yoy_growth'] = (df['sales'] - df['sales'].shift(365)) / (df['sales'].shift(365) + 1)
df['yoy_growth'] = df['yoy_growth'].fillna(0)
else:
df['yoy_growth'] = 0
# 趋势特征 (近30天销量的线性趋势)
def calc_trend(x):
if len(x) < 2:
return 0
y = np.array(x)
x_range = np.arange(len(y))
z = np.polyfit(x_range, y, 1)
return z[0] # 斜率
df['sales_trend_30'] = df['sales'].rolling(window=30, min_periods=2).apply(
lambda x: calc_trend(x), raw=False
).fillna(0)
# ===== 库存特征 =====
df['inventory_pct'] = df['inventory'] / (df['sales_ma_30'] + 1) # 库存占需求天数
df['inventory_age_days'] = df['inventory'].rolling(window=30, min_periods=1).apply(
lambda x: np.sum(x) / (np.sum(x) + 0.01) if np.sum(x) > 0 else 0
)
df['inventory_turnover_30'] = df['sales_ma_30'] / (df['inventory'] + 1)
# ===== 价格特征 =====
df['price_pct_change'] = df['price'].pct_change().fillna(0)
df['price_ma_7'] = df['price'].rolling(window=7).mean().fillna(method='bfill')
df['price_volatility_30'] = df['price'].rolling(window=30).std().fillna(0)
# 本公司报价与市场价差 (假设有benchmark_price列)
if 'benchmark_price' in df.columns:
df['price_vs_market'] = (df['price'] - df['benchmark_price']) / (df['benchmark_price'] + 1)
else:
df['price_vs_market'] = 0
# ===== 舆情特征 (MGIR指标) ⭐ =====
for col in ['ldi', 'cri', 'gri', 'fri']:
if col in df.columns:
# 原始指标值
df[f'{col}_normalized'] = df[col] / 10.0 # 归一化到0-1
# 指标变化率
df[f'{col}_change'] = df[col].diff().fillna(0)
# 高风险事件触发 (指标突增)
df[f'{col}_spike'] = (df[f'{col}_change'] > df[f'{col}_change'].std() * 2).astype(int)
# 舆情综合压力指数 (权重组合)
if all(col in df.columns for col in ['ldi', 'cri', 'gri', 'fri']):
df['sentiment_pressure_index'] = (
0.25 * (df['ldi'] / 10) + # 物流延迟影响采购延迟
0.20 * (df['cri'] / 10) + # 合规风险影响成本
0.35 * (df['gri'] / 10) + # 地缘政治驱动囤积行为 (权重最高)
0.20 * (df['fri'] / 10) # 金融风险影响订单确定性
)
else:
df['sentiment_pressure_index'] = 0
# 舆情驱动的需求异常指标
if 'gri' in df.columns:
df['demand_surge_signal'] = (df['gri'] > 6).astype(int)
# ===== 市场特征 =====
if 'competitor_price' in df.columns:
df['price_competitiveness'] = (df['competitor_price'] - df['price']) / (df['competitor_price'] + 1)
# ===== 采购商维度特征 =====
if 'buyer_segment' in df.columns: # 采购商等级: A/B/C
buyer_multipliers = {'A': 1.2, 'B': 1.0, 'C': 0.8}
df['buyer_segment_weight'] = df['buyer_segment'].map(buyer_multipliers)
# ===== 填补缺失值 =====
df = df.fillna(method='bfill').fillna(method='ffill').fillna(0)
return df
def train(self, df: pd.DataFrame, target_col: str = 'sales',
test_size: int = 90):
"""
时间序列交叉验证训练
参数:
df: 特征化后的数据框
target_col: 目标列名
test_size: 测试集大小 (天数)
"""
feature_cols = [col for col in df.columns if col not in ['date', 'sales', 'price', 'inventory']]
X = df[feature_cols].values
y = df[target_col].values
# 标准化特征
X_scaled = self.scaler.fit_transform(X)
# 时间序列交叉验证 (避免数据泄漏)
tscv = TimeSeriesSplit(n_splits=5)
cv_scores = []
for train_idx, val_idx in tscv.split(X_scaled):
X_train, X_val = X_scaled[train_idx], X_scaled[val_idx]
y_train, y_val = y[train_idx], y[val_idx]
# XGBoost配置 (针对回归任务优化)
params = {
'objective': 'reg:squarederror',
'booster': 'gbtree',
'max_depth': 7,
'eta': 0.1,
'subsample': 0.8,
'colsample_bytree': 0.8,
'min_child_weight': 5,
'lambda': 1.0, # L2正则
'alpha': 0.1, # L1正则
'tree_method': 'gpu_hist', # GPU加速
'gpu_id': 0,
'random_state': 42
}
# 训练
dtrain = xgb.DMatrix(X_train, label=y_train)
dval = xgb.DMatrix(X_val, label=y_val)
model = xgb.train(
params,
dtrain,
num_boost_round=200,
evals=[(dval, 'validation')],
early_stopping_rounds=20,
verbose_eval=False
)
# 验证集评估
y_pred = model.predict(dval)
mae = np.mean(np.abs(y_pred - y_val))
mape = np.mean(np.abs((y_pred - y_val) / (y_val + 1)))
cv_scores.append({'mae': mae, 'mape': mape})
# 使用全量数据进行最终训练
X_train_full = X_scaled[:len(X_scaled) - test_size]
y_train_full = y[:len(y) - test_size]
dtrain_full = xgb.DMatrix(X_train_full, label=y_train_full)
self.model = xgb.train(
params,
dtrain_full,
num_boost_round=200,
verbose_eval=False
)
# 提取特征重要度
self.feature_importance = self.model.get_score(importance_type='weight')
print(f"Cross-validation Results:")
print(f" Avg MAE: {np.mean([s['mae'] for s in cv_scores]):.2f}")
print(f" Avg MAPE: {np.mean([s['mape'] for s in cv_scores]):.2%}")
return self
def predict(self, df: pd.DataFrame, forecast_days: int = 7) -> pd.DataFrame:
"""
预测未来N天的销量
参数:
df: 包含最新特征的数据框 (最后一行为当前日期)
forecast_days: 预测天数
返回:
forecast_df: 包含预测值和置信区间的数据框
"""
predictions = []
forecast_df = df.copy()
for _ in range(forecast_days):
# 准备预测特征
feature_cols = [col for col in df.columns if col not in ['date', 'sales', 'price', 'inventory']]
X_latest = forecast_df.iloc[-1:][feature_cols].values
X_scaled = self.scaler.transform(X_latest)
# 预测
dmatrix = xgb.DMatrix(X_scaled)
pred = self.model.predict(dmatrix)[0]
# 应用下界 (销量不能为负)
pred = max(0, pred)
predictions.append(pred)
# 更新下一天的特征 (用于滚动预测)
# 这里简化处理: 假设销量按预测值更新
new_row = forecast_df.iloc[-1].copy()
new_row['date'] = pd.to_datetime(new_row['date']) + pd.Timedelta(days=1)
new_row['sales'] = pred
# 更新滚动特征
new_row['sales_ma_7'] = forecast_df.tail(7)['sales'].mean()
forecast_df = pd.concat([forecast_df, pd.DataFrame([new_row])], ignore_index=True)
# 构建预测结果DataFrame
result = pd.DataFrame({
'forecast_date': pd.date_range(start=df['date'].iloc[-1] + pd.Timedelta(days=1), periods=forecast_days),
'predicted_sales': predictions,
'lower_bound': [max(0, p * 0.85) for p in predictions], # 下界置信区间
'upper_bound': [p * 1.15 for p in predictions], # 上界置信区间
'confidence': [0.85 + 0.1 * (1 - i / forecast_days) for i in range(forecast_days)] # 长期预测置信度递减
})
return result
def get_feature_importance_report(self, top_n: int = 15) -> pd.DataFrame:
"""
获取特征重要度报告 (用于理解模型决策)
"""
if self.feature_importance is None:
return pd.DataFrame()
importance_df = pd.DataFrame(
list(self.feature_importance.items()),
columns=['feature', 'importance']
).sort_values('importance', ascending=False).head(top_n)
# 标记舆情特征
importance_df['is_sentiment'] = importance_df['feature'].str.contains('ldi|cri|gri|fri|sentiment')
return importance_df
# ===== 使用示例 =====
if __name__ == '__main__':
# 加载数据 (包含销量、价格、舆情指标等)
df = pd.read_csv('/data/commodity_sales_with_sentiment.csv')
# 创建钢材华东地区的预测模型
model = DemandForecastingModel(commodity_type='steel', region='east_china')
# 特征工程
df_features = model.prepare_features(df)
# 训练模型
model.train(df_features, target_col='sales', test_size=90)
# 获取特征重要度
print("\n特征重要度 (Top 15):")
importance = model.get_feature_importance_report()
print(importance)
# 预测未来7天
forecast = model.predict(df_features, forecast_days=7)
print("\n未来7天销量预测:")
print(forecast)
3.3 模型精度与舆情敏感性分析
def analyze_sentiment_impact(model_output: pd.DataFrame,
sentiment_df: pd.DataFrame) -> dict:
"""
分析舆情特征对预测的影响程度
"""
# 基于特征重要度的舆情贡献度
importance = model.get_feature_importance_report(top_n=50)
sentiment_features = importance[importance['is_sentiment']]
total_importance = importance['importance'].sum()
sentiment_contribution = sentiment_features['importance'].sum() / total_importance
return {
'sentiment_contribution_ratio': sentiment_contribution, # 舆情特征的总重要度占比
'key_sentiment_features': sentiment_features.head(5).to_dict('records'),
'model_interpretation': f"""
模型中舆情特征 (LDI/CRI/GRI/FRI) 对销量预测的贡献度为 {sentiment_contribution:.1%}。
其中:
- GRI (地缘政治指数) 影响最大: 驱动采购商囤积行为
- LDI (物流延迟指数) 其次: 影响补货紧迫性
- CRI (合规风险指数) 影响成本结构
- FRI (金融风险指数) 影响订单确定性
当舆情压力指数 > 0.6 时,模型会自动调高需求预测 (+15% ~ +40%)
"""
}
智能补货决策引擎
4.1 补货模型架构
库存当前状态 ─┐
补货成本 ─────┼──→ [动态规划优化] ──→ [补货方案]
补货时间 ─────┤ (Integer LP) ├─ 补货数量
销量预测 ─────┤ ├─ 补货时机
供应约束 ─────┤ ├─ 供应商选择
舆情风险 ─────┘
目标函数:
min(库存成本 + 缺货成本 + 供应商多源溢价)
约束条件:
- 库存 ≥ 安全库存 (risk-adjusted)
- 补货量 ≥ MOQ (最小订购量)
- 交期约束 (考虑舆情延迟)
- 供应商容量约束
- 现金流约束 (支付能力)
4.2 PuLP实现
from pulp import *
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
class IntelligentReplenishmentEngine:
"""
智能补货决策引擎
- 整合舆情指标优化安全库存
- 多供应商模式应对断裂风险
- 考虑资金成本与缺货成本的权衡
"""
def __init__(self, commodity_code: str, buyer_geo_key: str):
self.commodity_code = commodity_code # 商品代码
self.buyer_geo_key = buyer_geo_key # 采购商地理位置
self.problem = None
self.decision_vars = {}
def calculate_safety_stock(self,
avg_demand: float,
demand_std: float,
lead_time_days: int,
service_level: float = 0.95,
sentiment_indices: dict = None) -> float:
"""
计算风险调整后的安全库存
基础安全库存公式 (EOQ变体):
SS = Z_α × σ_demand × sqrt(L)
舆情调整:
- LDI增高 → lead_time增加 → SS↑
- GRI增高 → 需求波动增加 → SS↑
- CRI增高 → 供应不确定↑ → SS↑
"""
# 服务水平对应的标准差倍数
z_scores = {0.90: 1.28, 0.95: 1.645, 0.99: 2.33}
z = z_scores.get(service_level, 1.645)
# 基础安全库存
base_ss = z * demand_std * np.sqrt(lead_time_days)
# 舆情调整因子
adjustment_factor = 1.0
if sentiment_indices:
# LDI增加的配送时间
if 'ldi' in sentiment_indices:
ldi_adjusted_lead_time = lead_time_days * (1 + sentiment_indices['ldi'] / 10 * 0.3)
base_ss = z * demand_std * np.sqrt(ldi_adjusted_lead_time)
# GRI增加的需求不确定性
if 'gri' in sentiment_indices:
gri_factor = sentiment_indices['gri'] / 10 # 0-1
demand_std_adjusted = demand_std * (1 + gri_factor * 0.4)
base_ss = z * demand_std_adjusted * np.sqrt(lead_time_days)
# CRI增加的供应风险
if 'cri' in sentiment_indices:
cri_factor = sentiment_indices['cri'] / 10
adjustment_factor *= (1 + cri_factor * 0.2)
safety_stock = base_ss * adjustment_factor
return safety_stock
def optimize_replenishment(self,
current_inventory: float,
forecast_demand: np.ndarray, # 未来30天的日需求
holding_cost_per_unit: float, # 库存成本 (¥/单位/天)
stockout_cost_per_unit: float, # 缺货成本 (¥/单位)
suppliers: list, # 供应商列表
sentiment_indices: dict = None) -> dict:
"""
使用整数线性规划求解最优补货方案
参数示例:
current_inventory: 100 (吨)
forecast_demand: [50, 55, 45, ..., 60] (30天)
holding_cost: 50 (¥/吨/天)
stockout_cost: 2000 (¥/吨)
suppliers: [
{
'name': '供应商A',
'unit_price': 3000, # ¥/吨
'lead_time': 5, # 天
'capacity': 500, # 吨/月
'moq': 50, # 最小订购量
'quality_score': 0.95,
'reliability': 0.9
},
...
]
"""
days = len(forecast_demand)
n_suppliers = len(suppliers)
# 创建LP问题
prob = LpProblem("Replenishment_Optimization", LpMinimize)
# 决策变量
# x[i][j]: 从供应商j在第i天的订购量 (吨)
x = [[LpVariable(f"order_day{i}_supplier{j}", lowBound=0, cat='Continuous')
for j in range(n_suppliers)] for i in range(days)]
# h[i]: 第i天末的库存量 (吨)
h = [LpVariable(f"inventory_day{i}", lowBound=0, cat='Continuous') for i in range(days)]
# s[i]: 第i天的缺货量 (吨)
s = [LpVariable(f"shortage_day{i}", lowBound=0, cat='Continuous') for i in range(days)]
# y[i][j]: 是否从供应商j在第i天订购 (二元变量, 用于MOQ约束)
y = [[LpVariable(f"order_flag_day{i}_supplier{j}", cat='Binary')
for j in range(n_suppliers)] for i in range(days)]
# ===== 目标函数 =====
# 最小化: 库存持有成本 + 缺货损失 + 订货次数溢价 (多源溢价)
cost_holding = lpSum([h[i] * holding_cost_per_unit for i in range(days)])
cost_shortage = lpSum([s[i] * stockout_cost_per_unit for i in range(days)])
# 多源溢价 (从>1个供应商订购时增加成本5%)
multi_source_premium = lpSum([
y[i][j] * suppliers[j]['unit_price'] * 0.05 * lpSum([x[i][j_] for j_ in range(n_suppliers)])
for i in range(days) for j in range(n_suppliers)
])
prob += cost_holding + cost_shortage + multi_source_premium, "Total_Cost"
# ===== 约束条件 =====
# 1. 库存平衡约束
for i in range(days):
if i == 0:
# 第一天: 当前库存 + 订单 - 需求 = 库存 + 缺货
prob += (
current_inventory +
lpSum([x[i][j] for j in range(n_suppliers)]) -
forecast_demand[i] == h[i] + s[i],
f"Balance_Day{i}"
)
else:
# 后续天: 前日库存 + 订单 - 需求 = 库存 + 缺货
prob += (
h[i-1] +
lpSum([x[i][j] for j in range(n_suppliers)]) -
forecast_demand[i] == h[i] + s[i],
f"Balance_Day{i}"
)
# 2. 最小订购量 (MOQ) 约束
for i in range(days):
for j in range(n_suppliers):
moq = suppliers[j]['moq']
# 如果订购,则订购量 >= MOQ
prob += (
x[i][j] >= moq * y[i][j],
f"MOQ_Day{i}_Supplier{j}"
)
# 最大订购量 (供应商容量)
prob += (
x[i][j] <= suppliers[j]['capacity'] * y[i][j],
f"Capacity_Day{i}_Supplier{j}"
)
# 3. 安全库存约束 (风险调整)
avg_demand = np.mean(forecast_demand)
demand_std = np.std(forecast_demand)
# 计算风险调整的安全库存
safety_stock = self.calculate_safety_stock(
avg_demand, demand_std,
lead_time_days=suppliers[0]['lead_time'], # 使用最快的供应商
sentiment_indices=sentiment_indices
)
# 平均库存 >= 安全库存
avg_inventory = lpSum([h[i] for i in range(days)]) / days
prob += avg_inventory >= safety_stock, "Safety_Stock_Constraint"
# 4. 供应商可靠性约束 (考虑舆情风险)
for i in range(days):
for j in range(n_suppliers):
# 高风险环境下,不完全依赖单一供应商
if sentiment_indices and sentiment_indices.get('cri', 0) > 6:
# CRI>6时,单个供应商订购量不超过预测需求的60%
prob += (
x[i][j] <= forecast_demand[i] * 0.6,
f"Multi_Source_Diversity_Day{i}_Supplier{j}"
)
# 5. 缺货容忍度约束 (特定采购商)
# 采购商等级A: 不允许缺货 (s[i]=0)
# 采购商等级B: 缺货<平均需求的5%
# 采购商等级C: 缺货<平均需求的10%
buyer_grades = {'A': 0.0, 'B': 0.05, 'C': 0.10}
buyer_grade = 'B' # 默认B级
max_shortage_ratio = buyer_grades.get(buyer_grade, 0.05)
prob += (
lpSum([s[i] for i in range(days)]) <= avg_demand * max_shortage_ratio * days,
"Shortage_Tolerance"
)
# ===== 求解 =====
prob.solve(PULP_CBC_CMD(msg=0))
# ===== 提取结果 =====
replenishment_plan = []
for i in range(days):
day_orders = []
for j in range(n_suppliers):
order_qty = x[i][j].varValue
if order_qty and order_qty > 0.1: # 订购量>0.1单位
day_orders.append({
'supplier': suppliers[j]['name'],
'quantity': round(order_qty, 2),
'unit_price': suppliers[j]['unit_price'],
'total_cost': round(order_qty * suppliers[j]['unit_price'], 2),
'lead_time': suppliers[j]['lead_time']
})
if day_orders: # 只记录有订单的天
replenishment_plan.append({
'day': i + 1,
'orders': day_orders,
'inventory_end_of_day': round(h[i].varValue, 2),
'shortage': round(s[i].varValue, 2)
})
# 总成本分析
total_cost = value(prob.objective)
holding_cost_actual = sum([h[i].varValue * holding_cost_per_unit for i in range(days)])
shortage_cost_actual = sum([s[i].varValue * stockout_cost_per_unit for i in range(days)])
return {
'status': LpStatus[prob.status],
'optimization_successful': LpStatus[prob.status] == 'Optimal',
'total_cost': round(total_cost, 2),
'cost_breakdown': {
'holding_cost': round(holding_cost_actual, 2),
'shortage_cost': round(shortage_cost_actual, 2),
'multi_source_premium': round(total_cost - holding_cost_actual - shortage_cost_actual, 2)
},
'replenishment_plan': replenishment_plan,
'summary': {
'total_orders_days': len(replenishment_plan),
'avg_inventory_level': round(np.mean([h[i].varValue for i in range(days)]), 2),
'total_shortage': round(sum([s[i].varValue for i in range(days)]), 2),
'safety_stock_calculated': round(safety_stock, 2)
}
}
# ===== 使用示例 =====
if __name__ == '__main__':
engine = IntelligentReplenishmentEngine(
commodity_code='STEEL_REBAR',
buyer_geo_key='CHINA_EAST_001'
)
# 供应商配置
suppliers = [
{
'name': '鞍山供应商A',
'unit_price': 3000,
'lead_time': 3,
'capacity': 500,
'moq': 50,
'quality_score': 0.95,
'reliability': 0.92
},
{
'name': '宝山供应商B',
'unit_price': 3100,
'lead_time': 5,
'capacity': 400,
'moq': 40,
'quality_score': 0.90,
'reliability': 0.85
},
{
'name': '日本进口供应商C',
'unit_price': 3500,
'lead_time': 20,
'capacity': 200,
'moq': 100,
'quality_score': 0.98,
'reliability': 0.95
}
]
# 舆情数据
sentiment = {
'ldi': 5, # 物流延迟中等
'cri': 6, # 合规风险高
'gri': 3, # 地缘政治风险低
'fri': 4 # 金融风险中等
}
# 预测需求 (30天)
forecast = np.array([
50, 55, 45, 60, 52, 48, 65, 70, 55, 50,
58, 62, 51, 49, 67, 72, 58, 53, 61, 68,
54, 50, 59, 64, 52, 47, 63, 69, 55, 51
])
# 求解
result = engine.optimize_replenishment(
current_inventory=100,
forecast_demand=forecast,
holding_cost_per_unit=50,
stockout_cost_per_unit=2000,
suppliers=suppliers,
sentiment_indices=sentiment
)
print("优化结果:")
print(f" 优化状态: {result['status']}")
print(f" 总成本: ¥{result['total_cost']}")
print(f" 其中: 库存成本 ¥{result['cost_breakdown']['holding_cost']}, " +
f"缺货成本 ¥{result['cost_breakdown']['shortage_cost']}")
print(f" 采购计划: 共{result['summary']['total_orders_days']}天有订单")
动态定价与收益优化
5.1 定价模型
基于货运动态定价模型,扩展应用到大宗商品的销售定价:
class DynamicPricingEngine:
"""
大宗商品动态定价引擎
综合考虑:
- 需求预测
- 库存压力
- 竞争价格
- 舆情风险
- 价格弹性
"""
def __init__(self, commodity_code: str):
self.commodity_code = commodity_code
def estimate_price_elasticity(self, historical_data: pd.DataFrame) -> float:
"""
估算商品的价格弹性系数 (Price Elasticity of Demand)
公式: E = (ΔQ / Q) / (ΔP / P)
大宗商品特性:
- 钢材: E ≈ -0.6 (缺乏弹性, 工业需求稳定)
- 石油: E ≈ -0.3 (极缺乏弹性, 生活必需)
- 谷物: E ≈ -0.8 (相对缺乏, 饲料/食品刚需)
- 铜矿: E ≈ -0.5 (中等缺乏, 工业周期性)
"""
# 从历史数据计算价格与销量的相关性
if len(historical_data) < 30:
# 使用商品类型默认值
default_elasticity = {
'steel': -0.6,
'oil': -0.3,
'grain': -0.8,
'copper': -0.5
}
return default_elasticity.get(self.commodity_code, -0.5)
# 计算相关系数
price_changes = historical_data['price'].pct_change()
quantity_changes = historical_data['quantity'].pct_change()
elasticity = quantity_changes.corr(price_changes)
return elasticity
def calculate_optimal_price(self,
base_price: float,
inventory_ratio: float, # 库存 / (日均需求 * 30)
forecast_demand: float, # 预期需求
competitor_price: float,
sentiment_indices: dict,
elasticity: float = -0.6) -> dict:
"""
计算动态价格
影响因素权重:
- 库存压力 (30%)
- 需求预测 (35%)
- 竞争价格 (20%)
- 舆情风险 (15%)
"""
optimal_price = base_price
price_factors = {}
# ===== 因子1: 库存压力调整 =====
# 库存高 → 降价清库存
# 库存低 → 提价
if inventory_ratio > 1.5:
# 库存过高, 建议降价15-20%
inventory_adjustment = 0.85
price_factors['inventory'] = -15
elif inventory_ratio > 1.0:
# 库存适中偏高, 降价5-10%
inventory_adjustment = 0.93
price_factors['inventory'] = -7
elif inventory_ratio > 0.5:
# 库存健康, 不调整
inventory_adjustment = 1.0
price_factors['inventory'] = 0
else:
# 库存不足, 提价10-15%
inventory_adjustment = 1.12
price_factors['inventory'] = +12
# ===== 因子2: 需求预测调整 =====
# 预期需求高 → 提价
# 预期需求低 → 降价
avg_demand = historical_data['quantity'].mean()
demand_ratio = forecast_demand / avg_demand
if demand_ratio > 1.2:
# 需求激增 (可能由舆情驱动), 提价8-15%
demand_adjustment = 1.12
price_factors['demand'] = +12
elif demand_ratio > 1.0:
# 需求高于平均, 提价3-7%
demand_adjustment = 1.05
price_factors['demand'] = +5
elif demand_ratio > 0.8:
# 需求低于平均, 降价2-5%
demand_adjustment = 0.97
price_factors['demand'] = -3
else:
# 需求明显下滑, 降价7-12%
demand_adjustment = 0.91
price_factors['demand'] = -9
# ===== 因子3: 竞争价格调整 =====
# 根据市场价格与竞争对手调整
price_vs_market = (base_price - competitor_price) / competitor_price
if price_vs_market > 0.05:
# 本公司价格比竞争对手高>5%, 降价以改善竞争力
competition_adjustment = 0.98
price_factors['competition'] = -2
elif price_vs_market < -0.05:
# 本公司价格比竞争对手低>5%, 提价以提升利润
competition_adjustment = 1.03
price_factors['competition'] = +3
else:
# 价格接近竞争对手, 维持
competition_adjustment = 1.0
price_factors['competition'] = 0
# ===== 因子4: 舆情风险调整 =====
# 高风险环境下可能需要提价以对冲成本上升
gri = sentiment_indices.get('gri', 0) # 地缘政治
cri = sentiment_indices.get('cri', 0) # 合规风险
sentiment_pressure = (gri * 0.6 + cri * 0.4) / 10
if sentiment_pressure > 0.7:
# 高风险, 提价8-12% (弥补运输/融资成本增加)
sentiment_adjustment = 1.10
price_factors['sentiment'] = +10
elif sentiment_pressure > 0.5:
# 中高风险, 提价3-6%
sentiment_adjustment = 1.04
price_factors['sentiment'] = +4
elif sentiment_pressure > 0.3:
# 中等风险, 不调整或微调
sentiment_adjustment = 1.0
price_factors['sentiment'] = 0
else:
# 低风险, 维持或小幅降价
sentiment_adjustment = 1.0
price_factors['sentiment'] = 0
# ===== 综合计算 =====
# 加权平均调整因子
optimal_price = (
base_price *
(0.30 * inventory_adjustment +
0.35 * demand_adjustment +
0.20 * competition_adjustment +
0.15 * sentiment_adjustment)
)
# 价格下界: 不低于成本的110%
# 价格上界: 不高于基准价的130%
min_price = base_price * 0.8
max_price = base_price * 1.3
optimal_price = np.clip(optimal_price, min_price, max_price)
# ===== 收益预测 =====
# 使用价格弹性预测新需求
price_change_ratio = (optimal_price - base_price) / base_price
demand_change_ratio = elasticity * price_change_ratio
projected_demand = forecast_demand * (1 + demand_change_ratio)
projected_revenue = optimal_price * projected_demand
base_revenue = base_price * forecast_demand
revenue_lift = (projected_revenue - base_revenue) / base_revenue if base_revenue > 0 else 0
return {
'base_price': base_price,
'optimal_price': round(optimal_price, 2),
'price_change_pct': round(price_change_ratio * 100, 2),
'factors': {
'inventory_adjustment': round((inventory_adjustment - 1) * 100, 2),
'demand_adjustment': round((demand_adjustment - 1) * 100, 2),
'competition_adjustment': round((competition_adjustment - 1) * 100, 2),
'sentiment_adjustment': round((sentiment_adjustment - 1) * 100, 2)
},
'demand_projection': {
'base_forecast': round(forecast_demand, 2),
'price_adjusted_demand': round(projected_demand, 2),
'demand_elasticity': elasticity
},
'revenue_impact': {
'base_revenue': round(base_revenue, 2),
'projected_revenue': round(projected_revenue, 2),
'revenue_lift_pct': round(revenue_lift * 100, 2)
}
}
技术架构
6.1 整体数据流
┌─── 舆情系统 (MGIR) ───────────────────────────────────┐
│ LDI / CRI / GRI / FRI 指标 (5分钟更新) │
└─────────────────────────────┬─────────────────────────┘
↓
┌─── 销量预测模块 ──────────────────────────────────────┐
│ Kafka: sentiment_indices_stream │
│ ↓ │
│ Flink: XGBoost推理 (实时打分) │
│ ↓ │
│ 输出: forecast_demand_stream → Kafka │
└─────────────────────────────┬─────────────────────────┘
↓
┌─── 补货决策模块 ──────────────────────────────────────┐
│ 输入数据: │
│ - forecast_demand_stream (预测) │
│ - inventory_snapshot (库存) │
│ - supplier_data (供应商参数) │
│ - sentiment_indices (舆情因子) │
│ ↓ │
│ 优化求解: PuLP (整数线性规划) │
│ ↓ │
│ 输出: replenishment_plan → Doris │
└─────────────────────────────┬─────────────────────────┘
↓
┌─── 定价决策模块 ──────────────────────────────────────┐
│ 输入数据: │
│ - forecast_demand (已修正) │
│ - inventory_ratio (库存压力) │
│ - competitor_price (竞争价格) │
│ - sentiment_indices (舆情) │
│ ↓ │
│ 计算: 动态定价公式 │
│ ↓ │
│ 输出: pricing_strategy → Doris → ERP │
└─────────────────────────────┬─────────────────────────┘
↓
BI仪表板 / 执行系统
6.2 Flink流处理拓扑
Source:
├─ raw_forecast_demand (销量预测输出)
├─ inventory_snapshot (库存变化)
├─ supplier_status (供应商状态)
├─ sentiment_indices_stream (舆情指标)
└─ price_feeds (商品期货价格)
Process:
├─ [1] 数据融合
│ └─ 按SKU和采购商合并多源数据
│
├─ [2] 补货需求计算
│ ├─ 计算安全库存 (考虑舆情调整)
│ ├─ 计算订购点 (Reorder Point)
│ └─ 生成补货触发事件
│
├─ [3] 供应商评分与选择
│ ├─ 实时评估供应商可靠性
│ ├─ 考虑交期、质量、成本
│ └─ 生成推荐供应商名单
│
└─ [4] 决策输出生成
├─ 补货建议 → Doris
└─ 定价建议 → Doris
Sink:
├─ Doris: replenishment_decisions 表
├─ Doris: pricing_decisions 表
├─ Kafka: decision_stream (用于下游执行)
└─ 告警系统 (关键风险事件)
数据模型与表设计
7.1 Doris表设计
-- 1. 销量预测结果表
CREATE TABLE forecast_demand_doris (
forecast_id VARCHAR(36) PRIMARY KEY,
commodity_code VARCHAR(50) NOT NULL,
buyer_geo_key VARCHAR(50) NOT NULL,
forecast_date DATE NOT NULL,
forecast_horizon INT, -- 预测天数 (1-30)
base_forecast FLOAT, -- 基础预测值
forecast_with_sentiment FLOAT, -- 舆情调整后的预测
lower_bound FLOAT, -- 置信区间下界 (±15%)
upper_bound FLOAT,
confidence_score FLOAT, -- 模型置信度 (0-1)
sentiment_adjustment_ratio FLOAT, -- 舆情调整幅度
gri_influence FLOAT, -- GRI对预测的影响度
cri_influence FLOAT, -- CRI对预测的影响度
model_version VARCHAR(20), -- 模型版本
created_at DATETIME,
updated_at DATETIME
)
ENGINE = OLAP
DUPLICATE KEY (forecast_id)
PARTITION BY RANGE (forecast_date)
DISTRIBUTED BY HASH(buyer_geo_key) BUCKETS 32
PROPERTIES ("replication_num" = "2");
-- 2. 智能补货决策表
CREATE TABLE replenishment_decisions_doris (
decision_id VARCHAR(36) PRIMARY KEY,
commodity_code VARCHAR(50) NOT NULL,
buyer_geo_key VARCHAR(50) NOT NULL,
decision_date DATE NOT NULL,
current_inventory FLOAT,
safety_stock_required FLOAT,
replenishment_quantity FLOAT,
optimal_order_date DATE, -- 建议订购日期
-- 选中的供应商
primary_supplier_id VARCHAR(50),
primary_supplier_name VARCHAR(100),
primary_quantity FLOAT,
primary_unit_price FLOAT,
-- 备选供应商 (多源模式)
secondary_supplier_id VARCHAR(50),
secondary_supplier_name VARCHAR(100),
secondary_quantity FLOAT,
secondary_unit_price FLOAT,
-- 成本分析
total_cost FLOAT,
holding_cost FLOAT,
shortage_cost FLOAT,
multi_source_premium FLOAT,
-- 风险指标
estimated_lead_time_days INT,
sentiment_adjusted_lead_time INT,
supply_disruption_risk FLOAT, -- 0-1
ldi_impact FLOAT,
cri_impact FLOAT,
-- 状态追踪
decision_status VARCHAR(20), -- PENDING/APPROVED/EXECUTED/COMPLETED
execution_date DATE,
actual_arrival_date DATE,
variance_analysis VARCHAR(500), -- 实际vs计划的偏差分析
created_at DATETIME,
updated_at DATETIME
)
ENGINE = OLAP
DUPLICATE KEY (decision_id)
PARTITION BY RANGE (decision_date)
DISTRIBUTED BY HASH(buyer_geo_key) BUCKETS 32
PROPERTIES ("replication_num" = "2");
-- 3. 动态定价决策表
CREATE TABLE pricing_decisions_doris (
pricing_id VARCHAR(36) PRIMARY KEY,
commodity_code VARCHAR(50) NOT NULL,
buyer_geo_key VARCHAR(50) NOT NULL,
pricing_date DATE NOT NULL,
-- 价格信息
base_price FLOAT,
optimal_price FLOAT,
price_change_pct FLOAT, -- 调整幅度 %
-- 影响因素分解
inventory_adjustment_pct FLOAT,
demand_adjustment_pct FLOAT,
competition_adjustment_pct FLOAT,
sentiment_adjustment_pct FLOAT,
-- 需求与收益预测
base_forecast_demand FLOAT,
price_adjusted_demand FLOAT,
price_elasticity FLOAT,
base_revenue FLOAT,
projected_revenue FLOAT,
revenue_lift_pct FLOAT,
-- 风险因子
current_inventory_ratio FLOAT,
competitor_price FLOAT,
sentiment_pressure_index FLOAT, -- 0-1
gri_factor FLOAT,
cri_factor FLOAT,
-- 执行跟踪
pricing_status VARCHAR(20), -- RECOMMENDED/APPROVED/ACTIVE/EXPIRED
actual_sales_qty FLOAT,
actual_revenue FLOAT,
forecast_accuracy FLOAT, -- 预测与实际的准确度
created_at DATETIME,
valid_from_date DATE,
valid_until_date DATE,
updated_at DATETIME
)
ENGINE = OLAP
DUPLICATE KEY (pricing_id)
PARTITION BY RANGE (pricing_date)
DISTRIBUTED BY HASH(commodity_code) BUCKETS 32
PROPERTIES ("replication_num" = "2");
-- 4. 舆情指标与库存影响关联表
CREATE TABLE sentiment_inventory_impact_doris (
impact_id VARCHAR(36) PRIMARY KEY,
commodity_code VARCHAR(50) NOT NULL,
buyer_geo_key VARCHAR(50) NOT NULL,
impact_date DATE NOT NULL,
-- 舆情指标 (来自MGIR系统)
ldi_score FLOAT,
cri_score FLOAT,
gri_score FLOAT,
fri_score FLOAT,
composite_risk_index FLOAT,
-- 库存影响
safety_stock_base FLOAT, -- 无舆情时的安全库存
safety_stock_adjusted FLOAT, -- 舆情调整后的安全库存
stock_increase_ratio FLOAT, -- 增加比例
-- 需求冲击
demand_surge_probability FLOAT, -- 0-1, 预期需求激增的概率
estimated_surge_magnitude FLOAT, -- 预期激增幅度 %
-- 供应风险
lead_time_days_base INT,
lead_time_days_adjusted INT,
lead_time_increase_days INT,
-- 决策建议
action_recommended VARCHAR(100), -- 建议行动
urgency_level VARCHAR(20), -- NORMAL/HIGH/CRITICAL
created_at DATETIME
)
ENGINE = OLAP
DUPLICATE KEY (impact_id)
PARTITION BY RANGE (impact_date)
DISTRIBUTED BY HASH(buyer_geo_key) BUCKETS 32
PROPERTIES ("replication_num" = "2");
-- 5. 供应商绩效与风险评分表
CREATE TABLE supplier_performance_doris (
supplier_id VARCHAR(50) PRIMARY KEY,
supplier_name VARCHAR(100),
supplier_country VARCHAR(50),
supplier_region VARCHAR(50),
-- 历史绩效
avg_delivery_days INT,
on_time_delivery_rate FLOAT, -- 0-1
quality_score FLOAT, -- 0-1
-- 风险评分
location_risk_score FLOAT, -- 基于GRI的地理位置风险
financial_risk_score FLOAT, -- 基于FRI的财务风险
regulatory_risk_score FLOAT, -- 基于CRI的监管风险
overall_risk_score FLOAT, -- 综合风险 0-10
-- 容量与约束
capacity_monthly FLOAT,
moq FLOAT,
current_committed_qty FLOAT,
available_capacity FLOAT,
-- 实时状态
current_sentiment_status VARCHAR(20), -- GREEN/YELLOW/ORANGE/RED
last_updated DATETIME,
next_assessment_date DATE
)
ENGINE = OLAP
DUPLICATE KEY (supplier_id)
DISTRIBUTED BY HASH(supplier_country) BUCKETS 16
PROPERTIES ("replication_num" = "2");
7.2 Paimon表设计 (湖表一体)
-- 1. 预测原始结果Paimon表
CREATE TABLE forecast_demand_paimon (
forecast_id STRING NOT NULL,
commodity_code STRING,
buyer_geo_key STRING,
forecast_date DATE,
forecast_value FLOAT,
sentiment_adjusted_value FLOAT,
input_features STRING, -- JSON格式的所有输入特征
model_version STRING,
processing_timestamp TIMESTAMP(3),
update_version INT,
PRIMARY KEY (forecast_id) NOT ENFORCED
)
PARTITIONED BY (forecast_date)
WITH (
'bucket' = '64',
'file.compression' = 'lz4'
);
-- 2. 补货决策历史表
CREATE TABLE replenishment_history_paimon (
decision_id STRING NOT NULL,
commodity_code STRING,
buyer_geo_key STRING,
decision_date DATE,
decision_snapshot JSON, -- 完整决策结果
execution_snapshot JSON, -- 实际执行结果
variance_analysis STRING,
processing_timestamp TIMESTAMP(3),
update_version INT,
PRIMARY KEY (decision_id) NOT ENFORCED
)
PARTITIONED BY (decision_date)
WITH (
'bucket' = '32',
'file.compression' = 'lz4'
);