Day 13: 成本勘算 —— 真实环境下的 API Token 成本提取

8 阅读7分钟

🚀 Day 13: 成本勘算 —— 真实环境下的 API Token 成本提取

今日目标:由于不同大模型厂商对 Token 消耗量的回传格式完全不同(有的在 usage.total_tokens,有的分 inputoutput,有的甚至塞在 HTTP Headers 里),我们将编写一个强壮的多态提取函数 (Universal Token Extractor)。确保无论你未来对接哪个大模型,都能将每一次 AI 狩猎的真实资金成本精准记录进 Splunk,实现 100% 的财务可审计化!


💻 架构大纲:今天我们将如何重构“计费引擎”?

  1. 废弃粗暴获取:移除之前单纯依赖 response_json.get("usage", {}).get("total_tokens", 0) 的脆弱写法。
  2. 多厂商兼容字典遍历
  • OpenAI / 阿里 / DeepSeek 族:提取 usage.total_tokens
  • Anthropic / Claude 族:动态计算 usage.input_tokens + usage.output_tokens
  • 网关/代理族:从 HTTP 响应头(Headers)中提取 x-token-usage 等字段。
  1. 安全回退机制 (Graceful Degradation):即便 API 厂商大改版导致提取失败,使用 try-except 兜底返回 0,绝不允许因为计费失败导致核心安全阻断流程崩溃。

💻 终极实战:Day 13 FinOps 计费版全量代码

请打开 Add-on Builder 的 Define & Test 编辑器,用以下代码覆盖原有代码

import os
import sys
import time
import datetime
import json
import uuid
import requests
import splunklib.client as client
import splunklib.results as results

# ==========================================
# HELPER 1: Execute AI Generated SPL
# ==========================================
def execute_ai_spl(helper, service, spl_query):
    """
    Execute SPL generated by AI and return the raw result data.
    """
    spl_query = spl_query.strip()
    # Force the 'search' prefix to prevent syntax errors
    if not spl_query.startswith("search") and not spl_query.startswith("|"):
        spl_query = "search " + spl_query
        
    kwargs_oneshot = {"output_mode": "json"}
    helper.log_info(f"[Agentic Engine] Executing SPL: {spl_query}")
    
    try:
        search_results = service.jobs.oneshot(spl_query, **kwargs_oneshot)
        reader = results.JSONResultsReader(search_results)
        result_data = [res for res in reader if isinstance(res, dict)]
        helper.log_info(f"[Agentic Engine] SUCCESS: Found {len(result_data)} events.")
        return result_data
    except Exception as e:
        helper.log_error(f"[Agentic Engine] FAILED execution: {str(e)}")
        return []

# ==========================================
# HELPER 2: Fetch Real Logs (M-ATH Concept)
# ==========================================
def fetch_rare_logs(helper, service, target_index):
    """
    Fetch the most recent rare/anomalous logs from the target index to feed the AI.
    """
    helper.log_info("Fetching real rare logs for analysis...")
    # Fetching fresh data. Use cluster only if CPU permits, otherwise use head.
    spl = f"search index={target_index} | head 5 | table _raw"
    
    try:
        results_data = execute_ai_spl(helper, service, spl)
        if not results_data:
            return None
        
        # Extract the _raw strings and join them into a single text payload
        raw_logs = [item.get("_raw", "") for item in results_data if "_raw" in item]
        payload = "\n".join(raw_logs)

        # =========================================================================
        # Context Distillation (Payload Truncation)
        # Prevents massive Splunk logs from blowing up the LLM Context Window
        # =========================================================================
        MAX_CHARS = 6000 # Roughly equals 1500 Tokens
        if len(payload) > MAX_CHARS:
            helper.log_info(f"Payload too large ({len(payload)} chars). Truncating to {MAX_CHARS}...")
            # Slice the string and append a clear signal for the LLM
            payload = payload[:MAX_CHARS] + "\n\n...[TRUNCATED DUE TO CONTEXT LIMITS. ANALYZE AVAILABLE DATA ONLY.]..."
        # =========================================================================
            
        return payload
    except Exception as e:
        helper.log_error(f"Failed to fetch rare logs: {str(e)}")
        return None

# =========================================================================
# [DAY 13 NEW]: Universal Token Extractor (FinOps Cost Tracking)
# =========================================================================
def extract_token_usage(helper, response_json, response_headers):
    """
    Robustly extract token usage across different LLM providers and API gateways.
    Ensures FinOps tracking never crashes the main thread.
    """
    try:
        # Strategy 1: OpenAI / DeepSeek / DashScope standard format
        if "usage" in response_json:
            usage = response_json["usage"]
            if "total_tokens" in usage:
                return int(usage["total_tokens"])
            
            # Strategy 2: Anthropic-style or granular input/output split
            elif "prompt_tokens" in usage and "completion_tokens" in usage:
                return int(usage["prompt_tokens"]) + int(usage["completion_tokens"])
            elif "input_tokens" in usage and "output_tokens" in usage:
                return int(usage["input_tokens"]) + int(usage["output_tokens"])
        
        # Strategy 3: API Gateway headers (e.g., Azure, Cloudflare AI Gateway)
        header_keys = [k.lower() for k in response_headers.keys()]
        for key in header_keys:
            if "token-usage" in key or "x-ratelimit-usage" in key:
                return int(response_headers.get(key, 0))
                
    except Exception as e:
        helper.log_error(f"[FinOps Warning] Failed to parse token usage correctly: {str(e)}")
    
    # Graceful degradation: Return 0 if extraction fails, ensuring pipeline survival
    return 0

# ==========================================
# HELPER 3: The LLM API Connector
# ==========================================
# Added dynamic 'max_tokens' parameter to function signature
def call_llm_api(helper, api_key, base_url, model, system_prompt, user_prompt, max_tokens):
    """
    Establish real HTTP connection to the LLM API and return the JSON response.
    """
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }
    
    payload = {
        "model": model,
        "messages": [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ],
        # Mandatory flag for modern LLMs to strictly output JSON
        "response_format": {"type": "json_object"},
        
        # Hardware-level output boundary (Token Circuit Breaker)
        "max_tokens": max_tokens
    }
    
    # Ensure URL formatting is correct
    endpoint = base_url if base_url.endswith("/chat/completions") else f"{base_url.rstrip('/')}/chat/completions"
    
    try:
        helper.log_info(f"Initiating network request to LLM API: {endpoint} (Max Tokens: {max_tokens})")
        # 120s timeout ensures deep-thinking models (CoT) have enough time
        response = requests.post(endpoint, headers=headers, json=payload, timeout=120)
        response.raise_for_status() 
        
        response_json = response.json()
        llm_content = response_json["choices"][0]["message"]["content"]
        
        # =========================================================================
        # [DAY 13 MODIFIED]: Call the Universal Token Extractor
        # =========================================================================
        total_tokens = extract_token_usage(helper, response_json, response.headers)
        helper.log_info(f"API Call Success. FinOps Tracked: {total_tokens} tokens consumed.")
        
        return llm_content, total_tokens
        
    except requests.exceptions.RequestException as e:
        helper.log_error(f"Network error during API call: {str(e)}")
        raise

# ==========================================
# MAIN WORKFLOW: The Autonomous Agent
# ==========================================
def collect_events(helper, ew):
    """
    The Ultimate Live Workflow.
    Features: Real API Integration, Unix Epoch Time injection, Anti-Hallucination, Truncation, and FinOps Tracking.
    """
    helper.log_info("PEAK AI Hunter: LIVE MODE INITIALIZED.")
    cycle_start_time = time.time()
    
    # Generate a unique Session ID to stitch the flattened logs together
    hunt_session_id = str(uuid.uuid4())

    try:
        # 1. Acquire Splunk Service Session
        session_key = getattr(helper, 'session_key', None) or getattr(helper._input_definition, 'metadata', {}).get('session_key')
        if not session_key:
            raise ValueError("Failed to acquire session_key.")
        service = client.Service(token=session_key)
        
        # 2. Acquire Global Setup Configurations (API credentials)
        api_key = helper.get_global_setting("api_key")
        base_url = helper.get_global_setting("base_url")
        model_name = helper.get_global_setting("model_name")
        target_index = helper.get_output_index() or "main"

        if not api_key or not base_url:
             raise ValueError("API Key or Base URL is missing in Global Settings.")

        # ==========================================
        # PHASE 1: PREPARE (Real LLM Call for Blueprint)
        # ==========================================
        rare_logs_payload = fetch_rare_logs(helper, service, target_index)
        if not rare_logs_payload:
            helper.log_info("No anomalous logs found to analyze. Terminating cycle early gracefully.")
            return

        # Prompt Distillation - Forcing extreme conciseness
        sys_prompt_prepare = "You are a Senior Threat Hunter. You MUST reply in JSON format. Be extremely concise. No pleasantries. Schema requires: 'analysis' (string) and 'hypotheses' (array of objects). Each hypothesis must have 'hypothesis_id', 'ABLE' (Actor, Behavior, Location, Evidence), 'spl_round_1_validation', and 'spl_round_2_drilldown'."
        
        # ANTI-HALLUCINATION FIX: Forcing the LLM to strictly use {target_index} parameter
        usr_prompt_prepare = f"Analyze these real, rare logs from our environment:\n{rare_logs_payload}\n\nGenerate exactly 2 hunting hypotheses. CRITICAL: For 'spl_round_1_validation' and 'spl_round_2_drilldown', you MUST strictly start your queries with 'search index={{target_index}}'. Do NOT guess or use real index names! Output ONLY JSON format."

        helper.log_info("Triggering LLM for Prepare Phase...")
        # Pass max_tokens=1500 for generating SPLs
        blueprint_text, prep_tokens = call_llm_api(helper, api_key, base_url, model_name, sys_prompt_prepare, usr_prompt_prepare, max_tokens=1500)
        
        ai_hunting_plan = json.loads(blueprint_text.strip())
        hypotheses = ai_hunting_plan.get("hypotheses", [])

        # Write Plan to Splunk IMMEDIATELY (Injecting dynamic Unix Time)
        ew.write_event(helper.new_event(
            source=helper.get_input_type(), index=target_index, sourcetype="_json",
            time=time.time(), # THE ULTIMATE TIMEZONE FIX
            data=json.dumps({
                "session_id": hunt_session_id, 
                "event_type": "PEAK_Plan", 
                "timestamp": round(time.time(), 3), 
                "content": ai_hunting_plan
            }, ensure_ascii=False)
        ))

        # ==========================================
        # PHASE 2: EXECUTE (Agentic Splunk Query Loop)
        # ==========================================
        all_hunt_evidence = []
        for i, hyp in enumerate(hypotheses):
            hyp_start = time.time()
            spl_r1 = hyp.get("spl_round_1_validation", "").replace("{target_index}", target_index)
            spl_r2 = hyp.get("spl_round_2_drilldown", "").replace("{target_index}", target_index)
            
            r1_hits = len(execute_ai_spl(helper, service, spl_r1))
            r2_hits = len(execute_ai_spl(helper, service, spl_r2))
            
            all_hunt_evidence.append({
                "hypothesis_id": hyp.get("hypothesis_id", i+1),
                "threat_behavior": hyp.get('ABLE', {}).get('Behavior', 'Unknown'),
                "round_1_hit_count": r1_hits,
                "round_2_hit_count": r2_hits,
                "execution_duration_sec": round(time.time() - hyp_start, 2)
            })

        # Write Evidence to Splunk IMMEDIATELY (Injecting dynamic Unix Time)
        ew.write_event(helper.new_event(
            source=helper.get_input_type(), index=target_index, sourcetype="_json",
            time=time.time(), # THE ULTIMATE TIMEZONE FIX
            data=json.dumps({
                "session_id": hunt_session_id, 
                "event_type": "PEAK_Evidence", 
                "timestamp": round(time.time(), 3), 
                "content": all_hunt_evidence
            }, ensure_ascii=False)
        ))

        # ==========================================
        # PHASE 3: ACT (Real LLM Call for Final Report)
        # ==========================================
        # Concise prompt for Act Phase (Limits summary length)
        sys_prompt_act = "You are a Security Director. Output ONLY valid JSON. Keep summaries under 30 words. Keys: 'executive_summary', 'threat_qualification' (Benign/Suspicious/Confirmed), 'risk_score' (0-100), 'recommended_alert_spl'."
        usr_prompt_act = f"Here is the quantitative execution evidence collected by our agent:\n{json.dumps(all_hunt_evidence)}\n\nBased on these hit counts, qualify the threat, assign a risk score, and generate an alert SPL. Reply in JSON format."

        helper.log_info("Triggering LLM for Act Phase...")
        # Pass max_tokens=800 since this is just a short summary
        report_text, act_tokens = call_llm_api(helper, api_key, base_url, model_name, sys_prompt_act, usr_prompt_act, max_tokens=800)
        
        try:
            final_report = json.loads(report_text.strip())
        except json.JSONDecodeError as e:
            helper.log_error("JSON Truncation in Act Phase. Engaging fallback.")
            final_report = {"executive_summary": "LLM output truncated.", "risk_score": -1, "raw": report_text}

        # Write Final Report to Splunk (Injecting dynamic Unix Time)
        ew.write_event(helper.new_event(
            source=helper.get_input_type(), index=target_index, sourcetype="_json",
            time=time.time(), # THE ULTIMATE TIMEZONE FIX
            data=json.dumps({
                "session_id": hunt_session_id, 
                "event_type": "PEAK_Final_Report", 
                "timestamp": round(time.time(), 3), 
                "total_tokens_used": prep_tokens + act_tokens, 
                "content": final_report
            }, ensure_ascii=False)
        ))

        helper.log_info(f"LIVE CYCLE COMPLETE. Time: {round(time.time() - cycle_start_time, 2)}s. Session ID: {hunt_session_id}")

    except Exception as e:
        helper.log_error(f"FATAL Pipeline Crash: {str(e)}")


💵 极客验证:将 Token 转化为真金白银

代码写好了,大模型不再是“糊涂账”了。现在,让我们在 Splunk 中体验一把精算师的快感!

  1. 在 AOB 中保存代码并点击 Test 运行一次完整的流程。
  2. 回到 Splunk 的 Search 界面。这一次,我们要在面板里引入一个惊艳的动态运算——把 Token 直接折算成美元成本(Cost USD)(假设我们以 GPT-4o-mini 或 Qwen 的平均价格,约为 $0.002 每 1000 个 Token 进行估算)

执行以下带有财务视角的终极 Dashboard SPL

index=main sourcetype="_json" event_type="PEAK_Plan" OR event_type="PEAK_Evidence" OR event_type="PEAK_Final_Report"
| spath
| stats 
    min(timestamp) as Start_Time_Epoch,
    max(timestamp) as End_Time_Epoch,
    latest(content.risk_score) as Risk_Score,
    latest(content.executive_summary) as Summary,
    sum(content{}.round_1_hit_count) as Total_R1_Hits,
    sum(content{}.round_2_hit_count) as Total_R2_Hits,
    sum(total_tokens_used) as Total_Tokens
  by session_id
| eval Execution_Time_Sec = round(End_Time_Epoch - Start_Time_Epoch, 2)
| eval Start_Time = strftime(Start_Time_Epoch, "%Y-%m-%d %H:%M:%S")
| eval Cost_USD = "$" . tostring(round((Total_Tokens / 1000) * 0.002, 6))
| sort - Start_Time_Epoch
| table Start_Time, session_id, Risk_Score, Total_R1_Hits, Total_R2_Hits, Execution_Time_Sec, Total_Tokens, Cost_USD, Summary

🎯 你的验收时刻: 看一眼表格的倒数第二列!Cost_USD 会以 $ 开头的形式,无比清晰地告诉你:刚刚这几秒钟大模型的思考,到底花了公司几厘钱! 有了这套极为强壮的兼容逻辑,不管你们安全团队以后切换成哪个厂商的大模型,这份成本监控表将永远精准跳动。这就是企业级开发的尽头:业务要闭环,财务要透传!