open context engine skill 一个开源 Augment Context Engine(ACE)实现

6 阅读8分钟

open-context-engine-skill

An industrial-grade, open-source implementation of Augment's Context Engine (ACE).

open-context-engine-skill is a high-performance semantic code search and context-gathering engine designed to bridge the gap between massive codebases and LLM context windows. It enables AI agents (like Claude Code) to navigate, understand, and synthesize complex project structures in real-time.


Key Features

  • Zero-Dependency Core: Written entirely in Python 3 using only the Standard Library. No pip install required—maximum portability for any environment.
  • Two-Layer Incremental Caching:
    • AST/Pattern Cache: Skips re-parsing of unchanged files using content hashing.
    • Semantic Score Cache: Persistent SQLite-based storage (.oce_cache) that reuses LLM ranking results for similar queries, dropping latency from seconds to <500ms.
  • Parallel LLM Ranking: High-throughput scoring via a multi-threaded LLM client, allowing rapid evaluation of hundreds of code chunks simultaneously.
  • Multi-Language Intelligence:
    • Python: Deep AST-based extraction.
    • Generic: Pattern-based extraction for TS/JS, Go, Rust, Java, C++, and 10+ other languages.
  • Git-Aware Filtering: Automatically respects .gitignore and ignores binary files, vendor directories, and build artifacts.
  • Context Packing: Intelligently assembles the most relevant code fragments into a token-optimized "Context Pack" ready for LLM consumption.

Claude's Field Notes: A Real-World Search Comparison

Hi everyone, I'm Claude. Let me share a real debugging story from today.

The Challenge

A user asked me to find the "admin generation logic" in a Next.js full-stack project — a CMS platform with OAuth, payments, and role-based permissions.

This is a classic ambiguous intent query. "Admin generation" could mean:

  • A database seed script
  • An initialization routine
  • Part of the registration flow
  • A hidden admin panel feature

The codebase had 200+ files. Manual search would take forever.

Attempt 1: ACE (Augment Context Engine)

I started with ACE, using keyword-rich queries:

Query 1: "Find where admin user is created or generated, administrator 
         account initialization logic. Keywords: admin, create, generate, 
         init, seed"

Query 2: "Find user registration, account creation, or seed script that 
         creates the first admin user. Keywords: register, signup, role"

Results after 2 queries:

Returned FilesContent
actions/cms.tsPermission checks: user?.role === "admin"
actions/admin-*.tsAdmin panel CRUD operations
db/schema.tsUser table definition with role field

ACE found code that uses admin privileges, but not code that creates them. The keyword "admin" appeared 50+ times across permission checks, drowning out the actual creation logic.

Attempt 2: OCE (Open Context Engine)

Switched to OCE with a natural language query:

python scripts/search_context.py \
  --project "/path/to/nextjs-cms" \
  --query "I want to find where admin users are created or generated 
           during system initialization, how the first admin account 
           is set up"

Result: Direct hit on first query.

OCE returned src/app/api/auth/verify-email/route.ts with score 10/10:

// If this is the first user, promote to admin
const userCount = await db.select({ id: users.id }).from(users);
if (userCount.length === 1) {
  await db.update(users)
    .set({ role: "admin" })
    .where(eq(users.id, user.id));
  user.role = "admin";
}

Discovery: The project uses a "first registered user becomes admin" pattern, embedded in the email verification flow — not a seed script.

Technical Analysis: Why OCE Succeeded

1. Semantic Understanding vs Keyword Matching
AspectACE (Keyword-based)OCE (LLM-scored)
Query interpretationMatches "admin" literallyUnderstands "creation" vs "usage"
Result rankingFrequency-weightedSemantic relevance (0-10)
Noise filteringLimitedLLM rejects false positives
2. The Keyword Trap

ACE's keyword matching was polluted by high-frequency patterns:

// This pattern appears 47 times across 12 files
if (user?.role !== "admin") {
  return { success: false, error: "No permission" };
}

Every permission check contains "admin" + "user", triggering false positives.

3. OCE's Scoring Mechanism

OCE's LLM evaluator understood the semantic difference:

Code PatternACE RelevanceOCE ScoreReason
role !== "admin" (check)High (keyword match)2-3Usage, not creation
set({ role: "admin" }) (assign)Medium10Actual role assignment
userCount.length === 1 (condition)Low10First-user logic

Results Comparison

MetricACEOCE
Queries needed2 (incomplete)1
Files returned6 files1 file
Core logic foundNoYes
False positives~90%0%
Tokens consumed~4500~1200

Key Takeaways

  1. Ambiguous intent queries favor semantic search

    • "Find where X is created" requires understanding creation vs usage
    • Keyword matching cannot distinguish these semantics
  2. High-frequency patterns create noise

    • Common patterns (permission checks, logging) pollute keyword results
    • LLM scoring can identify and filter irrelevant matches
  3. Natural language queries outperform keyword lists

    • Bad: "admin creation. Keywords: admin, create, generate"
    • Good: "I want to find where admin users are created during initialization"
  4. Token efficiency correlates with precision

    • OCE returned 73% fewer tokens by excluding false positives
    • Less noise = faster comprehension = better responses

When to Use Each Tool

ScenarioRecommended
Known pattern lookup ("find all useState hooks")ACE
Ambiguous intent ("how does auth work")OCE
Cross-module tracingOCE + --deep
First-time codebase explorationOCE

— Claude, 2025-01-24
After mass scanning hundreds of files to find a 5-line needle in a haystack


Installation

  1. Clone the repository:

    git clone https://github.com/oldjs/open-context-engine-skill.git
    cd open-context-engine-skill
    
  2. Configure API Access: Create a config file at open-context-engine-skill/.config/open-context-engine/config.json:

    {
      "api_url": "https://api.openai.com/v1",
      "api_key": "your-api-key",
      "model": "gpt-oss-120b",
      "max_tokens": 8000
    }
    

Usage

Command Line Interface

Run a semantic search against any project:

python scripts/search_context.py \
  --project "/path/to/target/project" \
  --query "Find where the database connection is initialized and how retries are handled."

Integration with AI Tools (Claude Code)

This engine is designed to be used as a Skill. When an agent encounters a complex codebase query, it invokes search_context.py to retrieve the most relevant logic:

  1. [search-mode]: Exhaustive search across the codebase using parallel agents and AST-aware tools.
  2. [analyze-mode]: Deep context gathering and relationship mapping before suggesting architectural changes.

Architecture

The engine follows a strictly optimized pipeline:

  1. File Collector: Scans the project, applying Git rules and detecting binary files.
  2. Code Chunker: Splits files into logical units (Classes, Functions, or Blocks) while preserving metadata.
  3. Cache Manager: Handles SQLite interactions and content hashing to ensure zero-cost repeated queries.
  4. Context Ranker: Performs multi-threaded scoring using a thread-safe LLM client.
  5. Context Packer: Consolidates results into a single, structured JSON output within token limits.

Performance

Project SizeCold Search (Initial)Hot Search (Cached)
Small (<100 files)~20-40ms~15ms
Medium (~500 files)~80-120ms~35ms
Large (>1000 files)~1s+~35ms

Token Consumption: Addressing Developer Concerns

"Will this skill burn through my tokens?"

Short answer: No. Here's the real-world data and technical explanation.

Real-World Measurements

Tested on production codebases (200+ files each):

ProjectFilesCold Search (No Cache)Hot Search (Cached)
Flutter + Go full-stack200+~2000 input / ~50 output0 tokens
Next.js CMS200+~2000 input / ~50 output0 tokens
This project (OCE)~20~800 input / ~30 output0 tokens

Key insight: A cold search on a 200+ file project costs only ~2000 input tokens. That's roughly $0.0001 on GPT-4o-mini.

Why So Low? Signature Extraction + Two-Layer Caching

Optimization 1: Signature Extraction (NOT Full Code)

OCE does NOT send full code to the LLM for scoring. Instead, it extracts a compact signature:

# Original 150-line class
class UserService:
    """Handles user authentication and session management."""
    
    def __init__(self, db: Database, cache: Redis):
        self.db = db
        self.cache = cache
    
    def authenticate(self, username: str, password: str) -> User:
        # ... 50 lines of implementation
    
    def create_session(self, user: User) -> Session:
        # ... 40 lines of implementation
    
    # ... 50 more lines

# What LLM actually sees (extract_signature output):
# ─────────────────────────────────────────────────
# [0] src/services/user.py (class, L1-150)
# class UserService:
#     """Handles user authentication and session management."""
#     
#     def __init__(self, db: Database, cache: Redis):
#     def authenticate(self, username: str, password: str) -> User:
#     def create_session(self, user: User) -> Session:
#   ... (142 more lines)

Extraction rules by chunk type:

Chunk TypeLines SentWhat's Included
function/methodFirst 8 linesSignature + docstring
class/struct/interfaceUp to 12 key linesDeclaration + field definitions
exportFirst 10 linesExport statement + signature
blockFirst 8 linesOpening context

Token savings: A 150-line class becomes ~15 tokens for scoring. That's 90% reduction before even hitting the cache.

Optimization 2: Two-Layer Caching Architecture

OCE uses a chunk-level semantic cache, not a query-level cache. This is the key difference.

┌─────────────────────────────────────────────────────────────┐
                    Search Request                           
            "Find where admin users are created"             
└─────────────────────┬───────────────────────────────────────┘
                      
                      
┌─────────────────────────────────────────────────────────────┐
  Layer 1: File Chunk Cache (SQLite: file_cache)             
  ─────────────────────────────────────────────────────────  
  Key: file_path                                             
  Value: { hash: MD5(file_content), chunks: [...] }          
                                                             
  HIT: File unchanged  Skip re-parsing                      
  MISS: Re-chunk file  Update cache                         
└─────────────────────┬───────────────────────────────────────┘
                      
                      
┌─────────────────────────────────────────────────────────────┐
  Layer 2: Score Cache (SQLite: score_cache)                 
  ─────────────────────────────────────────────────────────  
  Key: (query_key, chunk_hash)                               
       query_key  = MD5(sorted(keywords))                    
       chunk_hash = MD5(code_content)                        
                                                             
  HIT: Same keywords + Same code  Return cached score       
  MISS: Call LLM for scoring  Update cache                  
└─────────────────────┬───────────────────────────────────────┘
                      
                      
┌─────────────────────────────────────────────────────────────┐
  Result: Only UNCACHED chunks trigger LLM calls             
└─────────────────────────────────────────────────────────────┘

Cache Key Design (Why Hit Rate Is High)

# Query key: Based on KEYWORDS, not exact query text
query_key = MD5(",".join(sorted(["admin", "create", "user"])))

# These queries produce the SAME query_key:
# - "Find where admin users are created"
# - "Show me user creation for admin accounts"  
# - "admin user create logic"

Result: Semantically similar queries share cache entries.

# Chunk hash: Based on CODE CONTENT
chunk_hash = MD5(code_block_content)

# Same code = Same hash, regardless of:
# - File path changes (moved files still hit cache)
# - Query variations (different queries, same code = hit)

Token Consumption Formula

Cold Search (First Time):
  tokens = num_chunks_to_score × avg_prompt_size
         ≈ 150 chunks × 15 tokens/chunk
         ≈ 2000-2500 tokens

Hot Search (Cache Hit):
  tokens = 0No LLM calls needed

Partial Cache (Some Files Changed):
  tokens = num_NEW_chunks × avg_prompt_size
         ≈ (only changed files) × 15 tokens/chunk

Practical Scenarios

ScenarioToken CostExplanation
Same query, same codebase0100% cache hit
Similar query (same keywords)0Keywords match → cache hit
Query after editing 1 file~50Only new chunks scored
Query after git pull (10 files changed)~300Only changed files re-scored
Completely new query topic~2000Full scoring, but cached for next time

Why Not Query-Level Caching?

Traditional approach: Cache (exact_query_string) → result

Problem: "Find admin creation" and "Where are admins created" are different strings but same intent.

OCE approach: Cache (keywords, code_hash) → score

Benefit:

  • Keyword normalization increases hit rate
  • Code-level granularity means partial updates are cheap
  • Similar queries benefit from each other's cache

Bottom Line

ConcernReality
"200 files = expensive"200 files ≈ 2000 tokens cold, 0 tokens hot
"Every search costs money"Only first search for each keyword set costs
"Cache invalidation issues"Content-hash based → automatic invalidation on change
"Memory overhead"SQLite file < 1MB for 10,000 chunks

The math: If you search 100 times/day on the same project with varied queries, you'll hit cache 90%+ of the time. Daily cost ≈ $0.001.


Benchmark: OCE Deep Mode vs Ace (2025-01-24)

A/B test comparing open-context-engine-skill Deep Mode (--deep) against Ace (Augment's Context Engine MCP) on the same codebase.

Test Queries

#QueryDifficulty
Q1How to modify the LLM scoring logic to support custom weights?Medium (single module)
Q2How does the cache system integrate with the scoring system?Medium (cross-module)
Q3How to add support for a new programming language (e.g., Elixir)?Easy (extension point)

Q1: LLM Scoring Logic

DimensionAceOCE Deep
Files Returned7 snippets (context_ranker, search_context, context_expander, README, config, cache_manager)5 blocks (context_ranker only)
Core Hitsrank_chunks, build_prompt, parse_scoresrank_chunks(9), parse_scores(8), build_prompt(8), quick_score(7)
NoiseIncludes context_expander, config.py, READMEZero noise
Tokens~40001827

Q2: Cache-Score Integration

DimensionAceOCE Deep
Files Returned5 complete file snippets2 blocks (2 files)
Core HitsFull CacheManager class, full rank_chunksrank_chunks(9), CacheManager(8)
Integration PointRequires reading large code blocksDirectly shows cache integration
Tokens~45002040

Q3: Add New Language Support

DimensionAceOCE Deep
Files Returned4 files (code_chunker complete, file_collector, SKILL, README)3 blocks (code_chunker only)
Core HitsLANGUAGE_PATTERNS, EXT_TO_LANGUAGE (buried in 400+ lines)LANGUAGE_PATTERNS(8), chunk_file(8), EXT_TO_LANGUAGE(6)
Extension PointsMust search through large files3 precise modification locations
Tokens~30001770

Overall Comparison

DimensionAceOCE DeepWinner
PrecisionB (broad coverage, manual filtering needed)A+ (surgical targeting)OCE Deep
Noise ControlC (includes docs, configs)A+ (zero noise)OCE Deep
Context CompletenessA (full call chains)B+ (core + smart expansion)Ace (slightly)
Token EfficiencyC (~3833 avg)A+ (~1879 avg)OCE Deep
LLM FriendlinessB (requires extensive reading)A+ (immediately actionable)OCE Deep

Token Efficiency

QueryAce (est.)OCE DeepSavings
Q1~4000182754%
Q2~4500204055%
Q3~3000177041%
Avg~38331879~51%

Accuracy Analysis

Deep mode achieves 100% accuracy across all test queries:

QueryCore Hit RateNoise RateVerdict
Q1: LLM Scoring100%0%All returned blocks are actual modification points
Q2: Cache Integration100%0%Directly shows CacheManager calls inside rank_chunks
Q3: New Language100%0%Pinpoints exact 3 locations to modify

Q1 Breakdown:

Returned BlockScoreIs Core?
rank_chunks()9Core - Main scoring entry point
parse_scores()8Core - Parses LLM response
build_prompt()8Core - Builds scoring prompt
quick_score()7Related - Pre-scoring logic

Q3 Breakdown:

Returned BlockScoreAction Required
LANGUAGE_PATTERNS8Add Elixir regex patterns
chunk_file()8Handle .ex extension
EXT_TO_LANGUAGE6Map .exelixir

Key Findings

Why Deep Mode Uses FEWER Tokens (Counter-intuitive!)

Deep mode is NOT "return more context" — it's "return more precise context".

The expansion logic is designed with intelligent restraint:

# Only expand when top chunks score >= 6
top_chunks = [c for c in chunks if c.get("score", 0) >= 6][:5]
# LLM decides if expansion is needed
expanded = expand_context(client, query, top_chunks, ...)

When the LLM analyzer determines "these core blocks are sufficient to answer the query", it returns an empty expansion list. This is correct behavior — smart restraint beats blind expansion.

OCE Deep Mode Advantages:

  • 51% Token Savings: Precision beats volume
  • Surgical Precision: Returns only the exact code blocks needed
  • Zero Noise: No README, config, or unrelated files in results
  • High Relevance Scores: Core functions consistently score 8-9
  • Smart Expansion: Expands only when genuinely needed, stays lean otherwise

Ace Advantages:

  • Complete file coverage helps when completely unfamiliar with project
  • Full call chains are safer for very large refactoring efforts

Recommendations

Use CaseRecommended Tool
Daily development queriesOCE Deep
Quick bug fixesOCE Deep
Extension point lookupOCE Deep
Cross-module integrationOCE Deep
Architecture deep-dive (new project)Ace
Massive refactoring (100+ files)Ace

Cross-Language Support: Flutter + Go Full-Stack Project

OCE provides seamless cross-language search capabilities. Here's a real-world benchmark on a Flutter + Go full-stack application (~200 files, Dart frontend + Go backend).

Test Project Structure

my_first_app/
├── lib/                    # Flutter Frontend (Dart)
│   ├── main.dart           # App entry point
│   ├── core/api_client.dart    # Dio HTTP client
│   ├── data/auth_manager.dart  # ChangeNotifier state
│   ├── services/*.dart     # API service layer
│   └── pages/*.dart        # UI components
└── server/                 # Go Backend
    ├── main.go             # HTTP server + routes
    ├── *_handler.go        # Request handlers
    ├── models/*.go         # GORM models
    └── utils/*.go          # Utilities

Test Queries & Results

QueryBlocksFilesTokensMax ScoreHighlights
Q1: App entry & initialization1110219Precise hit on main() + ShanhaiApp
Q2: State management patterns13814239Found all ChangeNotifier + setState
Q3: Network/API calls14718489Cross-language: Dart client + Go handlers

Q1: App Entry Point

python scripts/search_context.py \
  --project "/path/to/flutter_app" \
  --query "Find the main entry point and app initialization flow"

Result: Single block (1021 tokens) containing the complete initialization chain:

ComponentDescription
isDesktopPlatform detection
main()Window manager + ApiClient init
ShanhaiAppMaterialApp configuration
build()Theme + routing setup

Q2: State Management

Result: 13 blocks across 8 files, covering:

PatternFiles Found
ChangeNotifier singletonsauth_manager.dart, record_manager.dart
setState() usagelogin_page.dart, voice_feed_page.dart, etc.
Listener patterns_onAuthChanged(), _onRecordsChanged()

Q3: Network Requests (Cross-Language)

Result: 14 blocks from both Dart and Go code:

LanguageFilesKey Findings
Dart4ApiClient (Dio wrapper), user_service.dart, membership_service.dart
Go3GetRechargeOrdersHandler, ExchangeMembershipHandler, syncRechargeToBackend

This demonstrates OCE's ability to understand full-stack request flows — from Flutter frontend through Go backend.

Comparison with ACE

DimensionACEOCEWinner
Token Efficiency~3500 avg~1430 avgOCE (59% savings)
Cross-LanguageSeparate queries neededAutomaticOCE
GranularityFile-level snippetsBlock-levelOCE
NoiseIncludes configs, READMEsZero noiseOCE

Key Takeaways

  • Cross-language intelligence: Single query returns both Dart and Go code
  • Pattern recognition: Correctly identifies ChangeNotifier as Flutter's state management
  • Block-level precision: Returns specific functions, not entire files
  • High accuracy: All core blocks scored 8-9

Archived: Previous Benchmarks

OCE Standard Mode vs Ace (2025-01-24)

QueryAce (est.)OCE StandardSavings
Q1~4000207448%
Q2~4500362519%
Q3~30003105-3%
Avg~38332935~23%

Previous Results (Early Version)

QueryAceOCE (early)Savings
Q1~4000267333%
Q2~4500320729%
Q3~300094469%
Avg~38332275~40%