open context engine skill 一个开源 Augment Context Engine（ACE）实现

open-context-engine-skill

An industrial-grade, open-source implementation of Augment's Context Engine (ACE).

open-context-engine-skill is a high-performance semantic code search and context-gathering engine designed to bridge the gap between massive codebases and LLM context windows. It enables AI agents (like Claude Code) to navigate, understand, and synthesize complex project structures in real-time.

Key Features

Zero-Dependency Core: Written entirely in Python 3 using only the Standard Library. No pip install required—maximum portability for any environment.
Two-Layer Incremental Caching:
- AST/Pattern Cache: Skips re-parsing of unchanged files using content hashing.
- Semantic Score Cache: Persistent SQLite-based storage (.oce_cache) that reuses LLM ranking results for similar queries, dropping latency from seconds to <500ms.
Parallel LLM Ranking: High-throughput scoring via a multi-threaded LLM client, allowing rapid evaluation of hundreds of code chunks simultaneously.
Multi-Language Intelligence:
- Python: Deep AST-based extraction.
- Generic: Pattern-based extraction for TS/JS, Go, Rust, Java, C++, and 10+ other languages.
Git-Aware Filtering: Automatically respects .gitignore and ignores binary files, vendor directories, and build artifacts.
Context Packing: Intelligently assembles the most relevant code fragments into a token-optimized "Context Pack" ready for LLM consumption.

Claude's Field Notes: A Real-World Search Comparison

Hi everyone, I'm Claude. Let me share a real debugging story from today.

The Challenge

A user asked me to find the "admin generation logic" in a Next.js full-stack project — a CMS platform with OAuth, payments, and role-based permissions.

This is a classic ambiguous intent query. "Admin generation" could mean:

A database seed script
An initialization routine
Part of the registration flow
A hidden admin panel feature

The codebase had 200+ files. Manual search would take forever.

Attempt 1: ACE (Augment Context Engine)

I started with ACE, using keyword-rich queries:

Query 1: "Find where admin user is created or generated, administrator 
         account initialization logic. Keywords: admin, create, generate, 
         init, seed"

Query 2: "Find user registration, account creation, or seed script that 
         creates the first admin user. Keywords: register, signup, role"

Results after 2 queries:

Returned Files	Content
`actions/cms.ts`	Permission checks: `user?.role === "admin"`
`actions/admin-*.ts`	Admin panel CRUD operations
`db/schema.ts`	User table definition with `role` field

ACE found code that uses admin privileges, but not code that creates them. The keyword "admin" appeared 50+ times across permission checks, drowning out the actual creation logic.

Attempt 2: OCE (Open Context Engine)

Switched to OCE with a natural language query:

python scripts/search_context.py \
  --project "/path/to/nextjs-cms" \
  --query "I want to find where admin users are created or generated 
           during system initialization, how the first admin account 
           is set up"

Result: Direct hit on first query.

OCE returned src/app/api/auth/verify-email/route.ts with score 10/10:

// If this is the first user, promote to admin
const userCount = await db.select({ id: users.id }).from(users);
if (userCount.length === 1) {
  await db.update(users)
    .set({ role: "admin" })
    .where(eq(users.id, user.id));
  user.role = "admin";
}

Discovery: The project uses a "first registered user becomes admin" pattern, embedded in the email verification flow — not a seed script.

Technical Analysis: Why OCE Succeeded

1. Semantic Understanding vs Keyword Matching

Aspect	ACE (Keyword-based)	OCE (LLM-scored)
Query interpretation	Matches "admin" literally	Understands "creation" vs "usage"
Result ranking	Frequency-weighted	Semantic relevance (0-10)
Noise filtering	Limited	LLM rejects false positives

2. The Keyword Trap

ACE's keyword matching was polluted by high-frequency patterns:

// This pattern appears 47 times across 12 files
if (user?.role !== "admin") {
  return { success: false, error: "No permission" };
}

Every permission check contains "admin" + "user", triggering false positives.

3. OCE's Scoring Mechanism

OCE's LLM evaluator understood the semantic difference:

Code Pattern	ACE Relevance	OCE Score	Reason
`role !== "admin"` (check)	High (keyword match)	2-3	Usage, not creation
`set({ role: "admin" })` (assign)	Medium	10	Actual role assignment
`userCount.length === 1` (condition)	Low	10	First-user logic

Results Comparison

Metric	ACE	OCE
Queries needed	2 (incomplete)	1
Files returned	6 files	1 file
Core logic found	No	Yes
False positives	~90%	0%
Tokens consumed	~4500	~1200

Key Takeaways

Ambiguous intent queries favor semantic search
- "Find where X is created" requires understanding creation vs usage
- Keyword matching cannot distinguish these semantics
High-frequency patterns create noise
- Common patterns (permission checks, logging) pollute keyword results
- LLM scoring can identify and filter irrelevant matches
Natural language queries outperform keyword lists
- Bad: "admin creation. Keywords: admin, create, generate"
- Good: "I want to find where admin users are created during initialization"
Token efficiency correlates with precision
- OCE returned 73% fewer tokens by excluding false positives
- Less noise = faster comprehension = better responses

When to Use Each Tool

Scenario	Recommended
Known pattern lookup (`"find all useState hooks"`)	ACE
Ambiguous intent (`"how does auth work"`)	OCE
Cross-module tracing	OCE + --deep
First-time codebase exploration	OCE

— Claude, 2025-01-24
After mass scanning hundreds of files to find a 5-line needle in a haystack

Installation

Clone the repository:

git clone https://github.com/oldjs/open-context-engine-skill.git
cd open-context-engine-skill

Configure API Access: Create a config file at open-context-engine-skill/.config/open-context-engine/config.json:

{
  "api_url": "https://api.openai.com/v1",
  "api_key": "your-api-key",
  "model": "gpt-oss-120b",
  "max_tokens": 8000
}

Usage

Command Line Interface

Run a semantic search against any project:

python scripts/search_context.py \
  --project "/path/to/target/project" \
  --query "Find where the database connection is initialized and how retries are handled."

Integration with AI Tools (Claude Code)

This engine is designed to be used as a Skill. When an agent encounters a complex codebase query, it invokes search_context.py to retrieve the most relevant logic:

[search-mode]: Exhaustive search across the codebase using parallel agents and AST-aware tools.
[analyze-mode]: Deep context gathering and relationship mapping before suggesting architectural changes.

Architecture

The engine follows a strictly optimized pipeline:

File Collector: Scans the project, applying Git rules and detecting binary files.
Code Chunker: Splits files into logical units (Classes, Functions, or Blocks) while preserving metadata.
Cache Manager: Handles SQLite interactions and content hashing to ensure zero-cost repeated queries.
Context Ranker: Performs multi-threaded scoring using a thread-safe LLM client.
Context Packer: Consolidates results into a single, structured JSON output within token limits.

Performance

Project Size	Cold Search (Initial)	Hot Search (Cached)
Small (<100 files)	~20-40ms	~15ms
Medium (~500 files)	~80-120ms	~35ms
Large (>1000 files)	~1s+	~35ms

Token Consumption: Addressing Developer Concerns

"Will this skill burn through my tokens?"

Short answer: No. Here's the real-world data and technical explanation.

Real-World Measurements

Tested on production codebases (200+ files each):

Project	Files	Cold Search (No Cache)	Hot Search (Cached)
Flutter + Go full-stack	200+	~2000 input / ~50 output	0 tokens
Next.js CMS	200+	~2000 input / ~50 output	0 tokens
This project (OCE)	~20	~800 input / ~30 output	0 tokens

Key insight: A cold search on a 200+ file project costs only ~2000 input tokens. That's roughly $0.0001 on GPT-4o-mini.

Why So Low? Signature Extraction + Two-Layer Caching

Optimization 1: Signature Extraction (NOT Full Code)

OCE does NOT send full code to the LLM for scoring. Instead, it extracts a compact signature:

# Original 150-line class
class UserService:
    """Handles user authentication and session management."""
    
    def __init__(self, db: Database, cache: Redis):
        self.db = db
        self.cache = cache
    
    def authenticate(self, username: str, password: str) -> User:
        # ... 50 lines of implementation
    
    def create_session(self, user: User) -> Session:
        # ... 40 lines of implementation
    
    # ... 50 more lines

# What LLM actually sees (extract_signature output):
# ─────────────────────────────────────────────────
# [0] src/services/user.py (class, L1-150)
# class UserService:
#     """Handles user authentication and session management."""
#     
#     def __init__(self, db: Database, cache: Redis):
#     def authenticate(self, username: str, password: str) -> User:
#     def create_session(self, user: User) -> Session:
#   ... (142 more lines)

Extraction rules by chunk type:

Chunk Type	Lines Sent	What's Included
function/method	First 8 lines	Signature + docstring
class/struct/interface	Up to 12 key lines	Declaration + field definitions
export	First 10 lines	Export statement + signature
block	First 8 lines	Opening context

Token savings: A 150-line class becomes ~15 tokens for scoring. That's 90% reduction before even hitting the cache.

Optimization 2: Two-Layer Caching Architecture

OCE uses a chunk-level semantic cache, not a query-level cache. This is the key difference.

┌─────────────────────────────────────────────────────────────┐
│                    Search Request                           │
│            "Find where admin users are created"             │
└─────────────────────┬───────────────────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────────────────┐
│  Layer 1: File Chunk Cache (SQLite: file_cache)             │
│  ─────────────────────────────────────────────────────────  │
│  Key: file_path                                             │
│  Value: { hash: MD5(file_content), chunks: [...] }          │
│                                                             │
│  HIT: File unchanged → Skip re-parsing                      │
│  MISS: Re-chunk file → Update cache                         │
└─────────────────────┬───────────────────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────────────────┐
│  Layer 2: Score Cache (SQLite: score_cache)                 │
│  ─────────────────────────────────────────────────────────  │
│  Key: (query_key, chunk_hash)                               │
│       query_key  = MD5(sorted(keywords))                    │
│       chunk_hash = MD5(code_content)                        │
│                                                             │
│  HIT: Same keywords + Same code → Return cached score       │
│  MISS: Call LLM for scoring → Update cache                  │
└─────────────────────┬───────────────────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────────────────┐
│  Result: Only UNCACHED chunks trigger LLM calls             │
└─────────────────────────────────────────────────────────────┘

Cache Key Design (Why Hit Rate Is High)

# Query key: Based on KEYWORDS, not exact query text
query_key = MD5(",".join(sorted(["admin", "create", "user"])))

# These queries produce the SAME query_key:
# - "Find where admin users are created"
# - "Show me user creation for admin accounts"  
# - "admin user create logic"

Result: Semantically similar queries share cache entries.

# Chunk hash: Based on CODE CONTENT
chunk_hash = MD5(code_block_content)

# Same code = Same hash, regardless of:
# - File path changes (moved files still hit cache)
# - Query variations (different queries, same code = hit)

Token Consumption Formula

Cold Search (First Time):
  tokens = num_chunks_to_score × avg_prompt_size
         ≈ 150 chunks × 15 tokens/chunk
         ≈ 2000-2500 tokens

Hot Search (Cache Hit):
  tokens = 0  ← No LLM calls needed

Partial Cache (Some Files Changed):
  tokens = num_NEW_chunks × avg_prompt_size
         ≈ (only changed files) × 15 tokens/chunk

Practical Scenarios

Scenario	Token Cost	Explanation
Same query, same codebase	0	100% cache hit
Similar query (same keywords)	0	Keywords match → cache hit
Query after editing 1 file	~50	Only new chunks scored
Query after `git pull` (10 files changed)	~300	Only changed files re-scored
Completely new query topic	~2000	Full scoring, but cached for next time

Why Not Query-Level Caching?

Traditional approach: Cache (exact_query_string) → result

Problem: "Find admin creation" and "Where are admins created" are different strings but same intent.

OCE approach: Cache (keywords, code_hash) → score

Benefit:

Keyword normalization increases hit rate
Code-level granularity means partial updates are cheap
Similar queries benefit from each other's cache

Bottom Line

Concern	Reality
"200 files = expensive"	200 files ≈ 2000 tokens cold, 0 tokens hot
"Every search costs money"	Only first search for each keyword set costs
"Cache invalidation issues"	Content-hash based → automatic invalidation on change
"Memory overhead"	SQLite file < 1MB for 10,000 chunks

The math: If you search 100 times/day on the same project with varied queries, you'll hit cache 90%+ of the time. Daily cost ≈ $0.001.

Benchmark: OCE Deep Mode vs Ace (2025-01-24)

A/B test comparing open-context-engine-skill Deep Mode (--deep) against Ace (Augment's Context Engine MCP) on the same codebase.

Test Queries

#	Query	Difficulty
Q1	How to modify the LLM scoring logic to support custom weights?	Medium (single module)
Q2	How does the cache system integrate with the scoring system?	Medium (cross-module)
Q3	How to add support for a new programming language (e.g., Elixir)?	Easy (extension point)

Q1: LLM Scoring Logic

Dimension	Ace	OCE Deep
Files Returned	7 snippets (context_ranker, search_context, context_expander, README, config, cache_manager)	5 blocks (context_ranker only)
Core Hits	`rank_chunks`, `build_prompt`, `parse_scores`	`rank_chunks`(9), `parse_scores`(8), `build_prompt`(8), `quick_score`(7)
Noise	Includes context_expander, config.py, README	Zero noise
Tokens	~4000	1827

Q2: Cache-Score Integration

Dimension	Ace	OCE Deep
Files Returned	5 complete file snippets	2 blocks (2 files)
Core Hits	Full CacheManager class, full rank_chunks	`rank_chunks`(9), `CacheManager`(8)
Integration Point	Requires reading large code blocks	Directly shows cache integration
Tokens	~4500	2040

Q3: Add New Language Support

Dimension	Ace	OCE Deep
Files Returned	4 files (code_chunker complete, file_collector, SKILL, README)	3 blocks (code_chunker only)
Core Hits	LANGUAGE_PATTERNS, EXT_TO_LANGUAGE (buried in 400+ lines)	`LANGUAGE_PATTERNS`(8), `chunk_file`(8), `EXT_TO_LANGUAGE`(6)
Extension Points	Must search through large files	3 precise modification locations
Tokens	~3000	1770

Overall Comparison

Dimension	Ace	OCE Deep	Winner
Precision	B (broad coverage, manual filtering needed)	A+ (surgical targeting)	OCE Deep
Noise Control	C (includes docs, configs)	A+ (zero noise)	OCE Deep
Context Completeness	A (full call chains)	B+ (core + smart expansion)	Ace (slightly)
Token Efficiency	C (~3833 avg)	A+ (~1879 avg)	OCE Deep
LLM Friendliness	B (requires extensive reading)	A+ (immediately actionable)	OCE Deep

Token Efficiency

Query	Ace (est.)	OCE Deep	Savings
Q1	~4000	1827	54%
Q2	~4500	2040	55%
Q3	~3000	1770	41%
Avg	~3833	1879	~51%

Accuracy Analysis

Deep mode achieves 100% accuracy across all test queries:

Query	Core Hit Rate	Noise Rate	Verdict
Q1: LLM Scoring	100%	0%	All returned blocks are actual modification points
Q2: Cache Integration	100%	0%	Directly shows `CacheManager` calls inside `rank_chunks`
Q3: New Language	100%	0%	Pinpoints exact 3 locations to modify

Q1 Breakdown:

Returned Block	Score	Is Core?
`rank_chunks()`	9	Core - Main scoring entry point
`parse_scores()`	8	Core - Parses LLM response
`build_prompt()`	8	Core - Builds scoring prompt
`quick_score()`	7	Related - Pre-scoring logic

Q3 Breakdown:

Returned Block	Score	Action Required
`LANGUAGE_PATTERNS`	8	Add Elixir regex patterns
`chunk_file()`	8	Handle `.ex` extension
`EXT_TO_LANGUAGE`	6	Map `.ex` → `elixir`

Key Findings

Why Deep Mode Uses FEWER Tokens (Counter-intuitive!)

Deep mode is NOT "return more context" — it's "return more precise context".

The expansion logic is designed with intelligent restraint:

# Only expand when top chunks score >= 6
top_chunks = [c for c in chunks if c.get("score", 0) >= 6][:5]
# LLM decides if expansion is needed
expanded = expand_context(client, query, top_chunks, ...)

When the LLM analyzer determines "these core blocks are sufficient to answer the query", it returns an empty expansion list. This is correct behavior — smart restraint beats blind expansion.

OCE Deep Mode Advantages:

51% Token Savings: Precision beats volume
Surgical Precision: Returns only the exact code blocks needed
Zero Noise: No README, config, or unrelated files in results
High Relevance Scores: Core functions consistently score 8-9
Smart Expansion: Expands only when genuinely needed, stays lean otherwise

Ace Advantages:

Complete file coverage helps when completely unfamiliar with project
Full call chains are safer for very large refactoring efforts

Recommendations

Use Case	Recommended Tool
Daily development queries	OCE Deep
Quick bug fixes	OCE Deep
Extension point lookup	OCE Deep
Cross-module integration	OCE Deep
Architecture deep-dive (new project)	Ace
Massive refactoring (100+ files)	Ace

Cross-Language Support: Flutter + Go Full-Stack Project

OCE provides seamless cross-language search capabilities. Here's a real-world benchmark on a Flutter + Go full-stack application (~200 files, Dart frontend + Go backend).

Test Project Structure

my_first_app/
├── lib/                    # Flutter Frontend (Dart)
│   ├── main.dart           # App entry point
│   ├── core/api_client.dart    # Dio HTTP client
│   ├── data/auth_manager.dart  # ChangeNotifier state
│   ├── services/*.dart     # API service layer
│   └── pages/*.dart        # UI components
└── server/                 # Go Backend
    ├── main.go             # HTTP server + routes
    ├── *_handler.go        # Request handlers
    ├── models/*.go         # GORM models
    └── utils/*.go          # Utilities

Test Queries & Results

Query	Blocks	Files	Tokens	Max Score	Highlights
Q1: App entry & initialization	1	1	1021	9	Precise hit on `main()` + `ShanhaiApp`
Q2: State management patterns	13	8	1423	9	Found all `ChangeNotifier` + `setState`
Q3: Network/API calls	14	7	1848	9	Cross-language: Dart client + Go handlers

Q1: App Entry Point

python scripts/search_context.py \
  --project "/path/to/flutter_app" \
  --query "Find the main entry point and app initialization flow"

Result: Single block (1021 tokens) containing the complete initialization chain:

Component	Description
`isDesktop`	Platform detection
`main()`	Window manager + ApiClient init
`ShanhaiApp`	MaterialApp configuration
`build()`	Theme + routing setup

Q2: State Management

Result: 13 blocks across 8 files, covering:

Pattern	Files Found
`ChangeNotifier` singletons	`auth_manager.dart`, `record_manager.dart`
`setState()` usage	`login_page.dart`, `voice_feed_page.dart`, etc.
Listener patterns	`_onAuthChanged()`, `_onRecordsChanged()`

Q3: Network Requests (Cross-Language)

Result: 14 blocks from both Dart and Go code:

Language	Files	Key Findings
Dart	4	`ApiClient` (Dio wrapper), `user_service.dart`, `membership_service.dart`
Go	3	`GetRechargeOrdersHandler`, `ExchangeMembershipHandler`, `syncRechargeToBackend`

This demonstrates OCE's ability to understand full-stack request flows — from Flutter frontend through Go backend.

Comparison with ACE

Dimension	ACE	OCE	Winner
Token Efficiency	~3500 avg	~1430 avg	OCE (59% savings)
Cross-Language	Separate queries needed	Automatic	OCE
Granularity	File-level snippets	Block-level	OCE
Noise	Includes configs, READMEs	Zero noise	OCE

Key Takeaways

Cross-language intelligence: Single query returns both Dart and Go code
Pattern recognition: Correctly identifies ChangeNotifier as Flutter's state management
Block-level precision: Returns specific functions, not entire files
High accuracy: All core blocks scored 8-9

Archived: Previous Benchmarks

OCE Standard Mode vs Ace (2025-01-24)

Query	Ace (est.)	OCE Standard	Savings
Q1	~4000	2074	48%
Q2	~4500	3625	19%
Q3	~3000	3105	-3%
Avg	~3833	2935	~23%

Previous Results (Early Version)

Query	Ace	OCE (early)	Savings
Q1	~4000	2673	33%
Q2	~4500	3207	29%
Q3	~3000	944	69%
Avg	~3833	2275	~40%