April 6, 2026·14 min read

Building a Sub-200ms VS Code Extension

EngineeringPerformance

PRISM updates its analysis panel on every cursor movement. Not on save. Not on a keyboard shortcut. On every cursor movement. The full pipeline (identifying what class and method the cursor is in, resolving the inheritance hierarchy, computing the MRO, classifying method statuses) completes in under 200 milliseconds.

The constraint that shapes everything

200 milliseconds is the threshold for perceived instantaneity in UI interactions. Research from the Nielsen Norman Group and Google's RAIL model both identify ~200ms as the upper bound for a response to feel "immediate." Beyond that, users start to notice the delay. For a tool that fires on every cursor movement (potentially dozens of times per second when holding an arrow key), the analysis pipeline has to be fast enough that you never perceive it.

This constraint drove three architectural decisions: a persistent subprocess, aggressive AST caching, and debounced cursor events.

Decision 1: Persistent subprocess

PRISM's analysis engine is written in Python (to use Python's own ast module for parsing). The TypeScript extension communicates with it. Three options:

•HTTP server: Spawn a local HTTP server, send requests via fetch. Adds ~10-30ms per request for TCP setup, HTTP framing, and serialization on both sides.
•Per-request spawning: Spawn a new Python process for each cursor movement. Process startup (loading the interpreter, importing modules) takes 50-150ms. Already half the budget before any analysis starts.
•Persistent subprocess: Spawn one Python process on activation. Keep it alive. Send JSON lines to stdin, read JSON lines from stdout. No network overhead, no startup cost.

The persistent subprocess wins on overhead: one write to stdin, one read from stdout, both through an OS pipe. Measured cost: roughly 1-2ms per round trip for the IPC itself.

typescript

// extension.ts
const backend = spawn("python3", ["analyze.py"], {
  cwd: backendDir,
  stdio: ["pipe", "pipe", "pipe"],
});

// Send a request: one JSON line to stdin
function sendRequest(payload: object) {
  backend.stdin.write(JSON.stringify(payload) + "\n");
}

// Read responses: buffer until newline, parse JSON
let buffer = "";
backend.stdout.on("data", (chunk) => {
  buffer += chunk.toString();
  let idx;
  while ((idx = buffer.indexOf("\n")) !== -1) {
    const line = buffer.slice(0, idx);
    buffer = buffer.slice(idx + 1);
    handleResponse(JSON.parse(line));
  }
});

Decision 2: AST caching by mtime

Parsing a Python file into an AST takes 5-15ms for a typical file (a few hundred lines) and can reach 50ms+ for large files. Since the user is moving their cursor, not editing, the file hasn't changed. Parsing the same file again is wasted work.

PRISM maintains a module-level cache in the Python backend: a dictionary mapping file paths to (mtime, ast.Module) tuples. Before parsing, we check os.path.getmtime(). If it matches the cached mtime, we skip parsing entirely.

python

# analyze.py
_ast_cache: dict[str, tuple[float, ast.Module]] = {}

def get_ast(file_path: str) -> ast.Module:
    mtime = os.path.getmtime(file_path)
    cached = _ast_cache.get(file_path)
    if cached and cached[0] == mtime:
        return cached[1]
    with open(file_path) as f:
        tree = ast.parse(f.read(), filename=file_path)
    _ast_cache[file_path] = (mtime, tree)
    return tree

On the first request, every file in the hierarchy gets parsed (one-time cost). On subsequent cursor movements in the same file, parsing is skipped for all cached files. The cache is invalidated per-file when the extension's file watcher detects a save.

Decision 3: 80ms debounce

When a user holds an arrow key, VS Code fires onDidChangeTextEditorSelection many times per second. Sending a backend request for every event would flood the subprocess and waste cycles analyzing intermediate cursor positions the user has already moved past.

PRISM debounces cursor events with an 80ms delay. When a selection change fires, we start a timer. If another change fires before the timer completes, we reset it. Only when the cursor is still for 80ms do we send the request. This reduces backend load by 10-20x during rapid cursor movement while keeping perceived latency low: 80ms debounce + <120ms analysis = <200ms total.

The analysis pipeline: what happens in those 120ms

Once a request reaches the Python backend, four modules execute in sequence:

Request JSON (file, line, col, workspace)
        |
        v
  ast_parser.find_cursor_context()    ~2ms (cached) / ~10ms (cold)
  -> class_name, base_names, methods
        |
        v
  resolver.build_hierarchy()           ~5ms (cached) / ~40ms (cold)
  -> { ClassName: { file, line, bases, methods } }
        |
        v
  mro.compute_mro()                    ~1ms
  -> ["DeepSpeedEstimator", "LightningTrainer", "DLEstimatorBase"]
        |
        v
  shadow_detector.compute_statuses()   ~1ms
  -> [{ name, defined_in, status: owns|overrides|shadowed }]
        |
        v
  Response JSON -> stdout

The cold path (first request, nothing cached) takes 50-80ms total. The warm path (cursor moving within the same hierarchy) takes 4-8ms. The debounce absorbs the difference: by the time the 80ms debounce fires, the cache is almost certainly warm.

Workspace indexing: the resolver's secret weapon

The most expensive operation is resolver.build_hierarchy() on the cold path. It needs to find where each base class is defined, which means searching the workspace. Without an index, this requires scanning every .py file in the project.

PRISM builds a workspace index on first request: a mapping from class name to (file, line) for every class in the workspace. This index is cached by workspace root and invalidated incrementally when files are saved, created, or deleted. The extension's file watcher sends invalidation messages to the subprocess:

typescript

// extension.ts
const watcher = vscode.workspace.createFileSystemWatcher("**/*.py");
watcher.onDidChange((uri) => {
  backend.stdin.write(
    JSON.stringify({ type: "invalidate", file: uri.fsPath }) + "\n"
  );
});

On the backend, an invalidation message removes the specific file from the AST cache and workspace index, so the next request re-parses only that file, not the entire workspace.

Request IDs and stale response handling

Because requests and responses are asynchronous (the backend might take 50ms to respond), fast cursor movement can produce out-of-order responses. If the user moves from line 10 to line 50 quickly, the response for line 10 might arrive after the response for line 50.

PRISM solves this with monotonically increasing request IDs. Every request includes a request_id. The extension tracks the last sent ID. When a response arrives, it checks: is this response's ID equal to the last sent ID? If not, the response is stale and gets dropped. Only the most recent response is rendered.

Measuring and monitoring

Every response includes timing data. The extension logs it to the PRISM output channel:

text

// Typical output channel entries:
[PRISM] analysis: 6ms     (warm cache, simple hierarchy)
[PRISM] analysis: 47ms    (cold cache, 4-level hierarchy)
[PRISM] analysis: 183ms   (first request, workspace indexing)
[PRISM] SLOW: 312ms       (very large file, deep hierarchy)

The SLOW threshold (>200ms) helps identify performance regressions. In practice, the only time we see >200ms is on the very first request in a large workspace, when the index is being built. After that, requests stay consistently under 50ms.

The webview rendering budget

The 200ms budget includes rendering. After the TypeScript extension receives the JSON response, it posts a message to the webview panel. The webview's JavaScript handler reconstructs the DOM: signal bar, MRO chain cards, method pills. This rendering step takes 5-15ms for a typical response.

We optimized this by avoiding full DOM rebuilds. The webview maintains a reference to each element and patches it incrementally. If only the cursor method changed (same class, same hierarchy), only the method pill highlighting is updated. The MRO chain cards are not redrawn.

Lessons learned

•Set a performance budget before writing code. 200ms shaped every decision. Without that budget, we would have built something slower and tried to optimize later, which rarely works.
•Cache at every layer. AST cache, workspace index, hierarchy cache. Each layer reduces the typical-path cost by an order of magnitude.
•Debounce aggressively. 80ms feels instant to the user but prevents 90% of unnecessary backend requests.
•Use request IDs to handle concurrency. Async communication needs a mechanism to drop stale responses.
•Measure everything. The output channel timing lets us catch regressions before users report them.

If you're building a VS Code extension that needs real-time analysis, the persistent subprocess + cache + debounce pattern is a solid foundation. The per-request overhead is minimal, and the cache makes repeated queries nearly free.

← Previous

Why I Built PRISM

Try PRISM

See method resolution in real time.

Install for VS Code Try the Playground