fix-errors

When fixing an unhandled error from the telemetry dashboard, the issue typically contains an error message, a stack trace, hit count, and affected user count.

Approach

Do NOT fix at the crash site

The error manifests at a specific line in the stack trace, but the fix almost never belongs there. Fixing at the crash site (e.g., adding a typeof guard in a revive() function, swallowing the error with a try/catch, or returning a fallback value) only masks the real problem. The invalid data still flows through the system and will cause failures elsewhere.

Trace the data flow upward through the call stack

Read each frame in the stack trace from bottom to top. For each frame, understand:

What data is being passed and what is expected
Where that data originated (IPC message, extension API call, storage, user input, etc.)
Whether the data could have been corrupted or malformed at that point

The goal is to find the producer of invalid data, not the consumer that crashes on it.

When the producer cannot be identified from the stack alone

Sometimes the stack trace only shows the receiving/consuming side (e.g., an IPC server handler). The sending side is in a different process and not in the stack. In this case:

Enrich the error message at the consuming site with diagnostic context: the type of the invalid data, a truncated representation of its value, and which operation/command received it. This information flows into the error telemetry dashboard automatically via the unhandled error pipeline.
Do NOT silently swallow the error — let it still throw so it remains visible in telemetry, but with enough context to identify the sender in the next telemetry cycle.
Consider adding the same enrichment to the low-level validation function that throws (e.g., include the invalid value in the error message) so the telemetry captures it regardless of call site.

When the producer IS identifiable

Fix the producer directly:

Validate or sanitize data before sending it over IPC / storing it / passing it to APIs
Ensure serialization/deserialization preserves types correctly (e.g., URI objects should serialize as UriComponents objects, not as strings)

Example

Given a stack trace like:

at _validateUri (uri.ts) ← validation throws at new Uri (uri.ts) ← constructor at URI.revive (uri.ts) ← revive assumes valid UriComponents at SomeChannel.call (ipc.ts) ← IPC handler receives arg from another process

Wrong fix: Add a typeof guard in URI.revive to return undefined for non-object input. This silences the error but the caller still expects a valid URI and will fail later.

Right fix (when producer is unknown): Enrich the error at the IPC handler level and in _validateUri itself to include the actual invalid value, so telemetry reveals what data is being sent and from where. Example:

// In the IPC handler — validate before revive function reviveUri(data: UriComponents | URI | undefined | null, context: string): URI { if (data && typeof data !== 'object') { throw new Error([Channel] Invalid URI data for '${context}': type=${typeof data}, value=${String(data).substring(0, 100)}); } // ... }

// In _validateUri — include the scheme value throw new Error([UriError]: Scheme contains illegal characters. scheme:"${ret.scheme.substring(0, 50)}" (len:${ret.scheme.length}));

Right fix (when producer is known): Fix the code that sends malformed data. For example, if an authentication provider passes a stringified URI instead of a UriComponents object to a logger creation call, fix that call site to pass the proper object.

Understanding error construction before fixing

Before proposing any fix, always find and read the code that constructs the error. Search the codebase for the error class name or a unique substring of the error message. The construction code reveals:

What conditions trigger the error — thresholds, validation checks, state assertions
What classifications or categories the error encodes — the error may have subtypes that require different fix strategies
What the error's parameters mean — numeric values, ratios, or flags embedded in the message often encode diagnostic context
Whether the error is actionable — some errors are threshold-based warnings where the threshold may be legitimately exceeded by design

Use this understanding to determine the correct fix strategy. The construction code is the source of truth — do NOT assume what the error means from its message alone.

Example: Listener leak errors

Searching for ListenerLeakError leads to src/vs/base/common/event.ts , where the construction code reveals:

const kind = topCount / listenerCount > 0.3 ? 'dominated' : 'popular'; const error = new ListenerLeakError(kind, message, topStack);

Reading this code tells you:

The error has two categories based on a ratio
Dominated (ratio > 30%): one code path accounts for most listeners → that code path is the problem, fix its disposal
Popular (ratio ≤ 30%): many diverse code paths each contribute a few listeners → the identified stack trace is NOT the root cause; it's just the most identical stack among many. Investigate the emitter and its aggregate subscribers instead
For popular leaks: do NOT remove caching/pooling/reuse patterns that appear in the top stack — they exist to solve other problems. If the aggregate count is by design (e.g., many menus subscribing to a shared context key service), close the issue as "not planned"

This analysis came from reading the construction code, not from memorized rules about listener leaks.

Guidelines

Prefer enriching error messages over adding try/catch guards
Truncate any user-controlled values included in error messages (to avoid PII and keep messages bounded)
Do not change the behavior of shared utility functions (like URI.revive ) in ways that affect all callers — fix at the specific call site or producer
Run the relevant unit tests after making changes
Check for compilation errors via the build task before declaring work complete

Safety Notice

Copy this and send it to your AI assistant to learn

Source Transparency

Related Skills

accessibility

azure-pipelines

sessions

agent-sessions-layout