parser-development

Use this skill when creating or modifying Biome's parsers. Covers grammar authoring with ungrammar, lexer implementation, error recovery strategies, and list parsing patterns.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "parser-development" with this command: npx skills add biomejs/biome/biomejs-biome-parser-development

Purpose

Use this skill when creating or modifying Biome's parsers. Covers grammar authoring with ungrammar, lexer implementation, error recovery strategies, and list parsing patterns.

Prerequisites

  • Install required tools: just install-tools

  • Understand the language syntax you're implementing

  • Read crates/biome_parser/CONTRIBUTING.md for detailed concepts

Common Workflows

Create Grammar for New Language

Create a .ungram file in xtask/codegen/ (e.g., html.ungram ):

// html.ungram // Legend: // Name = -- non-terminal definition // 'ident' -- token (terminal) // A B -- sequence // A | B -- alternation // A* -- zero or more repetition // (A (',' A)* ','?) -- repetition with separator and optional trailing comma // A? -- zero or one repetition // label:A -- suggested name for field

HtmlRoot = element*

HtmlElement = '<' tag_name: HtmlName attributes: HtmlAttributeList '>' children: HtmlElementList '<' '/' close_tag_name: HtmlName '>'

HtmlAttributeList = HtmlAttribute*

HtmlAttribute = | HtmlSimpleAttribute | HtmlBogusAttribute

HtmlSimpleAttribute = name: HtmlName '=' value: HtmlString

HtmlBogusAttribute = /* error recovery node */

Naming conventions:

  • Prefix all nodes with language name: HtmlElement , CssRule

  • Unions start with Any : AnyHtmlAttribute

  • Error recovery nodes use Bogus : HtmlBogusAttribute

  • Lists end with List : HtmlAttributeList

  • Lists are mandatory (never optional), empty by default

Generate Parser from Grammar

Generate for specific language

just gen-grammar html

Generate for multiple languages

just gen-grammar html css

Generate all grammars

just gen-grammar

This creates:

  • biome_html_syntax/src/generated/

  • Node definitions

  • biome_html_factory/src/generated/

  • Node construction helpers

  • Parser skeleton files (you'll implement the actual parsing logic)

Implement a Lexer

Create lexer/mod.rs in your parser crate:

use biome_html_syntax::HtmlSyntaxKind; use biome_parser::{lexer::Lexer, ParseDiagnostic};

pub(crate) struct HtmlLexer<'source> { source: &'source str, position: usize, current_kind: HtmlSyntaxKind, diagnostics: Vec<ParseDiagnostic>, }

impl<'source> Lexer<'source> for HtmlLexer<'source> { const NEWLINE: Self::Kind = HtmlSyntaxKind::NEWLINE; const WHITESPACE: Self::Kind = HtmlSyntaxKind::WHITESPACE;

type Kind = HtmlSyntaxKind;
type LexContext = ();
type ReLexContext = ();

fn source(&#x26;self) -> &#x26;'source str {
    self.source
}

fn current(&#x26;self) -> Self::Kind {
    self.current_kind
}

fn position(&#x26;self) -> usize {
    self.position
}

fn advance(&#x26;mut self, context: Self::LexContext) -> Self::Kind {
    // Implement token scanning logic
    let start = self.position;
    let kind = self.read_next_token();
    self.current_kind = kind;
    kind
}

// Implement other required methods...

}

Implement Token Source

use biome_parser::lexer::BufferedLexer; use biome_html_syntax::HtmlSyntaxKind; use crate::lexer::HtmlLexer;

pub(crate) struct HtmlTokenSource<'src> { lexer: BufferedLexer<HtmlSyntaxKind, HtmlLexer<'src>>, }

impl<'source> TokenSourceWithBufferedLexer<HtmlLexer<'source>> for HtmlTokenSource<'source> { fn lexer(&mut self) -> &mut BufferedLexer<HtmlSyntaxKind, HtmlLexer<'source>> { &mut self.lexer } }

Write Parse Rules

Example: Parsing an if statement:

use biome_parser::prelude::; use biome_js_syntax::JsSyntaxKind::;

fn parse_if_statement(p: &mut JsParser) -> ParsedSyntax { // Presence test - return Absent if not at 'if' if !p.at(T![if]) { return Absent; }

let m = p.start();

// Parse required tokens
p.expect(T![if]);
p.expect(T!['(']);

// Parse required nodes with error recovery
parse_any_expression(p).or_add_diagnostic(p, expected_expression);

p.expect(T![')']);
parse_block_statement(p).or_add_diagnostic(p, expected_block);

// Parse optional else clause
if p.at(T![else]) {
    parse_else_clause(p).ok();
}

Present(m.complete(p, JS_IF_STATEMENT))

}

Parse Lists with Error Recovery

Use ParseSeparatedList for comma-separated lists:

struct ArrayElementsList;

impl ParseSeparatedList for ArrayElementsList { type ParsedElement = CompletedMarker;

fn parse_element(&#x26;mut self, p: &#x26;mut Parser) -> ParsedSyntax&#x3C;Self::ParsedElement> {
    parse_array_element(p)
}

fn is_at_list_end(&#x26;self, p: &#x26;mut Parser) -> bool {
    // Stop at array closing bracket or file end
    p.at(T![']']) || p.at(EOF)
}

fn recover(
    &#x26;mut self,
    p: &#x26;mut Parser,
    parsed_element: ParsedSyntax&#x3C;Self::ParsedElement>,
) -> RecoveryResult {
    parsed_element.or_recover(
        p,
        &#x26;ParseRecoveryTokenSet::new(
            JS_BOGUS_EXPRESSION,
            token_set![T![']'], T![,]]
        ),
        expected_array_element,
    )
}

fn separating_element_kind(&#x26;mut self) -> JsSyntaxKind {
    T![,]
}

}

// Use the list parser fn parse_array_elements(p: &mut Parser) -> CompletedMarker { let m = p.start(); ArrayElementsList.parse_list(p); m.complete(p, JS_ARRAY_ELEMENT_LIST) }

Implement Error Recovery

Error recovery wraps invalid tokens in BOGUS nodes:

// Recovery set includes: // - List terminator tokens (e.g., ']', '}') // - Statement terminators (e.g., ';') // - List separators (e.g., ',') let recovery_set = token_set![T![']'], T![,], T![;]];

parsed_element.or_recover( p, &ParseRecoveryTokenSet::new(JS_BOGUS_EXPRESSION, recovery_set), expected_expression_error, )

Handle Conditional Syntax

For syntax only valid in certain contexts (e.g., strict mode):

fn parse_with_statement(p: &mut Parser) -> ParsedSyntax { if !p.at(T![with]) { return Absent; }

let m = p.start();
p.bump(T![with]);
parenthesized_expression(p).or_add_diagnostic(p, expected_expression);
parse_statement(p).or_add_diagnostic(p, expected_statement);

let with_stmt = m.complete(p, JS_WITH_STATEMENT);

// Mark as invalid in strict mode
let conditional = StrictMode.excluding_syntax(p, with_stmt, |p, marker| {
    p.err_builder(
        "`with` statements are not allowed in strict mode",
        marker.range(p)
    )
});

Present(conditional.or_invalid_to_bogus(p))

}

Test Parser

Create test files in tests/ :

crates/biome_html_parser/tests/ ├── html_specs/ │ ├── ok/ │ │ ├── simple_element.html │ │ └── nested_elements.html │ └── error/ │ ├── unclosed_tag.html │ └── invalid_syntax.html └── html_test.rs

Run tests:

cd crates/biome_html_parser cargo test

Tips

  • Presence test: Always return Absent if the first token doesn't match - never progress parsing before returning Absent

  • Required vs optional: Use p.expect() for required tokens, p.eat() for optional ones

  • Missing markers: Use .or_add_diagnostic() for required nodes to add missing markers and errors

  • Error recovery: Include list terminators, separators, and statement boundaries in recovery sets

  • Bogus nodes: Check grammar for which BOGUS_* node types are valid in your context

  • Checkpoints: Use p.checkpoint() to save state and p.rewind() if parsing fails

  • Lookahead: Use p.at() to check tokens, p.nth_at() for lookahead beyond current token

  • Lists are mandatory: Always create list nodes even if empty - use parse_list() not parse_list().ok()

Common Patterns

// Optional token if p.eat(T![async]) { // handle async }

// Required token with error p.expect(T!['{']);

// Optional node parse_type_annotation(p).ok();

// Required node with error parse_expression(p).or_add_diagnostic(p, expected_expression);

// Lookahead if p.at(T![if]) || p.at(T![for]) { // handle control flow }

// Checkpoint for backtracking let checkpoint = p.checkpoint(); if parse_something(p).is_absent() { p.rewind(checkpoint); parse_something_else(p); }

References

  • Full guide: crates/biome_parser/CONTRIBUTING.md

  • Grammar examples: xtask/codegen/*.ungram

  • Parser examples: crates/biome_js_parser/src/syntax/

  • Error recovery: Search for ParseRecoveryTokenSet in existing parsers

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

biome-developer

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

testing-codegen

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

formatter-development

No summary provided by upstream source.

Repository SourceNeeds Review