sarif-parsing

SARIF Parsing Best Practices

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "sarif-parsing" with this command: npx skills add 5dlabs/cto/5dlabs-cto-sarif-parsing

SARIF Parsing Best Practices

Parse, analyze, and process SARIF files from static analysis tools like CodeQL, Semgrep, and others.

When to Use

  • Reading or interpreting static analysis scan results in SARIF format

  • Aggregating findings from multiple security tools

  • Deduplicating or filtering security alerts

  • Extracting specific vulnerabilities from SARIF files

  • Integrating SARIF data into CI/CD pipelines

  • Converting SARIF output to other formats

SARIF Structure Overview

SARIF 2.1.0 is the current OASIS standard:

sarifLog ├── version: "2.1.0" └── runs[] (array of analysis runs) ├── tool │ ├── driver │ │ ├── name (required) │ │ ├── version │ │ └── rules[] (rule definitions) │ └── extensions[] (plugins) ├── results[] (findings) │ ├── ruleId │ ├── level (error/warning/note) │ ├── message.text │ ├── locations[] │ │ └── physicalLocation │ │ ├── artifactLocation.uri │ │ └── region (startLine, startColumn, etc.) │ ├── fingerprints{} │ └── partialFingerprints{} └── artifacts[] (scanned files metadata)

Why Fingerprinting Matters

Without stable fingerprints, you can't track findings across runs:

  • Baseline comparison: "Is this a new finding or did we see it before?"

  • Regression detection: "Did this PR introduce new vulnerabilities?"

  • Suppression: "Ignore this known false positive in future runs"

Tool Selection Guide

Use Case Tool Installation

Quick CLI queries jq brew install jq / apt install jq

Python scripting (simple) pysarif pip install pysarif

Python scripting (advanced) sarif-tools pip install sarif-tools

.NET applications SARIF SDK NuGet package

JavaScript/Node.js sarif-js npm package

Quick Analysis with jq

Pretty print the file

jq '.' results.sarif

Count total findings

jq '[.runs[].results[]] | length' results.sarif

List all rule IDs triggered

jq '[.runs[].results[].ruleId] | unique' results.sarif

Extract errors only

jq '.runs[].results[] | select(.level == "error")' results.sarif

Get findings with file locations

jq '.runs[].results[] | { rule: .ruleId, message: .message.text, file: .locations[0].physicalLocation.artifactLocation.uri, line: .locations[0].physicalLocation.region.startLine }' results.sarif

Filter by severity and get count per rule

jq '[.runs[].results[] | select(.level == "error")] | group_by(.ruleId) | map({rule: .[0].ruleId, count: length})' results.sarif

Python with sarif-tools

from sarif import loader

Load single file

sarif_data = loader.load_sarif_file("results.sarif")

Or load multiple files

sarif_set = loader.load_sarif_files(["tool1.sarif", "tool2.sarif"])

Get summary report

report = sarif_data.get_report()

Get histogram by severity

errors = report.get_issue_type_histogram_for_severity("error") warnings = report.get_issue_type_histogram_for_severity("warning")

Filter results

high_severity = [r for r in sarif_data.get_results() if r.get("level") == "error"]

sarif-tools CLI commands:

Summary of findings

sarif summary results.sarif

List all results with details

sarif ls results.sarif

Get results by severity

sarif ls --level error results.sarif

Diff two SARIF files (find new/fixed issues)

sarif diff baseline.sarif current.sarif

Convert to other formats

sarif csv results.sarif > results.csv sarif html results.sarif > report.html

Aggregating Multiple SARIF Files

import json from pathlib import Path

def aggregate_sarif_files(sarif_paths: list[str]) -> dict: """Combine multiple SARIF files into one.""" aggregated = { "version": "2.1.0", "$schema": "https://json.schemastore.org/sarif-2.1.0.json", "runs": [] }

for path in sarif_paths:
    with open(path) as f:
        sarif = json.load(f)
        aggregated["runs"].extend(sarif.get("runs", []))

return aggregated

def deduplicate_results(sarif: dict) -> dict: """Remove duplicate findings based on fingerprints.""" seen_fingerprints = set()

for run in sarif["runs"]:
    unique_results = []
    for result in run.get("results", []):
        # Use partialFingerprints or create key from location
        fp = None
        if result.get("partialFingerprints"):
            fp = tuple(sorted(result["partialFingerprints"].items()))
        elif result.get("fingerprints"):
            fp = tuple(sorted(result["fingerprints"].items()))
        else:
            # Fallback: create fingerprint from rule + location
            loc = result.get("locations", [{}])[0]
            phys = loc.get("physicalLocation", {})
            fp = (
                result.get("ruleId"),
                phys.get("artifactLocation", {}).get("uri"),
                phys.get("region", {}).get("startLine")
            )
        
        if fp not in seen_fingerprints:
            seen_fingerprints.add(fp)
            unique_results.append(result)
    
    run["results"] = unique_results

return sarif

CI/CD Integration

GitHub Actions

  • name: Upload SARIF uses: github/codeql-action/upload-sarif@v3 with: sarif_file: results.sarif

  • name: Check for high severity run: | HIGH_COUNT=$(jq '[.runs[].results[] | select(.level == "error")] | length' results.sarif) if [ "$HIGH_COUNT" -gt 0 ]; then echo "Found $HIGH_COUNT high severity issues" exit 1 fi

Fail on New Issues

from sarif import loader

def check_for_regressions(baseline: str, current: str) -> int: """Return count of new issues not in baseline.""" baseline_data = loader.load_sarif_file(baseline) current_data = loader.load_sarif_file(current)

baseline_fps = {get_fingerprint(r) for r in baseline_data.get_results()}
new_issues = [r for r in current_data.get_results() 
              if get_fingerprint(r) not in baseline_fps]

return len(new_issues)

Common Pitfalls and Solutions

Path Normalization Issues

from urllib.parse import unquote from pathlib import Path

def normalize_path(uri: str, base_path: str = "") -> str: """Normalize SARIF artifact URI to consistent path.""" # Remove file:// prefix if present if uri.startswith("file://"): uri = uri[7:]

# URL decode
uri = unquote(uri)

# Handle relative paths
if not Path(uri).is_absolute() and base_path:
    uri = str(Path(base_path) / uri)

return str(Path(uri))

Safe Data Access

def safe_get_location(result: dict) -> tuple[str, int]: """Safely extract file and line from result.""" try: loc = result.get("locations", [{}])[0] phys = loc.get("physicalLocation", {}) file_path = phys.get("artifactLocation", {}).get("uri", "unknown") line = phys.get("region", {}).get("startLine", 0) return file_path, line except (IndexError, KeyError, TypeError): return "unknown", 0

Key Principles

  • Validate first: Check SARIF structure before processing

  • Handle optionals: Many fields are optional; use defensive access

  • Normalize paths: Tools report paths differently; normalize early

  • Fingerprint wisely: Combine multiple strategies for stable deduplication

  • Stream large files: Use ijson or similar for 100MB+ files

Resources

  • OASIS SARIF 2.1.0 Specification

  • GitHub SARIF Support

  • SARIF Validator

Attribution

Based on trailofbits/skills sarif-parsing skill.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

expo-patterns

No summary provided by upstream source.

Repository SourceNeeds Review
General

elysia-llm-docs

No summary provided by upstream source.

Repository SourceNeeds Review
General

better-auth-expo

No summary provided by upstream source.

Repository SourceNeeds Review
General

context7

No summary provided by upstream source.

Repository SourceNeeds Review