bio-vcf

Toolkit for VCF/BCF variant file analysis: calculate statistics, filter variants, and export as JSON. Designed for WGS/WES sequencing result inspection and quality control.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "bio-vcf" with this command: npx skills add dakesan/cc-dnawork-plugin/dakesan-cc-dnawork-plugin-bio-vcf

VCF Toolkit

Toolkit for VCF/BCF variant file analysis: calculate statistics, filter variants, and export as JSON. Designed for WGS/WES sequencing result inspection and quality control.

Quick Start

Install

uv pip install pysam typer

Basic Usage

1. VCF 統計情報を取得

python scripts/vcf_stats.py --vcf variants.vcf.gz --chrom chr1

2. 高品質バリアントのみをフィルタして新しい VCF を作成

python scripts/filter_vcf.py
--vcf variants.vcf.gz
--output high_quality.vcf
--min-qual 30
--min-dp 10

3. フィルタされたバリアントを JSON で出力(≤100 エントリ)

python scripts/inspect_vcf.py
--vcf high_quality.vcf
--chrom chr1
--output chr1.json

Scripts

inspect_vcf.py - VCF Inspection & JSON Export

Extract variants from VCF files for specific chromosomes or regions and export as JSON format.

Required Arguments

  • --vcf PATH

  • Input VCF file path

  • --chrom TEXT or --region TEXT

  • Either one required

  • --chrom : Entire chromosome (e.g., chr1 )

  • --region : Specific region (e.g., chr1:1000000-2000000 )

Optional Arguments

Output:

  • --output PATH
  • JSON output path (default: stdout)

Filter Conditions:

  • --min-qual FLOAT

  • Minimum quality score (QUAL >= X)

  • --min-dp INT

  • Minimum depth (INFO/DP >= X)

  • --min-af FLOAT

  • Minimum allele frequency (INFO/AF >= X)

  • --max-af FLOAT

  • Maximum allele frequency (INFO/AF <= X)

  • --pass-only / --all-filters

  • PASS only (default) / Include all filters

Limits:

  • --max-variants INT

  • Maximum variant count (default: 100)

  • --force

  • Ignore entry limit (allows large JSON output)

Output Format (JSON)

{ "num_variants": 45, "samples": ["sample1", "sample2"], "variants": [ { "chrom": "chr1", "pos": 12345, "id": "rs123456", "ref": "A", "alts": ["G"], "qual": 100.0, "filter": ["PASS"], "info": { "DP": 50, "AF": [0.5], "AC": [25] }, "samples": { "sample1": {"GT": "0/1", "DP": 25, "GQ": 99}, "sample2": {"GT": "0/0", "DP": 25, "GQ": 99} } } ] }

vcf_stats.py - VCF Statistics

Calculate comprehensive statistics from VCF files and output as JSON. Includes variant counts, quality distributions, depth distributions, and allele frequency statistics.

Arguments

Required:

  • --vcf PATH
  • Input VCF file path

Optional:

  • --chrom TEXT

  • Chromosome specification (default: all chromosomes)

  • --region TEXT

  • Region specification (e.g., chr1:1000-2000 )

  • --output PATH

  • JSON output path (default: stdout)

Output Content (JSON)

  • total_variants

  • Total variant count

  • filter_counts

  • Breakdown by filter (PASS, LowQual, etc.)

  • variant_types

  • Breakdown by variant type (SNP, insertion, deletion)

  • chrom_counts

  • Variant count per chromosome

  • quality_stats

  • Quality score statistics (min, max, mean, median)

  • depth_stats

  • Depth statistics (INFO/DP)

  • allele_frequency_stats

  • Allele frequency statistics (INFO/AF)

Usage Examples

Calculate statistics for chr1

python scripts/vcf_stats.py --vcf variants.vcf.gz --chrom chr1

Calculate statistics for all chromosomes (output to JSON file)

python scripts/vcf_stats.py --vcf variants.vcf.gz --output stats.json

Calculate statistics for specific region

python scripts/vcf_stats.py --vcf variants.vcf.gz --region chr1:10000-20000

filter_vcf.py - VCF Filtering

Filter VCF files by quality, depth, and allele frequency criteria. Output filtered variants as a new VCF file.

Arguments

Required:

  • --vcf PATH

  • Input VCF file path

  • --output PATH

  • Output VCF file path

Optional:

  • --chrom TEXT

  • Chromosome specification

  • --region TEXT

  • Region specification (e.g., chr1:1000-2000 )

  • --min-qual FLOAT

  • Minimum quality score

  • --min-dp INT

  • Minimum depth (INFO/DP)

  • --min-af FLOAT

  • Minimum allele frequency (INFO/AF)

  • --max-af FLOAT

  • Maximum allele frequency (INFO/AF)

  • --pass-only

  • PASS variants only (default: False)

Usage Examples

Extract chr1 PASS variants only

python scripts/filter_vcf.py
--vcf variants.vcf.gz
--output chr1_pass.vcf
--chrom chr1
--pass-only

Extract high-quality variants (QUAL >= 30, DP >= 10)

python scripts/filter_vcf.py
--vcf variants.vcf.gz
--output high_quality.vcf
--min-qual 30
--min-dp 10

Extract rare variants (AF <= 0.01)

python scripts/filter_vcf.py
--vcf variants.vcf.gz
--output rare_variants.vcf
--max-af 0.01

Workflow Examples

Example 1: Comprehensive Variant Analysis Workflow

Combine all three scripts for complete VCF analysis:

Step 1: Calculate overall statistics

python scripts/vcf_stats.py --vcf variants.vcf.gz --chrom chr1 --output stats.json

Step 2: Filter high-quality variants to new VCF

python scripts/filter_vcf.py
--vcf variants.vcf.gz
--output high_quality.vcf
--chrom chr1
--min-qual 30
--min-dp 10
--pass-only

Step 3: Export filtered variants as JSON for downstream analysis

python scripts/inspect_vcf.py
--vcf high_quality.vcf
--chrom chr1
--output chr1_filtered.json

Example 2: Rare Variant Discovery

Identify and export rare variants from specific region:

Filter rare variants (AF <= 0.01)

python scripts/filter_vcf.py
--vcf variants.vcf.gz
--output rare.vcf
--region chr17:41196312-41277500
--max-af 0.01

Export as JSON for analysis

python scripts/inspect_vcf.py
--vcf rare.vcf
--region chr17:41196312-41277500
--output brca1_rare.json

Error Handling

Variant Count Exceeds Limit

$ python scripts/inspect_vcf.py --vcf huge.vcf --chrom chr1 --output out.json

Error: VCF contains 1,234+ variants after filtering (limit: 100).

Suggestions:

  • Apply more restrictive filters: --min-qual, --min-dp, --pass-only
  • Specify a genomic region: --region chr1:1000-2000
  • Override limit with --force (warning: may produce very large JSON)
  • Use bcftools directly for large-scale processing

Current filter conditions: --chrom chr1 --pass-only

Solutions:

  • Apply more restrictive filters: --min-qual 30 , --min-dp 10

  • Narrow down the region: --region chr1:1000000-1100000

  • Override limit with --force (use cautiously)

Missing Chromosome/Region Specification

$ python scripts/inspect_vcf.py --vcf variants.vcf --output out.json

Error: Either --chrom or --region must be specified.

Solutions:

  • Add --chrom chr1 or --region chr1:1000-2000 to the command

Best Practices

  1. Always Specify Chromosome or Region

Always specify chromosome or region when using inspect_vcf.py to avoid processing entire VCF files inefficiently.

❌ Bad: No chromosome specified

python scripts/inspect_vcf.py --vcf variants.vcf

✅ Good: Chromosome specified

python scripts/inspect_vcf.py --vcf variants.vcf --chrom chr1

  1. Apply Additional Filters for Efficiency

Combine quality and depth filters with default PASS-only filtering for better results.

✅ Good: Multiple filters applied

python scripts/inspect_vcf.py
--vcf variants.vcf
--chrom chr1
--min-qual 30
--min-dp 10

  1. Respect 100-Entry Limit for JSON Export

Use inspect_vcf.py for small datasets only. Pre-filter large VCF files with filter_vcf.py or bcftools before JSON export.

Pre-filter large datasets with bcftools

bcftools view -i 'QUAL>=30 && DP>=10' -r chr1:1000000-2000000 variants.vcf > filtered.vcf

Then export to JSON

python scripts/inspect_vcf.py --vcf filtered.vcf --chrom chr1 --output filtered.json

  1. Use --force Cautiously

Use --force only when necessary. JSON files with thousands of entries can become several MB to tens of MB in size.

When to Use vcf-toolkit vs bcftools

Task vcf-toolkit bcftools

Small dataset JSON export ✅ inspect_vcf.py

Large-scale filtering filter_vcf.py ✅ bcftools view

Complex filter expressions

✅ bcftools

VCF-to-VCF conversion filter_vcf.py ✅ bcftools

Variant statistics ✅ vcf_stats.py ✅ bcftools stats

Recommended Workflow:

  • Pre-filter large datasets with bcftools or filter_vcf.py

  • Export filtered results to JSON with inspect_vcf.py for detailed inspection

  • Perform downstream analysis in Python/R using JSON output

Related Skills

  • pysam - BAM/CRAM alignment file operations

  • sequence-io - FASTA/FASTQ sequence file operations

  • blast-search - BLAST homology search

  • blat-api-searching - BLAT genome mapping

Troubleshooting

VCF File Too Large

Specify a narrower region or pre-filter with bcftools before JSON export.

Specify narrower region

python scripts/inspect_vcf.py --vcf variants.vcf --region chr1:1000000-1100000

Pre-filter with bcftools

bcftools view -i 'QUAL>=50' variants.vcf | python scripts/inspect_vcf.py --vcf - --chrom chr1

Index Error

Create tabix index for compressed VCF files.

Compress with bgzip

bgzip variants.vcf

Create tabix index

tabix -p vcf variants.vcf.gz

Use indexed VCF

python scripts/inspect_vcf.py --vcf variants.vcf.gz --chrom chr1

Include Non-PASS Variants

Use --all-filters flag to include all variants regardless of FILTER field.

python scripts/inspect_vcf.py --vcf variants.vcf --chrom chr1 --all-filters

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

bio-cosmic

No summary provided by upstream source.

Repository SourceNeeds Review
General

bio-fasta

No summary provided by upstream source.

Repository SourceNeeds Review
General

bio-bam

No summary provided by upstream source.

Repository SourceNeeds Review