Clair3 Variant Calling
Basic Usage
ONT variant calling
run_clair3.sh
--bam_fn=sample.bam
--ref_fn=reference.fasta
--threads=32
--platform=ont
--model_path=${CONDA_PREFIX}/bin/models/ont
--output=clair3_output
PacBio HiFi variant calling
run_clair3.sh
--bam_fn=sample.bam
--ref_fn=reference.fasta
--threads=32
--platform=hifi
--model_path=${CONDA_PREFIX}/bin/models/hifi
--output=clair3_output
Output: clair3_output/merge_output.vcf.gz
Platform-Specific Models
Platform Model Recommended Coverage
ONT R10 r1041_e82_400bps_sup_v430 30-60x
ONT R9 r941_prom_sup_g5014 30-60x
PacBio HiFi hifi 20-40x
PacBio CLR
Use PEPPER-Margin-DeepVariant
List available models
ls ${CONDA_PREFIX}/bin/models/
Specify exact model
run_clair3.sh
--bam_fn=sample.bam
--ref_fn=reference.fasta
--model_path=${CONDA_PREFIX}/bin/models/r1041_e82_400bps_sup_v430
--output=clair3_out
--threads=32
Key Parameters
Parameter Description
--platform ont, hifi, or ilmn
--model_path Path to trained model
--bed_fn Restrict calling to regions
--include_all_ctgs Call on all contigs (not just chr1-22,X,Y)
--no_phasing_for_fa Disable phasing
--gvcf Output gVCF format
--qual Minimum variant quality (default: 2)
Region-Specific Calling
Call variants in specific regions
run_clair3.sh
--bam_fn=sample.bam
--ref_fn=reference.fasta
--bed_fn=target_regions.bed
--threads=32
--platform=ont
--model_path=${CONDA_PREFIX}/bin/models/ont
--output=clair3_targeted
Call on non-human genomes (all contigs)
run_clair3.sh
--bam_fn=sample.bam
--ref_fn=reference.fasta
--include_all_ctgs
--threads=32
--platform=hifi
--model_path=${CONDA_PREFIX}/bin/models/hifi
--output=clair3_all_contigs
gVCF Output
Generate gVCF for joint calling
run_clair3.sh
--bam_fn=sample.bam
--ref_fn=reference.fasta
--gvcf
--threads=32
--platform=ont
--model_path=${CONDA_PREFIX}/bin/models/ont
--output=clair3_gvcf
Joint genotyping multiple samples
bcftools merge sample1.g.vcf.gz sample2.g.vcf.gz -o cohort.vcf.gz
Phased Variant Calling
With phasing information (requires haplotagged BAM)
run_clair3.sh
--bam_fn=haplotagged.bam
--ref_fn=reference.fasta
--enable_phasing
--longphase_for_phasing
--threads=32
--platform=ont
--model_path=${CONDA_PREFIX}/bin/models/ont
--output=clair3_phased
Quality Filtering
Filter by quality score
bcftools view -i 'QUAL>20' clair3_output/merge_output.vcf.gz -Oz -o filtered.vcf.gz
Filter by genotype quality
bcftools view -i 'GQ>30' clair3_output/merge_output.vcf.gz -Oz -o high_gq.vcf.gz
SNPs only
bcftools view -v snps clair3_output/merge_output.vcf.gz -Oz -o snps.vcf.gz
Indels only
bcftools view -v indels clair3_output/merge_output.vcf.gz -Oz -o indels.vcf.gz
Python Wrapper
import subprocess from pathlib import Path
def run_clair3(bam, reference, output_dir, platform='ont', model_path=None, threads=32, bed=None, gvcf=False, include_all_ctgs=False): if model_path is None: import os conda_prefix = os.environ.get('CONDA_PREFIX', '') model_path = f'{conda_prefix}/bin/models/{platform}'
cmd = [
'run_clair3.sh',
f'--bam_fn={bam}',
f'--ref_fn={reference}',
f'--threads={threads}',
f'--platform={platform}',
f'--model_path={model_path}',
f'--output={output_dir}'
]
if bed:
cmd.append(f'--bed_fn={bed}')
if gvcf:
cmd.append('--gvcf')
if include_all_ctgs:
cmd.append('--include_all_ctgs')
subprocess.run(cmd, check=True)
return Path(output_dir) / 'merge_output.vcf.gz'
def filter_variants(vcf, output, min_qual=20, variant_type=None): cmd = ['bcftools', 'view', '-i', f'QUAL>{min_qual}'] if variant_type: cmd.extend(['-v', variant_type]) cmd.extend([vcf, '-Oz', '-o', output]) subprocess.run(cmd, check=True) subprocess.run(['bcftools', 'index', '-t', output], check=True) return output
Example
vcf = run_clair3('sample.bam', 'ref.fa', 'clair3_out', platform='hifi', threads=48) snps = filter_variants(str(vcf), 'snps_q20.vcf.gz', min_qual=20, variant_type='snps')
Comparison with Other Callers
Caller Best For Speed Accuracy
Clair3 ONT/HiFi germline Fast High
DeepVariant HiFi, Illumina Medium Very high
PEPPER-DV ONT (integrated) Slow Very high
Longshot ONT SNPs Fast Good
Troubleshooting
Issue Solution
Missing model Download from Clair3 releases or use conda models
Low call rate Check coverage; increase --qual threshold
Slow performance Reduce --threads or use --bed_fn for targeted calling
Wrong variants on non-human Use --include_all_ctgs
Docker Usage
Using Docker
docker run -v /data:/data
hkubal/clair3:latest
/opt/bin/run_clair3.sh
--bam_fn=/data/sample.bam
--ref_fn=/data/reference.fasta
--threads=32
--platform=ont
--model_path=/opt/models/ont
--output=/data/clair3_output
Singularity
singularity exec clair3.sif run_clair3.sh
--bam_fn=sample.bam
--ref_fn=reference.fasta
--threads=32
--platform=ont
--model_path=/opt/models/ont
--output=clair3_output
Related Skills
-
variant-calling/bcftools-basics - VCF manipulation
-
variant-calling/filtering-best-practices - Quality filtering
-
long-read-sequencing/long-read-qc - Input quality control
-
long-read-sequencing/long-read-alignment - Mapping with minimap2