bio-copy-number-gatk-cnv

Somatic CNV Workflow Overview

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "bio-copy-number-gatk-cnv" with this command: npx skills add gptomics/bioskills/gptomics-bioskills-bio-copy-number-gatk-cnv

GATK CNV Workflow

Somatic CNV Workflow Overview

  1. PreprocessIntervals → intervals.interval_list
  2. CollectReadCounts → sample.counts.hdf5
  3. CreateReadCountPanelOfNormals → pon.hdf5
  4. DenoiseReadCounts → sample.denoised.tsv
  5. CollectAllelicCounts → sample.allelicCounts.tsv
  6. ModelSegments → sample.modelFinal.seg
  7. CallCopyRatioSegments → sample.called.seg

Step 1: Preprocess Intervals

For WES/targeted

gatk PreprocessIntervals
-R reference.fa
-L targets.interval_list
--bin-length 0
--interval-merging-rule OVERLAPPING_ONLY
-O preprocessed.interval_list

For WGS

gatk PreprocessIntervals
-R reference.fa
--bin-length 1000
--padding 0
-O wgs.interval_list

Step 2: Collect Read Counts

For each sample

gatk CollectReadCounts
-R reference.fa
-I sample.bam
-L preprocessed.interval_list
--interval-merging-rule OVERLAPPING_ONLY
-O sample.counts.hdf5

Step 3: Create Panel of Normals

Combine multiple normal samples

gatk CreateReadCountPanelOfNormals
-I normal1.counts.hdf5
-I normal2.counts.hdf5
-I normal3.counts.hdf5
--minimum-interval-median-percentile 5.0
-O cnv_pon.hdf5

Step 4: Denoise Read Counts

Using panel of normals

gatk DenoiseReadCounts
-I tumor.counts.hdf5
--count-panel-of-normals cnv_pon.hdf5
--standardized-copy-ratios tumor.standardized.tsv
--denoised-copy-ratios tumor.denoised.tsv

Step 5: Collect Allelic Counts

From known SNP sites (for LOH detection)

gatk CollectAllelicCounts
-R reference.fa
-I tumor.bam
-L common_snps.vcf
-O tumor.allelicCounts.tsv

Step 6: Model Segments

Somatic with matched normal allelic counts

gatk ModelSegments
--denoised-copy-ratios tumor.denoised.tsv
--allelic-counts tumor.allelicCounts.tsv
--normal-allelic-counts normal.allelicCounts.tsv
--output-prefix tumor
-O results/

Output files: tumor.cr.seg, tumor.modelFinal.seg, tumor.hets.tsv

Step 7: Call Copy Ratio Segments

gatk CallCopyRatioSegments
-I results/tumor.cr.seg
-O results/tumor.called.seg

Plotting

Plot copy ratios and segments

gatk PlotDenoisedCopyRatios
--standardized-copy-ratios tumor.standardized.tsv
--denoised-copy-ratios tumor.denoised.tsv
--sequence-dictionary reference.dict
--minimum-contig-length 46709983
--output-prefix tumor
-O plots/

Plot segments with allelic information

gatk PlotModeledSegments
--denoised-copy-ratios tumor.denoised.tsv
--allelic-counts results/tumor.hets.tsv
--segments results/tumor.modelFinal.seg
--sequence-dictionary reference.dict
--minimum-contig-length 46709983
--output-prefix tumor
-O plots/

Germline CNV Workflow

For germline: use cohort mode

1. Collect counts (same as above)

2. Determine contig ploidy

gatk DetermineGermlineContigPloidy
-I sample1.counts.hdf5
-I sample2.counts.hdf5
--model cohort_ploidy_model
--contig-ploidy-priors ploidy_priors.tsv
-O ploidy-calls/

3. Call germline CNVs

gatk GermlineCNVCaller
--run-mode COHORT
-I sample1.counts.hdf5
-I sample2.counts.hdf5
--contig-ploidy-calls ploidy-calls/ploidy_calls
--annotated-intervals annotated_intervals.tsv
--output-prefix cohort
-O germline_cnv_calls/

4. Post-process calls per sample

gatk PostprocessGermlineCNVCalls
--calls-shard-path germline_cnv_calls/cohort-calls
--model-shard-path germline_cnv_calls/cohort-model
--sample-index 0
--contig-ploidy-calls ploidy-calls/ploidy_calls
--sequence-dictionary reference.dict
--output-genotyped-intervals sample1.genotyped.tsv
--output-denoised-copy-ratios sample1.denoised.tsv
-O sample1_segments.vcf

Complete Somatic Pipeline Script

#!/bin/bash REFERENCE=reference.fa INTERVALS=targets.interval_list PON=cnv_pon.hdf5 SNP_SITES=common_snps.vcf TUMOR=$1 NORMAL=$2 OUTDIR=$3

mkdir -p $OUTDIR

Collect read counts

gatk CollectReadCounts -R $REFERENCE -I $TUMOR -L $INTERVALS
-O $OUTDIR/tumor.counts.hdf5 gatk CollectReadCounts -R $REFERENCE -I $NORMAL -L $INTERVALS
-O $OUTDIR/normal.counts.hdf5

Denoise

gatk DenoiseReadCounts -I $OUTDIR/tumor.counts.hdf5
--count-panel-of-normals $PON
--standardized-copy-ratios $OUTDIR/tumor.standardized.tsv
--denoised-copy-ratios $OUTDIR/tumor.denoised.tsv

Allelic counts

gatk CollectAllelicCounts -R $REFERENCE -I $TUMOR -L $SNP_SITES
-O $OUTDIR/tumor.allelicCounts.tsv gatk CollectAllelicCounts -R $REFERENCE -I $NORMAL -L $SNP_SITES
-O $OUTDIR/normal.allelicCounts.tsv

Model and call

gatk ModelSegments
--denoised-copy-ratios $OUTDIR/tumor.denoised.tsv
--allelic-counts $OUTDIR/tumor.allelicCounts.tsv
--normal-allelic-counts $OUTDIR/normal.allelicCounts.tsv
--output-prefix tumor -O $OUTDIR/

gatk CallCopyRatioSegments -I $OUTDIR/tumor.cr.seg -O $OUTDIR/tumor.called.seg

Key Output Files

File Description

.counts.hdf5 Raw read counts per interval

.denoised.tsv Denoised log2 copy ratios

.modelFinal.seg Segmented copy ratios with confidence

.called.seg Final called segments with CN state

.hets.tsv Heterozygous SNP allelic counts

Related Skills

  • copy-number/cnvkit-analysis - Alternative CNV caller

  • copy-number/cnv-visualization - Plotting results

  • alignment-files/bam-statistics - Input BAM QC

  • variant-calling/variant-calling - SNP calling for allelic counts

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

bio-read-qc-fastp-workflow

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

bio-workflows-scrnaseq-pipeline

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

bio-workflows-rnaseq-to-de

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

bio-workflows-genome-assembly-pipeline

No summary provided by upstream source.

Repository SourceNeeds Review