CLIP-seq Peak Calling

CLIPper (Recommended)

Basic peak calling

clipper
-b deduped.bam
-s hg38
-o peaks.bed
--save-pickle

With FDR threshold

clipper
-b deduped.bam
-s hg38
-o peaks.bed
--FDR 0.05
--superlocal

Specify gene annotations

clipper
-b deduped.bam
-s hg38
--gene genes.bed
-o peaks.bed

CLIPper Options

Option Description

-b Input BAM file

-s Species (hg38, mm10)

-o Output BED file

--FDR FDR threshold (default 0.05)

--superlocal Use superlocal background

--gene Custom gene annotation BED

--save-pickle Save intermediate data

PureCLIP (HMM-Based)

PureCLIP uses an HMM to model crosslink sites, incorporating enrichment and truncation signals.

Installation

conda install -c bioconda pureclip

Basic peak calling

pureclip
-i deduped.bam
-bai deduped.bam.bai
-g genome.fa
-o crosslink_sites.bed
-or binding_regions.bed
-nt 4

-nt 4: Number of threads. Adjust based on CPU cores.

-o: Single-nucleotide crosslink sites

-or: Broader binding regions

PureCLIP Options

Option Description

-i Input BAM file

-bai BAM index file

-g Reference genome FASTA

-o Crosslink sites output

-or Binding regions output

-nt Number of threads

-iv Interval file to restrict analysis

-dm Min distance for merging

PureCLIP with Input Control

With SMInput control BAM

pureclip
-i clip.bam
-bai clip.bam.bai
-g genome.fa
-ibam sminput.bam
-ibai sminput.bam.bai
-o crosslinks.bed
-or regions.bed
-nt 8

-ibam/-ibai: Input control BAM for background modeling

PureCLIP Output

Crosslink sites BED contains:

chr start end name score strand

Score interpretation:

Higher scores = more confident crosslink sites

Filter by score

score>=3: Medium confidence. Use 5+ for high confidence.

awk '$5 >= 3' crosslink_sites.bed > filtered_sites.bed

PureCLIP for Different CLIP Types

eCLIP (recommended settings)

pureclip -i eclip.bam -bai eclip.bam.bai -g genome.fa
-o sites.bed -or regions.bed -nt 4 -dm 8

iCLIP (single-nucleotide resolution)

pureclip -i iclip.bam -bai iclip.bam.bai -g genome.fa
-o sites.bed -or regions.bed -nt 4

PAR-CLIP (T-to-C transitions)

pureclip -i parclip.bam -bai parclip.bam.bai -g genome.fa
-o sites.bed -or regions.bed -nt 4

Piranha

Basic usage

Piranha -s deduped.bam -o peaks.bed

With p-value threshold

Piranha -s deduped.bam -o peaks.bed -p 0.01

Stranded analysis

Piranha -s deduped.bam -o peaks.bed -p 0.01 -u

Zero-truncated negative binomial

Piranha -s deduped.bam -o peaks.bed -d ZeroTruncatedNegativeBinomial

PEAKachu (for PAR-CLIP)

PAR-CLIP specific caller

peakachu adaptive
-c control.bam
-t treatment.bam
-r reference.fa
-o peakachu_peaks.gff

MACS3 for CLIP (Alternative)

Use narrow peak calling mode

macs3 callpeak
-t deduped.bam
-f BAM
-g hs
-n clip_peaks
--nomodel
--extsize 50
-q 0.01

Strand-Specific Peak Calling

Split BAM by strand

samtools view -h -F 16 deduped.bam | samtools view -Sb - > plus_strand.bam samtools view -h -f 16 deduped.bam | samtools view -Sb - > minus_strand.bam

Call peaks on each strand

clipper -b plus_strand.bam -s hg38 -o peaks_plus.bed clipper -b minus_strand.bam -s hg38 -o peaks_minus.bed

Combine

cat peaks_plus.bed peaks_minus.bed | sort -k1,1 -k2,2n > peaks_all.bed

Filter Peaks

By score

awk '$5 >= 10' peaks.bed > peaks_filtered.bed

By size

awk '($3 - $2) >= 20 && ($3 - $2) <= 200' peaks.bed > peaks_sized.bed

By read count (if in name field)

awk '$5 >= 5' peaks.bed > peaks_min5reads.bed

Merge Replicates

Use bedtools to find consensus peaks

bedtools intersect -a rep1_peaks.bed -b rep2_peaks.bed -wa |
sort -u > consensus_peaks.bed

Require overlap in N replicates

bedtools multiinter -i rep1.bed rep2.bed rep3.bed |
awk '$4 >= 2' |
bedtools merge > consensus_peaks.bed

Peak Metrics

import pandas as pd

def load_clip_peaks(bed_path): peaks = pd.read_csv(bed_path, sep='\t', header=None, names=['chrom', 'start', 'end', 'name', 'score', 'strand']) return peaks

def peak_stats(peaks): stats = { 'n_peaks': len(peaks), 'mean_width': (peaks['end'] - peaks['start']).mean(), 'median_score': peaks['score'].median(), 'peaks_per_chrom': peaks.groupby('chrom').size().to_dict() } return stats

peaks = load_clip_peaks('peaks.bed') print(peak_stats(peaks))

Quality Metrics

Metric Good Value Description

Peak count 1,000-50,000 Depends on RBP

Peak width 20-100 nt Typical for RBP footprint

FRiP

0.1 Fraction reads in peaks

Calculate FRiP

Reads in peaks

reads_in_peaks=$(bedtools intersect -a deduped.bam -b peaks.bed -u | samtools view -c -)

Total reads

total_reads=$(samtools view -c deduped.bam)

FRiP

frip=$(echo "scale=4; $reads_in_peaks / $total_reads" | bc) echo "FRiP: $frip"

Related Skills

clip-alignment - Generate aligned BAM
clip-preprocessing - UMI deduplication
binding-site-annotation - Annotate peaks with gene features
clip-motif-analysis - Find enriched motifs in peaks

bio-clip-seq-clip-peak-calling

Safety Notice

Copy this and send it to your AI assistant to learn

Basic peak calling

With FDR threshold

Specify gene annotations

Installation

Basic peak calling

-nt 4: Number of threads. Adjust based on CPU cores.

-o: Single-nucleotide crosslink sites

-or: Broader binding regions

With SMInput control BAM

-ibam/-ibai: Input control BAM for background modeling

Crosslink sites BED contains:

chr start end name score strand

Score interpretation:

Higher scores = more confident crosslink sites

Filter by score

score>=3: Medium confidence. Use 5+ for high confidence.

eCLIP (recommended settings)

iCLIP (single-nucleotide resolution)

PAR-CLIP (T-to-C transitions)

Basic usage

With p-value threshold

Stranded analysis

Zero-truncated negative binomial

PAR-CLIP specific caller

Use narrow peak calling mode

Split BAM by strand

Call peaks on each strand

Combine

By score

By size

By read count (if in name field)

Use bedtools to find consensus peaks

Require overlap in N replicates

Reads in peaks

Total reads

FRiP

Source Transparency

Related Skills

bio-workflows-clip-pipeline

bio-clinical-databases-dbsnp-queries

bio-clinical-databases-variant-prioritization

bioskills