CLIP-seq Peak Calling
CLIPper (Recommended)
Basic peak calling
clipper
-b deduped.bam
-s hg38
-o peaks.bed
--save-pickle
With FDR threshold
clipper
-b deduped.bam
-s hg38
-o peaks.bed
--FDR 0.05
--superlocal
Specify gene annotations
clipper
-b deduped.bam
-s hg38
--gene genes.bed
-o peaks.bed
CLIPper Options
Option Description
-b Input BAM file
-s Species (hg38, mm10)
-o Output BED file
--FDR FDR threshold (default 0.05)
--superlocal Use superlocal background
--gene Custom gene annotation BED
--save-pickle Save intermediate data
PureCLIP (HMM-Based)
PureCLIP uses an HMM to model crosslink sites, incorporating enrichment and truncation signals.
Installation
conda install -c bioconda pureclip
Basic peak calling
pureclip
-i deduped.bam
-bai deduped.bam.bai
-g genome.fa
-o crosslink_sites.bed
-or binding_regions.bed
-nt 4
-nt 4: Number of threads. Adjust based on CPU cores.
-o: Single-nucleotide crosslink sites
-or: Broader binding regions
PureCLIP Options
Option Description
-i Input BAM file
-bai BAM index file
-g Reference genome FASTA
-o Crosslink sites output
-or Binding regions output
-nt Number of threads
-iv Interval file to restrict analysis
-dm Min distance for merging
PureCLIP with Input Control
With SMInput control BAM
pureclip
-i clip.bam
-bai clip.bam.bai
-g genome.fa
-ibam sminput.bam
-ibai sminput.bam.bai
-o crosslinks.bed
-or regions.bed
-nt 8
-ibam/-ibai: Input control BAM for background modeling
PureCLIP Output
Crosslink sites BED contains:
chr start end name score strand
Score interpretation:
Higher scores = more confident crosslink sites
Filter by score
score>=3: Medium confidence. Use 5+ for high confidence.
awk '$5 >= 3' crosslink_sites.bed > filtered_sites.bed
PureCLIP for Different CLIP Types
eCLIP (recommended settings)
pureclip -i eclip.bam -bai eclip.bam.bai -g genome.fa
-o sites.bed -or regions.bed -nt 4 -dm 8
iCLIP (single-nucleotide resolution)
pureclip -i iclip.bam -bai iclip.bam.bai -g genome.fa
-o sites.bed -or regions.bed -nt 4
PAR-CLIP (T-to-C transitions)
pureclip -i parclip.bam -bai parclip.bam.bai -g genome.fa
-o sites.bed -or regions.bed -nt 4
Piranha
Basic usage
Piranha -s deduped.bam -o peaks.bed
With p-value threshold
Piranha -s deduped.bam -o peaks.bed -p 0.01
Stranded analysis
Piranha -s deduped.bam -o peaks.bed -p 0.01 -u
Zero-truncated negative binomial
Piranha -s deduped.bam -o peaks.bed -d ZeroTruncatedNegativeBinomial
PEAKachu (for PAR-CLIP)
PAR-CLIP specific caller
peakachu adaptive
-c control.bam
-t treatment.bam
-r reference.fa
-o peakachu_peaks.gff
MACS3 for CLIP (Alternative)
Use narrow peak calling mode
macs3 callpeak
-t deduped.bam
-f BAM
-g hs
-n clip_peaks
--nomodel
--extsize 50
-q 0.01
Strand-Specific Peak Calling
Split BAM by strand
samtools view -h -F 16 deduped.bam | samtools view -Sb - > plus_strand.bam samtools view -h -f 16 deduped.bam | samtools view -Sb - > minus_strand.bam
Call peaks on each strand
clipper -b plus_strand.bam -s hg38 -o peaks_plus.bed clipper -b minus_strand.bam -s hg38 -o peaks_minus.bed
Combine
cat peaks_plus.bed peaks_minus.bed | sort -k1,1 -k2,2n > peaks_all.bed
Filter Peaks
By score
awk '$5 >= 10' peaks.bed > peaks_filtered.bed
By size
awk '($3 - $2) >= 20 && ($3 - $2) <= 200' peaks.bed > peaks_sized.bed
By read count (if in name field)
awk '$5 >= 5' peaks.bed > peaks_min5reads.bed
Merge Replicates
Use bedtools to find consensus peaks
bedtools intersect -a rep1_peaks.bed -b rep2_peaks.bed -wa |
sort -u > consensus_peaks.bed
Require overlap in N replicates
bedtools multiinter -i rep1.bed rep2.bed rep3.bed |
awk '$4 >= 2' |
bedtools merge > consensus_peaks.bed
Peak Metrics
import pandas as pd
def load_clip_peaks(bed_path): peaks = pd.read_csv(bed_path, sep='\t', header=None, names=['chrom', 'start', 'end', 'name', 'score', 'strand']) return peaks
def peak_stats(peaks): stats = { 'n_peaks': len(peaks), 'mean_width': (peaks['end'] - peaks['start']).mean(), 'median_score': peaks['score'].median(), 'peaks_per_chrom': peaks.groupby('chrom').size().to_dict() } return stats
peaks = load_clip_peaks('peaks.bed') print(peak_stats(peaks))
Quality Metrics
Metric Good Value Description
Peak count 1,000-50,000 Depends on RBP
Peak width 20-100 nt Typical for RBP footprint
FRiP
0.1 Fraction reads in peaks
Calculate FRiP
Reads in peaks
reads_in_peaks=$(bedtools intersect -a deduped.bam -b peaks.bed -u | samtools view -c -)
Total reads
total_reads=$(samtools view -c deduped.bam)
FRiP
frip=$(echo "scale=4; $reads_in_peaks / $total_reads" | bc) echo "FRiP: $frip"
Related Skills
-
clip-alignment - Generate aligned BAM
-
clip-preprocessing - UMI deduplication
-
binding-site-annotation - Annotate peaks with gene features
-
clip-motif-analysis - Find enriched motifs in peaks