bio-clip-seq-clip-peak-calling

CLIP-seq Peak Calling

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "bio-clip-seq-clip-peak-calling" with this command: npx skills add gptomics/bioskills/gptomics-bioskills-bio-clip-seq-clip-peak-calling

CLIP-seq Peak Calling

CLIPper (Recommended)

Basic peak calling

clipper
-b deduped.bam
-s hg38
-o peaks.bed
--save-pickle

With FDR threshold

clipper
-b deduped.bam
-s hg38
-o peaks.bed
--FDR 0.05
--superlocal

Specify gene annotations

clipper
-b deduped.bam
-s hg38
--gene genes.bed
-o peaks.bed

CLIPper Options

Option Description

-b Input BAM file

-s Species (hg38, mm10)

-o Output BED file

--FDR FDR threshold (default 0.05)

--superlocal Use superlocal background

--gene Custom gene annotation BED

--save-pickle Save intermediate data

PureCLIP (HMM-Based)

PureCLIP uses an HMM to model crosslink sites, incorporating enrichment and truncation signals.

Installation

conda install -c bioconda pureclip

Basic peak calling

pureclip
-i deduped.bam
-bai deduped.bam.bai
-g genome.fa
-o crosslink_sites.bed
-or binding_regions.bed
-nt 4

-nt 4: Number of threads. Adjust based on CPU cores.

-o: Single-nucleotide crosslink sites

-or: Broader binding regions

PureCLIP Options

Option Description

-i Input BAM file

-bai BAM index file

-g Reference genome FASTA

-o Crosslink sites output

-or Binding regions output

-nt Number of threads

-iv Interval file to restrict analysis

-dm Min distance for merging

PureCLIP with Input Control

With SMInput control BAM

pureclip
-i clip.bam
-bai clip.bam.bai
-g genome.fa
-ibam sminput.bam
-ibai sminput.bam.bai
-o crosslinks.bed
-or regions.bed
-nt 8

-ibam/-ibai: Input control BAM for background modeling

PureCLIP Output

Crosslink sites BED contains:

chr start end name score strand

Score interpretation:

Higher scores = more confident crosslink sites

Filter by score

score>=3: Medium confidence. Use 5+ for high confidence.

awk '$5 >= 3' crosslink_sites.bed > filtered_sites.bed

PureCLIP for Different CLIP Types

eCLIP (recommended settings)

pureclip -i eclip.bam -bai eclip.bam.bai -g genome.fa
-o sites.bed -or regions.bed -nt 4 -dm 8

iCLIP (single-nucleotide resolution)

pureclip -i iclip.bam -bai iclip.bam.bai -g genome.fa
-o sites.bed -or regions.bed -nt 4

PAR-CLIP (T-to-C transitions)

pureclip -i parclip.bam -bai parclip.bam.bai -g genome.fa
-o sites.bed -or regions.bed -nt 4

Piranha

Basic usage

Piranha -s deduped.bam -o peaks.bed

With p-value threshold

Piranha -s deduped.bam -o peaks.bed -p 0.01

Stranded analysis

Piranha -s deduped.bam -o peaks.bed -p 0.01 -u

Zero-truncated negative binomial

Piranha -s deduped.bam -o peaks.bed -d ZeroTruncatedNegativeBinomial

PEAKachu (for PAR-CLIP)

PAR-CLIP specific caller

peakachu adaptive
-c control.bam
-t treatment.bam
-r reference.fa
-o peakachu_peaks.gff

MACS3 for CLIP (Alternative)

Use narrow peak calling mode

macs3 callpeak
-t deduped.bam
-f BAM
-g hs
-n clip_peaks
--nomodel
--extsize 50
-q 0.01

Strand-Specific Peak Calling

Split BAM by strand

samtools view -h -F 16 deduped.bam | samtools view -Sb - > plus_strand.bam samtools view -h -f 16 deduped.bam | samtools view -Sb - > minus_strand.bam

Call peaks on each strand

clipper -b plus_strand.bam -s hg38 -o peaks_plus.bed clipper -b minus_strand.bam -s hg38 -o peaks_minus.bed

Combine

cat peaks_plus.bed peaks_minus.bed | sort -k1,1 -k2,2n > peaks_all.bed

Filter Peaks

By score

awk '$5 >= 10' peaks.bed > peaks_filtered.bed

By size

awk '($3 - $2) >= 20 && ($3 - $2) <= 200' peaks.bed > peaks_sized.bed

By read count (if in name field)

awk '$5 >= 5' peaks.bed > peaks_min5reads.bed

Merge Replicates

Use bedtools to find consensus peaks

bedtools intersect -a rep1_peaks.bed -b rep2_peaks.bed -wa |
sort -u > consensus_peaks.bed

Require overlap in N replicates

bedtools multiinter -i rep1.bed rep2.bed rep3.bed |
awk '$4 >= 2' |
bedtools merge > consensus_peaks.bed

Peak Metrics

import pandas as pd

def load_clip_peaks(bed_path): peaks = pd.read_csv(bed_path, sep='\t', header=None, names=['chrom', 'start', 'end', 'name', 'score', 'strand']) return peaks

def peak_stats(peaks): stats = { 'n_peaks': len(peaks), 'mean_width': (peaks['end'] - peaks['start']).mean(), 'median_score': peaks['score'].median(), 'peaks_per_chrom': peaks.groupby('chrom').size().to_dict() } return stats

peaks = load_clip_peaks('peaks.bed') print(peak_stats(peaks))

Quality Metrics

Metric Good Value Description

Peak count 1,000-50,000 Depends on RBP

Peak width 20-100 nt Typical for RBP footprint

FRiP

0.1 Fraction reads in peaks

Calculate FRiP

Reads in peaks

reads_in_peaks=$(bedtools intersect -a deduped.bam -b peaks.bed -u | samtools view -c -)

Total reads

total_reads=$(samtools view -c deduped.bam)

FRiP

frip=$(echo "scale=4; $reads_in_peaks / $total_reads" | bc) echo "FRiP: $frip"

Related Skills

  • clip-alignment - Generate aligned BAM

  • clip-preprocessing - UMI deduplication

  • binding-site-annotation - Annotate peaks with gene features

  • clip-motif-analysis - Find enriched motifs in peaks

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

bio-workflows-clip-pipeline

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

bio-clinical-databases-dbsnp-queries

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

bio-clinical-databases-variant-prioritization

No summary provided by upstream source.

Repository SourceNeeds Review
General

bioskills

No summary provided by upstream source.

Repository SourceNeeds Review