Super-Enhancer Calling

Identify super-enhancers (SEs) - large clusters of enhancers that control cell identity genes.

Background

Super-enhancers are:

Large clusters of enhancer regions
Marked by H3K27ac, Med1, BRD4
Control cell identity genes
Often altered in disease/cancer

ROSE (Rank Ordering of Super-Enhancers)

Installation

git clone https://github.com/stjude/ROSE.git cd ROSE

Requires samtools, R, bedtools

Input Requirements

BAM file - H3K27ac ChIP-seq aligned reads
Peak file - Called peaks (BED or GFF)
Genome annotation - TSS annotations

Run ROSE

Basic usage

python ROSE_main.py
-g HG38
-i peaks.gff
-r h3k27ac.bam
-o output_dir
-s 12500
-t 2500

With control/input

python ROSE_main.py
-g HG38
-i peaks.gff
-r h3k27ac.bam
-c input.bam
-o output_dir

Key Parameters

Parameter Description Default

-s

Stitching distance 12500 bp

-t

TSS exclusion 2500 bp

-c

Control BAM None

Output Files

output_dir/ ├── *_AllEnhancers.table.txt # All enhancer regions ├── *_SuperEnhancers.table.txt # Super-enhancers only ├── *_Enhancers_withSuper.bed # BED with SE annotation └── *_Plot_points.png # Hockey stick plot

Prepare Input Files

Convert BED to GFF

ROSE requires GFF format for peaks

awk 'BEGIN{OFS="\t"} {print $1,"peaks","enhancer",$2,$3,".",$6,".","ID="NR}'
peaks.bed > peaks.gff

Filter Peaks for Enhancers

Remove promoter peaks (within 2.5kb of TSS)

bedtools intersect -a peaks.bed -b promoters.bed -v > enhancer_peaks.bed

Alternative: HOMER Super-Enhancers

Call super-enhancers with HOMER

findPeaks tag_dir/ -style super -o auto

Or from existing peaks

findPeaks tag_dir/ -style super -i input_tag_dir/
-typical typical_enhancers.txt
-superSlope -1000
> super_enhancers.txt

Alternative: SEanalysis

R-based analysis

Rscript << 'EOF' library(SEanalysis)

Load H3K27ac signal at enhancers

signal <- read.table('enhancer_signal.txt', header=TRUE)

Rank and identify super-enhancers

se_result <- identifySE(signal$signal, method='ROSE')

Get super-enhancer IDs

super_enhancers <- signal$id[se_result$is_super] write.table(super_enhancers, 'super_enhancers.txt', quote=FALSE, row.names=FALSE) EOF

Custom Hockey Stick Analysis (R)

library(ggplot2)

Load enhancer signal data

enhancers <- read.table('enhancer_signal.txt', header=TRUE)

Rank by signal

enhancers <- enhancers[order(enhancers$signal), ] enhancers$rank <- 1:nrow(enhancers)

Find inflection point (tangent = 1)

Normalize ranks and signal to 0-1

enhancers$rank_norm <- enhancers$rank / max(enhancers$rank) enhancers$signal_norm <- enhancers$signal / max(enhancers$signal)

Calculate slope at each point

n <- nrow(enhancers) slopes <- diff(enhancers$signal_norm) / diff(enhancers$rank_norm) inflection <- which(slopes > 1)[1]

Classify

enhancers$type <- ifelse(enhancers$rank >= inflection, 'Super-Enhancer', 'Typical')

Plot

ggplot(enhancers, aes(rank, signal, color = type)) + geom_point(size = 0.5) + scale_color_manual(values = c('Super-Enhancer' = 'red', 'Typical' = 'grey60')) + geom_vline(xintercept = inflection, linetype = 'dashed') + labs(x = 'Enhancer Rank', y = 'H3K27ac Signal', title = 'Super-Enhancer Identification') + theme_bw()

ggsave('hockey_stick_plot.pdf', width = 8, height = 6)

Output super-enhancers

super_enhancers <- enhancers[enhancers$type == 'Super-Enhancer', ] write.table(super_enhancers, 'super_enhancers.txt', sep = '\t', quote = FALSE, row.names = FALSE)

Calculate Enhancer Signal

Get H3K27ac signal at peak regions

bedtools multicov -bams h3k27ac.bam -bed enhancer_peaks.bed > enhancer_counts.txt

Normalize by peak size

awk 'BEGIN{OFS="\t"} { size = $3 - $2 rpm = ($NF / TOTAL_READS) * 1e6 rpkm = rpm / (size / 1000) print $0, rpkm }' enhancer_counts.txt > enhancer_signal.txt

Downstream Analysis

Gene Assignment

Assign super-enhancers to nearest genes

bedtools closest -a super_enhancers.bed -b genes.bed -d > se_gene_assignment.txt

Compare Conditions

Load SE from two conditions

se1 <- read.table('condition1_SE.txt', header=TRUE) se2 <- read.table('condition2_SE.txt', header=TRUE)

Find differential super-enhancers

library(GenomicRanges) gr1 <- makeGRangesFromDataFrame(se1) gr2 <- makeGRangesFromDataFrame(se2)

Gained in condition 2

gained <- subsetByOverlaps(gr2, gr1, invert=TRUE)

Lost in condition 2

lost <- subsetByOverlaps(gr1, gr2, invert=TRUE)

Enrichment of Disease Variants

Check if GWAS SNPs enriched in super-enhancers

bedtools intersect -a gwas_snps.bed -b super_enhancers.bed -wa -wb > snps_in_SE.txt

Calculate enrichment

total_snps=$(wc -l < gwas_snps.bed) snps_in_se=$(wc -l < snps_in_SE.txt) se_coverage=$(awk '{sum += $3-$2} END {print sum}' super_enhancers.bed) genome_size=3000000000

expected=$(echo "$total_snps * $se_coverage / $genome_size" | bc -l) enrichment=$(echo "$snps_in_se / $expected" | bc -l) echo "Enrichment: $enrichment"

Complete Workflow

#!/bin/bash set -euo pipefail

H3K27AC_BAM=$1 PEAKS_BED=$2 OUTPUT_DIR=$3

mkdir -p $OUTPUT_DIR

echo "=== Convert peaks to GFF ===" awk 'BEGIN{OFS="\t"} {print $1,"peaks","enhancer",$2,$3,".",$6,".","ID="NR}'
$PEAKS_BED > $OUTPUT_DIR/peaks.gff

echo "=== Run ROSE ===" python ROSE_main.py
-g HG38
-i $OUTPUT_DIR/peaks.gff
-r $H3K27AC_BAM
-o $OUTPUT_DIR
-s 12500
-t 2500

echo "=== Summary ===" n_typical=$(grep -c "Typical" $OUTPUT_DIR/_AllEnhancers.table.txt || echo 0) n_super=$(wc -l < $OUTPUT_DIR/_SuperEnhancers.table.txt)

echo "Typical enhancers: $n_typical" echo "Super-enhancers: $n_super"

Related Skills

chip-seq/peak-calling - Call H3K27ac peaks first
chip-seq/peak-annotation - Annotate SE to genes
chip-seq/differential-binding - Compare SE between conditions
data-visualization/genome-tracks - Visualize SE regions

bio-chip-seq-super-enhancers

Safety Notice

Copy this and send it to your AI assistant to learn