bio-bedgraph-handling

bedGraph is a text format for displaying continuous-valued data on genome browsers. Common for coverage, signal intensity, and scores.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "bio-bedgraph-handling" with this command: npx skills add gptomics/bioskills/gptomics-bioskills-bio-bedgraph-handling

bedGraph Handling

bedGraph is a text format for displaying continuous-valued data on genome browsers. Common for coverage, signal intensity, and scores.

bedGraph Format

track type=bedGraph name="Sample" description="Coverage" chr1 0 100 1.5 chr1 100 200 2.3 chr1 200 300 0.8

Four columns: chrom, start, end, value (0-based, half-open)

Create bedGraph from BAM

Using bedtools genomecov

bedtools genomecov -ibam sample.bam -bg > sample.bedgraph bedtools genomecov -ibam sample.bam -bg -split > sample.bedgraph bedtools genomecov -ibam sample.bam -bg -scale 1.5 > sample.scaled.bedgraph

Strand-Specific

bedtools genomecov -ibam sample.bam -bg -strand + > sample.plus.bedgraph bedtools genomecov -ibam sample.bam -bg -strand - > sample.minus.bedgraph

5' End Coverage (ChIP-seq)

bedtools genomecov -ibam sample.bam -bg -5 > sample.5prime.bedgraph

Normalize by Library Size (CPM)

total_reads=$(samtools view -c -F 260 sample.bam) scale=$(echo "scale=10; 1000000 / $total_reads" | bc)

bedtools genomecov -ibam sample.bam -bg -scale $scale > sample.cpm.bedgraph

Sort bedGraph

bedGraph must be sorted for conversion to bigWig.

sort -k1,1 -k2,2n sample.bedgraph > sample.sorted.bedgraph LC_ALL=C sort -k1,1 -k2,2n sample.bedgraph > sample.sorted.bedgraph

Convert bedGraph to bigWig

Using UCSC bedGraphToBigWig

bedGraphToBigWig sample.sorted.bedgraph chrom.sizes sample.bw fetchChromSizes hg38 > hg38.chrom.sizes bedGraphToBigWig sample.sorted.bedgraph hg38.chrom.sizes sample.bw

Generate chrom.sizes

samtools faidx reference.fa cut -f1,2 reference.fa.fai > chrom.sizes fetchChromSizes hg38 > hg38.chrom.sizes mysql --user=genome --host=genome-mysql.soe.ucsc.edu -A -e
"select chrom, size from hg38.chromInfo" > hg38.chrom.sizes

Clip to Chromosome Boundaries

bedClip sample.bedgraph chrom.sizes sample.clipped.bedgraph bedGraphToBigWig sample.clipped.bedgraph chrom.sizes sample.bw

Convert bigWig to bedGraph

bigWigToBedGraph sample.bw sample.bedgraph bigWigToBedGraph sample.bw sample.chr1.bedgraph -chrom=chr1 bigWigToBedGraph sample.bw sample.region.bedgraph -chrom=chr1 -start=1000 -end=2000

Merge bedGraph Files

Using bedtools unionbedg

bedtools unionbedg -i sample1.bedgraph sample2.bedgraph sample3.bedgraph
-header -names sample1 sample2 sample3 > merged.bedgraph

Average Across Samples

bedtools unionbedg -i sample1.bedgraph sample2.bedgraph sample3.bedgraph |
awk '{sum=0; for(i=4;i<=NF;i++) sum+=$i; print $1,$2,$3,sum/(NF-3)}' OFS='\t'
> average.bedgraph

Mathematical Operations

bedtools map for Region Statistics

bedtools map -a regions.bed -b sample.bedgraph -c 4 -o mean > region_means.bed bedtools map -a regions.bed -b sample.bedgraph -c 4 -o sum > region_sums.bed bedtools map -a regions.bed -b sample.bedgraph -c 4 -o max > region_max.bed

Subtract Background

bedtools unionbedg -i treatment.bedgraph input.bedgraph |
awk '{diff=$4-$5; if(diff<0) diff=0; print $1,$2,$3,diff}' OFS='\t'
> subtracted.bedgraph

Log Transform

awk '{print $1,$2,$3,log($4+1)/log(2)}' OFS='\t' sample.bedgraph > sample.log2.bedgraph

Smooth Signal

bedtools slop -i sample.bedgraph -g chrom.sizes -b 50 |
bedtools merge -i - -c 4 -o mean > smoothed.bedgraph

Python with pyBigWig

Write bedGraph

import pyBigWig

bw = pyBigWig.open('output.bedgraph', 'w') bw.addHeader([('chr1', 248956422), ('chr2', 242193529)])

chroms = ['chr1', 'chr1', 'chr1'] starts = [0, 100, 200] ends = [100, 200, 300] values = [1.5, 2.3, 0.8] bw.addEntries(chroms, starts, ends=ends, values=values) bw.close()

Read bigWig to bedGraph Format

import pyBigWig

bw = pyBigWig.open('sample.bw')

for chrom, size in bw.chroms().items(): intervals = bw.intervals(chrom) if intervals: for start, end, value in intervals: print(f'{chrom}\t{start}\t{end}\t{value}')

bw.close()

Convert bigWig Region to bedGraph

import pyBigWig

bw = pyBigWig.open('sample.bw') intervals = bw.intervals('chr1', 1000000, 2000000)

with open('region.bedgraph', 'w') as f: for start, end, value in intervals: f.write(f'chr1\t{start}\t{end}\t{value}\n')

bw.close()

deepTools for Normalization

bamCoverage (BAM to bedGraph/bigWig)

bamCoverage -b sample.bam -o sample.bw --normalizeUsing RPKM bamCoverage -b sample.bam -o sample.bw --normalizeUsing CPM bamCoverage -b sample.bam -o sample.bw --normalizeUsing BPM bamCoverage -b sample.bam -o sample.bedgraph --outFileFormat bedgraph --normalizeUsing CPM

bamCompare (Treatment vs Control)

bamCompare -b1 treatment.bam -b2 input.bam -o log2ratio.bw --scaleFactorsMethod readCount bamCompare -b1 treatment.bam -b2 input.bam -o subtracted.bw --ratio subtract

bigwigCompare

bigwigCompare -b1 treatment.bw -b2 input.bw -o ratio.bw --ratio log2 bigwigCompare -b1 sample1.bw -b2 sample2.bw -o diff.bw --ratio subtract

Filtering and Subsetting

Filter by Value

awk '$4 >= 1.0' sample.bedgraph > high_signal.bedgraph awk '$4 > 0' sample.bedgraph > nonzero.bedgraph

Extract Regions

bedtools intersect -a sample.bedgraph -b regions.bed > subset.bedgraph

Remove Specific Chromosomes

grep -v "^chrM" sample.bedgraph | grep -v "_random" > filtered.bedgraph awk '$1 ~ /^chr[0-9XY]+$/' sample.bedgraph > standard_chroms.bedgraph

Aggregate to Bins

Fixed-Size Bins

bedtools makewindows -g chrom.sizes -w 1000 > bins.bed bedtools map -a bins.bed -b sample.bedgraph -c 4 -o mean > binned.bedgraph

Gene Bodies

bedtools map -a genes.bed -b sample.bedgraph -c 4 -o mean > gene_signal.bed

Quality Control

Check for Overlapping Intervals

bedtools merge -i sample.bedgraph -c 4 -o collapse |
awk 'index($4,",") > 0' | head

Verify Sorted Order

sort -c -k1,1 -k2,2n sample.bedgraph && echo "Sorted" || echo "Not sorted"

Check Value Range

awk 'NR==1 {min=$4; max=$4} {if($4<min) min=$4; if($4>max) max=$4} END {print "Min:", min, "Max:", max}' sample.bedgraph

Complete Pipeline

#!/bin/bash BAM=$1 NAME=$(basename $BAM .bam) CHROM_SIZES=$2

total_reads=$(samtools view -c -F 260 $BAM) scale=$(echo "scale=10; 1000000 / $total_reads" | bc)

bedtools genomecov -ibam $BAM -bg -scale $scale > ${NAME}.bedgraph

sort -k1,1 -k2,2n ${NAME}.bedgraph > ${NAME}.sorted.bedgraph

bedClip ${NAME}.sorted.bedgraph $CHROM_SIZES ${NAME}.clipped.bedgraph

bedGraphToBigWig ${NAME}.clipped.bedgraph $CHROM_SIZES ${NAME}.bw

rm ${NAME}.bedgraph ${NAME}.sorted.bedgraph ${NAME}.clipped.bedgraph

echo "Created ${NAME}.bw (CPM normalized)"

Track Header for UCSC

echo 'track type=bedGraph name="Sample" description="CPM normalized" visibility=full color=0,0,255 altColor=255,0,0 autoScale=on graphType=bar' > track.bedgraph cat sample.bedgraph >> track.bedgraph

Related Skills

  • coverage-analysis - Generate coverage from alignments

  • bigwig-tracks - Work with bigWig format

  • chipseq-visualization - Visualize signal tracks

  • alignment-files - BAM file processing

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

bioskills

No summary provided by upstream source.

Repository SourceNeeds Review
General

bio-data-visualization-genome-tracks

No summary provided by upstream source.

Repository SourceNeeds Review
General

bio-epitranscriptomics-merip-preprocessing

No summary provided by upstream source.

Repository SourceNeeds Review
General

bio-data-visualization-multipanel-figures

No summary provided by upstream source.

Repository SourceNeeds Review