STAR RNA-seq Alignment

Generate Genome Index

Basic index generation

STAR --runMode genomeGenerate
--runThreadN 8
--genomeDir star_index/
--genomeFastaFiles reference.fa
--sjdbGTFfile annotation.gtf
--sjdbOverhang 100 # Read length - 1

Index with Specific Read Length

For 150bp reads, use sjdbOverhang=149

STAR --runMode genomeGenerate
--runThreadN 8
--genomeDir star_index_150/
--genomeFastaFiles reference.fa
--sjdbGTFfile annotation.gtf
--sjdbOverhang 149

Basic Alignment

Paired-end alignment

STAR --runThreadN 8
--genomeDir star_index/
--readFilesIn reads_1.fq.gz reads_2.fq.gz
--readFilesCommand zcat
--outFileNamePrefix sample_
--outSAMtype BAM SortedByCoordinate

Single-End Alignment

STAR --runThreadN 8
--genomeDir star_index/
--readFilesIn reads.fq.gz
--readFilesCommand zcat
--outFileNamePrefix sample_
--outSAMtype BAM SortedByCoordinate

Two-Pass Mode

Two-pass mode for better novel junction detection

STAR --runThreadN 8
--genomeDir star_index/
--readFilesIn r1.fq.gz r2.fq.gz
--readFilesCommand zcat
--outFileNamePrefix sample_
--outSAMtype BAM SortedByCoordinate
--twopassMode Basic

Quantification Mode

Output gene counts (like featureCounts)

STAR --runThreadN 8
--genomeDir star_index/
--readFilesIn r1.fq.gz r2.fq.gz
--readFilesCommand zcat
--outFileNamePrefix sample_
--outSAMtype BAM SortedByCoordinate
--quantMode GeneCounts

Output: sample_ReadsPerGene.out.tab with columns:

Gene ID
Unstranded counts
Forward strand counts
Reverse strand counts

ENCODE Options

ENCODE recommended settings

STAR --runThreadN 8
--genomeDir star_index/
--readFilesIn r1.fq.gz r2.fq.gz
--readFilesCommand zcat
--outFileNamePrefix sample_
--outSAMtype BAM SortedByCoordinate
--outSAMunmapped Within
--outSAMattributes NH HI AS NM MD
--outFilterType BySJout
--outFilterMultimapNmax 20
--outFilterMismatchNmax 999
--outFilterMismatchNoverReadLmax 0.04
--alignIntronMin 20
--alignIntronMax 1000000
--alignMatesGapMax 1000000
--alignSJoverhangMin 8
--alignSJDBoverhangMin 1

Fusion Detection

For chimeric/fusion detection

Output Files

File Description

*Aligned.sortedByCoord.out.bam Sorted BAM file

*Log.final.out Alignment summary statistics

*Log.out Detailed log

*SJ.out.tab Splice junctions

*ReadsPerGene.out.tab Gene counts (if --quantMode)

*Chimeric.out.junction Fusion candidates (if chimeric)

Memory Requirements

Reduce memory for limited systems

STAR --genomeLoad NoSharedMemory
--limitBAMsortRAM 10000000000 \ # 10GB for sorting ...

For very large genomes, limit during index generation

STAR --runMode genomeGenerate
--limitGenomeGenerateRAM 31000000000 \ # 31GB ...

Shared Memory Mode

Load genome into shared memory (for multiple samples)

STAR --genomeLoad LoadAndExit --genomeDir star_index/

Run alignments (faster startup)

STAR --genomeLoad LoadAndKeep --genomeDir star_index/ ...

Remove from memory when done

STAR --genomeLoad Remove --genomeDir star_index/

Key Parameters

Parameter Default Description

--runThreadN 1 Number of threads

--sjdbOverhang 100 Read length - 1

--outFilterMultimapNmax 10 Max multi-mapping

--alignIntronMax 0 Max intron size

--outFilterMismatchNmax 10 Max mismatches

--outSAMtype SAM Output format

--quantMode

GeneCounts for counting

--twopassMode None Basic for two-pass

Related Skills

rna-quantification/featurecounts-counting - Alternative counting
rna-quantification/alignment-free-quant - Salmon/kallisto alternative
differential-expression/deseq2-basics - Downstream DE analysis
read-qc/fastp-workflow - Preprocess reads

bio-read-alignment-star-alignment

Safety Notice

Copy this and send it to your AI assistant to learn