bio-read-alignment-hisat2-alignment

HISAT2 RNA-seq Alignment

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "bio-read-alignment-hisat2-alignment" with this command: npx skills add gptomics/bioskills/gptomics-bioskills-bio-read-alignment-hisat2-alignment

HISAT2 RNA-seq Alignment

Build Index

Basic index (no annotation)

hisat2-build -p 8 reference.fa hisat2_index

Index with splice sites and exons (recommended)

hisat2_extract_splice_sites.py annotation.gtf > splice_sites.txt hisat2_extract_exons.py annotation.gtf > exons.txt

hisat2-build -p 8
--ss splice_sites.txt
--exon exons.txt
reference.fa hisat2_index

Basic Alignment

Paired-end reads

hisat2 -p 8 -x hisat2_index
-1 reads_1.fq.gz -2 reads_2.fq.gz
-S aligned.sam

Single-end reads

hisat2 -p 8 -x hisat2_index
-U reads.fq.gz
-S aligned.sam

Direct to Sorted BAM

Pipe to samtools

hisat2 -p 8 -x hisat2_index
-1 r1.fq.gz -2 r2.fq.gz |
samtools sort -@ 4 -o aligned.sorted.bam -

samtools index aligned.sorted.bam

Stranded Libraries

Forward stranded (e.g., Ligation)

hisat2 -p 8 -x hisat2_index
--rna-strandness FR
-1 r1.fq.gz -2 r2.fq.gz -S aligned.sam

Reverse stranded (e.g., dUTP, TruSeq - most common)

hisat2 -p 8 -x hisat2_index
--rna-strandness RF
-1 r1.fq.gz -2 r2.fq.gz -S aligned.sam

Single-end stranded

hisat2 -p 8 -x hisat2_index
--rna-strandness F \ # or R for reverse -U reads.fq.gz -S aligned.sam

Novel Splice Junction Discovery

Output novel splice junctions

hisat2 -p 8 -x hisat2_index
--novel-splicesite-outfile novel_splices.txt
-1 r1.fq.gz -2 r2.fq.gz -S aligned.sam

Use known + novel junctions for subsequent alignments

hisat2 -p 8 -x hisat2_index
--novel-splicesite-infile novel_splices.txt
-1 r1.fq.gz -2 r2.fq.gz -S aligned.sam

Two-Pass Alignment (Manual)

Pass 1: Discover junctions from all samples

for r1 in *_R1.fq.gz; do r2=${r1/_R1/_R2} base=$(basename $r1 _R1.fq.gz) hisat2 -p 8 -x hisat2_index
--novel-splicesite-outfile ${base}_splices.txt
-1 $r1 -2 $r2 -S /dev/null done

Combine and filter junctions

cat *_splices.txt | sort -u > combined_splices.txt

Pass 2: Realign with all junctions

for r1 in *_R1.fq.gz; do r2=${r1/_R1/_R2} base=$(basename $r1 _R1.fq.gz) hisat2 -p 8 -x hisat2_index
--novel-splicesite-infile combined_splices.txt
-1 $r1 -2 $r2 |
samtools sort -@ 4 -o ${base}.sorted.bam - done

Read Group Information

hisat2 -p 8 -x hisat2_index
--rg-id sample1
--rg SM:sample1
--rg PL:ILLUMINA
--rg LB:lib1
-1 r1.fq.gz -2 r2.fq.gz -S aligned.sam

Downstream Quantification

Output name-sorted BAM for htseq-count

hisat2 -p 8 -x hisat2_index -1 r1.fq.gz -2 r2.fq.gz |
samtools sort -n -@ 4 -o aligned.namesorted.bam -

Or coordinate-sorted for featureCounts

hisat2 -p 8 -x hisat2_index -1 r1.fq.gz -2 r2.fq.gz |
samtools sort -@ 4 -o aligned.sorted.bam -

Key Parameters

Parameter Default Description

-p 1 Number of threads

-x

Index basename

--rna-strandness unstranded FR/RF/F/R

--dta off Downstream transcriptome assembly

--dta-cufflinks off For Cufflinks

--min-intronlen 20 Minimum intron length

--max-intronlen 500000 Maximum intron length

-k 5 Max alignments to report

For StringTie/Cufflinks

Use --dta for StringTie

hisat2 -p 8 -x hisat2_index
--dta
-1 r1.fq.gz -2 r2.fq.gz |
samtools sort -@ 4 -o aligned.sorted.bam -

Alignment Summary

HISAT2 prints summary to stderr

hisat2 -p 8 -x hisat2_index -1 r1.fq.gz -2 r2.fq.gz -S aligned.sam 2> summary.txt

Example:

50000000 reads; of these: 50000000 (100.00%) were paired; of these: 2500000 (5.00%) aligned concordantly 0 times 45000000 (90.00%) aligned concordantly exactly 1 time 2500000 (5.00%) aligned concordantly >1 times 95.00% overall alignment rate

Memory Comparison

Aligner Human Genome Memory

STAR ~30GB

HISAT2 ~8GB

Related Skills

  • read-alignment/star-alignment - Alternative with more features

  • rna-quantification/featurecounts-counting - Count aligned reads

  • rna-quantification/alignment-free-quant - Skip alignment entirely

  • differential-expression/deseq2-basics - Downstream DE analysis

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

bioskills

No summary provided by upstream source.

Repository SourceNeeds Review
General

bio-data-visualization-genome-tracks

No summary provided by upstream source.

Repository SourceNeeds Review
General

bio-data-visualization-specialized-omics-plots

No summary provided by upstream source.

Repository SourceNeeds Review