bio-alignment-pairwise

Pairwise Sequence Alignment

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "bio-alignment-pairwise" with this command: npx skills add gptomics/bioskills/gptomics-bioskills-bio-alignment-pairwise

Pairwise Sequence Alignment

Align two sequences using dynamic programming algorithms (Needleman-Wunsch for global, Smith-Waterman for local).

Required Import

from Bio.Align import PairwiseAligner from Bio.Seq import Seq from Bio import SeqIO

Core Concepts

Mode Algorithm Use Case

global

Needleman-Wunsch Full-length alignment, similar-length sequences

local

Smith-Waterman Find best matching regions, different-length sequences

Creating an Aligner

Basic aligner with defaults

aligner = PairwiseAligner()

Configure mode and scoring

aligner = PairwiseAligner(mode='global', match_score=2, mismatch_score=-1, open_gap_score=-10, extend_gap_score=-0.5)

For protein alignment with substitution matrix

from Bio.Align import substitution_matrices aligner = PairwiseAligner(mode='global', substitution_matrix=substitution_matrices.load('BLOSUM62'))

Performing Alignments

seq1 = Seq('ACCGGTAACGTAG') seq2 = Seq('ACCGTTAACGAAG')

Get all optimal alignments

alignments = aligner.align(seq1, seq2) print(f'Found {len(alignments)} optimal alignments') print(alignments[0]) # Print first alignment

Get score only (faster for large sequences)

score = aligner.score(seq1, seq2)

Alignment Output Format

target 0 ACCGGTAACGTAG 13 0 |||||.||||.|| 13 query 0 ACCGTTAACGAAG 13

Accessing Alignment Data

alignment = alignments[0]

Basic properties

print(alignment.score) # Alignment score print(alignment.shape) # (num_seqs, alignment_length) print(len(alignment)) # Alignment length

Get aligned sequences with gaps

target_aligned = alignment[0, :] # First sequence (target) with gaps query_aligned = alignment[1, :] # Second sequence (query) with gaps

Get coordinate mapping

print(alignment.aligned) # Array of aligned segment coordinates print(alignment.coordinates) # Full coordinate array

Alignment Counts (Identities, Mismatches, Gaps)

alignment = alignments[0] counts = alignment.counts()

print(f'Identities: {counts.identities}') print(f'Mismatches: {counts.mismatches}') print(f'Gaps: {counts.gaps}')

Calculate percent identity

total_aligned = counts.identities + counts.mismatches percent_identity = counts.identities / total_aligned * 100 print(f'Percent identity: {percent_identity:.1f}%')

Common Scoring Configurations

DNA/RNA Alignment

aligner = PairwiseAligner(mode='global', match_score=2, mismatch_score=-1, open_gap_score=-10, extend_gap_score=-0.5)

Protein Alignment

from Bio.Align import substitution_matrices blosum62 = substitution_matrices.load('BLOSUM62') aligner = PairwiseAligner(mode='global', substitution_matrix=blosum62, open_gap_score=-11, extend_gap_score=-1)

Local Alignment (Find Best Region)

aligner = PairwiseAligner(mode='local', match_score=2, mismatch_score=-1, open_gap_score=-10, extend_gap_score=-0.5)

Semiglobal (Overlap/Extension)

Allow free end gaps on query (useful for primer alignment)

aligner = PairwiseAligner(mode='global') aligner.query_left_open_gap_score = 0 aligner.query_left_extend_gap_score = 0 aligner.query_right_open_gap_score = 0 aligner.query_right_extend_gap_score = 0

Available Substitution Matrices

from Bio.Align import substitution_matrices print(substitution_matrices.load()) # List all available matrices

Common matrices

blosum62 = substitution_matrices.load('BLOSUM62') # General protein blosum80 = substitution_matrices.load('BLOSUM80') # Closely related proteins pam250 = substitution_matrices.load('PAM250') # Distantly related proteins

Working with SeqRecord Objects

from Bio import SeqIO

records = list(SeqIO.parse('sequences.fasta', 'fasta')) seq1, seq2 = records[0].seq, records[1].seq

aligner = PairwiseAligner(mode='global', match_score=1, mismatch_score=-1) alignments = aligner.align(seq1, seq2)

Iterating Over Multiple Alignments

Limit number of alignments returned (memory efficient)

aligner.max_alignments = 100

for i, alignment in enumerate(alignments): print(f'Alignment {i+1}: score={alignment.score}') if i >= 4: break

Substitution Matrix from Alignment

alignment = alignments[0] substitutions = alignment.substitutions

View as array (rows=target, cols=query)

print(substitutions)

Access specific substitution counts

substitutions['A', 'T'] gives count of A aligned to T

Export Alignment to Different Formats

alignment = alignments[0]

Various output formats

print(format(alignment, 'fasta')) # FASTA format print(format(alignment, 'clustal')) # Clustal format print(format(alignment, 'psl')) # PSL format (BLAT) print(format(alignment, 'sam')) # SAM format

Quick Reference: Scoring Parameters

Parameter Description Typical DNA Typical Protein

match_score

Score for identical bases 1-2 Use matrix

mismatch_score

Penalty for mismatches -1 to -3 Use matrix

open_gap_score

Cost to start a gap -5 to -15 -10 to -12

extend_gap_score

Cost per gap extension -0.5 to -2 -0.5 to -1

substitution_matrix

Scoring matrix N/A BLOSUM62

Common Errors

Error Cause Solution

OverflowError

Too many optimal alignments Set aligner.max_alignments

Low scores Wrong scoring scheme Use substitution matrix for proteins

No alignments in local mode Scores all negative Ensure match_score

0

Decision Tree: Choosing Alignment Mode

Need full-length comparison? ├── Yes → Use mode='global' │ └── Sequences similar length? │ ├── Yes → Standard global │ └── No → Consider semiglobal (free end gaps) └── No → Use mode='local' └── Find best matching regions only

Related Skills

  • alignment-io - Save alignments to files in various formats

  • msa-parsing - Work with multiple sequence alignments

  • msa-statistics - Calculate identity, similarity metrics

  • sequence-manipulation/motif-search - Pattern matching in sequences

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

bioskills

No summary provided by upstream source.

Repository SourceNeeds Review
General

bio-data-visualization-genome-tracks

No summary provided by upstream source.

Repository SourceNeeds Review
General

bio-data-visualization-specialized-omics-plots

No summary provided by upstream source.

Repository SourceNeeds Review