Bioinformatics

Analyze DNA, RNA, and protein sequences with alignment, variant calling, and expression analysis pipelines.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "Bioinformatics" with this command: npx skills add ivangdavila/bioinformatics

Setup

On first use, read setup.md for integration guidelines. Create ~/bioinformatics/ with user consent to store project context and preferences.

When to Use

User needs to analyze biological sequences, run genomic pipelines, or interpret sequencing data. Agent handles sequence alignment, variant calling, expression analysis, and format conversions.

Architecture

Memory lives in ~/bioinformatics/. See memory-template.md for structure.

~/bioinformatics/
├── memory.md         # Projects, preferences, reference genomes
├── pipelines/        # Saved pipeline configurations
└── results/          # Analysis outputs and logs

Quick Reference

TopicFile
Setup processsetup.md
Memory templatememory-template.md
File formatsformats.md
Tool commandstools.md
RNA-seq pipelinernaseq.md
Variant callingvariants.md

Core Rules

1. Verify Input Quality First

Before any analysis, check input data quality:

  • FASTQ: Run FastQC, check per-base quality, adapter content
  • BAM: Verify sorted, indexed (samtools quickcheck)
  • VCF: Validate format (bcftools view -h)

Bad input → garbage output. Always QC first.

2. Use Reference Genome Consistently

Track which reference is used per project:

  • Human: GRCh38/hg38 (prefer) or GRCh37/hg19
  • Mouse: GRCm39/mm39 or GRCm38/mm10
  • Mixing references = invalid results

Store reference info in ~/bioinformatics/memory.md per project.

3. Preserve Raw Data

NEVER modify original FASTQ/BAM files:

  • Work on copies
  • Keep originals read-only
  • Log every transformation step

4. Resource Awareness

Bioinformatics commands can consume massive resources:

  • Check file sizes before operations
  • Use streaming when possible (samtools view | ...)
  • Estimate memory needs (BWA: ~6GB for human genome)
  • Warn before operations >10 minutes

5. Reproducibility

Every analysis must be reproducible:

  • Log exact tool versions (samtools --version)
  • Save command parameters
  • Record input file checksums for critical analyses

Common Traps

  • Wrong chromosome namingchr1 vs 1 causes silent failures. Check and convert with sed 's/^chr//'
  • Unsorted BAM — Most tools expect sorted input. Symptoms: errors or wrong results with no warning
  • Index missing — BAM needs .bai, VCF needs .tbi. Commands fail cryptically without them
  • Memory exhaustion — Large BAM operations kill the session. Stream or use --threads wisely
  • Stale indices — After modifying BAM/VCF, regenerate index. Old index = corrupt reads
  • 0-based vs 1-based coordinates — BED is 0-based, VCF/GFF is 1-based. Off-by-one bugs are common

File Formats Quick Reference

FormatPurposeKey Tool
FASTAReference sequencessamtools faidx
FASTQRaw reads + qualityseqtk, fastp
SAM/BAMAligned readssamtools
VCF/BCFVariantsbcftools
BEDGenomic intervalsbedtools
GFF/GTFGene annotationsgffread
BigWigCoverage tracksdeepTools

Essential Commands

Quality Control

# FASTQ quality report
fastqc sample.fastq.gz -o qc_reports/

# Trim adapters + low quality
fastp -i R1.fq.gz -I R2.fq.gz -o R1.clean.fq.gz -O R2.clean.fq.gz

# BAM statistics
samtools flagstat aligned.bam
samtools stats aligned.bam > stats.txt

Alignment

# Index reference (once)
bwa index reference.fa

# Align paired-end reads
bwa mem -t 8 reference.fa R1.fq.gz R2.fq.gz | \
  samtools sort -o aligned.bam -

# Index BAM
samtools index aligned.bam

Variant Calling

# Call variants
bcftools mpileup -Ou -f reference.fa aligned.bam | \
  bcftools call -mv -Oz -o variants.vcf.gz

# Index VCF
bcftools index variants.vcf.gz

# Filter variants
bcftools filter -s LowQual -e 'QUAL<20' variants.vcf.gz

Data Manipulation

# Extract region
samtools view -b aligned.bam chr1:1000000-2000000 > region.bam

# Convert BAM to FASTQ
samtools fastq -1 R1.fq.gz -2 R2.fq.gz aligned.bam

# Merge BAMs
samtools merge merged.bam sample1.bam sample2.bam

# Subset VCF by region
bcftools view -r chr1:1000-2000 variants.vcf.gz

Security & Privacy

Data access:

  • Only reads files user explicitly provides as input
  • Writes outputs to directories user specifies
  • Stores preferences in ~/bioinformatics/ (with consent)

Data that stays local:

  • All sequence data processed locally
  • No external API calls for analysis
  • Pipeline configs in ~/bioinformatics/

This skill does NOT:

  • Upload sequence data anywhere
  • Access files without explicit user instruction
  • Infer or collect data beyond explicit inputs
  • Make network requests during analysis

Note: Installing tools (conda, brew) and downloading reference genomes requires internet access. These are user-initiated actions.

Related Skills

Install with clawhub install <slug> if user confirms:

  • data-analysis — statistical interpretation
  • statistics — hypothesis testing
  • science — research methodology

Feedback

  • If useful: clawhub star bioinformatics
  • Stay updated: clawhub sync

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Research

Autism Spectrum Disorder Behavior Analysis Tool | 孤独症谱系障碍行为分析工具

Performs special video analysis on behavioral characteristics of children with autism, identifies core symptom features, provides structured analysis reports...

Registry SourceRecently Updated
Research

""Mental Health Analysis Tool | 心理健康分析工具""

Analyzes human mental health and psychological behavior, supports identifying common psychological problem tendencies through video analysis, and provides st...

Registry SourceRecently Updated
Research

"""Micro-Expression Recognition & Analysis Tool | 微观情绪识别分析工具"""

Professional discernment of subtle cues! It performs detailed analysis and recognition of facial micro-expressions, outputs precise emotional state reports,...

Registry SourceRecently Updated
840Profile unavailable
Research

媒体广告流量分析

查询广告投放流量分布与趋势的数据分析技能。支持按行业、地域、媒体(OTT/移动端)、目标受众等多维度分析广告曝光数据,适用于媒体策略评估、竞品投放监测、行业广告趋势研究等场景。

Registry SourceRecently Updated
336Profile unavailable