BioServices
Overview
BioServices is a Python package providing programmatic access to approximately 40 bioinformatics web services and databases. Retrieve biological data, perform cross-database queries, map identifiers, analyze sequences, and integrate multiple biological resources in Python workflows. The package handles both REST and SOAP/WSDL protocols transparently.
When to Use This Skill
This skill should be used when:
-
Retrieving protein sequences, annotations, or structures from UniProt, PDB, Pfam
-
Analyzing metabolic pathways and gene functions via KEGG or Reactome
-
Searching compound databases (ChEBI, ChEMBL, PubChem) for chemical information
-
Converting identifiers between different biological databases (KEGG↔UniProt, compound IDs)
-
Running sequence similarity searches (BLAST, MUSCLE alignment)
-
Querying gene ontology terms (QuickGO, GO annotations)
-
Accessing protein-protein interaction data (PSICQUIC, IntactComplex)
-
Mining genomic data (BioMart, ArrayExpress, ENA)
-
Integrating data from multiple bioinformatics resources in a single workflow
Core Capabilities
- Protein Analysis
Retrieve protein information, sequences, and functional annotations:
from bioservices import UniProt
u = UniProt(verbose=False)
Search for protein by name
results = u.search("ZAP70_HUMAN", frmt="tab", columns="id,genes,organism")
Retrieve FASTA sequence
sequence = u.retrieve("P43403", "fasta")
Map identifiers between databases
kegg_ids = u.mapping(fr="UniProtKB_AC-ID", to="KEGG", query="P43403")
Key methods:
-
search() : Query UniProt with flexible search terms
-
retrieve() : Get protein entries in various formats (FASTA, XML, tab)
-
mapping() : Convert identifiers between databases
Reference: references/services_reference.md for complete UniProt API details.
- Pathway Discovery and Analysis
Access KEGG pathway information for genes and organisms:
from bioservices import KEGG
k = KEGG() k.organism = "hsa" # Set to human
Search for organisms
k.lookfor_organism("droso") # Find Drosophila species
Find pathways by name
k.lookfor_pathway("B cell") # Returns matching pathway IDs
Get pathways containing specific genes
pathways = k.get_pathway_by_gene("7535", "hsa") # ZAP70 gene
Retrieve and parse pathway data
data = k.get("hsa04660") parsed = k.parse(data)
Extract pathway interactions
interactions = k.parse_kgml_pathway("hsa04660") relations = interactions['relations'] # Protein-protein interactions
Convert to Simple Interaction Format
sif_data = k.pathway2sif("hsa04660")
Key methods:
-
lookfor_organism() , lookfor_pathway() : Search by name
-
get_pathway_by_gene() : Find pathways containing genes
-
parse_kgml_pathway() : Extract structured pathway data
-
pathway2sif() : Get protein interaction networks
Reference: references/workflow_patterns.md for complete pathway analysis workflows.
- Compound Database Searches
Search and cross-reference compounds across multiple databases:
from bioservices import KEGG, UniChem
k = KEGG()
Search compounds by name
results = k.find("compound", "Geldanamycin") # Returns cpd:C11222
Get compound information with database links
compound_info = k.get("cpd:C11222") # Includes ChEBI links
Cross-reference KEGG → ChEMBL using UniChem
u = UniChem() chembl_id = u.get_compound_id_from_kegg("C11222") # Returns CHEMBL278315
Common workflow:
-
Search compound by name in KEGG
-
Extract KEGG compound ID
-
Use UniChem for KEGG → ChEMBL mapping
-
ChEBI IDs are often provided in KEGG entries
Reference: references/identifier_mapping.md for complete cross-database mapping guide.
- Sequence Analysis
Run BLAST searches and sequence alignments:
from bioservices import NCBIblast
s = NCBIblast(verbose=False)
Run BLASTP against UniProtKB
jobid = s.run( program="blastp", sequence=protein_sequence, stype="protein", database="uniprotkb", email="your.email@example.com" # Required by NCBI )
Check job status and retrieve results
s.getStatus(jobid) results = s.getResult(jobid, "out")
Note: BLAST jobs are asynchronous. Check status before retrieving results.
- Identifier Mapping
Convert identifiers between different biological databases:
from bioservices import UniProt, KEGG
UniProt mapping (many database pairs supported)
u = UniProt() results = u.mapping( fr="UniProtKB_AC-ID", # Source database to="KEGG", # Target database query="P43403" # Identifier(s) to convert )
KEGG gene ID → UniProt
kegg_to_uniprot = u.mapping(fr="KEGG", to="UniProtKB_AC-ID", query="hsa:7535")
For compounds, use UniChem
from bioservices import UniChem u = UniChem() chembl_from_kegg = u.get_compound_id_from_kegg("C11222")
Supported mappings (UniProt):
-
UniProtKB ↔ KEGG
-
UniProtKB ↔ Ensembl
-
UniProtKB ↔ PDB
-
UniProtKB ↔ RefSeq
-
And many more (see references/identifier_mapping.md )
- Gene Ontology Queries
Access GO terms and annotations:
from bioservices import QuickGO
g = QuickGO(verbose=False)
Retrieve GO term information
term_info = g.Term("GO:0003824", frmt="obo")
Search annotations
annotations = g.Annotation(protein="P43403", format="tsv")
- Protein-Protein Interactions
Query interaction databases via PSICQUIC:
from bioservices import PSICQUIC
s = PSICQUIC(verbose=False)
Query specific database (e.g., MINT)
interactions = s.query("mint", "ZAP70 AND species:9606")
List available interaction databases
databases = s.activeDBs
Available databases: MINT, IntAct, BioGRID, DIP, and 30+ others.
Multi-Service Integration Workflows
BioServices excels at combining multiple services for comprehensive analysis. Common integration patterns:
Complete Protein Analysis Pipeline
Execute a full protein characterization workflow:
python scripts/protein_analysis_workflow.py ZAP70_HUMAN your.email@example.com
This script demonstrates:
-
UniProt search for protein entry
-
FASTA sequence retrieval
-
BLAST similarity search
-
KEGG pathway discovery
-
PSICQUIC interaction mapping
Pathway Network Analysis
Analyze all pathways for an organism:
python scripts/pathway_analysis.py hsa output_directory/
Extracts and analyzes:
-
All pathway IDs for organism
-
Protein-protein interactions per pathway
-
Interaction type distributions
-
Exports to CSV/SIF formats
Cross-Database Compound Search
Map compound identifiers across databases:
python scripts/compound_cross_reference.py Geldanamycin
Retrieves:
-
KEGG compound ID
-
ChEBI identifier
-
ChEMBL identifier
-
Basic compound properties
Batch Identifier Conversion
Convert multiple identifiers at once:
python scripts/batch_id_converter.py input_ids.txt --from UniProtKB_AC-ID --to KEGG
Best Practices
Output Format Handling
Different services return data in various formats:
-
XML: Parse using BeautifulSoup (most SOAP services)
-
Tab-separated (TSV): Pandas DataFrames for tabular data
-
Dictionary/JSON: Direct Python manipulation
-
FASTA: BioPython integration for sequence analysis
Rate Limiting and Verbosity
Control API request behavior:
from bioservices import KEGG
k = KEGG(verbose=False) # Suppress HTTP request details k.TIMEOUT = 30 # Adjust timeout for slow connections
Error Handling
Wrap service calls in try-except blocks:
try: results = u.search("ambiguous_query") if results: # Process results pass except Exception as e: print(f"Search failed: {e}")
Organism Codes
Use standard organism abbreviations:
-
hsa : Homo sapiens (human)
-
mmu : Mus musculus (mouse)
-
dme : Drosophila melanogaster
-
sce : Saccharomyces cerevisiae (yeast)
List all organisms: k.list("organism") or k.organismIds
Integration with Other Tools
BioServices works well with:
-
BioPython: Sequence analysis on retrieved FASTA data
-
Pandas: Tabular data manipulation
-
PyMOL: 3D structure visualization (retrieve PDB IDs)
-
NetworkX: Network analysis of pathway interactions
-
Galaxy: Custom tool wrappers for workflow platforms
Resources
scripts/
Executable Python scripts demonstrating complete workflows:
-
protein_analysis_workflow.py : End-to-end protein characterization
-
pathway_analysis.py : KEGG pathway discovery and network extraction
-
compound_cross_reference.py : Multi-database compound searching
-
batch_id_converter.py : Bulk identifier mapping utility
Scripts can be executed directly or adapted for specific use cases.
references/
Detailed documentation loaded as needed:
-
services_reference.md : Comprehensive list of all 40+ services with methods
-
workflow_patterns.md : Detailed multi-step analysis workflows
-
identifier_mapping.md : Complete guide to cross-database ID conversion
Load references when working with specific services or complex integration tasks.
Installation
uv pip install bioservices
Dependencies are automatically managed. Package is tested on Python 3.9-3.12.
Additional Information
For detailed API documentation and advanced features, refer to:
-
Official documentation: https://bioservices.readthedocs.io/
-
Source code: https://github.com/cokelaer/bioservices
-
Service-specific references in references/services_reference.md