pdb-database

Query and retrieve protein/nucleic acid structures from RCSB PDB. Use when you need to search the PDB database for structures or metadata. Supports text, sequence, and structure-based searches, coordinate downloads, and metadata retrieval for structural biology workflows.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "pdb-database" with this command: npx skills add aminoanalytica/amina-skills/aminoanalytica-amina-skills-pdb-database

RCSB Protein Data Bank

The Protein Data Bank hosts over 200,000 experimentally determined macromolecular structures plus computed models from AlphaFold and ModelArchive. This skill provides programmatic access to search, download, and analyze structural data.

Applicable Scenarios

TaskExamples
Structure RetrievalDownload coordinates for a known PDB ID
Similarity SearchFind structures similar by sequence or 3D fold
Metadata AccessGet resolution, method, organism, ligands
Dataset BuildingCompile structures for ML training or analysis
Drug DiscoveryIdentify ligand-bound structures for a target
Quality FilteringSelect high-resolution, well-refined structures

Setup

pip install rcsb-api requests

The rcsb-api package (v1.5.0+) provides:

  • rcsbapi.search - Query construction and execution
  • rcsbapi.data - DataQuery for batch retrieval

Quick Reference

Search Queries

from rcsbapi.search import TextQuery, AttributeQuery, SeqSimilarityQuery, StructSimilarityQuery

# Text search
results = list(TextQuery("kinase inhibitor")())

# Filter by organism (use string attribute paths)
human = AttributeQuery(
    attribute="rcsb_entity_source_organism.scientific_name",
    operator="exact_match",
    value="Homo sapiens"
)

# Filter by resolution
high_res = AttributeQuery(
    attribute="rcsb_entry_info.resolution_combined",
    operator="less",
    value=2.0
)

# Filter by experimental method
xray = AttributeQuery(
    attribute="exptl.method",
    operator="exact_match",
    value="X-RAY DIFFRACTION"
)

# Combine queries: & (AND), | (OR), ~ (NOT)
results = list((TextQuery("kinase") & human & high_res)())

# Sequence similarity (MMseqs2) - minimum 25 residues required
seq_query = SeqSimilarityQuery(
    value="VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH",
    evalue_cutoff=1e-5,
    identity_cutoff=0.7
)

# Structure similarity (3D fold)
struct_query = StructSimilarityQuery(
    structure_search_type="entry_id",
    entry_id="4HHB"
)

Data Retrieval (REST API)

The rcsb-api package's data module has limited functionality. Use the REST API directly for metadata:

import requests

def fetch_entry(pdb_id: str) -> dict:
    """Fetch entry metadata from RCSB REST API."""
    resp = requests.get(f"https://data.rcsb.org/rest/v1/core/entry/{pdb_id}")
    resp.raise_for_status()
    return resp.json()

# Example usage
data = fetch_entry("4HHB")
print(data["struct"]["title"])
print(data["rcsb_entry_info"]["resolution_combined"])

# Polymer entity (chain info + sequence)
def fetch_polymer_entity(pdb_id: str, entity_id: int = 1) -> dict:
    resp = requests.get(f"https://data.rcsb.org/rest/v1/core/polymer_entity/{pdb_id}/{entity_id}")
    resp.raise_for_status()
    return resp.json()

entity = fetch_polymer_entity("4HHB", 1)
sequence = entity["entity_poly"]["pdbx_seq_one_letter_code"]

File Downloads

FormatURL Pattern
PDBhttps://files.rcsb.org/download/{ID}.pdb
mmCIFhttps://files.rcsb.org/download/{ID}.cif
Assemblyhttps://files.rcsb.org/download/{ID}.pdb1
FASTAhttps://www.rcsb.org/fasta/entry/{ID}
import requests
from pathlib import Path

def download_structure(pdb_id: str, fmt: str = "cif", outdir: str = ".") -> Path:
    url = f"https://files.rcsb.org/download/{pdb_id}.{fmt}"
    resp = requests.get(url)
    resp.raise_for_status()
    outpath = Path(outdir) / f"{pdb_id}.{fmt}"
    outpath.write_text(resp.text)
    return outpath

Common Workflows

Find High-Quality Human Structures

from rcsbapi.search import TextQuery, AttributeQuery

query = (
    TextQuery("receptor") &
    AttributeQuery(
        attribute="rcsb_entity_source_organism.scientific_name",
        operator="exact_match",
        value="Homo sapiens"
    ) &
    AttributeQuery(
        attribute="rcsb_entry_info.resolution_combined",
        operator="less",
        value=2.5
    ) &
    AttributeQuery(
        attribute="exptl.method",
        operator="exact_match",
        value="X-RAY DIFFRACTION"
    )
)
results = list(query())

Batch Metadata Retrieval

import requests
import time

def fetch_batch(pdb_ids: list, delay: float = 0.3) -> dict:
    """Fetch metadata with rate limiting."""
    results = {}
    for pdb_id in pdb_ids:
        try:
            resp = requests.get(f"https://data.rcsb.org/rest/v1/core/entry/{pdb_id}")
            resp.raise_for_status()
            data = resp.json()
            results[pdb_id] = {
                "title": data["struct"]["title"],
                "resolution": data.get("rcsb_entry_info", {}).get("resolution_combined"),
                "method": data.get("exptl", [{}])[0].get("method"),
            }
        except Exception as e:
            results[pdb_id] = {"error": str(e)}
        time.sleep(delay)
    return results

Find Drug-Bound Structures

from rcsbapi.search import AttributeQuery

# Find structures containing imatinib (ligand ID: STI)
query = AttributeQuery(
    attribute="rcsb_nonpolymer_entity_instance_container_identifiers.comp_id",
    operator="exact_match",
    value="STI"
)
drug_complexes = list(query())

GraphQL for Complex Queries

import requests

query = """
{
  entry(entry_id: "4HHB") {
    struct { title }
    rcsb_entry_info {
      resolution_combined
      deposited_atom_count
    }
    polymer_entities {
      rcsb_polymer_entity { pdbx_description }
      entity_poly { pdbx_seq_one_letter_code }
    }
  }
}
"""

response = requests.post(
    "https://data.rcsb.org/graphql",
    json={"query": query}
)
result = response.json()["data"]["entry"]

Key Concepts

TermDefinition
PDB ID4-character alphanumeric code (e.g., "4HHB"). AlphaFold uses "AF_" prefix
EntityDistinct molecular species. A homodimer has one entity appearing twice
ResolutionQuality metric in angstroms. Lower is better; <2.0 Å is high quality
Biological AssemblyFunctional oligomeric state (may differ from asymmetric unit)
mmCIFModern format replacing legacy PDB; required for large structures

Common Attribute Paths

Use these string paths with AttributeQuery:

AttributeDescription
rcsb_entity_source_organism.scientific_nameSource organism (e.g., "Homo sapiens")
rcsb_entry_info.resolution_combinedResolution in angstroms
exptl.methodExperimental method (X-RAY DIFFRACTION, ELECTRON MICROSCOPY, SOLUTION NMR)
rcsb_nonpolymer_entity_instance_container_identifiers.comp_idLigand/small molecule ID
struct.titleStructure title
rcsb_accession_info.deposit_dateDeposition date

Best Practices

PracticeRationale
Use mmCIF formatPDB format has atom count limits
Filter by resolution<2.5 Å for most analyses; <2.0 Å for detailed work
Check experimental methodX-ray vs cryo-EM vs NMR have different quality metrics
Rate limit requests2-3 req/s to avoid 429 errors
Cache downloadsStructures rarely change after release
Prefer GraphQLReduces requests for complex data needs

Troubleshooting

IssueResolution
404 on entry fetchEntry may be obsoleted; check RCSB website for superseding ID
429 Too Many RequestsImplement exponential backoff; reduce request rate
Empty search resultsCheck query syntax; use query.to_dict() to debug
Large structure failsUse mmCIF format instead of PDB
Missing sequence dataQuery polymer entity endpoint, not entry

References

See references/api-reference.md for:

  • Complete REST endpoint documentation
  • All searchable attributes and operators
  • Advanced query patterns
  • Rate limiting strategies

External Links

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

rdkit

No summary provided by upstream source.

Repository SourceNeeds Review
General

uniprot-database

No summary provided by upstream source.

Repository SourceNeeds Review
General

amina-init

No summary provided by upstream source.

Repository SourceNeeds Review
General

chembl-database

No summary provided by upstream source.

Repository SourceNeeds Review