chembl-database

Query the ChEMBL database for bioactive compounds, drug targets, and bioactivity data. Use this skill when searching for small molecules, finding inhibitors for protein targets, or analyzing drug mechanisms of action.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "chembl-database" with this command: npx skills add aminoanalytica/amina-skills/aminoanalytica-amina-skills-chembl-database

ChEMBL Database

ChEMBL is the European Bioinformatics Institute's repository of bioactive compound data, containing over 2 million compounds, 19 million bioactivity measurements, and 13,000+ drug targets.

Use Cases

  • Find potent inhibitors for a protein target
  • Search for compounds similar to a known drug
  • Retrieve drug mechanism of action data
  • Filter compounds by molecular properties (Lipinski, etc.)
  • Export bioactivity data for ML or analysis

Installation

uv pip install chembl_webresource_client

Basic Usage

from chembl_webresource_client.new_client import new_client

# Fetch compound by identifier
mol = new_client.molecule.get('CHEMBL192')

# Retrieve target data
tgt = new_client.target.get('CHEMBL203')

# Query activity measurements
acts = new_client.activity.filter(
    target_chembl_id='CHEMBL203',
    standard_type='IC50',
    standard_value__lte=50
)

Available Endpoints

ResourceDescription
moleculeCompound structures and properties
targetBiological targets
activityBioassay measurements
assayExperimental protocols
drugApproved drug data
mechanismDrug mechanisms of action
drug_indicationTherapeutic indications
similarityStructure similarity search
substructureSubstructure search
documentLiterature references
cell_lineCell line data
protein_classProtein classifications
imageSVG molecular images

Query Operators

The client uses Django-style filtering:

OperatorFunctionExample
__exactExact matchpref_name__exact='Aspirin'
__icontainsCase-insensitive substringpref_name__icontains='kinase'
__lte, __gteLess/greater than or equalstandard_value__lte=10
__lt, __gtLess/greater thanpchembl_value__gt=7
__rangeValue within rangealogp__range=[-1, 5]
__inValue in listtarget_chembl_id__in=['CHEMBL203']
__isnullNull checkpchembl_value__isnull=False
__startswithPrefix matchpref_name__startswith='Proto'
__regexRegular expressionpref_name__regex='^[A-Z]{3}'

Common Workflows

Find Target Inhibitors

from chembl_webresource_client.new_client import new_client

activity = new_client.activity

# Get potent BRAF inhibitors (IC50 < 100 nM)
braf_hits = activity.filter(
    target_chembl_id='CHEMBL5145',
    standard_type='IC50',
    standard_value__lte=100,
    standard_units='nM'
)

for hit in braf_hits:
    print(f"{hit['molecule_chembl_id']}: {hit['standard_value']} nM")

Search by Target Name

from chembl_webresource_client.new_client import new_client

target = new_client.target
activity = new_client.activity

# Find CDK targets
cdk_targets = target.filter(
    pref_name__icontains='cyclin-dependent kinase',
    target_type='SINGLE PROTEIN'
)

target_ids = [t['target_chembl_id'] for t in cdk_targets]

# Get activities for these targets
cdk_activities = activity.filter(
    target_chembl_id__in=target_ids[:5],
    standard_type='IC50',
    standard_value__lte=100,
    standard_units='nM'
)

Structure Similarity Search

from chembl_webresource_client.new_client import new_client

sim = new_client.similarity

# Find molecules 80% similar to ibuprofen
ibuprofen_smiles = 'CC(C)Cc1ccc(cc1)C(C)C(=O)O'
matches = sim.filter(smiles=ibuprofen_smiles, similarity=80)

for m in matches:
    print(f"{m['molecule_chembl_id']}: {m['similarity']}%")

Substructure Search

from chembl_webresource_client.new_client import new_client

sub = new_client.substructure

# Find compounds with benzimidazole core
benzimidazole = 'c1ccc2[nH]cnc2c1'
compounds = sub.filter(smiles=benzimidazole)

Filter by Molecular Properties

from chembl_webresource_client.new_client import new_client

mol = new_client.molecule

# Lipinski-compliant fragments
fragments = mol.filter(
    molecule_properties__mw_freebase__lte=300,
    molecule_properties__alogp__lte=3,
    molecule_properties__hbd__lte=3,
    molecule_properties__hba__lte=3
)

Drug Mechanisms of Action

from chembl_webresource_client.new_client import new_client

mech = new_client.mechanism
drug_ind = new_client.drug_indication

# Get mechanism of metformin
metformin_id = 'CHEMBL1431'
mechanisms = mech.filter(molecule_chembl_id=metformin_id)

for m in mechanisms:
    print(f"Target: {m['target_chembl_id']}")
    print(f"Action: {m['action_type']}")

# Get approved indications
indications = drug_ind.filter(molecule_chembl_id=metformin_id)

Generate Molecule Images

from chembl_webresource_client.new_client import new_client

img = new_client.image

# Get SVG of caffeine
caffeine_svg = img.get('CHEMBL113')

with open('caffeine.svg', 'w') as f:
    f.write(caffeine_svg)

Key Response Fields

Molecule Properties

FieldDescription
molecule_chembl_idChEMBL identifier
pref_namePreferred name
molecule_structures.canonical_smilesSMILES string
molecule_structures.standard_inchi_keyInChI key
molecule_properties.mw_freebaseMolecular weight
molecule_properties.alogpCalculated LogP
molecule_properties.hba / hbdH-bond acceptors/donors
molecule_properties.psaPolar surface area
molecule_properties.rtbRotatable bonds
molecule_properties.num_ro5_violationsLipinski violations
molecule_properties.qed_weightedQED drug-likeness

Activity Fields

FieldDescription
molecule_chembl_idCompound ID
target_chembl_idTarget ID
standard_typeMeasurement type (IC50, Ki, EC50)
standard_valueNumeric value
standard_unitsUnits (nM, uM)
pchembl_valueNormalized -log10 value
data_validity_commentQuality flag
potential_duplicateDuplicate indicator

Target Fields

FieldDescription
target_chembl_idChEMBL target ID
pref_namePreferred name
target_typeSINGLE PROTEIN, PROTEIN COMPLEX, etc.
organismSpecies

Mechanism Fields

FieldDescription
molecule_chembl_idDrug ID
target_chembl_idTarget ID
mechanism_of_actionDescription
action_typeINHIBITOR, AGONIST, ANTAGONIST, etc.

Export to DataFrame

import pandas as pd
from chembl_webresource_client.new_client import new_client

activity = new_client.activity

results = activity.filter(
    target_chembl_id='CHEMBL279',
    standard_type='Ki',
    pchembl_value__isnull=False
)

df = pd.DataFrame(list(results))
df.to_csv('dopamine_d2_ligands.csv', index=False)

Configuration

from chembl_webresource_client.settings import Settings

cfg = Settings.Instance()

cfg.CACHING = True           # Enable response caching
cfg.CACHE_EXPIRE = 43200     # Cache TTL (12 hours)
cfg.TIMEOUT = 60             # Request timeout
cfg.TOTAL_RETRIES = 5        # Retry attempts

Data Quality Notes

  • ChEMBL data is manually curated but verify data_validity_comment fields
  • Check potential_duplicate flags when aggregating results
  • Use pchembl_value for normalized comparisons across assay types
  • Activity values without standard_units should be used cautiously

Best Practices

  1. Use caching - Reduces API load and improves performance
  2. Filter early - Apply filters to reduce data transfer
  3. Limit results - Use [:n] slicing for testing
  4. Check validity - Inspect data_validity_comment fields
  5. Use pchembl_value - Normalized values enable cross-assay comparison
  6. Batch queries - Use __in operator for multiple IDs

Error Handling

from chembl_webresource_client.new_client import new_client

mol = new_client.molecule

try:
    result = mol.get('INVALID_ID')
except Exception as e:
    if '404' in str(e):
        print("Compound not found")
    elif '503' in str(e):
        print("Service unavailable - retry later")
    else:
        raise

External Links

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

alphafold-database

No summary provided by upstream source.

Repository SourceNeeds Review
General

rdkit

No summary provided by upstream source.

Repository SourceNeeds Review
General

pdb-database

No summary provided by upstream source.

Repository SourceNeeds Review
General

scikit-bio

No summary provided by upstream source.

Repository SourceNeeds Review