rdkit

Python cheminformatics library for molecular manipulation and analysis. Parse SMILES/SDF/MOL formats, compute descriptors (MW, LogP, TPSA), generate fingerprints (Morgan, MACCS), perform substructure queries with SMARTS, create 2D/3D geometries, calculate similarity, and run chemical reactions.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "rdkit" with this command: npx skills add aminoanalytica/amina-skills/aminoanalytica-amina-skills-rdkit

RDKit: Python Cheminformatics Library

Summary

RDKit (v2023+) provides comprehensive Python APIs for molecular structure manipulation, property calculation, and chemical informatics. It requires Python 3 and NumPy, offering modular components for molecule parsing, descriptors, fingerprints, substructure search, conformer generation, and reaction processing.

Applicable Scenarios

This skill applies when you need to:

Task CategoryExamples
Molecule I/OParse SMILES, MOL, SDF, InChI; write structures
Property CalculationMolecular weight, LogP, TPSA, H-bond donors/acceptors
FingerprintingMorgan (ECFP), MACCS keys, atom pairs, topological
Similarity AnalysisTanimoto, Dice, clustering compounds
Substructure SearchSMARTS patterns, functional group detection
3D ConformersGenerate, optimize, align molecular geometries
Chemical ReactionsDefine and execute transformations
Drug-LikenessLipinski rules, QED, lead-likeness filters
Visualization2D depictions, highlighting, grid images

Module Organization

ModulePurposeReference
rdkit.ChemCore molecule parsing, serialization, substructurereferences/api-reference.md
rdkit.Chem.DescriptorsProperty calculationsreferences/descriptors-reference.md
rdkit.Chem.rdFingerprintGeneratorModern fingerprint APIreferences/api-reference.md
rdkit.DataStructsSimilarity metrics, bulk operationsreferences/api-reference.md
rdkit.Chem.AllChem3D coordinates, reactions, optimizationreferences/api-reference.md
rdkit.Chem.DrawVisualization and depictionreferences/api-reference.md
SMARTS patternsSubstructure query languagereferences/smarts-patterns.md

Setup

Install via pip or conda:

# Conda (recommended)
conda install -c conda-forge rdkit

# Pip
pip install rdkit-pypi

Standard imports:

from rdkit import Chem
from rdkit.Chem import AllChem, Descriptors, Draw
from rdkit import DataStructs

Quick Reference

Parse and Validate Molecules

from rdkit import Chem

mol = Chem.MolFromSmiles('c1ccc(O)cc1')
if mol is None:
    print("Invalid SMILES")

Compute Properties

from rdkit.Chem import Descriptors

mw = Descriptors.MolWt(mol)
logp = Descriptors.MolLogP(mol)
tpsa = Descriptors.TPSA(mol)

Generate Fingerprints

from rdkit.Chem import rdFingerprintGenerator

gen = rdFingerprintGenerator.GetMorganGenerator(radius=2, fpSize=2048)
fp = gen.GetFingerprint(mol)

Similarity Search

from rdkit import DataStructs

similarity = DataStructs.TanimotoSimilarity(fp1, fp2)

Substructure Match

pattern = Chem.MolFromSmarts('[OH1][C]')  # Alcohol
has_alcohol = mol.HasSubstructMatch(pattern)

Generate 3D Conformer

from rdkit.Chem import AllChem

mol = Chem.AddHs(mol)
AllChem.EmbedMolecule(mol, randomSeed=42)
AllChem.MMFFOptimizeMolecule(mol)

Implementation Patterns

Drug-Likeness Assessment

from rdkit import Chem
from rdkit.Chem import Descriptors

def assess_druglikeness(smiles: str) -> dict | None:
    mol = Chem.MolFromSmiles(smiles)
    if mol is None:
        return None

    mw = Descriptors.MolWt(mol)
    logp = Descriptors.MolLogP(mol)
    hbd = Descriptors.NumHDonors(mol)
    hba = Descriptors.NumHAcceptors(mol)

    return {
        'MW': mw,
        'LogP': logp,
        'HBD': hbd,
        'HBA': hba,
        'TPSA': Descriptors.TPSA(mol),
        'RotBonds': Descriptors.NumRotatableBonds(mol),
        'Lipinski': mw <= 500 and logp <= 5 and hbd <= 5 and hba <= 10,
        'QED': Descriptors.qed(mol)
    }

Batch Similarity Search

from rdkit import Chem, DataStructs
from rdkit.Chem import rdFingerprintGenerator

def find_similar(query_smiles: str, database: list[str], threshold: float = 0.7) -> list:
    query = Chem.MolFromSmiles(query_smiles)
    if query is None:
        return []

    gen = rdFingerprintGenerator.GetMorganGenerator(radius=2)
    query_fp = gen.GetFingerprint(query)

    hits = []
    for idx, smi in enumerate(database):
        mol = Chem.MolFromSmiles(smi)
        if mol:
            fp = gen.GetFingerprint(mol)
            sim = DataStructs.TanimotoSimilarity(query_fp, fp)
            if sim >= threshold:
                hits.append((idx, smi, sim))

    return sorted(hits, key=lambda x: x[2], reverse=True)

Functional Group Screening

from rdkit import Chem

FUNCTIONAL_GROUPS = {
    'alcohol': '[OH1][C]',
    'amine': '[NH2,NH1][C]',
    'carboxylic_acid': 'C(=O)[OH1]',
    'amide': 'C(=O)N',
    'ester': 'C(=O)O[C]',
    'nitro': '[N+](=O)[O-]'
}

def detect_functional_groups(smiles: str) -> list[str]:
    mol = Chem.MolFromSmiles(smiles)
    if mol is None:
        return []

    found = []
    for name, smarts in FUNCTIONAL_GROUPS.items():
        pattern = Chem.MolFromSmarts(smarts)
        if mol.HasSubstructMatch(pattern):
            found.append(name)
    return found

Conformer Generation with Clustering

from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.ML.Cluster import Butina

def generate_diverse_conformers(smiles: str, n_confs: int = 50, rmsd_thresh: float = 0.5) -> list:
    mol = Chem.MolFromSmiles(smiles)
    if mol is None:
        return []

    mol = Chem.AddHs(mol)
    conf_ids = AllChem.EmbedMultipleConfs(mol, numConfs=n_confs, randomSeed=42)

    # Optimize all conformers
    for cid in conf_ids:
        AllChem.MMFFOptimizeMolecule(mol, confId=cid)

    # Cluster by RMSD to get diverse set
    if len(conf_ids) < 2:
        return list(conf_ids)

    dists = []
    for i in range(len(conf_ids)):
        for j in range(i):
            rmsd = AllChem.GetConformerRMS(mol, conf_ids[j], conf_ids[i])
            dists.append(rmsd)

    clusters = Butina.ClusterData(dists, len(conf_ids), rmsd_thresh, isDistData=True)
    return [conf_ids[c[0]] for c in clusters]  # Cluster centroids

Batch Processing SDF Files

from rdkit import Chem
from rdkit.Chem import Descriptors

def process_sdf(input_path: str, output_path: str, min_mw: float = 200, max_mw: float = 500):
    """Filter compounds by molecular weight and add property columns."""
    supplier = Chem.SDMolSupplier(input_path)
    writer = Chem.SDWriter(output_path)

    for mol in supplier:
        if mol is None:
            continue

        mw = Descriptors.MolWt(mol)
        if not (min_mw <= mw <= max_mw):
            continue

        # Add computed properties
        mol.SetProp('MW', f'{mw:.2f}')
        mol.SetProp('LogP', f'{Descriptors.MolLogP(mol):.2f}')
        mol.SetProp('TPSA', f'{Descriptors.TPSA(mol):.2f}')

        writer.write(mol)

    writer.close()

Guidelines

Always validate parsed molecules:

mol = Chem.MolFromSmiles(smiles)
if mol is None:
    print(f"Parse failed: {smiles}")
    continue

Use bulk operations for performance:

fps = [gen.GetFingerprint(m) for m in mols]
sims = DataStructs.BulkTanimotoSimilarity(fps[0], fps[1:])

Add hydrogens for 3D work:

mol = Chem.AddHs(mol)  # Required before EmbedMolecule
AllChem.EmbedMolecule(mol)

Stream large files:

# Memory-efficient: process one at a time
for mol in Chem.ForwardSDMolSupplier(file_handle):
    if mol:
        process(mol)

# Avoid: loading entire file
all_mols = list(Chem.SDMolSupplier('huge.sdf'))

Thread safety: Most operations are thread-safe except for concurrent access to MolSupplier objects.

Troubleshooting

IssueResolution
MolFromSmiles returns NoneInvalid SMILES syntax; check input
Sanitization errorUse Chem.DetectChemistryProblems(mol) to diagnose
Wrong 3D geometryCall AddHs(mol) before embedding
Fingerprint size mismatchUse same fpSize parameter for all comparisons
SMARTS not matchingCheck aromatic vs aliphatic atoms (c vs C)
Slow SDF processingUse ForwardSDMolSupplier or MultithreadedSDMolSupplier
Memory issues with large filesStream with ForwardSDMolSupplier, don't load all

Reference Documentation

Each reference file contains detailed API documentation:

FileContents
references/api-reference.mdComplete function/class listings by module
references/descriptors-reference.mdAll molecular descriptors with examples
references/smarts-patterns.mdCommon SMARTS patterns for substructure search

External Resources

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

biopython

No summary provided by upstream source.

Repository SourceNeeds Review
General

scikit-bio

No summary provided by upstream source.

Repository SourceNeeds Review
General

alphafold-database

No summary provided by upstream source.

Repository SourceNeeds Review
General

chembl-database

No summary provided by upstream source.

Repository SourceNeeds Review