Chemistry Query Agent v1.4.1
Overview
Full-stack chemistry toolkit combining PubChem data retrieval with RDKit molecule processing, visualization, analysis, retrosynthesis, and synthesis planning. All outputs are structured JSON for easy downstream chaining. Generates PNG/SVG images on demand.
Key capabilities:
- PubChem compound lookup (info, structure, synthesis refs, similarity search)
- RDKit molecular properties (MW, logP, TPSA, HBD/HBA, rotatable bonds, aromatic rings)
- 2D molecule visualization (PNG/SVG)
- BRICS retrosynthesis with recursive depth control
- Multi-step synthesis route planning
- Forward reaction simulation with SMARTS templates
- Morgan fingerprints and similarity/substructure search
- 21 named reaction templates (Suzuki, Heck, Grignard, Wittig, Diels-Alder, etc.)
Quick Start
# PubChem compound info
exec python scripts/query_pubchem.py --compound "aspirin" --type info
# Molecular properties from SMILES
exec python scripts/rdkit_mol.py --smiles "CC(=O)Oc1ccccc1C(=O)O" --action props
# Retrosynthesis
exec python scripts/rdkit_mol.py --target "CC(=O)Oc1ccccc1C(=O)O" --action retro --depth 2
# Full chain (name → props + draw + retro)
exec python scripts/chain_entry.py --input-json '{"name": "caffeine", "context": "user"}'
Scripts
scripts/query_pubchem.py
PubChem REST API queries with automatic name→CID resolution and timeout handling.
--compound <name|CID> --type <info|structure|synthesis|similar> [--format smiles|inchi|image|json] [--threshold 80]
- info: Formula, MW, IUPAC name, InChIKey (JSON)
- structure: SMILES, InChI, image URL, or full JSON
- synthesis: Synonyms/references for a compound
- similar: Similar compounds by 2D fingerprint (top 20)
scripts/rdkit_mol.py
RDKit cheminformatics engine. Resolves names via PubChem automatically.
--smiles <SMILES> --action <props|draw|fingerprint|similarity|substruct|xyz|react|retro|plan>
| Action | Description | Key Args |
|---|---|---|
| props | MW, logP, TPSA, HBD, HBA, rotB, aromRings | --smiles |
| draw | 2D PNG/SVG (300×300) | --smiles --output file.png --format png|svg |
| retro | BRICS recursive retrosynthesis | --target <SMILES|name> --depth N |
| plan | Multi-step retro route | --target <SMILES|name> --steps N |
| react | Forward reaction via SMARTS | --reactants "smi1 smi2" --smarts "<SMARTS>" |
| fingerprint | Morgan fingerprint bitvector | --smiles --radius 2 |
| similarity | Tanimoto similarity scoring | --query_smiles --target_smiles "smi1,smi2" |
| substruct | Substructure matching | --query_smiles --target_smiles "smi1,smi2" |
| xyz | 3D coordinates (MMFF optimized) | --smiles |
scripts/chain_entry.py
Standard agent chain interface. Accepts {"smiles": "...", "context": "..."} or {"name": "...", "context": "..."}. Returns unified JSON with props, visualization, and retrosynthesis.
python scripts/chain_entry.py --input-json '{"name": "sotorasib", "context": "user"}'
Output schema:
{
"agent": "chemistry-query",
"version": "1.4.0",
"smiles": "<canonical>",
"status": "success|error",
"report": {"props": {...}, "draw": {...}, "retro": {...}},
"risks": [],
"viz": ["path/to/image.png"],
"recommend_next": ["pharmacology", "toxicology"],
"confidence": 0.95,
"warnings": [],
"timestamp": "ISO8601"
}
scripts/templates.json
21 named reaction templates with SMARTS, expected yields, conditions, and references. Includes: Suzuki, Heck, Buchwald-Hartwig, Grignard, Wittig, Diels-Alder, Click, Sonogashira, Negishi, and more.
Chaining
- Name → Full Profile:
chain_entry.pywith{"name": "ibuprofen"}→ props + draw + retro - Chemistry → Pharmacology: Output feeds directly into
pharma-pharmacology-agent - Retro + Viz: Get precursors, then draw each one
- Suzuki Test:
--action react --reactants "c1ccccc1Br c1ccccc1B(O)O" --smarts "[c:1][Br:2].[c:3][B]([c:4])(O)O>>[c:1][c:3]"
Tested With
All features verified end-to-end with RDKit 2024.03+:
| Molecule | SMILES | Tests Passed |
|---|---|---|
| Caffeine | CN1C=NC2=C1C(=O)N(C(=O)N2C)C | info, structure, props, draw, retro, plan, chain |
| Aspirin | CC(=O)Oc1ccccc1C(=O)O | info, structure, props, draw, retro, plan, chain |
| Sotorasib | PubChem name lookup | info, structure, props, draw, retro, chain |
| Ibuprofen | PubChem name lookup | info, structure, props, chain |
| Invalid SMILES | XXXINVALID | Graceful JSON error |
| Empty input | {} | Graceful JSON error |
Resources
references/api_endpoints.md— PubChem API endpoint reference and rate limitsscripts/rdkit_reaction.py— Legacy reaction modulescripts/chembl_query.py,scripts/pubmed_search.py,scripts/admet_predict.py— Additional query modules
scripts/advanced_chem.py
Advanced cheminformatics engine with 6 Tier 1 capabilities.
--action <standardize|descriptors|scaffold|mcs|mmpa|chemspace> --smiles <SMILES> [options]
| Action | Description | Key Args |
|---|---|---|
| standardize | Salt stripping, charge normalization, tautomer enumeration | --smiles |
| descriptors | 217+ molecular descriptors (RDKit full set), QED, SA Score, Lipinski/Veber rules | --smiles --descriptor_set all|druglike|physical|topological |
| scaffold | Murcko scaffold extraction, generic scaffolds, diversity analysis, R-group decomposition | --smiles or --target_smiles "smi1,smi2,..." --rgroup_core <SMARTS> |
| mcs | Maximum Common Substructure across 2+ molecules | --target_smiles "smi1,smi2,..." |
| mmpa | Matched Molecular Pair Analysis — find single-point transformations | --target_smiles "smi1,smi2,..." |
| chemspace | Chemical space visualization (PCA/t-SNE/UMAP scatter plot PNG) | --target_smiles "smi1,smi2,..." --method pca|tsne|umap --output plot.png |
Examples:
# Standardize a salt form
python scripts/advanced_chem.py --action standardize --smiles "[Na+].CC(=O)[O-]"
# Full descriptors (217+)
python scripts/advanced_chem.py --action descriptors --smiles "CC(=O)Oc1ccccc1C(=O)O" --descriptor_set all
# Scaffold diversity of a set
python scripts/advanced_chem.py --action scaffold --target_smiles "CC(=O)Oc1ccccc1C(=O)O,CN1C=NC2=C1C(=O)N(C(=O)N2C)C,CC(C)Cc1ccc(cc1)C(C)C(=O)O"
# MCS of aspirin and salicylic acid
python scripts/advanced_chem.py --action mcs --target_smiles "CC(=O)Oc1ccccc1C(=O)O,c1ccccc1C(=O)O"
# Matched molecular pairs
python scripts/advanced_chem.py --action mmpa --target_smiles "c1ccc(CC(=O)O)cc1,c1ccc(CCC(=O)O)cc1"
# Chemical space PCA plot
python scripts/advanced_chem.py --action chemspace --target_smiles "CC(=O)Oc1ccccc1C(=O)O,CN1C=NC2=C1C(=O)N(C(=O)N2C)C,c1ccccc1" --method pca --output space.png
Changelog
v2.0.0 (2026-02-28)
- NEW:
advanced_chem.pywith 6 Tier 1 cheminformatics capabilities- Molecular Standardization & Tautomer Enumeration (salt stripping, charge normalization, canonical tautomers)
- Extended Descriptors (217+ RDKit descriptors, QED, SA Score, Lipinski, Veber)
- Scaffold Analysis (Murcko, generic scaffolds, diversity ratio, R-group decomposition)
- Maximum Common Substructure (rdFMCS with coverage per molecule)
- Matched Molecular Pair Analysis (rdMMPA fragmentation, transformation detection)
- Chemical Space Visualization (PCA/t-SNE/UMAP with matplotlib scatter plots)
- Dependencies: scikit-learn, matplotlib (added)
v1.4.1 (2026-02-25)
- Security hardening: input sanitization for all subprocess calls (SMILES, compound names, output paths)
- Added
_sanitize_input()— length limits, null-byte rejection for all user inputs - Added
_sanitize_output_path()— prevents path traversal, restricts extensions, blocks arbitrary file writes - Added shell metacharacter rejection in
resolve_target() - Added SMILES validation via RDKit in
chem_ui.pybefore subprocess calls - Added compound input validation in
query_pubchem.py(length/null-byte checks) - Added timeout to
resolve_target()PubChem subprocess call - Addresses VirusTotal "suspicious" classification for argument injection vectors
v1.4.0 (2026-02-14)
- Fixed PubChem SMILES/InChI endpoint (property/CanonicalSMILES/TXT)
- Fixed chain_entry.py HTML entity corruption
- Fixed brics_retro to handle BRICSDecompose string output correctly
- Added request timeouts (15s) to all PubChem calls
- Graceful error handling for invalid SMILES and empty input
- Updated chain output version and schema
- Comprehensive end-to-end testing
v1.3.0
- RDKit props NoneType fixes, invalid SMILES graceful errors
- React fix: ReactionFromSmarts import
- Name resolution via PubChem for all RDKit actions
v1.2.0
- BRICS retrosynthesis + 21 reaction templates library
- Multi-step synthesis planning