solublempnn

Solubility-optimized protein sequence design using SolubleMPNN. Use this skill when: (1) Designing for E. coli expression, (2) Optimizing solubility of designed proteins, (3) Reducing aggregation propensity, (4) Need high-yield expression, (5) Avoiding inclusion body formation. For standard design, use proteinmpnn. For ligand-aware design, use ligandmpnn.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "solublempnn" with this command: npx skills add adaptyvbio/protein-design-skills/adaptyvbio-protein-design-skills-solublempnn

SolubleMPNN Solubility-Optimized Design

Prerequisites

RequirementMinimumRecommended
Python3.8+3.10
CUDA11.0+11.7+
GPU VRAM8GB16GB (T4)
RAM8GB16GB

How to run

First time? See Installation Guide to set up Modal and biomodals.

Option 1: Modal (recommended)

SolubleMPNN uses the ProteinMPNN Modal wrapper with soluble model:

cd biomodals
modal run modal_proteinmpnn.py \
  --pdb-path backbone.pdb \
  --num-seq-per-target 16 \
  --sampling-temp 0.1 \
  --model-name v_48_020

GPU: T4 (16GB) | Timeout: 600s default

Option 2: Local installation

git clone https://github.com/dauparas/ProteinMPNN.git
cd ProteinMPNN

# Use soluble model weights
python protein_mpnn_run.py \
  --pdb_path backbone.pdb \
  --out_folder output/ \
  --num_seq_per_target 16 \
  --sampling_temp "0.1" \
  --model_name "v_48_020"  # Soluble model

Key parameters

ParameterDefaultRangeDescription
--pdb_pathrequiredpathInput structure
--num_seq_per_target11-1000Sequences per structure
--sampling_temp"0.1""0.0001-1.0"Temperature (string!)
--model_namev_48_020stringSoluble model variant

Model Variants

ModelDescriptionUse Case
v_48_002StandardGeneral design
v_48_020Soluble-trainedE. coli expression
v_48_030High solubilityDifficult targets

Output format

output/
├── seqs/backbone.fa
└── backbone_pdb/backbone_0001.pdb

Sample output

Successful run

$ python protein_mpnn_run.py --pdb_path backbone.pdb --model_name v_48_020 --num_seq_per_target 8
Loading soluble model weights (v_48_020)...
Designing sequences for backbone.pdb
Generated 8 sequences in 2.1 seconds

output/seqs/backbone.fa:
>backbone_0001, score=1.31, global_score=1.24, seq_recovery=0.78
MKTAYIAKQRQISFVKSHFSRQLE...
>backbone_0002, score=1.28, global_score=1.21, seq_recovery=0.81
MKTAYIAKQRQISFVKSQFSRQLD...

What good output looks like:

  • Score: 1.0-2.0 (lower = more confident)
  • Reduced hydrophobic patches compared to standard MPNN
  • Improved charge distribution

Decision tree

Should I use SolubleMPNN?
│
├─ What expression system?
│  ├─ E. coli → SolubleMPNN ✓
│  ├─ Mammalian → ProteinMPNN (PTMs matter more)
│  └─ Yeast → Either
│
├─ History of expression problems?
│  ├─ Yes, aggregation → SolubleMPNN ✓
│  ├─ Yes, low yield → SolubleMPNN ✓
│  └─ No → ProteinMPNN is fine
│
├─ What's in the binding site?
│  ├─ Small molecule / ligand → Use LigandMPNN
│  └─ Nothing / protein only → SolubleMPNN ✓
│
└─ Need highest solubility?
   ├─ Yes → Use v_48_030 model
   └─ Standard → Use v_48_020 model

Typical performance

Campaign SizeTime (T4)Cost (Modal)Notes
100 backbones × 8 seq15-20 min~$2Standard
500 backbones × 8 seq1-1.5h~$8Large campaign

Expected improvement: +15-30% solubility score vs standard ProteinMPNN.


Verify

grep -c "^>" output/seqs/*.fa  # Should match backbone_count × num_seq_per_target

Troubleshooting

Still insoluble: Try v_48_030 (higher solubility bias) Low diversity: Increase temperature to 0.2 Poor folding: Use standard ProteinMPNN and optimize later

Error interpretation

ErrorCauseFix
RuntimeError: CUDA out of memoryLong protein or large batchReduce batch_size
FileNotFoundError: v_48_020Missing model weightsDownload soluble weights

Next: Structure prediction for validation → protein-qc for filtering.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

cell-free-expression

No summary provided by upstream source.

Repository SourceNeeds Review
General

protein-qc

No summary provided by upstream source.

Repository SourceNeeds Review
General

binding-characterization

No summary provided by upstream source.

Repository SourceNeeds Review
General

ipsae

No summary provided by upstream source.

Repository SourceNeeds Review