proteinmpnn

Design protein sequences using ProteinMPNN inverse folding. Use this skill when: (1) Designing sequences for RFdiffusion backbones, (2) Redesigning existing protein sequences, (3) Fixing specific residues while designing others, (4) Optimizing sequences for expression or stability, (5) Multi-state or negative design. For backbone generation, use rfdiffusion or bindcraft. For ligand-aware design, use ligandmpnn. For solubility optimization, use solublempnn.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "proteinmpnn" with this command: npx skills add adaptyvbio/protein-design-skills/adaptyvbio-protein-design-skills-proteinmpnn

ProteinMPNN Sequence Design

Prerequisites

RequirementMinimumRecommended
Python3.8+3.10
CUDA11.0+11.7+
GPU VRAM8GB16GB (T4)
RAM8GB16GB

How to run

First time? See Installation Guide to set up Modal and biomodals.

Option 1: Local installation (recommended)

git clone https://github.com/dauparas/ProteinMPNN.git
cd ProteinMPNN

python protein_mpnn_run.py \
  --pdb_path backbone.pdb \
  --out_folder output/ \
  --num_seq_per_target 16 \
  --sampling_temp "0.1"

GPU: T4 (16GB) sufficient | Time: ~50-100 sequences/minute

Option 2: Modal (via LigandMPNN wrapper)

cd biomodals
modal run modal_ligandmpnn.py \
  --pdb-path backbone.pdb \
  --num-seq-per-target 16

Note: LigandMPNN includes ProteinMPNN functionality.

Config Schema

Core Parameters

ParameterDefaultRangeDescription
--pdb_pathrequiredpathSingle PDB input
--pdb_path_chainsallA,BChains to design (comma-sep)
--out_folderrequiredpathOutput directory
--num_seq_per_target11-1000Sequences per structure
--sampling_temp"0.1""0.0001-1.0"Temperature (string!)
--seed0intRandom seed
--batch_size11-32Batch size

Temperature Guide

0.1  -> Low diversity, high recovery (production)
0.2  -> Moderate diversity (default)
0.3  -> Higher diversity (exploration)
0.5+ -> Very diverse, lower quality

IMPORTANT: Temperature must be passed as a string, not float.

Common mistakes

Temperature Parameter

Correct:

--sampling_temp "0.1"    # String with quotes

Wrong:

--sampling_temp 0.1      # Float without quotes - may cause errors
--sampling_temp 0.1,0.2  # Multiple temps need proper format

Fixed Positions JSONL

Correct:

{"A": [1, 2, 3, 10, 11], "B": [5, 6]}

Wrong:

{"A": "1,2,3,10,11"}     # String instead of list
{A: [1, 2, 3]}           # Missing quotes on key
{"A": [1,2,3,]}          # Trailing comma

Chain Selection

Correct:

--pdb_path_chains A,B    # No spaces

Wrong:

--pdb_path_chains A, B   # Space after comma
--pdb_path_chains "A,B"  # Quotes may cause issues

Amino Acid Biases

# Bias toward certain AAs (positive = favor)
--bias_AA_jsonl '{"A": {"A": 1.5, "W": -2.0}}'

# Omit specific AAs globally
--omit_AAs "CM"  # No cysteine or methionine

# Per-position omission
--omit_AA_jsonl '{"A": {"1": "C", "2": "CM"}}'

Multi-Chain Design

# Design chains A and B together
--pdb_path_chains A,B

# Tie chains (same sequence)
--tied_positions_jsonl tied.jsonl

Variants Comparison

VariantUse CaseKey Difference
ProteinMPNNGeneralOriginal model
SolubleMPNNExpressionTrained on soluble proteins
LigandMPNNSmall moleculesLigand-aware context

Output format

output/
├── seqs/
│   └── backbone.fa          # FASTA sequences
└── backbone_pdb/
    └── backbone_0001.pdb    # PDBs with designed sequence

FASTA Header Format

>backbone_0001, score=1.234, global_score=1.234, seq_recovery=0.85
MKTAYIAKQRQISFVKSHFSRQLE...

Common workflows

Binder Sequence Design

python protein_mpnn_run.py \
  --pdb_path binder_backbone.pdb \
  --out_folder output/ \
  --num_seq_per_target 16 \
  --sampling_temp "0.1" \
  --pdb_path_chains B  # Design binder chain only

Interface Redesign

# Fix core, design interface
python protein_mpnn_run.py \
  --pdb_path complex.pdb \
  --fixed_positions_jsonl core_positions.jsonl \
  --num_seq_per_target 32

Multi-State Design

# Design for multiple conformations
python protein_mpnn_run.py \
  --pdb_path_multi state1.pdb,state2.pdb \
  --num_seq_per_target 16

Sample output

Successful run

$ python protein_mpnn_run.py --pdb_path backbone.pdb --out_folder output/ --num_seq_per_target 8
Loading model weights...
Designing sequences for backbone.pdb
Generated 8 sequences in 2.3 seconds

output/seqs/backbone.fa:
>backbone_0001, score=1.234, global_score=1.189, seq_recovery=0.82
MKTAYIAKQRQISFVKSHFSRQLEERGLTKE...
>backbone_0002, score=1.198, global_score=1.156, seq_recovery=0.79
MKTAYIAKQRQISFVKSQFSRQLDERGLTKE...

What good output looks like:

  • Score: 1.0-2.0 (lower = more confident)
  • Seq recovery: 0.3-0.6 for de novo, 0.7-0.9 for redesign
  • Diverse sequences (not all identical) when temp > 0.1

Decision tree

Should I use ProteinMPNN?
│
├─ Have a backbone structure?
│  ├─ Yes → Continue below
│  └─ No → Use RFdiffusion first
│
├─ What's in the binding site?
│  ├─ Nothing / protein only → ProteinMPNN ✓
│  ├─ Small molecule / ligand → Use LigandMPNN
│  └─ Metal / cofactor → Use LigandMPNN
│
├─ Priority?
│  ├─ Solubility/expression → Consider SolubleMPNN
│  ├─ Speed → ProteinMPNN ✓
│  └─ AF2 optimization → Consider ColabDesign
│
└─ Need fixed positions?
   ├─ Yes → Use --fixed_positions_jsonl
   └─ No → ProteinMPNN ✓ (design all)

Typical performance

Campaign SizeTime (T4)Cost (Modal)Notes
100 backbones × 8 seq15-20 min~$2Standard
500 backbones × 8 seq1-1.5h~$8Large campaign
1000 backbones × 16 seq3-4h~$18Comprehensive

Throughput: ~50-100 sequences/minute on T4 GPU.


Verify

grep -c "^>" output/seqs/*.fa  # Should match backbone_count × num_seq_per_target

Troubleshooting

Low sequence diversity: Increase sampling_temp to 0.2-0.3 Poor recovery: Decrease sampling_temp to 0.1 OOM errors: Reduce batch_size Unwanted cysteines: Use --omit_AAs "C"

Error interpretation

ErrorCauseFix
RuntimeError: CUDA out of memoryLong protein or large batchReduce batch_size or use larger GPU
KeyError: 'A'Chain not in PDBCheck chain IDs in your PDB file
JSONDecodeErrorInvalid JSONL formatValidate JSON syntax (see Common Mistakes)
IndexError: list indexEmpty chain or residue listCheck PDB has atoms, not just HEADER

Next: Structure prediction for validation → protein-qc for filtering.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

cell-free-expression

No summary provided by upstream source.

Repository SourceNeeds Review
General

protein-qc

No summary provided by upstream source.

Repository SourceNeeds Review
General

binding-characterization

No summary provided by upstream source.

Repository SourceNeeds Review
General

ipsae

No summary provided by upstream source.

Repository SourceNeeds Review
proteinmpnn | V50.AI