Numerai Model Upload

Overview

Create a portable predict(live_features, live_benchmark_models) pickle that runs inside Numerai's numerai_predict container without repo dependencies.

CRITICAL: Python Version Compatibility

Before creating any pkl file, you must ensure your Python environment matches Numerai's compute environment. Mismatched versions cause segfaults and validation failures due to binary incompatibility (especially with numpy).

Step 1: Query the Default Docker Image (MCP Required)

If the numerai MCP server is available, always query the default Python version first:

query { computePickleDockerImages { id name image tag default } }

Look for the entry with default: true . The image name indicates the Python version:

numerai_predict_py_3_12:a78dedd → Python 3.12 (current default as of 2026)
numerai_predict_py_3_11:a78dedd → Python 3.11
numerai_predict_py_3_10:a78dedd → Python 3.10

If the numerai MCP is not installed, it can be installed through our install script via curl -sL https://numer.ai/install-mcp.sh | bash , this script guides the user through installing the MCP for Codex CLI and configuring an API key with the correct scopes that are required by MCP.

You can find more documentation about Numerai MCP here: https://docs.numer.ai/numerai-tournament/mcp

Step 2: Create Matching Virtual Environment with pyenv

Use pyenv to create a virtual environment with the exact Python version:

1. List available pyenv Python versions

ls ~/.pyenv/versions/

2. Find the matching minor version (e.g., for Python 3.12)

PYENV_PY=$(ls -d ~/.pyenv/versions/3.12.* 2>/dev/null | head -1)

3. Create the virtual environment

$PYENV_PY/bin/python -m venv ./venv

4. Activate and install pkl dependencies

source ./venv/bin/activate pip install --upgrade pip pip install numpy pandas cloudpickle scipy

Add lightgbm, torch, etc. only if your model needs them

Step 3: Create pkl in the Correct Environment

Always create pkl files using the matching venv:

./venv/bin/python create_model_pkl.py

Requirements

Implement predict(live_features, live_benchmark_models) and return a DataFrame with a prediction column aligned to the input index.
Preserve training-time preprocessing (feature order, imputation values, scaling params) inside the pickle.
Avoid imports from local repo modules (no agents.* ), because Numerai's container will not have them.
Prefer numpy/pandas/scipy-only inference; do not rely on torch/xgboost unless you verify the container has those packages.
Move any trained model to CPU before exporting and store plain numpy weights.
Validate required columns (era for per-era ranking, benchmark column if used).

Workflow

Query the default Docker image from the MCP to determine the required Python version.
Create/activate a matching venv using pyenv (see above).
Train on the desired full dataset (train + validation) with the same preprocessing and early-stopping scheme as the best model only after your research has plateaued and you have selected the final configuration to deploy.
Export an inference bundle from the trained model:
Feature list and ordering
Imputation values and scaling stats
Model weights/biases (numpy arrays)
Activation name and any constants
Benchmark column name if needed as a feature
Build a predict function that:
Reads only from the bundle and standard libraries
Applies preprocessing and a numpy forward pass
Ranks predictions per era to [0, 1] when required
cloudpickle.dump(predict, "model.pkl") using the matching venv's Python.
Test the pickle with the Numerai container before uploading.

Testing

Run the Numerai debug container locally (use the same image tag as the default):

Get the default image tag from MCP query, then test:

docker run -i --rm -v "$PWD:$PWD" ghcr.io/numerai/numerai_predict_py_3_12:a78dedd --debug --model $PWD/[PICKLE_FILE]

Common Pitfalls

Segmentation fault / numpy binary incompatibility: The pkl was created with a different Python version than Numerai's container. Always query the default docker image first and create pkl files using a matching pyenv-based venv.
ImportError: No module named 'agents' : occurs when the pickle references repo classes. Fix by exporting a pure-numpy inference bundle and rebuilding predict without repo imports.
Missing era column: per-era ranking requires live_features["era"] .
Benchmark misalignment: ensure live_benchmark_models is reindexed to live_features (by id) before use.
Feature drift: ensure feature order in inference matches training order exactly.

Debugging Validation Failures

If your pickle fails validation, query the trigger status and logs:

query { account { models { username computePickleUpload { filename validationStatus triggerStatus triggers { id status statuses { status description insertedAt } } } } } }

Common error descriptions:

"Segmentation fault! Ensure python and library versions match our environment." → Python/numpy version mismatch
"No currently open rounds!" → Model validated successfully but no round is open for submission

Reference

Use numerai/example_model.ipynb for the expected predict signature and output format.

Deploying to Numerai via MCP Server

After creating and testing your pkl file, you can deploy it to Numerai using the Numerai MCP server. The MCP server provides tools for creating models and uploading pkl files programmatically.

Available MCP Tools

The numerai MCP server provides these key tools:

check_api_credentials
Verify your API token and see granted scopes
create_model
Create a new model in a tournament
upload_model
Upload pkl files (multi-step workflow)
graphql_query
List existing models and perform custom queries

Authentication

All authenticated operations require a Numerai API token with upload_submission scope:

Format: PUBLIC_ID$SECRET_KEY
Get your API key from https://numer.ai/account

Option 1: Upload to an Existing Model

If you already have a model slot you want to use:

List your models using graphql_query :

query { account { models { id name } } }

Get upload authorization for your pkl file:

Call upload_model with operation: "get_upload_auth" , modelId: "<model_uuid>" , filename: "model.pkl"
This returns a presigned URL for uploading

Upload the pkl file

Call a PUT file upload on the pre-signed URL with the path to the pkl file.

Call upload_model with operation: "create" , modelId: "<model_uuid>" , filename: "model.pkl"
This triggers validation of your pickle

Check validation status:

Call upload_model with operation: "list" to see all pickles and their status
Wait for validation to complete successfully

Assign the pickle to the model slot:

Call upload_model with operation: "assign" , modelId: "<model_uuid>" , pickleId: "<pickle_uuid>"
This makes the pickle active for automated submissions

Option 2: Create a New Model and Upload

If you want to create a new model slot:

Create the model:

Call create_model with name: "<unique_model_name>" , tournament: 8 (for Classic)
Note: Model names must be unique within the tournament

Get the model ID from the response

Follow steps 2-6 from Option 1 to upload and assign the pkl file

Upload Workflow Summary

┌─────────────────────────────────────────────────────────────────┐ │ PKL DEPLOYMENT WORKFLOW │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ 1. Create pkl file (this skill's main workflow) │ │ 2. Test pkl locally with numerai_predict container │ │ 3. Choose: create new model OR use existing model │ │ │ │ For new model: │ │ └─> create_model(name, tournament=8) │ │ │ │ For existing model: │ │ └─> graphql_query to list models and get model ID │ │ │ │ 4. upload_model(operation="get_upload_auth", modelId, filename)│ │ 5. upload_model(operation="put_file", presignedUrl, localPath) │ │ 6. upload_model(operation="create", modelId, filename) │ │ 7. upload_model(operation="list") - wait for validation │ │ 8. upload_model(operation="assign", modelId, pickleId) │ │ │ │ Optional: │ │ - upload_model(operation="trigger", pickleId) to test │ │ - upload_model(operation="get_logs", pickleId, triggerId) │ │ │ └─────────────────────────────────────────────────────────────────┘

Important Notes

Only the Classic tournament (tournament=8) supports pickle uploads
The model must have its submission webhook disabled before uploading
CRITICAL: Before creating pkl files, query the default docker image to ensure Python version compatibility
Use this GraphQL query to check available runtimes and the default: query { computePickleDockerImages { id name image tag default } }
Use upload_model(operation="list_data_versions") to see available dataset versions
After assignment, Numerai will automatically run your pickle each round

Pre-Upload Checklist

Before uploading a pkl file, verify:

✅ Queried computePickleDockerImages to get the default Python version
✅ Created venv using pyenv with matching Python version
✅ Created pkl file using the matching venv's Python interpreter
✅ Tested pkl locally with the matching docker container (optional but recommended)

Triggering and Debugging

After assigning a pickle, you can manually trigger it for testing:

Trigger the pickle:

Call upload_model with operation: "trigger" , pickleId: "<pickle_uuid>" , triggerValidation: true

View execution logs:

Call upload_model with operation: "get_logs" , pickleId: "<pickle_uuid>" , triggerId: "<trigger_uuid>"

Asking the User

Before deploying, confirm with the user:

Do they want to deploy the pkl to Numerai?
Should we create a new model or upload to an existing one?
If new: what name should the model have?
If existing: which model should receive the upload?
Do they have their API token ready (or is it already configured)?

numerai-model-upload

Safety Notice

Copy this and send it to your AI assistant to learn

1. List available pyenv Python versions

2. Find the matching minor version (e.g., for Python 3.12)

3. Create the virtual environment

4. Activate and install pkl dependencies

Add lightgbm, torch, etc. only if your model needs them

Get the default image tag from MCP query, then test:

Source Transparency

Related Skills

numerai-model-implementation

numerai-experiment-design

report-research