pytorch-onnx

Exporting PyTorch models to ONNX format for cross-platform deployment. Includes handling dynamic axes, graph optimization in ONNX Runtime, and INT8 model quantization. (onnx, onnxruntime, torch.onnx.export, dynamic_axes, constant-folding, edge-deployment)

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "pytorch-onnx" with this command: npx skills add cuba6112/skillfactory/cuba6112-skillfactory-pytorch-onnx

Overview

ONNX (Open Neural Network Exchange) is an open format built to represent machine learning models. Exporting PyTorch models to ONNX allows them to be executed in environments without Python or PyTorch, using high-performance engines like ONNX Runtime.

When to Use

Use ONNX for cross-language deployment (C++, Java, C#), edge deployment (mobile/IoT), or to leverage specialized hardware accelerators (like TensorRT) that support ONNX as an input format.

Decision Tree

  1. Does your model accept variable batch sizes?
    • SPECIFY: dynamic_axes in the torch.onnx.export call.
  2. Do you need the fastest possible inference on a CPU?
    • APPLY: Quantization using the ONNX Runtime quantization tool.
  3. Are you deploying to a C++ environment without Python?
    • EXPORT: To ONNX and load using the ONNX Runtime C++ API.

Workflows

  1. Exporting a Model for Cross-Platform Deployment

    1. Instantiate the PyTorch model and set it to .eval().
    2. Create a dummy input tensor matching the input shape.
    3. Call torch.onnx.export() specifying input/output names and dynamic axes.
    4. Verify the resulting .onnx file using a tool like Netron.
  2. Optimizing ONNX Models for Inference

    1. Load the .onnx model into an ONNX Runtime InferenceSession.
    2. Choose an appropriate Execution Provider (e.g., 'CUDAExecutionProvider', 'TensorrtExecutionProvider').
    3. Enable graph optimizations like constant folding and node fusion.
    4. Run inference using the session.run() method with input dictionaries.
  3. Reducing Model Footprint via Quantization

    1. Export the model to standard ONNX format.
    2. Use the ONNX Runtime quantization tool to convert FP32 weights to INT8.
    3. Calibrate the model using a representative dataset to minimize accuracy loss.
    4. Deploy the quantized .onnx model to edge devices for lower latency.

Non-Obvious Insights

  • Static vs. Dynamic: By default, torch.onnx.export captures the shape of the dummy input as a static shape. If your application handles varying inputs, you must explicitly define these as dynamic axes.
  • Graph Optimization: ONNX Runtime performs "constant folding," which pre-computes parts of the graph that rely on constant values, effectively stripping unnecessary computation before inference starts.
  • Serialization Choice: While TorchScript is also an option for PyTorch deployment, ONNX is often preferred for cross-vendor compatibility (e.g., running a model on a Web browser using ONNX.js).

Evidence

Scripts

  • scripts/pytorch-onnx_tool.py: Script to export a model with dynamic axes support.
  • scripts/pytorch-onnx_tool.js: Node.js interface to run inference via ONNX Runtime.

Dependencies

  • torch
  • onnx
  • onnxruntime

References

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

ollama-rag

No summary provided by upstream source.

Repository SourceNeeds Review
General

torchaudio

No summary provided by upstream source.

Repository SourceNeeds Review
General

unsloth-sft

No summary provided by upstream source.

Repository SourceNeeds Review
General

torchvision

No summary provided by upstream source.

Repository SourceNeeds Review