MUSA Torch Coding

Guide for generating PyTorch code that runs on Moore Threads (摩尔线程) MUSA GPUs using torch_musa.

Overview

MUSA (Metaverse Unified System Architecture) is Moore Threads' GPU computing platform. This skill helps generate code that:

Runs on Moore Threads GPUs via torch_musa
Converts CUDA code to MUSA-compatible code
Sets up proper environments (conda v1.2/v1.3)
Follows MUSA best practices

Key Differences: CUDA vs MUSA

CUDA	MUSA
`torch.cuda`	`torch.musa`
`torch.device("cuda")`	`torch.device("musa")`
`torch.cuda.is_available()`	`torch.musa.is_available()`
`backend='nccl'`	`backend='mccl'`
`torch.cuda.device_count()`	`torch.musa.device_count()`
`torch.cuda.get_device_name()`	`torch.musa.get_device_name()`

Environment Setup

⚠️ Important: MUSA Uses Pre-configured Conda Environments

DO NOT install PyTorch, vLLM, or related packages manually. MUSA environments are custom-built and include:

MUSA-specific PyTorch builds (not compatible with standard PyTorch)
MUSA-customized vLLM versions
MUSA drivers and SDK integration

Installing standard packages from PyPI will break the environment.

Conda Environment (v1.2/v1.3)

MUSA provides pre-configured conda environments. Common environment names:

v1.2 - MUSA SDK v1.2 environment
v1.3 - MUSA SDK v1.3 environment (newer)

# List available MUSA environments
conda env list | grep -E "(v1\.2|v1\.3|musa)"

# Activate the appropriate environment
conda activate v1.2  # or v1.3

# Verify MUSA availability
python -c "import torch_musa; import torch; print(torch.musa.is_available())"

Environment Detection & Setup

If no MUSA conda environment is detected:

Check if MUSA is installed:

which musaInfo  # Should show musaInfo path
ls /usr/local/musa/  # MUSA SDK location

If MUSA is not set up:
- Use the musa-env-setup skill for complete environment installation
- The skill covers SDK installation, conda setup, and vLLM-MUSA configuration
Common conda environment locations:
- /opt/conda/envs/
- ~/conda/envs/
- /usr/local/conda/envs/

Key Environment Variables

Variable	Purpose
`MUSA_VISIBLE_DEVICES=0,1,2,3`	Control visible GPU IDs
`MUSA_LAUNCH_BLOCKING=1`	Synchronous kernel launch
`MUDNN_LOG_LEVEL=INFO`	Enable MUDNN logging
`TORCH_SHOW_CPP_STACKTRACES=1`	Show C++ stack traces

Code Generation Rules

When generating PyTorch code for MUSA:

Always import torch_musa

import torch_musa  # Must import before using torch.musa

Use torch.device("musa")

device = torch.device("musa") if torch.musa.is_available() else torch.device("cpu")
tensor = torch.tensor([1.0, 2.0], device=device)

Use 'mccl' for distributed training

dist.init_process_group(backend='mccl', ...)

Mixed precision (AMP) is supported

from torch.cuda.amp import autocast, GradScaler  # Same API

TensorCore optimization available
- Set torch.backends.musa.matmul.allow_tf32 = True for TensorFloat32

Model Templates

For common model types, see templates in references/:

reference.md - Complete MUSA API reference

Common Tasks

Check GPU Availability

import torch
import torch_musa

print(f"MUSA available: {torch.musa.is_available()}")
print(f"Device count: {torch.musa.device_count()}")
print(f"Device name: {torch.musa.get_device_name(0)}")

Training Loop Pattern

import torch_musa

# Device setup
device = torch.device("musa") if torch.musa.is_available() else torch.device("cpu")

# Model and data to device
model = model.to(device)
inputs = inputs.to(device)

# Training (same as CUDA)
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, targets)
loss.backward()
optimizer.step()

Distributed Training (DDP)

import torch.distributed as dist
import torch_musa

# Initialize with mccl backend
dist.init_process_group(backend='mccl', rank=rank, world_size=world_size)

# Create process group on MUSA
torch.cuda.set_device(local_rank)  # torch_musa extends torch.cuda API

Code Conversion

When converting existing CUDA code to MUSA:

Add import torch_musa at the top
Replace cuda with musa in device strings
Replace nccl with mccl for distributed backend
Keep all other PyTorch API calls unchanged

Troubleshooting

Device not found: Ensure user is in render group: sudo usermod -aG render $(whoami)
Library not found: Check LD_LIBRARY_PATH includes /usr/local/musa/lib/
Build issues: Clean and rebuild: python setup.py clean && bash build.sh
Docker issues: Use --env MTHREADS_VISIBLE_DEVICES=all

Reference

For detailed API reference and examples, see references/reference.md.