alayarenderer-generative-world

AI coding agent skill for AlayaRenderer — a generative world rendering framework with inverse rendering (RGB→G-buffers) and game editing (G-buffers+text→stylized video) using fine-tuned video diffusion models.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "alayarenderer-generative-world" with this command: npx skills add aradotso/trending-skills/aradotso-trending-skills-alayarenderer-generative-world

AlayaRenderer — Generative World Renderer

Skill by ara.so — Daily 2026 Skills collection.

AlayaRenderer is a two-stage framework for high-quality video rendering:

  1. Inverse Renderer (RGB → G-buffers): Extracts albedo, normal, depth, roughness, and metallic maps from RGB video using a fine-tuned Cosmos-Transfer1-DiffusionRenderer 7B model.
  2. Game Editing (G-buffers + Text → Stylized RGB): Synthesizes photorealistic, stylized RGB video from G-buffer inputs using a fine-tuned Wan2.1 1.3B model via DiffSynth-Studio.

Installation

Clone the Repository

git clone --recurse-submodules https://github.com/ShandaAI/AlayaRenderer.git
cd AlayaRenderer

Important: Use --recurse-submodules — DiffSynth-Studio is a git submodule required for Game Editing.

Two Separate Conda Environments (Recommended)

The two models have conflicting dependencies. Use separate environments:

# Environment 1: Inverse Renderer
conda create -n inverse_renderer python=3.10 -y
conda activate inverse_renderer
cd inverse_renderer
# Follow inverse_renderer/ instructions for Cosmos-Transfer1 setup

# Environment 2: Game Editing
conda create -n game_editing python=3.10 -y
conda activate game_editing
cd game_editing
# Follow DiffSynth-Studio setup instructions

Model Weights

ModelBase ModelSizeHuggingFace Link
Inverse RendererCosmos-Transfer1-DiffusionRenderer 7B~7B paramsBrian9999/world_inverse_renderer
Game EditingWan2.1 1.3B~1.3B paramsBrian9999/stylerenderer

Download and Place Weights

# Inverse Renderer — replace the base checkpoint
huggingface-cli download Brian9999/world_inverse_renderer \
  --local-dir inverse_renderer/checkpoints/Diffusion_Renderer_Inverse_Cosmos_7B

# Game Editing — place in game_editing models directory
mkdir -p game_editing/models/train/Wan2.1-T2V-1.3B_gbuffer
huggingface-cli download Brian9999/stylerenderer \
  --local-dir game_editing/models/train/Wan2.1-T2V-1.3B_gbuffer

Inverse Renderer Usage

The inverse renderer decomposes an RGB video into 5 G-buffer channels: albedo, normal, depth, roughness, metallic.

Setup

cd inverse_renderer
# Follow Cosmos-Transfer1-DiffusionRenderer environment setup
# Ensure checkpoint is at:
# inverse_renderer/checkpoints/Diffusion_Renderer_Inverse_Cosmos_7B/

Inference

Refer to the inverse_renderer/ subdirectory for the full inference script. The general pattern follows Cosmos-Transfer1-DiffusionRenderer conventions:

# inverse_renderer/run_inverse.py (typical pattern)
import torch
from pathlib import Path

# Input: path to RGB video
input_video = "path/to/rgb_video.mp4"
output_dir = "outputs/gbuffers/"

# The model outputs 5 synchronized channels:
# - albedo (diffuse color)
# - normal (surface orientation)
# - depth (scene geometry)
# - roughness (surface roughness)
# - metallic (metallic property)

Game Editing Usage

Quick Start — CLI Inference

cd game_editing

CUDA_VISIBLE_DEVICES=0 python \
    examples/wanvideo/model_inference/inference_gbuffer_caption.py \
    --checkpoint models/train/Wan2.1-T2V-1.3B_gbuffer/model.safetensors \
    --gpu 0 \
    --style snowy_winter \
    --prompt "the scene is set in a frozen, snow-covered environment under cold, pale winter light with falling snowflakes, creating a silent and ethereal winter wonderland atmosphere." \
    --gbuffer_dir test_dataset \
    --save_dir outputs/ \
    --num_frames 81 \
    --height 480 \
    --width 832

CLI Parameters

ParameterDescriptionExample
--checkpointPath to fine-tuned .safetensors weightsmodels/train/Wan2.1-T2V-1.3B_gbuffer/model.safetensors
--gpuGPU device index0
--styleNamed style presetsnowy_winter, rainy, night, sunset
--promptText description of target lighting/atmosphereSee examples below
--gbuffer_dirDirectory containing G-buffer input frames/videotest_dataset
--save_dirOutput directory for rendered videooutputs/
--num_framesNumber of frames to generate (must be 8n+1)81
--heightOutput height in pixels480
--widthOutput width in pixels832

G-buffer Directory Structure

test_dataset/
├── albedo/
│   ├── frame_0000.png
│   ├── frame_0001.png
│   └── ...
├── normal/
│   ├── frame_0000.png
│   └── ...
├── depth/
│   ├── frame_0000.png
│   └── ...
├── roughness/
│   ├── frame_0000.png
│   └── ...
└── metallic/
    ├── frame_0000.png
    └── ...

Style Prompt Examples

# Cyberpunk night scene
--style night \
--prompt "neon-lit urban environment at night with rain-slicked streets reflecting colorful neon signs, creating a cyberpunk noir atmosphere"

# Golden hour / sunset
--style sunset \
--prompt "warm golden hour lighting with long shadows and a glowing amber sky, soft cinematic atmosphere"

# Rainy urban
--style rainy \
--prompt "overcast rainy day with wet surfaces, soft diffuse lighting, and atmospheric fog creating a moody cinematic look"

# Fantasy / stylized
--style fantasy \
--prompt "magical forest environment with bioluminescent plants, ethereal blue-green lighting, and mystical particle effects"

# Foggy morning
--style foggy \
--prompt "early morning dense fog with soft diffused light creating a mysterious and quiet atmosphere"

Multi-GPU Inference

# Run on specific GPU
CUDA_VISIBLE_DEVICES=1 python \
    examples/wanvideo/model_inference/inference_gbuffer_caption.py \
    --checkpoint models/train/Wan2.1-T2V-1.3B_gbuffer/model.safetensors \
    --gpu 1 \
    --style rainy \
    --prompt "heavy rainfall with dark storm clouds and dramatic lightning in the distance" \
    --gbuffer_dir my_gbuffers \
    --save_dir outputs/rainy_scene \
    --num_frames 81 --height 480 --width 832

Full Pipeline: RGB Video → Stylized Output

# Step 1: Extract G-buffers from RGB video (Inverse Renderer env)
conda activate inverse_renderer
cd inverse_renderer
python run_inverse.py \
    --input path/to/gameplay_video.mp4 \
    --output_dir ../game_editing/test_dataset/

# Step 2: Apply game editing style (Game Editing env)
conda activate game_editing
cd ../game_editing
CUDA_VISIBLE_DEVICES=0 python \
    examples/wanvideo/model_inference/inference_gbuffer_caption.py \
    --checkpoint models/train/Wan2.1-T2V-1.3B_gbuffer/model.safetensors \
    --gpu 0 \
    --style snowy_winter \
    --prompt "frozen tundra with blizzard conditions, pale blue-white lighting and drifting snow" \
    --gbuffer_dir test_dataset \
    --save_dir outputs/final_render \
    --num_frames 81 --height 480 --width 832

Online Demos


Dataset Overview

The AlayaRenderer dataset (release pending) features:

  • 4M+ frames at 720p / 30 FPS
  • 6 synchronized channels: RGB + albedo, normal, depth, metallic, roughness
  • 40 hours from Cyberpunk 2077 and Black Myth: Wukong
  • Average clip length: 8 minutes, up to 53 minutes continuous
  • Weather variants: sunny, rainy, foggy, night, sunset
  • Motion blur variant via sub-frame interpolation

Architecture Summary

RGB Video Input
      │
      ▼
┌─────────────────────────────────────┐
│  Inverse Renderer                   │
│  (Cosmos-Transfer1 7B fine-tuned)   │
│  RGB → [albedo, normal, depth,      │
│          roughness, metallic]       │
└─────────────────┬───────────────────┘
                  │  G-buffers
                  ▼
┌─────────────────────────────────────┐
│  Game Editing                       │
│  (Wan2.1 1.3B fine-tuned)           │
│  G-buffers + Text Prompt            │
│  → Stylized RGB Video               │
└─────────────────────────────────────┘

Troubleshooting

Submodule not found / DiffSynth-Studio missing

# If cloned without --recurse-submodules:
git submodule update --init --recursive

CUDA Out of Memory

  • Reduce --num_frames (try 41 instead of 81)
  • Reduce resolution: --height 320 --width 576
  • Ensure no other processes are using the GPU: CUDA_VISIBLE_DEVICES=0

num_frames must follow 8n+1 pattern

Valid values: 9, 17, 25, 33, 41, 49, 57, 65, 73, 81

# Valid
--num_frames 81   # 8*10 + 1 ✓
--num_frames 41   # 8*5 + 1  ✓

# Invalid
--num_frames 80   # ✗
--num_frames 60   # ✗

Checkpoint not found

# Verify checkpoint placement
ls game_editing/models/train/Wan2.1-T2V-1.3B_gbuffer/model.safetensors
ls inverse_renderer/checkpoints/Diffusion_Renderer_Inverse_Cosmos_7B/

Version conflicts between models

Always use the two separate conda environments (inverse_renderer and game_editing). Do not install both models' dependencies in one environment.


Citation

@article{huang2026generativeworldrenderer,
    title={Generative World Renderer},
    author={Zheng-Hui Huang and Zhixiang Wang and Jiaming Tan and Ruihan Yu and Yidan Zhang and Bo Zheng and Yu-Lun Liu and Yung-Yu Chuang and Kaipeng Zhang},
    journal={arXiv preprint arXiv:2604.02329},
    year={2026}
}

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

inkos-multi-agent-novel-writing

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

agency-agents-ai-specialists

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

agent-browser-automation

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

gstack-workflow-assistant

No summary provided by upstream source.

Repository SourceNeeds Review