vm-and-bytecode-reverse

Custom VM and bytecode reverse engineering playbook. Use when CTF challenges or protected software implement custom virtual machines with proprietary bytecode, dispatcher loops, or maze-style challenges.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "vm-and-bytecode-reverse" with this command: npx skills add yaklang/hack-skills/yaklang-hack-skills-vm-and-bytecode-reverse

SKILL: VM & Bytecode Reverse Engineering — Expert Analysis Playbook

AI LOAD INSTRUCTION: Expert techniques for reversing custom virtual machines and bytecode interpreters. Covers dispatcher identification, opcode mapping, custom ISA reconstruction, disassembler/decompiler writing, maze challenges, and real-world VM protector analysis. Base models often fail to recognize the fetch-decode-execute pattern or attempt to analyze VM bytecode as native code.

0. RELATED ROUTING

Quick identification

Binary PatternLikely VM TypeStart With
while(1) { switch(bytecode[pc]) }Switch-based dispatcherMap each case to an operation
Indirect jump via table jmp [table + opcode*8]Table-based dispatcherDump jump table, analyze handlers
Nested if-else chain on byte valueIf-chain dispatcherSame as switch, just different syntax
Stack push/pop dominant operationsStack-based VMIdentify push, pop, arithmetic ops
reg[X] = ... array operationsRegister-based VMMap register indices to operations
2D grid + direction inputMaze challengeExtract grid, apply BFS/DFS

1. CUSTOM VM IDENTIFICATION

1.1 Structural Indicators

VM Architecture Components:
┌─────────────────────────────────┐
│  Bytecode Program (data section)│
├─────────────────────────────────┤
│  Program Counter (pc/ip)        │
│  Register File / Stack          │
│  Memory / Data Area             │
├─────────────────────────────────┤
│  Dispatcher Loop                │
│  ├─ Fetch: opcode = code[pc]    │
│  ├─ Decode: lookup handler      │
│  └─ Execute: run handler        │
└─────────────────────────────────┘

1.2 IDA/Ghidra Signatures

Switch dispatcher (most common in CTF):

while (running) {
    unsigned char op = bytecode[pc++];
    switch (op) {
        case 0x00: /* nop */       break;
        case 0x01: /* push imm */  stack[sp++] = bytecode[pc++]; break;
        case 0x02: /* add */       stack[sp-2] += stack[sp-1]; sp--; break;
        // ...
        case 0xFF: /* halt */      running = 0; break;
    }
}

Table dispatcher (more optimized):

typedef void (*handler_t)(vm_ctx_t*);
handler_t handlers[256] = { handle_nop, handle_push, handle_add, ... };

while (running) {
    handlers[bytecode[pc++]](&ctx);
}

2. ANALYSIS METHODOLOGY

Step 1: Find the Dispatcher

Look for:

  • Large switch statement (many cases) in a loop
  • Array of function pointers indexed by a byte from a data buffer
  • Single function with high cyclomatic complexity
  • Cross-references to a data buffer read byte-by-byte

Step 2: Map Opcodes to Operations

For each case/handler, determine:

PropertyHow to Identify
Opcode valueCase number or table index
Operation typeRegister/stack modifications
Operand countHow many bytes consumed after opcode
Operand typeImmediate value, register index, or memory address
Side effectsOutput, memory write, flag modification

Step 3: Extract Bytecode Program

# Typical extraction from binary
import struct

with open('challenge', 'rb') as f:
    f.seek(bytecode_offset)
    bytecode = f.read(bytecode_length)

# Or from IDA:
# bytecode = idc.get_bytes(bytecode_addr, bytecode_len)

Step 4: Write Custom Disassembler

OPCODES = {
    0x00: ("nop",  0),    # (mnemonic, operand_bytes)
    0x01: ("push", 1),    # push immediate byte
    0x02: ("pop",  0),
    0x03: ("add",  0),
    0x04: ("sub",  0),
    0x05: ("xor",  0),
    0x06: ("cmp",  0),
    0x07: ("jmp",  2),    # jump to 16-bit address
    0x08: ("je",   2),
    0x09: ("jne",  2),
    0x0A: ("mov",  2),    # mov reg, imm
    0x0B: ("load", 1),    # load from memory[operand]
    0x0C: ("store",1),    # store to memory[operand]
    0x0D: ("print",0),
    0x0E: ("read", 0),    # read input
    0xFF: ("halt", 0),
}

def disassemble(bytecode):
    pc = 0
    while pc < len(bytecode):
        op = bytecode[pc]
        if op not in OPCODES:
            print(f"  {pc:04x}: UNKNOWN {op:#04x}")
            pc += 1
            continue

        mnemonic, operand_size = OPCODES[op]
        operands = bytecode[pc+1:pc+1+operand_size]
        operand_str = ' '.join(f'{b:#04x}' for b in operands)
        print(f"  {pc:04x}: {mnemonic:8s} {operand_str}")
        pc += 1 + operand_size

disassemble(bytecode)

Step 5: Analyze Disassembled Program

With the custom disassembly, apply standard reverse engineering:

  • Identify input reading (read opcode)
  • Trace data flow from input to comparison
  • Determine success/failure conditions
  • Extract the check logic (often XOR/ADD transformations of input compared against constants)

3. COMMON VM PATTERNS IN CTF

3.1 Stack-Based VM

Operations work on a stack (like JVM or Python bytecode).

OpcodeOperationStack Effect
PUSH immPush immediate value[...] → [..., imm]
POPDiscard top[..., a] → [...]
ADDAdd top two[..., a, b] → [..., a+b]
SUBSubtract[..., a, b] → [..., a-b]
MULMultiply[..., a, b] → [..., a*b]
XORBitwise XOR[..., a, b] → [..., a^b]
CMPCompare[..., a, b] → [..., (a==b)]
JMP addrUnconditional jumpno change
JZ addrJump if top is zero[..., a] → [...]
PRINTOutput top as char[..., a] → [...]
READRead char to stack[...] → [..., input]
HALTStop execution-

3.2 Register-Based VM

Operations use register indices (like x86, ARM).

OpcodeFormatOperation
MOV r, imm0x01 RR II IIreg[R] = imm16
MOV r1, r20x02 R1 R2reg[R1] = reg[R2]
ADD r1, r20x03 R1 R2reg[R1] += reg[R2]
SUB r1, r20x04 R1 R2reg[R1] -= reg[R2]
XOR r1, r20x05 R1 R2reg[R1] ^= reg[R2]
CMP r1, r20x06 R1 R2flags = compare(r1, r2)
JMP addr0x07 AA AApc = addr
JE addr0x08 AA AAif equal: pc = addr
LOAD r, [addr]0x09 RR AAreg[R] = mem[addr]
STORE [addr], r0x0A AA RRmem[addr] = reg[R]
SYSCALL0x0BI/O operation based on reg[0]
HALT0xFFstop

3.3 Brainfuck-like / Esoteric VMs

BF CommandVM EquivalentDescription
>INC ptrMove data pointer right
<DEC ptrMove data pointer left
+INC [ptr]Increment byte at pointer
-DEC [ptr]Decrement byte at pointer
.OUTPUT [ptr]Output byte at pointer
,INPUT [ptr]Input byte to pointer
[JZ forwardJump past ] if byte is zero
]JNZ backJump back to [ if byte is nonzero

4. MAZE CHALLENGES

4.1 Identification

  • Binary reads directional input (WASD, arrow keys, UDLR)
  • 2D array in data section (walls, paths, start, end)
  • Position tracking with x,y coordinates
  • Win condition at specific coordinates

4.2 Map Extraction

# Extract maze grid from binary data section
MAZE_ADDR = 0x601060
WIDTH = 20
HEIGHT = 15

# From binary dump:
maze = []
for row in range(HEIGHT):
    line = ""
    for col in range(WIDTH):
        cell = bytecode[MAZE_ADDR + row * WIDTH + col - base_addr]
        if cell == 0: line += "."    # path
        elif cell == 1: line += "#"  # wall
        elif cell == 2: line += "S"  # start
        elif cell == 3: line += "E"  # end
        else: line += "?"
    maze.append(line)
    print(line)

4.3 Automated Solving

from collections import deque

def solve_maze(maze, start, end):
    """BFS solver returns direction string."""
    rows, cols = len(maze), len(maze[0])
    directions = {'U': (-1, 0), 'D': (1, 0), 'L': (0, -1), 'R': (0, 1)}
    queue = deque([(start, "")])
    visited = {start}

    while queue:
        (r, c), path = queue.popleft()
        if (r, c) == end:
            return path

        for name, (dr, dc) in directions.items():
            nr, nc = r + dr, c + dc
            if (0 <= nr < rows and 0 <= nc < cols and
                maze[nr][nc] != '#' and (nr, nc) not in visited):
                visited.add((nr, nc))
                queue.append(((nr, nc), path + name))

    return None

# Find start and end positions
for r, row in enumerate(maze):
    for c, cell in enumerate(row):
        if cell == 'S': start = (r, c)
        if cell == 'E': end = (r, c)

solution = solve_maze(maze, start, end)
print(f"Path: {solution}")

4.4 Direction Encoding

Different challenges encode directions differently:

EncodingUpDownLeftRight
WASDWSAD
UDLRUDLR
Arrow keys↑ (0x48)↓ (0x50)← (0x4B)→ (0x4D)
Numbers1234
Hex opcodes0x010x020x030x04

5. REAL-WORLD VM PROTECTORS

5.1 VMProtect Analysis Approach

1. Find VM entry: search for pushad/pushfd sequence
2. Identify VM context structure (registers, flags, bytecode pointer)
3. Locate handler table (often obfuscated with opaque predicates)
4. For each handler:
   a. Remove junk code / opaque predicates
   b. Identify the core operation
   c. Document handler semantics
5. Trace bytecode execution (instruction-level trace)
6. Reconstruct original code from trace

5.2 Tigress Obfuscator

Academic VM obfuscator with configurable protection layers.

FeatureApproach
Single-dispatch VMStandard handler extraction
Split handlersHandlers spread across multiple functions
Nested VMsOuter VM handler invokes inner VM
Encrypted bytecodeDynamic decryption before each fetch
Polymorphic handlersDifferent code for same operation on each build

5.3 Common VM Protector Patterns

ProtectorDispatcher StyleDifficulty
VMProtectTable + opaque predicatesHigh
Themida (Code Virtualizer)CISC-like, large handler setHigh
TigressConfigurable, academicMedium-High
Custom CTF VMSimple switchLow-Medium
MovfuscatorAll-mov computationMedium

6. TOOLS

ToolPurposeUsage
IDA ProIdentify dispatcher, reverse handlersF5 decompile, xref analysis
GhidraFree alternative with Sleigh processor modulesWrite custom processor for VM ISA
angrSymbolic execution through VMTreat entire VM as constraint system
Pin / DynamoRIODynamic instrumentation for tracingRecord opcode handler execution sequence
REVENFull-system trace recordingReplay and analyze VM execution
UnicornEmulate VM executionFast handler emulation
MiasmIR-based analysisLift VM handlers to IR for analysis
Custom PythonWrite disassembler/decompilerPer-challenge custom tooling

Ghidra Sleigh Processor Module

For recurring VM architectures, write a Sleigh processor specification:

define space ram      type=ram_space      size=2  default;
define space register type=register_space  size=1;

define register offset=0 size=1 [ R0 R1 R2 R3 FLAGS PC SP ];

define token opcode(8)
    op = (0,7)
;

:NOP    is op=0x00 { }
:PUSH   imm is op=0x01; imm { SP = SP - 1; *[ram]:1 SP = imm; }
:POP    is op=0x02 { SP = SP + 1; }
:ADD    is op=0x03 { local a = *[ram]:1 (SP+1); *[ram]:1 (SP+1) = a + *[ram]:1 SP; SP = SP + 1; }

7. DECISION TREE

Binary contains custom bytecode interpreter?
│
├─ Can you identify the dispatcher?
│  ├─ Yes (switch/table/if-chain)
│  │  ├─ Few opcodes (< 20) → Simple CTF VM
│  │  │  ├─ Stack-based → map push/pop/arithmetic ops
│  │  │  ├─ Register-based → map mov/add/cmp ops
│  │  │  └─ Write disassembler → analyze program → solve
│  │  │
│  │  └─ Many opcodes (50+) → Commercial protector
│  │     ├─ Known protector → use specific deprotection tools
│  │     └─ Custom → trace execution, pattern-match handlers
│  │
│  └─ No clear dispatcher
│     ├─ All-mov instructions → movfuscator
│     ├─ Encrypted bytecode → find decryption, dump after decode
│     └─ Split/distributed handlers → trace execution to find them
│
├─ Is it a maze challenge?
│  ├─ Extract grid from data section
│  ├─ Identify direction encoding
│  ├─ BFS/DFS to find shortest path
│  └─ Convert path to expected input format
│
├─ Is there input validation in VM?
│  ├─ Small input space → brute-force via Unicorn emulation
│  ├─ Known format → constrained angr solve
│  └─ Complex check → write disassembler, analyze check logic
│
└─ Multiple VM layers (VM in VM)?
   ├─ Analyze outer VM first
   ├─ Extract inner bytecode
   ├─ Repeat analysis for inner VM
   └─ Consider: symbolic execution may handle nested VMs directly

8. CTF SOLVING WORKFLOW

1. Run the binary — understand I/O behavior
   └─ What input does it expect? What output on success/failure?

2. Open in IDA/Ghidra — find the main loop
   └─ Look for while/for loop with switch or indirect jump

3. Identify VM components:
   ├─ Bytecode location (where is the program data?)
   ├─ PC/IP variable (how is current position tracked?)
   ├─ Registers/stack (where is VM state stored?)
   └─ I/O handlers (which opcodes read input / write output?)

4. Map all opcodes (create the ISA specification)
   └─ For each case/handler: opcode number, operation, operands

5. Write disassembler in Python
   └─ Output readable assembly for the bytecode

6. Analyze the disassembled program:
   ├─ Find input reading
   ├─ Trace transformations applied to input
   ├─ Find comparison against expected values
   └─ Reverse the transformation to find valid input

7. Solve:
   ├─ If simple transforms (XOR, ADD) → reverse manually
   ├─ If complex → feed to Z3 as constraints
   └─ If maze → extract grid, run pathfinding

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

insecure-source-code-management

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

clickjacking

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

code-obfuscation-deobfuscation

No summary provided by upstream source.

Repository SourceNeeds Review
General

hack

No summary provided by upstream source.

Repository SourceNeeds Review