code-obfuscation-deobfuscation

Code obfuscation analysis and deobfuscation playbook. Use when reversing binaries protected by junk code, opaque predicates, self-modifying code, control flow flattening, VM protection, or string encryption.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "code-obfuscation-deobfuscation" with this command: npx skills add yaklang/hack-skills/yaklang-hack-skills-code-obfuscation-deobfuscation

SKILL: Code Obfuscation & Deobfuscation — Expert Analysis Playbook

AI LOAD INSTRUCTION: Expert techniques for identifying, classifying, and defeating code obfuscation in native binaries. Covers junk code, opaque predicates, SMC, control flow flattening, movfuscator, VM protectors (VMProtect/Themida/Code Virtualizer), string encryption, import hiding, and anti-disassembly tricks. Base models often conflate packing with obfuscation and miss the distinction between static and dynamic deobfuscation strategies.

0. RELATED ROUTING

Quick identification picks

Symptom in IDA/GhidraLikely ObfuscationStart With
Flat CFG, single giant switchControl flow flatteningSymbolic execution to recover CFG
Only mov instructionsmovfuscatordemovfuscation / trace-based lifting
pushad/pushfd → VM entryVM protectorHandler table extraction
XOR loop before code executionSMC / string encryptionDynamic analysis, breakpoint after decode
Impossible conditions (opaque predicates)Junk code insertionPattern-based removal
All strings unreadableString encryptionHook decryption routine, or emulate
No imports in IATImport hidingTrace GetProcAddress / hash resolution

1. JUNK CODE & OPAQUE PREDICATES

1.1 Junk Code Insertion

Dead code that never affects program output, added to increase analysis time.

Identification:

  • Instructions that write to registers/memory never read afterward
  • Function calls whose return values are discarded and have no side effects
  • Loops with invariant bounds that compute unused results

Removal strategy:

  1. Compute def-use chains (IDA/Ghidra data flow analysis)
  2. Mark instructions with no downstream use as dead
  3. Verify removal doesn't change program behavior (trace comparison)

1.2 Opaque Predicates

Conditional branches where the condition is always true or always false, but this is non-obvious.

TypeExampleAlways Evaluates To
Arithmeticx² ≥ 0True
Number theoryx*(x+1) % 2 == 0True (product of consecutive ints)
Pointer-basedptr == ptr after aliasingTrue
Hash-basedCRC32(constant) == known_valueTrue

Deobfuscation:

  • Abstract interpretation: prove the condition is constant
  • Symbolic execution: Z3 proves ∀x: predicate(x) = True
  • Pattern matching: recognize known opaque predicate families
  • Dynamic: trace and observe the branch is never taken / always taken
import z3
x = z3.BitVec('x', 32)
s = z3.Solver()
s.add(x * (x + 1) % 2 != 0)
print(s.check())  # unsat → always true

2. SELF-MODIFYING CODE (SMC)

Runtime code patching: encrypted code is decrypted just before execution.

2.1 XOR Decryption Loop (Most Common)

lea esi, [encrypted_code]
mov ecx, code_length
mov al, xor_key
decrypt_loop:
    xor byte [esi], al
    inc esi
    loop decrypt_loop
    jmp encrypted_code  ; now decrypted

2.2 Analysis Strategy

1. Identify the decryption routine (look for XOR/ADD/SUB in loops writing to .text)
2. Set breakpoint AFTER the loop completes
3. At breakpoint: dump the decrypted memory region
4. Re-analyze the dumped code in IDA/Ghidra
5. For multi-layer: repeat for each decryption stage

2.3 Automated Unpacking via Emulation

from unicorn import *
from unicorn.x86_const import *

mu = Uc(UC_ARCH_X86, UC_MODE_32)
mu.mem_map(0x400000, 0x10000)
mu.mem_write(0x400000, binary_code)
mu.emu_start(decrypt_entry, decrypt_end)
decrypted = mu.mem_read(code_start, code_length)

3. CONTROL FLOW FLATTENING (CFF)

3.1 Structure

Original sequential blocks are transformed into a dispatcher loop:

Original:      A → B → C → D

Flattened:     ┌──────────────────┐
               │   dispatcher     │
               │   switch(state)  │◄─────┐
               ├──────────────────┤      │
               │ case 1: block A  │──────┤
               │ case 2: block B  │──────┤
               │ case 3: block C  │──────┤
               │ case 4: block D  │──────┘
               └──────────────────┘

Each block sets state = next_state before jumping back to the dispatcher.

3.2 Recovery Techniques

TechniqueToolEffectiveness
Symbolic executionangr, Triton, miasmHigh — traces all state transitions
Trace-based recoveryPin/DynamoRIO trace → reconstruct CFGMedium — covers executed paths only
Pattern matchingCustom IDA/Ghidra scriptMedium — works for known flatteners
D-810 (IDA plugin)IDA ProHigh — specifically designed for CFF

3.3 Symbolic Deflattening (angr approach)

import angr, claripy

proj = angr.Project('./obfuscated')
cfg = proj.analyses.CFGFast()

# Find dispatcher block (highest in-degree basic block)
dispatcher = max(cfg.graph.nodes(), key=lambda n: cfg.graph.in_degree(n))

# For each case block, symbolically determine successor
for block in case_blocks:
    state = proj.factory.blank_state(addr=block.addr)
    # ... solve state variable to find real successor

4. MOVFUSCATOR

4.1 Concept

All computation reduced to mov instructions only (Turing-complete via memory-mapped computation tables). Created by Christopher Domas.

4.2 Identification

  • Function contains only mov instructions (no add, sub, xor, jmp, call)
  • Large lookup tables in data section
  • Memory-mapped flag registers

4.3 Demovfuscation

ApproachDescription
demovfuscator (tool)Static analysis, recovers original operations from mov patterns
Trace + taint analysisRun with Pin/DynamoRIO, taint inputs, observe computation
Symbolic executionTreat entire function as constraint system

5. VM PROTECTION (VMProtect / Themida / Code Virtualizer)

5.1 VM Architecture

Protected code → bytecode compiler → custom bytecode
Runtime: VM entry (pushad/pushfd) → fetch → decode → execute → VM exit (popad/popfd)

5.2 VM Entry Point Identification

; Typical VMProtect entry
pushad                    ; save all registers
pushfd                    ; save flags
mov ebp, esp              ; VM stack frame
sub esp, VM_LOCALS_SIZE   ; allocate VM context
mov esi, bytecode_addr    ; bytecode instruction pointer
jmp vm_dispatcher         ; enter VM loop

5.3 Handler Table Extraction

1. Find dispatcher (large switch or indirect jump via table)
2. Each case/entry = one VM handler (implements one VM opcode)
3. Map handler addresses to operations by analyzing each handler:
   - Handler reads operand from bytecode stream (esi)
   - Performs operation on VM registers/stack
   - Advances bytecode pointer
   - Returns to dispatcher

5.4 Devirtualization Approaches

MethodDescriptionTool
Manual handler mappingReverse each handler, build ISA specIDA + scripting
Trace recordingRecord all handler executions, reconstruct programREVEN, Pin
Symbolic liftingSymbolically execute handlers, lift to IRTriton, miasm
Pattern matchingMatch handler patterns to known VM familiesCustom scripts

5.5 VMProtect Specifics

  • Uses opaque predicates in dispatcher
  • Handler mutation: same opcode, different handler code per build
  • Multiple VM layers (VM inside VM)
  • Integrates anti-debug and integrity checks

6. STRING ENCRYPTION

6.1 Common Patterns

PatternExampleRecovery
XOR loopfor (i=0; i<len; i++) s[i] ^= key;Hook or emulate XOR function
Stack stringsmov [esp+0], 'H'; mov [esp+1], 'e'; ...IDA FLIRT / Ghidra script to reassemble
RC4 encryptedEncrypted blob + RC4 key in binaryExtract key, decrypt offline
AES encryptedEncrypted blob + AES key derived at runtimeHook after decryption
Custom encodingBase64 + XOR + reverseTrace the decode function, replicate

6.2 Automated String Decryption

# Ghidra script: find XOR decryption calls, emulate them
from ghidra.program.model.symbol import SourceType

decrypt_func = getFunction("decrypt_string")
refs = getReferencesTo(decrypt_func.getEntryPoint())

for ref in refs:
    call_addr = ref.getFromAddress()
    # extract arguments (encrypted buffer ptr, key, length)
    # emulate decryption, add comment with plaintext

7. IMPORT HIDING

7.1 GetProcAddress + Hash Lookup

FARPROC resolve(DWORD hash) {
    // Walk PEB → LDR → InMemoryOrderModuleList
    // For each DLL, walk export table
    // Hash each export name, compare with target hash
    // Return matching function pointer
}

7.2 Recovery

  1. Identify the hash algorithm (common: CRC32, djb2, ROR13+ADD)
  2. Compute hashes for all known API names
  3. Build hash → API name lookup table
  4. Annotate resolved calls in IDA/Ghidra

7.3 Common Hash Algorithms

NameAlgorithmUsed By
ROR13hash = (hash >> 13 | hash << 19) + charMetasploit shellcode
djb2hash = hash * 33 + charVarious malware
CRC32Standard CRC32 of function nameSophisticated packers
FNV-1ahash = (hash ^ char) * 0x01000193Modern malware

8. ANTI-DISASSEMBLY TRICKS

8.1 Techniques

TrickMechanismFix
Overlapping instructionsjmp $+2; db 0xE8 (fake call prefix)Manual re-analysis from correct offset
Misaligned jumpsJump into middle of multi-byte instructionForce IDA to re-analyze at target
Conditional jump pairjz $+5; jnz $+3 (always jumps, confuses linear disasm)Convert to unconditional jmp
Return address manipulationpush addr; ret instead of jmp addrRecognize push+ret as jump
Exception-based flowTrigger exception, real code in handlerAnalyze exception handler chain
Call + add [esp]call $+5; add [esp], N; ret (computed jump)Calculate actual target

8.2 IDA Fixes

Right-click → Undefine (U)
Right-click → Code (C) at correct offset
Edit → Patch → Assemble (for permanent fix)

9. DECISION TREE

Obfuscated binary — how to approach?
│
├─ Can you run it?
│  ├─ Yes → Dynamic analysis first
│  │  ├─ Set BP on interesting APIs (file, network, crypto)
│  │  ├─ Trace execution to understand real behavior
│  │  └─ Dump decrypted code/strings at runtime
│  │
│  └─ No (embedded/firmware/exotic arch) → Static only
│     └─ Identify obfuscation type from patterns below
│
├─ What does the code look like?
│  │
│  ├─ Giant flat switch/dispatcher loop?
│  │  ├─ State variable drives control flow → CFF
│  │  │  └─ Use D-810 or symbolic deflattening
│  │  └─ Bytecode fetch-decode-execute → VM protection
│  │     └─ Extract handlers, build disassembler
│  │
│  ├─ Only mov instructions?
│  │  └─ movfuscator → demovfuscator tool
│  │
│  ├─ XOR/ADD loop writing to .text section?
│  │  └─ SMC → breakpoint after decode, dump
│  │
│  ├─ Impossible conditions in branches?
│  │  └─ Opaque predicates → Z3 proving or pattern removal
│  │
│  ├─ Disassembly looks wrong / functions overlap?
│  │  └─ Anti-disassembly → manual re-analysis at correct offsets
│  │
│  ├─ No readable strings?
│  │  └─ String encryption → hook decrypt function or emulate
│  │
│  ├─ No imports in IAT?
│  │  └─ Import hiding → identify hash, build lookup table
│  │
│  └─ pushad/pushfd → complex code → popad/popfd?
│     └─ VM protector entry/exit → full VM analysis
│
└─ What tool to use?
   ├─ Known protector (VMProtect/Themida) → specific deprotection guide
   ├─ Custom obfuscation → combine: IDA scripting + Triton + manual
   ├─ CTF challenge → angr symbolic execution often fastest
   └─ Malware analysis → dynamic (debugger + API monitor) first

10. TOOLBOX

ToolPurposeBest For
IDA Pro + Hex-RaysDisassembly, decompilation, scriptingAll-around analysis
GhidraFree alternative with scripting (Java/Python)Budget-friendly RE
D-810 (IDA plugin)Automated CFF deflatteningOLLVM-style obfuscation
miasmIR-based analysis frameworkSymbolic deobfuscation
TritonDynamic symbolic executionOpaque predicate solving, CFF
REVENFull-system trace recording and replayVM protector analysis
demovfuscatormovfuscator reversalmov-only binaries
x64dbg + pluginsDynamic analysis with scriptingWindows RE
Unicorn EngineCPU emulationSMC unpacking, shellcode
CapstoneDisassembly libraryCustom tooling
IDA FLIRTFunction signature matchingIdentify library code in stripped binaries
Binary NinjaAlternative disassembler with MLIL/HLILAutomated analysis

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

insecure-source-code-management

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

clickjacking

No summary provided by upstream source.

Repository SourceNeeds Review
General

hack

No summary provided by upstream source.

Repository SourceNeeds Review
General

api-sec

No summary provided by upstream source.

Repository SourceNeeds Review