Analyzing Supply Chain Malware Artifacts

Overview

Supply chain attacks compromise legitimate software distribution channels to deliver malware through trusted update mechanisms. Notable examples include SolarWinds SUNBURST (2020, affecting 18,000+ customers), 3CX SmoothOperator (2023, a cascading supply chain attack originating from Trading Technologies), and numerous npm/PyPI package poisoning campaigns. Analysis involves comparing trojanized binaries against legitimate versions, identifying injected code in build artifacts, examining code signing anomalies, and tracing the infection chain from initial compromise through payload delivery. As of 2025, supply chain attacks account for 30% of all breaches, a 100% increase from prior years.

Prerequisites

Python 3.9+ with pefile, ssdeep, hashlib
Binary diff tools (BinDiff, Diaphora)
Code signing verification tools (sigcheck, codesign)
Software composition analysis (SCA) tools
Access to legitimate software versions for comparison
Package repository monitoring (npm, PyPI, NuGet)

Practical Steps

Step 1: Binary Comparison Analysis

#!/usr/bin/env python3
"""Compare trojanized binary against legitimate version."""
import hashlib
import pefile
import sys
import json


def compare_pe_files(legitimate_path, suspect_path):
    """Compare PE file structures between legitimate and suspect versions."""
    legit_pe = pefile.PE(legitimate_path)
    suspect_pe = pefile.PE(suspect_path)

    report = {"differences": [], "suspicious_sections": [], "import_changes": []}

    # Compare sections
    legit_sections = {s.Name.rstrip(b'\x00').decode(): {
        "size": s.SizeOfRawData,
        "entropy": s.get_entropy(),
        "characteristics": s.Characteristics,
    } for s in legit_pe.sections}

    suspect_sections = {s.Name.rstrip(b'\x00').decode(): {
        "size": s.SizeOfRawData,
        "entropy": s.get_entropy(),
        "characteristics": s.Characteristics,
    } for s in suspect_pe.sections}

    # Find new or modified sections
    for name, props in suspect_sections.items():
        if name not in legit_sections:
            report["suspicious_sections"].append({
                "name": name, "reason": "New section not in legitimate version",
                "size": props["size"], "entropy": round(props["entropy"], 2),
            })
        elif abs(props["size"] - legit_sections[name]["size"]) > 1024:
            report["suspicious_sections"].append({
                "name": name, "reason": "Section size significantly changed",
                "legit_size": legit_sections[name]["size"],
                "suspect_size": props["size"],
            })

    # Compare imports
    legit_imports = set()
    if hasattr(legit_pe, 'DIRECTORY_ENTRY_IMPORT'):
        for entry in legit_pe.DIRECTORY_ENTRY_IMPORT:
            for imp in entry.imports:
                if imp.name:
                    legit_imports.add(f"{entry.dll.decode()}!{imp.name.decode()}")

    suspect_imports = set()
    if hasattr(suspect_pe, 'DIRECTORY_ENTRY_IMPORT'):
        for entry in suspect_pe.DIRECTORY_ENTRY_IMPORT:
            for imp in entry.imports:
                if imp.name:
                    suspect_imports.add(f"{entry.dll.decode()}!{imp.name.decode()}")

    new_imports = suspect_imports - legit_imports
    if new_imports:
        report["import_changes"] = list(new_imports)

    # Check code signing
    report["legit_signed"] = bool(legit_pe.OPTIONAL_HEADER.DATA_DIRECTORY[4].Size)
    report["suspect_signed"] = bool(suspect_pe.OPTIONAL_HEADER.DATA_DIRECTORY[4].Size)

    return report


def hash_file(filepath):
    """Calculate multiple hashes for a file."""
    hashes = {}
    with open(filepath, 'rb') as f:
        data = f.read()
    for algo in ['md5', 'sha1', 'sha256']:
        h = hashlib.new(algo)
        h.update(data)
        hashes[algo] = h.hexdigest()
    return hashes


if __name__ == "__main__":
    if len(sys.argv) < 3:
        print(f"Usage: {sys.argv[0]} <legitimate_binary> <suspect_binary>")
        sys.exit(1)
    report = compare_pe_files(sys.argv[1], sys.argv[2])
    print(json.dumps(report, indent=2))

Validation Criteria

Trojanized components identified through binary diffing
Injected code isolated and analyzed separately
Code signing anomalies documented
Infection timeline reconstructed from build artifacts
Downstream impact scope assessed across affected systems
IOCs extracted for detection and blocking