Reverse Engineer (Route-Aware)

Step 2 of 6 in the Reverse Engineering to Spec-Driven Development process. The 6-step process: 1. Analyze, 2. Reverse Engineer (this skill), 3. Create Specs, 4. Gap Analysis, 5. Implementation Planning, 6. Implementation.

Estimated Time: 30-45 minutes Prerequisites: Step 1 completed (analysis-report.md and route selection in .stackshift-state.json ) Output: 11 documentation files in docs/reverse-engineering/

Route-Dependent Behavior:

Greenfield: Extract business logic only (framework-agnostic)
Brownfield: Extract business logic + technical implementation details

Output is the same regardless of implementation framework (Spec Kit, BMAD, or BMAD Auto-Pilot). The framework choice only affects what happens after Step 2.

Configuration Check

Guard: Verify state file exists before proceeding.

if [ ! -f .stackshift-state.json ]; then echo "ERROR: .stackshift-state.json not found." echo "Step 1 (Initial Analysis) must be completed first. Run /stackshift.analyze to begin." exit 1 fi

DETECTION_TYPE=$(cat .stackshift-state.json | jq -r '.detection_type') ROUTE=$(cat .stackshift-state.json | jq -r '.route')

if [ "$DETECTION_TYPE" = "null" ] || [ -z "$DETECTION_TYPE" ]; then echo "ERROR: detection_type missing from state file. Re-run /stackshift.analyze." exit 1 fi if [ "$ROUTE" = "null" ] || [ -z "$ROUTE" ]; then echo "ERROR: route missing from state file. Re-run /stackshift.analyze." exit 1 fi

echo "Detection: $DETECTION_TYPE" echo "Route: $ROUTE"

SPEC_OUTPUT=$(cat .stackshift-state.json | jq -r '.config.spec_output_location // "."') echo "Writing specs to: $SPEC_OUTPUT"

if [ "$SPEC_OUTPUT" != "." ]; then mkdir -p "$SPEC_OUTPUT/docs/reverse-engineering" mkdir -p "$SPEC_OUTPUT/.specify/memory/specifications" fi

State file structure:

{ "detection_type": "monorepo-service", "route": "greenfield", "implementation_framework": "speckit", "config": { "spec_output_location": "~~/git/my-new-app", "build_location": "~~/git/my-new-app", "target_stack": "Next.js 15..." } }

Capture commit hash for incremental updates:

COMMIT_HASH=$(git rev-parse HEAD 2>/dev/null || echo "unknown") COMMIT_DATE=$(git log -1 --format=%ci 2>/dev/null || date -u +"%Y-%m-%d %H:%M:%S") echo "Pinning docs to commit: $COMMIT_HASH"

Extraction approach based on detection + route:

Detection Type

Greenfield
Brownfield

Monorepo Service Business logic only (tech-agnostic) Full implementation + shared packages (tech-prescriptive)

Nx App Business logic only (framework-agnostic) Full Nx/Angular implementation details

Generic App Business logic only Full implementation

detection_type determines WHAT patterns to look for (shared packages, Nx project config, monorepo structure, etc.)
route determines HOW to document them (tech-agnostic vs tech-prescriptive)

Phase 1: Deep Codebase Analysis

Use the Task tool with subagent_type=stackshift:stackshift-code-analyzer:AGENT to perform analysis. If the agent is unavailable, fall back to the Explore agent.

Error recovery: If a subagent fails or returns empty results for a sub-phase, retry once with the Explore agent. If the retry also fails, record the gap with an [ANALYSIS INCOMPLETE] marker and continue with remaining sub-phases.

Missing components: If a sub-phase finds no relevant code (e.g., no frontend in a backend-only service), document the absence in the corresponding output file rather than skipping the sub-phase.

Launch sub-phases 1.1 through 1.6 in parallel using separate subagent invocations. Collect all results before proceeding to Phase 2.

1.1 Backend Analysis

Find all API endpoints and record their method, route, auth requirements, parameters, and purpose.
Catalog every data model including schemas, types, interfaces, and field definitions.
Inventory all configuration sources: env vars, config files, and settings.
Map every external integration: APIs, services, and databases.
Extract business logic from services, utilities, and algorithms.

1.2 Frontend Analysis

List all pages and routes with their purpose and auth requirements.
Catalog all components by category: layout, form, and UI components.
Document state management: store structure and global state patterns.
Map the API client layer: how the frontend calls the backend.
Extract styling patterns: design system, themes, and component styles.

1.3 Infrastructure Analysis

Document deployment configuration: IaC tools, cloud provider, and services.
Map CI/CD pipelines and workflows.
Catalog database setup: type, schema, and migrations.
Identify storage systems: object storage, file systems, and caching.

1.4 Testing Analysis

Locate all test files and identify the testing frameworks in use.
Classify tests by type: unit, integration, and E2E.
Estimate coverage percentages by module.
Catalog test data: mocks, fixtures, and seed data.

1.5 Business Context Analysis

Read README, CONTRIBUTING, and any marketing or landing pages.
Extract package descriptions and repository metadata.
Identify comment patterns indicating user-facing features.
Collect error messages and user-facing strings for persona inference.
Analyze naming conventions to reveal domain concepts.
Examine git history for decision archaeology.

1.6 Decision Archaeology

Inspect dependency manifests (package.json, go.mod, requirements.txt) for technology choices.
Analyze config files (tsconfig, eslint, prettier) for design philosophy.
Review CI/CD configuration for deployment decisions.
Run git blame on key architectural files to identify decision points.
Collect comments with "why" explanations (TODO, HACK, FIXME, NOTE).
Look for rejected alternatives visible in git history or comments.

Progress signal: After all sub-phases complete, log: "Phase 1 complete: Analysis gathered for [list which sub-phases produced results]."

Phase 2: Generate Documentation

Create docs/reverse-engineering/ directory and generate all 11 documentation files. For each file, apply the greenfield or brownfield variant as described in operations/output-file-specs.md . Read that file now for the detailed per-file specifications.

If .stackshift-docs-meta.json already exists, overwrite it completely with fresh metadata.

Step 2.1: Write metadata file FIRST

COMMIT_HASH=$(git rev-parse HEAD 2>/dev/null || echo "unknown") COMMIT_DATE=$(git log -1 --format=%ci 2>/dev/null || date -u +"%Y-%m-%d %H:%M:%S") GENERATED_AT=$(date -u +"%Y-%m-%dT%H:%M:%SZ")

Write docs/reverse-engineering/.stackshift-docs-meta.json :

{ "commit_hash": "<COMMIT_HASH>", "commit_date": "<COMMIT_DATE>", "generated_at": "<GENERATED_AT>", "doc_count": 11, "route": "<greenfield|brownfield>", "docs": { "functional-specification.md": { "generated_at": "<GENERATED_AT>", "commit_hash": "<COMMIT_HASH>" }, "integration-points.md": { "generated_at": "<GENERATED_AT>", "commit_hash": "<COMMIT_HASH>" }, "configuration-reference.md": { "generated_at": "<GENERATED_AT>", "commit_hash": "<COMMIT_HASH>" }, "data-architecture.md": { "generated_at": "<GENERATED_AT>", "commit_hash": "<COMMIT_HASH>" }, "operations-guide.md": { "generated_at": "<GENERATED_AT>", "commit_hash": "<COMMIT_HASH>" }, "technical-debt-analysis.md": { "generated_at": "<GENERATED_AT>", "commit_hash": "<COMMIT_HASH>" }, "observability-requirements.md": { "generated_at": "<GENERATED_AT>", "commit_hash": "<COMMIT_HASH>" }, "visual-design-system.md": { "generated_at": "<GENERATED_AT>", "commit_hash": "<COMMIT_HASH>" }, "test-documentation.md": { "generated_at": "<GENERATED_AT>", "commit_hash": "<COMMIT_HASH>" }, "business-context.md": { "generated_at": "<GENERATED_AT>", "commit_hash": "<COMMIT_HASH>" }, "decision-rationale.md": { "generated_at": "<GENERATED_AT>", "commit_hash": "<COMMIT_HASH>" } } }

Step 2.2: Add metadata header to each doc

Every generated doc starts with this header after the title:

[Document Title]

Generated by StackShift | Commit: <short-hash> | Date: <GENERATED_AT> Run /stackshift.refresh-docs to update with latest changes.

Step 2.3: Generate files with checkpoints

Generate files in this order, logging progress after each:

Batch 1 (core architecture):

functional-specification.md
data-architecture.md
integration-points.md
configuration-reference.md

After writing files 1-4, log: "Generated 4/11 files (core architecture complete)." Verify the output directory contains 4 files before continuing.

Batch 2 (operations and quality): 5. operations-guide.md 6. technical-debt-analysis.md 7. observability-requirements.md 8. visual-design-system.md 9. test-documentation.md

After writing files 5-9, log: "Generated 9/11 files (operations and quality complete)." Verify the output directory contains 9 files before continuing.

Batch 3 (context and decisions): 10. business-context.md 11. decision-rationale.md

After writing files 10-11, log: "Generated 11/11 files. Phase 2 complete."

Output structure:

docs/reverse-engineering/ ├── .stackshift-docs-meta.json ├── functional-specification.md ├── integration-points.md ├── configuration-reference.md ├── data-architecture.md ├── operations-guide.md ├── technical-debt-analysis.md ├── observability-requirements.md ├── visual-design-system.md ├── test-documentation.md ├── business-context.md └── decision-rationale.md

Success Criteria

All 11 documentation files generated in docs/reverse-engineering/
Comprehensive coverage of all application aspects
Framework-agnostic functional specification (for greenfield)
Complete data model documentation
Business context captured with clear [INFERRED] / [NEEDS USER INPUT] markers
Decision rationale documented with ADR format
Integration points fully mapped with data flow diagrams
.stackshift-docs-meta.json created with commit hash for incremental updates
Each doc has metadata header with commit hash and generation date

Next Step

Once all documentation is generated:

For GitHub Spec Kit (implementation_framework: speckit ): Proceed to Step 3 -- use /stackshift.create-specs to transform docs into .specify/ specs.

For BMAD Method (implementation_framework: bmad ): Proceed to Step 6 -- hand off to BMAD's *workflow-init . BMAD's PM and Architect agents use the reverse-engineering docs as context.

For BMAD Auto-Pilot (implementation_framework: bmad-autopilot ): Proceed to /stackshift.bmad-synthesize to auto-generate BMAD artifacts. The 11 reverse-engineering docs provide ~90% of what BMAD needs.

DO / DON'T

DO:

Describe WHAT the system does, not HOW (especially for greenfield)
Use all available signals for inference: README, comments, naming, config, git history
Mark confidence levels: no marker = confident, [INFERRED] = reasonable inference, [NEEDS USER INPUT] = genuinely unknown
Cross-reference between docs (e.g., tech debt informs trade-offs)
Cite specific evidence for each inference

DON'T:

Hard-code framework names in functional specs (greenfield)
Mix business logic with technical implementation (greenfield)
Fabricate business goals with no supporting evidence
State inferences as facts without marking them
Skip a section because it requires inference -- attempt it and mark confidence

Completeness Checklist

Verify analysis captured:

ALL API endpoints (not just the obvious ones)
ALL data models (including DTOs, types, interfaces)
ALL configuration options (check multiple files)
ALL external integrations
ALL user-facing strings and error messages (for persona/context inference)
ALL config files (for decision rationale inference)

Each document must be comprehensive, accurate, organized, actionable, and honest about inferred vs verified information.

reverse-engineer

Safety Notice

Copy this and send it to your AI assistant to learn

[Document Title]

Source Transparency

Related Skills

create-specs

complete-spec

analyze