duplication-detect

Find and eliminate code duplication with DRY refactoring strategies

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "duplication-detect" with this command: npx skills add manastalukdar/claude-devstudio/manastalukdar-claude-devstudio-duplication-detect

Code Duplication Detection & DRY Refactoring

I'll analyze your codebase for duplicate code blocks, identify similar patterns across files, and suggest DRY (Don't Repeat Yourself) refactoring strategies based on obra principles.

Detection Capabilities:

  • Exact code duplication (copy-paste detection)
  • Similar code patterns (semantic duplication)
  • Repeated logic across files
  • Duplicated constants and magic numbers
  • Repeated test patterns

Supported Languages:

  • JavaScript/TypeScript
  • Python
  • Go
  • Java
  • Generic text-based detection

Token Optimization

This skill uses duplication detection-specific patterns to minimize token usage:

1. Source Directory Detection Caching (500 token savings)

Pattern: Cache project structure and source directories

  • Store structure in .duplication-structure-cache (1 hour TTL)
  • Cache: source directories, file extensions, excluded paths
  • Read cached structure on subsequent runs (50 tokens vs 550 tokens fresh)
  • Invalidate on directory structure changes
  • Savings: 91% on repeat duplication checks

2. Bash-Based Duplication Tool Execution (1,800 token savings)

Pattern: Use jscpd/PMD directly via bash

  • JavaScript: jscpd --format json (300 tokens)
  • Python: pylint --duplicate-code (300 tokens)
  • Generic: simian or custom grep-based detection (400 tokens)
  • Parse JSON output with jq
  • No Task agents for duplication detection
  • Savings: 90% vs Task-based duplication analysis

3. Sample-Based Duplication Reporting (900 token savings)

Pattern: Report first 10 duplication instances only

  • Show top 10 duplications by severity (600 tokens)
  • Count remaining duplications without details
  • Full report via --all flag
  • Savings: 65% vs reporting every duplication

4. Template-Based DRY Refactoring Recommendations (800 token savings)

Pattern: Use predefined DRY patterns

  • Standard strategies: extract function, extract constant, inheritance, composition
  • Pattern-based recommendations for duplication types
  • No creative refactoring generation
  • Savings: 80% vs LLM-generated DRY strategies

5. Incremental Duplication Checks (1,000 token savings)

Pattern: Check only changed files via git diff

  • Analyze files modified since last commit (500 tokens)
  • Check if changes introduce new duplication
  • Full codebase analysis via --full flag
  • Savings: 75% vs full codebase duplication detection

6. Grep-Based Similar Pattern Discovery (700 token savings)

Pattern: Find potential duplications with grep

  • Grep for repeated patterns: function signatures, constant values (300 tokens)
  • Flag files with high similarity
  • Run full tool only on flagged file pairs
  • Savings: 70% vs running tool on all file combinations

7. Cached Duplication Baseline (600 token savings)

Pattern: Store baseline duplication report

  • Cache initial duplication report
  • Compare new runs against baseline to detect new duplications
  • Focus on newly introduced duplications
  • Savings: 80% by focusing on deltas

8. Threshold-Based Filtering (400 token savings)

Pattern: Filter out small duplications

  • Default: 6+ lines of duplication
  • Skip trivial duplications (imports, boilerplate)
  • Adjustable threshold
  • Savings: 70% by filtering noise

Real-World Token Usage Distribution

Typical operation patterns:

  • Check recent changes (git diff scope): 1,200 tokens
  • Show top 10 duplications: 1,400 tokens
  • Full codebase analysis (first time): 3,000 tokens
  • Cached baseline comparison: 800 tokens
  • Refactoring recommendations (top 5): 1,800 tokens
  • Most common: Incremental checks on recent changes

Expected per-analysis: 1,500-2,500 tokens (60% reduction from 3,500-6,000 baseline) Real-world average: 1,100 tokens (due to incremental checks, sample-based reporting, cached baseline)

Arguments: $ARGUMENTS - optional: minimum duplication threshold (default: 6 lines) or specific directories

<think> Code duplication indicates: - Copy-paste programming (high maintenance cost) - Missing abstraction opportunities - Violation of DRY principle - Increased bug surface (fix in one place, miss others) - Higher cognitive load

DRY refactoring strategies:

  • Extract function/method (most common)
  • Extract constant/configuration
  • Inheritance or composition
  • Higher-order functions
  • Template method pattern
  • Strategy pattern for variations </think>

Phase 1: Tool Setup & Configuration

First, I'll set up duplication detection tools:

#!/bin/bash
# Duplication Detection - Setup & Configuration

echo "=== Code Duplication Detection ==="
echo ""

# Create analysis directory
mkdir -p .claude/duplication-analysis
ANALYSIS_DIR=".claude/duplication-analysis"
TIMESTAMP=$(date +%Y%m%d-%H%M%S)
REPORT="$ANALYSIS_DIR/duplication-report-$TIMESTAMP.md"
MIN_LINES="${1:-6}"  # Minimum lines for duplication detection

echo "Configuration:"
echo "  Minimum duplication threshold: $MIN_LINES lines"
echo "  Analysis directory: $ANALYSIS_DIR"
echo ""

# Detect project structure
echo "Detecting project structure..."
SOURCE_DIRS=""

# Common source directories
for dir in src lib app components pages services utils helpers; do
    if [ -d "$dir" ]; then
        SOURCE_DIRS="$SOURCE_DIRS $dir"
        echo "  ✓ Found: $dir/"
    fi
done

# Python-specific
for dir in tests test __tests__; do
    if [ -d "$dir" ]; then
        echo "  ✓ Found test directory: $dir/"
    fi
done

if [ -z "$SOURCE_DIRS" ]; then
    echo "  Using current directory"
    SOURCE_DIRS="."
fi

echo ""

Phase 2: Install Detection Tools

I'll install and configure duplication detection tools:

echo "=== Installing Duplication Detection Tools ==="
echo ""

install_jscpd() {
    # JSCPD - Multi-language copy-paste detector
    if ! command -v jscpd >/dev/null 2>&1 && ! npm list -g jscpd >/dev/null 2>&1; then
        echo "Installing jscpd (copy-paste detector)..."
        npm install -g jscpd 2>/dev/null || npm install --save-dev jscpd

        if [ $? -eq 0 ]; then
            echo "✓ jscpd installed"
        else
            echo "⚠️  Failed to install jscpd - using basic detection"
            return 1
        fi
    else
        echo "✓ jscpd already installed"
    fi

    # Create jscpd configuration
    cat > "$ANALYSIS_DIR/.jscpd.json" << JSCPDCONFIG
{
    "threshold": $MIN_LINES,
    "reporters": ["json", "html"],
    "ignore": [
        "**/*.min.js",
        "**/node_modules/**",
        "**/dist/**",
        "**/build/**",
        "**/.git/**",
        "**/coverage/**",
        "**/__pycache__/**",
        "**/venv/**",
        "**/.venv/**"
    ],
    "format": ["javascript", "typescript", "python", "go", "java"],
    "absolute": false,
    "output": "$ANALYSIS_DIR/jscpd-report"
}
JSCPDCONFIG

    echo "✓ jscpd configuration created"
    return 0
}

# Install tool
if install_jscpd; then
    USE_JSCPD=true
else
    USE_JSCPD=false
    echo "ℹ️  Will use basic grep-based detection"
fi

echo ""

Phase 3: Detect Code Duplication

I'll analyze the codebase for duplicated code:

echo "=== Analyzing Code Duplication ==="
echo ""

run_jscpd_analysis() {
    echo "Running jscpd analysis..."
    echo ""

    jscpd $SOURCE_DIRS \
        --config "$ANALYSIS_DIR/.jscpd.json" \
        2>&1 | tee "$ANALYSIS_DIR/jscpd-log.txt"

    if [ $? -eq 0 ]; then
        echo ""
        echo "✓ jscpd analysis complete"

        # Check if duplications found
        if [ -f "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" ]; then
            DUPLICATIONS=$(jq '.statistics.total.duplications' "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null)
            PERCENTAGE=$(jq '.statistics.total.percentage' "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null)

            echo ""
            echo "Duplication Statistics:"
            echo "  Total duplications: $DUPLICATIONS"
            echo "  Duplication percentage: $PERCENTAGE%"
            echo ""

            # Extract top duplications
            echo "Top Duplicated Blocks:"
            jq -r '.duplicates[] |
                "\(.format) - \(.lines) lines duplicated in \(.fragment) (\(.firstFile.name):\(.firstFile.start) and \(.secondFile.name):\(.secondFile.start))"' \
                "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null | head -10
        fi
    else
        echo "⚠️  jscpd analysis failed - see $ANALYSIS_DIR/jscpd-log.txt"
        return 1
    fi
}

run_basic_duplication_detection() {
    echo "Running basic duplication detection..."
    echo ""

    # Find similar function signatures
    echo "Detecting similar function signatures..."

    # JavaScript/TypeScript functions
    if find $SOURCE_DIRS -name "*.js" -o -name "*.jsx" -o -name "*.ts" -o -name "*.tsx" 2>/dev/null | grep -q .; then
        grep -rh "^function\|^const.*=.*=>.*{$\|^export function" \
            --include="*.js" --include="*.jsx" --include="*.ts" --include="*.tsx" \
            $SOURCE_DIRS 2>/dev/null | \
            sort | uniq -d | head -10 > "$ANALYSIS_DIR/duplicate-functions.txt"

        if [ -s "$ANALYSIS_DIR/duplicate-functions.txt" ]; then
            echo "  Found duplicate function signatures (JavaScript/TypeScript):"
            cat "$ANALYSIS_DIR/duplicate-functions.txt" | sed 's/^/    /'
            echo ""
        fi
    fi

    # Python functions
    if find $SOURCE_DIRS -name "*.py" 2>/dev/null | grep -q .; then
        grep -rh "^def \|^async def " \
            --include="*.py" \
            $SOURCE_DIRS 2>/dev/null | \
            sort | uniq -d | head -10 > "$ANALYSIS_DIR/duplicate-python-functions.txt"

        if [ -s "$ANALYSIS_DIR/duplicate-python-functions.txt" ]; then
            echo "  Found duplicate function signatures (Python):"
            cat "$ANALYSIS_DIR/duplicate-python-functions.txt" | sed 's/^/    /'
            echo ""
        fi
    fi

    # Find magic numbers (potential constants)
    echo "Detecting magic numbers (repeated literals)..."

    find $SOURCE_DIRS -type f \( -name "*.js" -o -name "*.ts" -o -name "*.py" \) 2>/dev/null | \
        xargs grep -oh "[0-9]\{2,\}" 2>/dev/null | \
        sort | uniq -c | sort -rn | head -10 > "$ANALYSIS_DIR/magic-numbers.txt"

    if [ -s "$ANALYSIS_DIR/magic-numbers.txt" ]; then
        echo "  Most common numeric literals:"
        cat "$ANALYSIS_DIR/magic-numbers.txt" | sed 's/^/    /'
        echo ""
    fi

    # Find repeated strings
    echo "Detecting repeated string literals..."

    find $SOURCE_DIRS -type f \( -name "*.js" -o -name "*.ts" -o -name "*.py" \) 2>/dev/null | \
        xargs grep -oh "\"[^\"]\{10,\}\"" 2>/dev/null | \
        sort | uniq -c | sort -rn | head -10 > "$ANALYSIS_DIR/repeated-strings.txt"

    if [ -s "$ANALYSIS_DIR/repeated-strings.txt" ]; then
        echo "  Most common string literals:"
        cat "$ANALYSIS_DIR/repeated-strings.txt" | sed 's/^/    /'
        echo ""
    fi
}

# Run appropriate analysis
if [ "$USE_JSCPD" = true ]; then
    if ! run_jscpd_analysis; then
        echo "Falling back to basic detection..."
        run_basic_duplication_detection
    fi
else
    run_basic_duplication_detection
fi

Phase 4: Generate DRY Refactoring Strategies

I'll provide specific refactoring patterns for eliminating duplication:

echo ""
echo "=== Generating DRY Refactoring Strategies ==="
echo ""

cat > "$ANALYSIS_DIR/dry-refactoring-patterns.md" << 'DRYPATTERNS'
# DRY Refactoring Patterns

Based on obra YAGNI/DRY principles: Don't Repeat Yourself

---

## Pattern 1: Extract Function

**Problem:** Same code block repeated in multiple places

**Solution:** Extract to a reusable function

### Before (JavaScript)
```javascript
// File 1
function processOrderA(order) {
    if (!order.customer || !order.customer.email) {
        throw new Error('Invalid customer');
    }
    if (!order.items || order.items.length === 0) {
        throw new Error('Empty order');
    }
    // ... process order
}

// File 2
function processOrderB(order) {
    if (!order.customer || !order.customer.email) {
        throw new Error('Invalid customer');
    }
    if (!order.items || order.items.length === 0) {
        throw new Error('Empty order');
    }
    // ... different processing
}

After

// utils/validation.js
export function validateOrder(order) {
    if (!order.customer?.email) {
        throw new Error('Invalid customer');
    }
    if (!order.items?.length) {
        throw new Error('Empty order');
    }
}

// File 1
import { validateOrder } from './utils/validation';

function processOrderA(order) {
    validateOrder(order);
    // ... process order
}

// File 2
import { validateOrder } from './utils/validation';

function processOrderB(order) {
    validateOrder(order);
    // ... different processing
}

DRY Improvement: 8 duplicated lines → 1 function call


Pattern 2: Extract Configuration/Constants

Problem: Same literals repeated across codebase

Solution: Centralize in configuration

Before (Python)

# Multiple files with repeated values
def calculate_shipping():
    if weight < 10:
        return 5.99
    return 9.99

def check_free_shipping(total):
    return total >= 50.00

def apply_discount():
    if quantity >= 5:
        return 0.10
    return 0

After

# config/constants.py
class ShippingConfig:
    FREE_SHIPPING_THRESHOLD = 50.00
    LIGHT_PACKAGE_WEIGHT = 10
    LIGHT_PACKAGE_COST = 5.99
    HEAVY_PACKAGE_COST = 9.99

class DiscountConfig:
    BULK_QUANTITY = 5
    BULK_DISCOUNT = 0.10

# services/shipping.py
from config.constants import ShippingConfig, DiscountConfig

def calculate_shipping(weight):
    if weight < ShippingConfig.LIGHT_PACKAGE_WEIGHT:
        return ShippingConfig.LIGHT_PACKAGE_COST
    return ShippingConfig.HEAVY_PACKAGE_COST

def check_free_shipping(total):
    return total >= ShippingConfig.FREE_SHIPPING_THRESHOLD

def apply_discount(quantity):
    if quantity >= DiscountConfig.BULK_QUANTITY:
        return DiscountConfig.BULK_DISCOUNT
    return 0

DRY Improvement: Magic numbers eliminated, single source of truth


Pattern 3: Template Method Pattern

Problem: Similar algorithms with slight variations

Solution: Use template method or strategy pattern

Before (TypeScript)

class PDFReport {
    generate() {
        this.loadData();
        this.formatHeader();
        this.formatPDFContent();
        this.addPDFFooter();
        this.savePDF();
    }

    private loadData() { /* same logic */ }
    private formatHeader() { /* same logic */ }
    private formatPDFContent() { /* PDF-specific */ }
    private addPDFFooter() { /* PDF-specific */ }
    private savePDF() { /* PDF-specific */ }
}

class ExcelReport {
    generate() {
        this.loadData();
        this.formatHeader();
        this.formatExcelContent();
        this.addExcelFooter();
        this.saveExcel();
    }

    private loadData() { /* DUPLICATE logic */ }
    private formatHeader() { /* DUPLICATE logic */ }
    private formatExcelContent() { /* Excel-specific */ }
    private addExcelFooter() { /* Excel-specific */ }
    private saveExcel() { /* Excel-specific */ }
}

After

abstract class Report {
    // Template method
    generate() {
        this.loadData();
        this.formatHeader();
        this.formatContent();
        this.addFooter();
        this.save();
    }

    // Common implementations
    protected loadData() {
        // Shared logic
    }

    protected formatHeader() {
        // Shared logic
    }

    // Abstract methods for subclasses
    protected abstract formatContent(): void;
    protected abstract addFooter(): void;
    protected abstract save(): void;
}

class PDFReport extends Report {
    protected formatContent() { /* PDF-specific */ }
    protected addFooter() { /* PDF-specific */ }
    protected save() { /* PDF-specific */ }
}

class ExcelReport extends Report {
    protected formatContent() { /* Excel-specific */ }
    protected addFooter() { /* Excel-specific */ }
    protected save() { /* Excel-specific */ }
}

DRY Improvement: Shared logic in base class, variations in subclasses


Pattern 4: Higher-Order Functions

Problem: Similar operations with different behaviors

Solution: Use higher-order functions or callbacks

Before (JavaScript)

function processUsersForEmail(users) {
    const results = [];
    for (const user of users) {
        if (user.email && user.active) {
            results.push(user);
        }
    }
    return results;
}

function processUsersForSMS(users) {
    const results = [];
    for (const user of users) {
        if (user.phone && user.active) {
            results.push(user);
        }
    }
    return results;
}

function processUsersForPush(users) {
    const results = [];
    for (const user of users) {
        if (user.deviceToken && user.active) {
            results.push(user);
        }
    }
    return results;
}

After

function processUsers(users, channel) {
    const validators = {
        email: user => user.email && user.active,
        sms: user => user.phone && user.active,
        push: user => user.deviceToken && user.active
    };

    return users.filter(validators[channel]);
}

// Or more flexible with custom validator
function processUsers(users, isValid) {
    return users.filter(isValid);
}

// Usage
const emailUsers = processUsers(users, user => user.email && user.active);
const smsUsers = processUsers(users, user => user.phone && user.active);
const pushUsers = processUsers(users, user => user.deviceToken && user.active);

DRY Improvement: 3 functions → 1 configurable function


Pattern 5: Composition Over Duplication

Problem: Repeated utility combinations

Solution: Compose smaller utilities

Before (Go)

// Repeated validation patterns
func ValidateUser(user User) error {
    if user.Email == "" {
        return errors.New("email required")
    }
    if !strings.Contains(user.Email, "@") {
        return errors.New("invalid email")
    }
    if user.Age < 18 {
        return errors.New("must be 18+")
    }
    return nil
}

func ValidateAdmin(admin Admin) error {
    if admin.Email == "" {
        return errors.New("email required")
    }
    if !strings.Contains(admin.Email, "@") {
        return errors.New("invalid email")
    }
    if admin.Age < 18 {
        return errors.New("must be 18+")
    }
    if admin.Role == "" {
        return errors.New("role required")
    }
    return nil
}

After

// Composable validators
func requireEmail(email string) error {
    if email == "" {
        return errors.New("email required")
    }
    return nil
}

func validateEmailFormat(email string) error {
    if !strings.Contains(email, "@") {
        return errors.New("invalid email")
    }
    return nil
}

func requireAdult(age int) error {
    if age < 18 {
        return errors.New("must be 18+")
    }
    return nil
}

func requireRole(role string) error {
    if role == "" {
        return errors.New("role required")
    }
    return nil
}

// Compose validators
func ValidateUser(user User) error {
    validators := []func() error{
        func() error { return requireEmail(user.Email) },
        func() error { return validateEmailFormat(user.Email) },
        func() error { return requireAdult(user.Age) },
    }
    return runValidators(validators)
}

func ValidateAdmin(admin Admin) error {
    validators := []func() error{
        func() error { return requireEmail(admin.Email) },
        func() error { return validateEmailFormat(admin.Email) },
        func() error { return requireAdult(admin.Age) },
        func() error { return requireRole(admin.Role) },
    }
    return runValidators(validators)
}

func runValidators(validators []func() error) error {
    for _, validate := range validators {
        if err := validate(); err != nil {
            return err
        }
    }
    return nil
}

DRY Improvement: Reusable validators, composable validation


Pattern 6: Extract Test Helpers

Problem: Repeated test setup/teardown

Solution: Create test utilities

Before

// test/user.test.js
describe('User tests', () => {
    it('should create user', () => {
        const db = createDatabase();
        const user = { name: 'John', email: 'john@example.com' };
        // ... test logic
        db.close();
    });

    it('should update user', () => {
        const db = createDatabase();
        const user = { name: 'John', email: 'john@example.com' };
        // ... test logic
        db.close();
    });
});

// test/order.test.js
describe('Order tests', () => {
    it('should create order', () => {
        const db = createDatabase();
        const user = { name: 'John', email: 'john@example.com' };
        // ... test logic
        db.close();
    });
});

After

// test/helpers/testUtils.js
export function setupTestDatabase() {
    const db = createDatabase();
    return {
        db,
        cleanup: () => db.close()
    };
}

export function createTestUser(overrides = {}) {
    return {
        name: 'John',
        email: 'john@example.com',
        ...overrides
    };
}

// test/user.test.js
import { setupTestDatabase, createTestUser } from './helpers/testUtils';

describe('User tests', () => {
    let db;

    beforeEach(() => {
        ({ db } = setupTestDatabase());
    });

    afterEach(() => {
        db.close();
    });

    it('should create user', () => {
        const user = createTestUser();
        // ... test logic (cleaner!)
    });

    it('should update user', () => {
        const user = createTestUser({ name: 'Jane' });
        // ... test logic
    });
});

DRY Improvement: Shared test utilities, less boilerplate


obra YAGNI/DRY Principles

YAGNI: You Aren't Gonna Need It

  • Don't add functionality until necessary
  • Avoid premature abstraction
  • Wait for duplication before extracting

DRY: Don't Repeat Yourself

  • Every piece of knowledge should have a single representation
  • Avoid copy-paste programming
  • Extract when you see duplication 2-3 times

When to Extract

  1. Three strikes rule: Duplicate 3 times → extract
  2. Clear pattern: Similar code structure appears
  3. High maintenance cost: Changes require updates in multiple places
  4. Business logic: Domain rules should be centralized

When NOT to Extract

  1. Premature abstraction: Only seen once or twice
  2. Coincidental duplication: Similar code, different concepts
  3. Temporary code: Prototypes, experiments
  4. Over-engineering: Abstraction more complex than duplication

DRYPATTERNS

echo "✓ DRY refactoring patterns guide created"


## Phase 5: Generate Duplication Report

I'll create a comprehensive report with prioritized refactoring opportunities:

```bash
echo ""
echo "=== Generating Duplication Report ==="
echo ""

# Count duplications
TOTAL_DUPLICATIONS=0
DUPLICATION_PERCENTAGE=0

if [ -f "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" ]; then
    TOTAL_DUPLICATIONS=$(jq '.statistics.total.duplications' "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null)
    DUPLICATION_PERCENTAGE=$(jq '.statistics.total.percentage' "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null)
fi

cat > "$REPORT" << EOF
# Code Duplication Analysis Report

**Generated:** $(date)
**Minimum Lines Threshold:** $MIN_LINES lines
**Total Duplications Found:** $TOTAL_DUPLICATIONS
**Duplication Percentage:** $DUPLICATION_PERCENTAGE%

---

## Summary

Code duplication violates the DRY (Don't Repeat Yourself) principle and leads to:
- Higher maintenance costs (fix bugs in multiple places)
- Increased bug likelihood (miss updating one location)
- Larger codebase (more code to understand)
- Lower code quality (copy-paste programming)

**Duplication Guidelines (obra principles):**
- **0-5%:** Excellent - minimal duplication
- **5-10%:** Good - acceptable for large projects
- **10-20%:** Moderate - consider refactoring
- **20%+:** High - significant technical debt

---

## Duplication Statistics

EOF

if [ "$USE_JSCPD" = true ] && [ -f "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" ]; then
    cat >> "$REPORT" << EOF
### Overall Statistics
- Total Lines Analyzed: $(jq '.statistics.total.lines' "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null)
- Duplicated Lines: $(jq '.statistics.total.clones' "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null)
- Duplication Blocks: $TOTAL_DUPLICATIONS
- Duplication Percentage: $DUPLICATION_PERCENTAGE%

### Top Duplicated Files

EOF
    jq -r '.statistics.formats[] |
        "\(.format):\n  Files: \(.sources)\n  Duplications: \(.duplications)\n  Percentage: \(.percentage)%\n"' \
        "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null >> "$REPORT"

    cat >> "$REPORT" << EOF

### Largest Duplication Blocks

EOF
    jq -r '.duplicates[:10] | .[] |
        "**\(.lines) lines** duplicated in \(.format)\n- Location 1: \(.firstFile.name):\(.firstFile.start)\n- Location 2: \(.secondFile.name):\(.secondFile.start)\n"' \
        "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null >> "$REPORT"

    cat >> "$REPORT" << EOF

**Full HTML Report:** Open \`$ANALYSIS_DIR/jscpd-report/html/index.html\` in browser

EOF
else
    cat >> "$REPORT" << EOF
### Basic Analysis Results

EOF
    if [ -f "$ANALYSIS_DIR/duplicate-functions.txt" ] && [ -s "$ANALYSIS_DIR/duplicate-functions.txt" ]; then
        cat >> "$REPORT" << EOF
**Duplicate Function Signatures (JavaScript/TypeScript):**
\`\`\`
$(cat "$ANALYSIS_DIR/duplicate-functions.txt")
\`\`\`

EOF
    fi

    if [ -f "$ANALYSIS_DIR/duplicate-python-functions.txt" ] && [ -s "$ANALYSIS_DIR/duplicate-python-functions.txt" ]; then
        cat >> "$REPORT" << EOF
**Duplicate Function Signatures (Python):**
\`\`\`
$(cat "$ANALYSIS_DIR/duplicate-python-functions.txt")
\`\`\`

EOF
    fi

    if [ -f "$ANALYSIS_DIR/magic-numbers.txt" ] && [ -s "$ANALYSIS_DIR/magic-numbers.txt" ]; then
        cat >> "$REPORT" << EOF
**Repeated Numeric Literals:**
\`\`\`
$(cat "$ANALYSIS_DIR/magic-numbers.txt")
\`\`\`
Consider extracting to named constants.

EOF
    fi

    if [ -f "$ANALYSIS_DIR/repeated-strings.txt" ] && [ -s "$ANALYSIS_DIR/repeated-strings.txt" ]; then
        cat >> "$REPORT" << EOF
**Repeated String Literals:**
\`\`\`
$(cat "$ANALYSIS_DIR/repeated-strings.txt")
\`\`\`
Consider extracting to configuration.

EOF
    fi
fi

cat >> "$REPORT" << 'EOF'
---

## Recommended DRY Refactoring

### Priority 1: Extract Duplicated Functions
- Identify exact code duplicates (10+ lines)
- Extract to shared utility functions
- Consolidate in common modules

### Priority 2: Centralize Configuration
- Extract magic numbers to constants
- Create configuration files
- Use environment variables for deployment-specific values

### Priority 3: Eliminate Similar Patterns
- Identify similar code structures
- Extract common patterns
- Use higher-order functions or templates

### Priority 4: Consolidate Test Helpers
- Extract common test setup/teardown
- Create test utilities
- Share fixtures across test suites

**See detailed patterns:** `cat $ANALYSIS_DIR/dry-refactoring-patterns.md`

---

## Implementation Steps

1. **Create Git Checkpoint**
   ```bash
   git add -A
   git commit -m "Pre DRY-refactoring checkpoint" || echo "No changes"
  1. Prioritize Duplications

    • Start with largest duplication blocks
    • Focus on frequently changed code
    • Consider domain importance
  2. Apply DRY Patterns

    • Extract function (most common)
    • Extract configuration
    • Template method pattern
    • Higher-order functions
    • Composition
  3. Three Strikes Rule

    • Wait for 3 occurrences before extracting
    • Avoid premature abstraction
    • Balance DRY with YAGNI
  4. Verify Improvements

    • Re-run duplication analysis
    • Ensure tests pass
    • Check for reduced code size
  5. Document Changes

    • Update documentation
    • Note refactoring decisions
    • Share patterns with team

Continuous Monitoring

Add to CI/CD

Pre-commit Hook

# .git/hooks/pre-commit
#!/bin/bash
# Check duplication before commit
jscpd --threshold $MIN_LINES . || exit 1

GitHub Actions

name: Code Quality
on: [push, pull_request]
jobs:
  duplication:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Check duplication
        run: |
          npm install -g jscpd
          jscpd --threshold 6 .

Integration with Other Skills

  • /refactor - Systematic code restructuring
  • /complexity-reduce - Reduce cyclomatic complexity
  • /make-it-pretty - Improve code readability
  • /review - Include duplication in code reviews
  • /test - Ensure tests after refactoring

Resources


Report generated at: $(date)

Next Steps:

  1. Review high-duplication areas
  2. Select appropriate DRY pattern
  3. Implement refactoring incrementally
  4. Re-run analysis to verify improvement

EOF

echo "✓ Duplication report generated: $REPORT"


## Summary

```bash
echo ""
echo "=== ✓ Duplication Analysis Complete ==="
echo ""
echo "📊 Analysis Results:"
if [ "$USE_JSCPD" = true ]; then
    echo "  Total duplications: $TOTAL_DUPLICATIONS"
    echo "  Duplication percentage: $DUPLICATION_PERCENTAGE%"
    echo "  Threshold: $MIN_LINES lines"
else
    echo "  Basic analysis complete"
    echo "  Install jscpd for detailed analysis: npm install -g jscpd"
fi
echo ""
echo "📁 Generated Files:"
echo "  - Duplication Report: $REPORT"
echo "  - DRY Patterns: $ANALYSIS_DIR/dry-refactoring-patterns.md"
[ -f "$ANALYSIS_DIR/jscpd-report/html/index.html" ] && echo "  - HTML Report: $ANALYSIS_DIR/jscpd-report/html/index.html"
echo ""
echo "🎯 Recommended Actions:"
if [ "$DUPLICATION_PERCENTAGE" != "0" ] && [ "${DUPLICATION_PERCENTAGE%.*}" -gt 10 ]; then
    echo "  ⚠️  High duplication detected ($DUPLICATION_PERCENTAGE%)"
    echo "  1. Review largest duplication blocks"
    echo "  2. Extract duplicated functions"
    echo "  3. Centralize configuration"
    echo "  4. Run tests after refactoring"
elif [ "$DUPLICATION_PERCENTAGE" != "0" ] && [ "${DUPLICATION_PERCENTAGE%.*}" -gt 5 ]; then
    echo "  Moderate duplication ($DUPLICATION_PERCENTAGE%)"
    echo "  Consider refactoring during feature work"
else
    echo "  ✓ Low duplication - excellent!"
    echo "  Continue monitoring in code reviews"
fi
echo ""
echo "💡 DRY Refactoring Patterns:"
echo "  1. Extract Function (consolidate duplicates)"
echo "  2. Extract Configuration (centralize constants)"
echo "  3. Template Method (share algorithm structure)"
echo "  4. Higher-Order Functions (parameterize behavior)"
echo "  5. Composition (build from smaller pieces)"
echo ""
echo "📏 obra YAGNI/DRY Guidelines:"
echo "  - Three strikes rule: Extract after 3 duplications"
echo "  - Avoid premature abstraction"
echo "  - Balance DRY with readability"
echo ""
echo "🔗 Integration Points:"
echo "  - /refactor - Systematic restructuring"
echo "  - /complexity-reduce - Reduce complexity"
echo "  - /review - Include in code reviews"
echo ""
echo "View report: cat $REPORT"
echo "View patterns: cat $ANALYSIS_DIR/dry-refactoring-patterns.md"
[ -f "$ANALYSIS_DIR/jscpd-report/html/index.html" ] && echo "View HTML: open $ANALYSIS_DIR/jscpd-report/html/index.html"

Safety Guarantees

What I'll NEVER do:

  • Automatically refactor without analysis
  • Extract after single occurrence (violates YAGNI)
  • Remove code without understanding context
  • Skip testing after refactoring
  • Add AI attribution to commits

What I WILL do:

  • Identify genuine code duplication
  • Suggest proven DRY patterns
  • Follow obra YAGNI principles
  • Recommend incremental refactoring
  • Preserve functionality
  • Provide clear examples

Credits

This skill is based on:

  • obra/superpowers - YAGNI and DRY principles
  • The Pragmatic Programmer - DRY principle origin
  • Martin Fowler - Refactoring patterns
  • JSCPD - Multi-language duplication detection
  • Rule of Three - When to extract abstraction

Token Budget

Target: 2,000-3,500 tokens per execution

  • Phase 1-2: ~600 tokens (setup + tool installation)
  • Phase 3: ~1,000 tokens (duplication detection)
  • Phase 4-5: ~1,500 tokens (patterns + reporting)

Optimization Strategy:

  • Use jscpd for efficient detection
  • Fallback to grep-based analysis
  • Template-based pattern generation
  • Focus on actionable duplications
  • Clear DRY refactoring guidance

This ensures thorough duplication detection with practical DRY refactoring strategies based on obra principles while respecting token limits.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

sessions-init

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

cache-strategy

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

postman-convert

No summary provided by upstream source.

Repository SourceNeeds Review