test-data-management

<default_to_action> When creating or managing test data:

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "test-data-management" with this command: npx skills add proffesor-for-testing/agentic-qe/proffesor-for-testing-agentic-qe-test-data-management

Test Data Management

<default_to_action> When creating or managing test data:

  • NEVER use production PII directly

  • GENERATE synthetic data with faker libraries

  • ANONYMIZE production data if used (mask, hash)

  • ISOLATE test data (transactions, per-test cleanup)

  • SCALE with batch generation (10k+ records/sec)

Quick Data Strategy:

  • Unit tests: Minimal data (just enough)

  • Integration: Realistic data (full complexity)

  • Performance: Volume data (10k+ records)

Critical Success Factors:

  • 40% of test failures from inadequate data

  • GDPR fines up to €20M for PII violations

  • Never store production PII in test environments </default_to_action>

Quick Reference Card

When to Use

  • Creating test datasets

  • Handling sensitive data

  • Performance testing with volume

  • GDPR/CCPA compliance

Data Strategies

Type When Size

Minimal Unit tests 1-10 records

Realistic Integration 100-1000 records

Volume Performance 10k+ records

Edge cases Boundary testing Targeted

Privacy Techniques

Technique Use Case

Synthetic Generate fake data (preferred)

Masking j***@example.com

Hashing Irreversible pseudonymization

Tokenization Reversible with key

Synthetic Data Generation

import { faker } from '@faker-js/faker';

// Seed for reproducibility faker.seed(123);

function generateUser() { return { id: faker.string.uuid(), email: faker.internet.email(), firstName: faker.person.firstName(), lastName: faker.person.lastName(), phone: faker.phone.number(), address: { street: faker.location.streetAddress(), city: faker.location.city(), zip: faker.location.zipCode() }, createdAt: faker.date.past() }; }

// Generate 1000 users const users = Array.from({ length: 1000 }, generateUser);

Test Data Builder Pattern

class UserBuilder { private user: Partial<User> = {};

asAdmin() { this.user.role = 'admin'; this.user.permissions = ['read', 'write', 'delete']; return this; }

asCustomer() { this.user.role = 'customer'; this.user.permissions = ['read']; return this; }

withEmail(email: string) { this.user.email = email; return this; }

build(): User { return { id: this.user.id ?? faker.string.uuid(), email: this.user.email ?? faker.internet.email(), role: this.user.role ?? 'customer', ...this.user } as User; } }

// Usage const admin = new UserBuilder().asAdmin().withEmail('admin@test.com').build(); const customer = new UserBuilder().asCustomer().build();

Data Anonymization

// Masking function maskEmail(email) { const [user, domain] = email.split('@'); return ${user[0]}***@${domain}; } // john@example.com → j***@example.com

function maskCreditCard(cc) { return ****-****-****-${cc.slice(-4)}; } // 4242424242424242 → --****-4242

// Anonymize production data const anonymizedUsers = prodUsers.map(user => ({ id: user.id, // Keep ID for relationships email: user-${user.id}@example.com, // Fake email firstName: faker.person.firstName(), // Generated phone: null, // Remove PII createdAt: user.createdAt // Keep non-PII }));

Database Transaction Isolation

// Best practice: use transactions for cleanup beforeEach(async () => { await db.beginTransaction(); });

afterEach(async () => { await db.rollbackTransaction(); // Auto cleanup! });

test('user registration', async () => { const user = await userService.register({ email: 'test@example.com' }); expect(user.id).toBeDefined(); // Automatic rollback after test - no cleanup needed });

Volume Data Generation

// Generate 10,000 users efficiently async function generateLargeDataset(count = 10000) { const batchSize = 1000; const batches = Math.ceil(count / batchSize);

for (let i = 0; i < batches; i++) { const users = Array.from({ length: batchSize }, (_, index) => ({ id: i * batchSize + index, email: user${i * batchSize + index}@example.com, firstName: faker.person.firstName() }));

await db.users.insertMany(users); // Batch insert
console.log(`Batch ${i + 1}/${batches}`);

} }

Agent-Driven Data Generation

// High-speed generation with constraints await Task("Generate Test Data", { schema: 'ecommerce', count: { users: 10000, products: 500, orders: 5000 }, preserveReferentialIntegrity: true, constraints: { age: { min: 18, max: 90 }, roles: ['customer', 'admin'] } }, "qe-test-data-architect");

// GDPR-compliant anonymization await Task("Anonymize Production Data", { source: 'production-snapshot', piiFields: ['email', 'phone', 'ssn'], method: 'pseudonymization', retainStructure: true }, "qe-test-data-architect");

Agent Coordination Hints

Memory Namespace

aqe/test-data-management/ ├── schemas/* - Data schemas ├── generators/* - Generator configs ├── anonymization/* - PII handling rules └── fixtures/* - Reusable fixtures

Fleet Coordination

const dataFleet = await FleetManager.coordinate({ strategy: 'test-data-generation', agents: [ 'qe-test-data-architect', // Generate data 'qe-test-executor', // Execute with data 'qe-security-scanner' // Validate no PII exposure ], topology: 'sequential' });

Related Skills

  • database-testing - Schema and integrity testing

  • compliance-testing - GDPR/CCPA compliance

  • performance-testing - Volume data for perf tests

Remember

Test data is infrastructure, not an afterthought. 40% of test failures are caused by inadequate test data. Poor data = poor tests.

Never use production PII directly. GDPR fines up to €20M or 4% of revenue. Always use synthetic data or properly anonymized production snapshots.

With Agents: qe-test-data-architect generates 10k+ records/sec with realistic patterns, relationships, and constraints. Agents ensure GDPR/CCPA compliance automatically and eliminate test data bottlenecks.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

api-testing-patterns

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

compatibility-testing

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

regression-testing

No summary provided by upstream source.

Repository SourceNeeds Review