identifying

Identifiability Analysis (LINDDUN I)

Analyze source code for identifiability threats where individuals can be identified from supposedly anonymous data. Combinations of quasi-identifiers (zip code, birth date, gender) can uniquely identify individuals. Re-identification attacks on "anonymized" data are the primary concern.

Supported Flags

Read ../../shared/schemas/flags.md for full flag documentation. This skill supports all cross-cutting flags.

Flag Identifiability-Specific Behavior

--scope

Default changed . Focuses on files handling user data, anonymization logic, data exports, analytics pipelines, and API responses.

--depth quick

Grep patterns only: scan for PII in logs, quasi-identifiers in exports, and missing anonymization.

--depth standard

Full code read, analyze data fields returned in APIs and stored in databases for re-identification risk.

--depth deep

Trace data flows from collection to storage to export. Assess quasi-identifier combinations across the system.

--depth expert

Deep + re-identification risk modeling: estimate k-anonymity violations and uniqueness of attribute combinations.

--severity

Filter output. Identifiability findings range from low (theoretical) to critical (direct PII exposure).

--fix

Generate anonymization, generalization, and suppression replacements.

Framework Context

LINDDUN I -- Identifiability

Identifiability occurs when a person can be identified from data that is supposed to be anonymous or pseudonymous. Read ../../shared/frameworks/linddun.md for the full LINDDUN framework reference including re-identification attack patterns and regulatory definitions.

Privacy Property Violated: Anonymity / Pseudonymity

STRIDE Mapping: Information Disclosure (identifiability focuses specifically on re-identification of anonymized data rather than general data access)

Workflow

Step 1 -- Determine Scope

Parse --scope flag (default: changed ).
Resolve to a concrete file list.
Filter to relevant files: data models, API handlers, data export logic, analytics pipelines, logging configuration, database schemas, and anonymization utilities.
Prioritize files containing: user data structures, data export endpoints, log statements with user context, report generation, and data sharing logic.

Step 2 -- Analyze for Identifiability Patterns

Read each scoped file and assess re-identification risk:

Identify direct identifiers: Find fields like name, email, phone, SSN, or national ID that should not appear in anonymous contexts.
Identify quasi-identifiers: Find combinations of fields (zip code, age, gender, job title) that together may uniquely identify individuals.
Check anonymization logic: Verify that anonymization techniques are actually applied and are sufficient (not just removing the name field).
Assess API responses: Check whether endpoints return more personal attributes than the consumer needs.
Examine logs and error messages: Look for PII appearing in log output, stack traces, or debug messages.

At --depth deep or --depth expert , model quasi-identifier combinations and estimate uniqueness across the population.

Step 3 -- Report Findings

Output findings per ../../shared/schemas/findings.md . Each finding needs: IDENT-NNN id, title, severity (based on directness of identification and data sensitivity), location with snippet, description of what enables identification, impact (re-identification harm), fix (anonymization, generalization, or suppression), and CWE/LINDDUN references.

Analysis Checklist

Are direct identifiers (name, email, phone, SSN) present in data exports or analytics?
Do API responses return more user attributes than the consumer actually needs?
Are quasi-identifiers (zip code, birth date, gender) combined in any output?
Is anonymization actually implemented, or just assumed in comments?
Do logs contain IP addresses, user agents, or device identifiers alongside actions?
Can database queries return single-user results from "anonymous" tables?
Are email addresses or phone numbers used as primary keys or foreign keys?
Do error messages or stack traces expose personal data fields?

What to Look For

PII in log statements: Personal data written to application logs.
Grep: log.\w+(.*email|logger.\w+(.*name|console.log(.*phone|print(.*ssn
Email or phone as primary key: Using direct identifiers as database keys.
Grep: PRIMARY KEY.*email|primary_key.*email|@Column.*email.*unique|findByEmail|findByPhone
IP address logging: Recording IP addresses without anonymization.
Grep: req.ip|request.remote_addr|X-Forwarded-For|ip_address|ipAddress|getRemoteAddr
Over-fetched API responses: SELECT * or returning full user objects.
Grep: SELECT *.*FROM.*user|.findAll(|.find({})|res.json(user)|JSON.stringify(user
Insufficient anonymization: Removing names but keeping detailed attributes.
Grep: anonymize|anonymise|deidentify|de_identify|pseudonymize|mask.*data
Quasi-identifier combinations: Multiple demographic fields in the same record.
Grep: zip_code.*birth_date|zipCode.*gender|age.*location|dateOfBirth.*address
User agent collection: Storing full browser fingerprint strings.
Grep: user-agent|userAgent|navigator.userAgent|req.headers[.user-agent.]
Data exports without scrubbing: Export endpoints that dump raw user data.
Grep: export.*user|download.*report|csv.*user|toCSV|toJSON.*user

Regulatory Mapping

Regulation Provision Relevance

GDPR Recital 26 Identifiability test Data is personal if any means can identify the subject

GDPR Art. 4(5) Pseudonymization definition Pseudonymized data is still personal data

GDPR Art. 25 Data protection by design Anonymization must be effective by design

HIPAA Safe Harbor 18 identifier categories All 18 must be removed for de-identification

CCPA 1798.140(h) Deidentified information Reasonably cannot be linked to a consumer

CCPA 1798.140(o) Personal information Includes information that identifies or could be linked

Output Format

Use finding ID prefix IDENT (e.g., IDENT-001 , IDENT-002 ).

All findings follow the schema in ../../shared/schemas/findings.md with:

references.cwe : CWE-359 or CWE-200 as appropriate
references.owasp : A02:2021 (Cryptographic Failures -- weak anonymization)
metadata.tool : "identifying"
metadata.framework : "linddun"
metadata.category : "I"

Summary table after all findings:

Identifiability Pattern	Critical	High	Medium	Low
Direct PII exposure
PII in logs
Quasi-identifier combos
Insufficient anonymization
Over-fetched API responses
IP / device tracking

Followed by: top 3 priorities, re-identification risk assessment, and overall assessment.

Safety Notice

Copy this and send it to your AI assistant to learn

Source Transparency

Related Skills

spec-writer

config

sans25

attack-surface