Neo4j Data Models
When to Use
Use this skill when designing or extending a Neo4j graph data model. Covers naming conventions, node/relationship design, property management, fraud detection domain models, and modeling best practices.
Design Process
Start with specific business questions before designing the model. Follow a three-phase cycle:
-
Conceptualize the structure (nodes, relationships, properties)
-
Design queries that answer the business questions
-
Validate against real data and optimize
Every node requires a unique identifier or property combination. Prioritize the model around the application's most frequent or critical queries.
Naming Conventions
Node Labels — CapitalCase
CREATE (:Person {name: "Alice"}) CREATE (:Company {name: "Neo4j"}) CREATE (:Transaction {transactionId: "TX-001"})
Relationship Types — UPPER_SNAKE_CASE
(:Person)-[:WORKS_AT]->(:Company) (:Account)-[:PERFORM]->(:Transaction) (:Customer)-[:HAS_EMAIL]->(:Email)
Properties — camelCase
CREATE (:Person {firstName: "Alice", lastName: "Smith", deptId: 101}) CREATE (:Transaction {transactionId: "TX-001", createdAt: datetime()})
Node Design
Keep Labels Minimal (max 4)
Additional attributes belong in properties, not labels:
// BAD: too many labels CREATE (:Person:Employee:Developer:Manager {name: "Alice"})
// GOOD: use properties for attributes CREATE (:Person {name: "Alice", role: "Developer", department: "Engineering"})
Eliminate Redundancy with Shared Nodes
Instead of duplicating data across nodes, create shared nodes:
// BAD: email duplicated as string property on multiple customers CREATE (:Customer {email: "shared@example.com"}) CREATE (:Customer {email: "shared@example.com"})
// GOOD: shared Email node CREATE (e:Email {address: "shared@example.com"}) CREATE (c1:Customer)-[:HAS_EMAIL]->(e) CREATE (c2:Customer)-[:HAS_EMAIL]->(e)
Extract Collections into Nodes
When attributes form collections, connect them as separate nodes rather than storing arrays:
// BAD: array property CREATE (:Customer {phones: ["+1-555-0100", "+1-555-0200"]})
// GOOD: separate nodes with relationships CREATE (c:Customer)-[:HAS_PHONE]->(:Phone {number: "+1-555-0100"}) CREATE (c)-[:HAS_PHONE]->(:Phone {number: "+1-555-0200"})
Relationship Design
Use Specific, Descriptive Types
// BAD: generic relationship (:Person)-[:RELATED_TO]->(:Company)
// GOOD: descriptive type (:Person)-[:WORKS_AT]->(:Company) (:Person)-[:FOUNDED]->(:Company)
Single Direction, Not Symmetric Pairs
// BAD: redundant symmetric relationships (:Person)-[:KNOWS]->(:Person) (:Person)<-[:KNOWS]-(:Person)
// GOOD: single direction, query in either direction (:Person)-[:KNOWS]->(:Person) // Query: MATCH (a)-[:KNOWS]-(b) -- undirected traversal
Intermediate Nodes for Hyperedges
When a relationship involves three or more entities, introduce an intermediate node:
// Model: Alice and Bob worked on a project with specific roles CREATE (a:Person {name: "Alice"})-[:WORKED_ON]->(w:Work {role: "Contributor"}) CREATE (w)-[:FOR_PROJECT]->(p:Project {name: "GraphDB Project"}) CREATE (b:Person {name: "Bob"})-[:WORKED_ON]->(w)
Property Management
Properties for Identification and Querying
-
Identification properties: unique keys for anchoring queries (indexed)
-
Query-support properties: simple, indexed properties for filtering and traversal
-
Decoration properties: complex data returned in results only (not indexed)
Always Create Constraints on Business Keys
CREATE CONSTRAINT customer_unique FOR (c:Customer) REQUIRE c.customerId IS UNIQUE CREATE CONSTRAINT email_unique FOR (e:Email) REQUIRE e.address IS UNIQUE CREATE CONSTRAINT account_unique FOR (a:Account) REQUIRE a.accountNumber IS UNIQUE
Index Frequently Queried Properties
CREATE INDEX customer_nationality FOR (c:Customer) ON (c.nationality) CREATE INDEX transaction_date FOR (t:Transaction) ON (t.timestamp)
Data Loading Best Practices
-
Establish constraints first — unique constraints on business keys before loading
-
Use MERGE for nodes with unique identifiers (avoids duplicates)
-
Batch large datasets — process in chunks of 1,000–10,000
-
Pre-clean source data — deduplicate before loading
-
Transform foreign keys to relationships — don't store FKs as properties
// Batch loading pattern UNWIND $batch AS row MERGE (c:Customer {customerId: row.customerId}) ON CREATE SET c.firstName = row.firstName, c.lastName = row.lastName MERGE (e:Email {address: row.email}) MERGE (c)-[:HAS_EMAIL]->(e)
Standard Patterns
Linked Lists (Ordered Sequences)
// Chain events in order CREATE (e1:Event)-[:NEXT]->(e2:Event)-[:NEXT]->(e3:Event)
// Traverse in order MATCH (start:Event {id: $startId})-[:NEXT*]->(subsequent:Event) RETURN subsequent
Timeline Trees
// Year -> Month -> Day hierarchy CREATE (:Year {value: 2024})-[:HAS_MONTH]->(:Month {value: 3})-[:HAS_DAY]->(:Day {value: 15})
// Find all events on a specific day MATCH (:Year {value: 2024})-[:HAS_MONTH]->(:Month {value: 3})-[:HAS_DAY]->(d:Day {value: 15}) MATCH (d)<-[:ON_DAY]-(event) RETURN event
Transaction Base Model (Reference)
This is the Neo4j reference data model for banking transactions, fraud detection, and financial investigation. Use it as the canonical schema when building fraud or banking applications.
Graph Overview
Customer -[:HAS_EMAIL]-> Email Customer -[:HAS_PHONE]-> Phone Customer -[:HAS_ADDRESS]-> Address Customer -[:HAS_PASSPORT]-> Passport Customer -[:HAS_DRIVING_LICENSE]-> DrivingLicense Customer -[:HAS_FACE]-> Face Customer -[:HAS_NATIONALITY]-> Country Customer -[:HAS_ACCOUNT {role, since}]-> Account Account -[:PERFORMS]-> Transaction -[:BENEFITS_TO]-> Account Account -[:IS_HOSTED]-> Country Transaction -[:IMPLIED {totalMovements}]-> Movement Counterparty -[:HAS_ACCOUNT {since}]-> Account Counterparty -[:HAS_ADDRESS {since, isCurrent}]-> Address Address -[:LOCATED_IN]-> Country Device -[:USED_BY {lastUsed}]-> Customer Session -[:SESSION_USES_DEVICE]-> Device Session -[:USES_IP]-> IP IP -[:IS_ALLOCATED_TO {createdAt}]-> ISP IP -[:LOCATED_IN {createdAt}]-> Location Location -[:LOCATED_IN]-> Country Alert -[:TRIGGERED]-> Case Account -[:SUBJECT_OF]-> Case Customer -[:SUBJECT_OF]-> Case
Account labels: Account (required), plus Internal , External , HighRiskJurisdiction , Flagged , UnderInvestigation , Confirmed .
Node Labels and Key Properties
Label Key Properties Other Properties
Account accountNumber (String) accountType , openedDate , closedDate , suspendedDate
Customer customerId (String) firstName , middleName , lastName , dateOfBirth (Date), placeOfBirth , countryOfBirth
Transaction transactionId (String) amount (Float, always positive), currency (ISO 4217), date (DateTime), message , type
Movement movementId (String) amount (Float), currency , date (DateTime), description , status , sequenceNumber (Integer), authorisedBy , validatedBy
Counterparty counterpartyId (String) name , type (INDIVIDUAL/BUSINESS/GOVERNMENT/CHARITY), registrationNumber
Email address (String) domain
Phone phoneNumber (String) countryCode
Address addressLine1
- postTown
- postCode (composite) addressLine2 , region , latitude , longitude
Passport passportNumber (String) issueDate , expiryDate , issuingCountry , nationality
DrivingLicense licenseNumber
- issuingCountry (composite) issueDate , expiryDate
Face faceId (String) embedding (List<Float>, 512–1536 dims)
Device deviceId (String) deviceType , userAgent
Session sessionId (String) status
IP ipAddress (String) —
ISP name (String) —
Location city
- postCode
- country
latitude , longitude
Country code (ISO 3166-1 alpha-2) name
Alert alertId (String) ruleName , ruleId , severity (LOW/MEDIUM/HIGH/CRITICAL), triggeredAt
Case caseId (String) status , outcome , financialStakes (Float), investigatedBy , closedAt
All nodes with timestamps use createdAt (DateTime) for record creation.
Relationship Types
Relationship Direction Properties
:HAS_ACCOUNT
Customer→Account role , since
:HAS_ACCOUNT
Counterparty→Account since
:HAS_EMAIL
Customer→Email since
:HAS_PHONE
Customer→Phone since
:HAS_ADDRESS
Customer→Address addedAt , lastChangedAt , isCurrent
:HAS_ADDRESS
Counterparty→Address since , isCurrent
:HAS_PASSPORT
Customer→Passport verificationDate , verificationMethod , verificationStatus
:HAS_DRIVING_LICENSE
Customer→DrivingLicense verificationDate , verificationMethod , verificationStatus
:HAS_FACE
Customer→Face verificationDate , verificationMethod , verificationStatus
:HAS_NATIONALITY
Customer→Country —
:PERFORMS
Account→Transaction —
:BENEFITS_TO
Transaction→Account —
:IMPLIED
Transaction→Movement totalMovements
:IS_HOSTED
Account→Country —
:SESSION_USES_DEVICE
Session→Device —
:USES_IP
Session→IP —
:USED_BY
Device→Customer lastUsed
:IS_ALLOCATED_TO
IP→ISP createdAt
:LOCATED_IN
Address/IP/Location→Country/Location createdAt (on IP→Location)
:SUBJECT_OF
Account/Customer→Case —
:TRIGGERED
Alert→Case —
Constraints and Indexes
// Node key constraints (unique business identifiers) CREATE CONSTRAINT customer_id IF NOT EXISTS FOR (c:Customer) REQUIRE c.customerId IS NODE KEY; CREATE CONSTRAINT account_number IF NOT EXISTS FOR (a:Account) REQUIRE a.accountNumber IS NODE KEY; CREATE CONSTRAINT transaction_id IF NOT EXISTS FOR (t:Transaction) REQUIRE t.transactionId IS NODE KEY; CREATE CONSTRAINT movement_id IF NOT EXISTS FOR (m:Movement) REQUIRE m.movementId IS NODE KEY; CREATE CONSTRAINT email_address IF NOT EXISTS FOR (e:Email) REQUIRE e.address IS NODE KEY; CREATE CONSTRAINT phone_number IF NOT EXISTS FOR (p:Phone) REQUIRE p.number IS NODE KEY; CREATE CONSTRAINT passport_number IF NOT EXISTS FOR (p:Passport) REQUIRE (p.passportNumber, p.issuingCountry) IS NODE KEY; CREATE CONSTRAINT driving_licence_number IF NOT EXISTS FOR (d:DrivingLicense) REQUIRE (d.licenseNumber, d.issuingCountry) IS NODE KEY; CREATE CONSTRAINT device_id IF NOT EXISTS FOR (d:Device) REQUIRE d.deviceId IS NODE KEY; CREATE CONSTRAINT ip_address IF NOT EXISTS FOR (i:IP) REQUIRE i.ipAddress IS NODE KEY; CREATE CONSTRAINT session_id IF NOT EXISTS FOR (s:Session) REQUIRE s.sessionId IS NODE KEY; CREATE CONSTRAINT face_id IF NOT EXISTS FOR (f:Face) REQUIRE f.faceId IS NODE KEY; CREATE CONSTRAINT counterparty_id IF NOT EXISTS FOR (cp:Counterparty) REQUIRE cp.counterpartyId IS NODE KEY; CREATE CONSTRAINT isp_name IF NOT EXISTS FOR (i:ISP) REQUIRE i.name IS NODE KEY; CREATE CONSTRAINT country_code IF NOT EXISTS FOR (c:Country) REQUIRE c.code IS NODE KEY; CREATE CONSTRAINT address_composite IF NOT EXISTS FOR (a:Address) REQUIRE (a.addressLine1, a.postTown, a.postCode) IS NODE KEY; CREATE CONSTRAINT alert_id IF NOT EXISTS FOR (a:Alert) REQUIRE a.alertId IS NODE KEY; CREATE CONSTRAINT case_id IF NOT EXISTS FOR (c:Case) REQUIRE c.caseId IS NODE KEY;
// Performance indexes CREATE INDEX transaction_date_idx IF NOT EXISTS FOR (t:Transaction) ON (t.date); CREATE INDEX transaction_amount_idx IF NOT EXISTS FOR (t:Transaction) ON (t.amount);
// Vector index for facial recognition CALL db.index.vector.createNodeIndex( 'face_embedding_idx', 'Face', 'embedding', 1536, 'cosine' );
// Full-text index for customer name search CREATE FULLTEXT INDEX customer_name_idx IF NOT EXISTS FOR (c:Customer) ON EACH [c.firstName, c.lastName, c.middleName];
Key Design Decisions
-
PII as separate nodes (Email, Phone, Address, Passport, DrivingLicense, Face) — enables shared-identity detection via graph traversal
-
Transaction as a node (not a relationship) — allows attaching amount, currency, timestamp, and linking to Movements
-
Movement sub-transactions — Transaction :IMPLIED Movement captures multi-part payments (installments, fees)
-
Account multi-labels — Internal , External , HighRiskJurisdiction enable label-based filtering without property checks
-
Verification on relationships — :HAS_PASSPORT , :HAS_DRIVING_LICENSE , :HAS_FACE carry verificationDate/Method/Status so the same document can have different verification states per customer
-
Session → Device → Customer chain — connects digital activity to identity for device fingerprinting and session analysis
-
IP → ISP + Location — enriches network data for geographic anomaly detection
-
Alert → Case pipeline — separates automated detection (Alert) from human investigation (Case) with :TRIGGERED and :SUBJECT_OF
Fraud Investigation Pattern
// Flag an account and open a case MATCH (a:Account {accountNumber: $accNum}) SET a:Flagged
CREATE (alert:Alert { alertId: $alertId, ruleName: $ruleName, severity: 'HIGH', triggeredAt: datetime() }) CREATE (case:Case { caseId: $caseId, status: 'OPEN', createdAt: datetime() }) CREATE (alert)-[:TRIGGERED]->(case) CREATE (a)-[:SUBJECT_OF]->(case)
// Link customer to the same case MATCH (c:Customer)-[:HAS_ACCOUNT]->(a:Account)-[:SUBJECT_OF]->(case:Case {caseId: $caseId}) CREATE (c)-[:SUBJECT_OF]->(case)
Query Performance
-
Anchor on indexed properties — start MATCH from a constrained, indexed node
-
Use specific relationship types — [:PERFORMS] not [*]
-
PROFILE queries to verify index usage and eliminate CartesianProduct operators
-
Pre-aggregate statistics for frequently accessed counts/sums
-
Use label filtering — MATCH (a:Account:HighRiskJurisdiction) is faster than WHERE a.jurisdiction = 'high-risk'
Anti-Patterns
- Modeling Everything as Properties
// BAD: can't traverse to find shared attributes CREATE (:Customer {email: "a@b.com", phone: "555-0100"})
// GOOD: nodes enable graph queries CREATE (:Customer)-[:HAS_EMAIL]->(:Email {address: "a@b.com"})
- Generic Relationship Types
// BAD: loses semantic meaning (:Customer)-[:CONNECTED_TO]->(:Account)
// GOOD: specific and queryable (:Customer)-[:HAS_ACCOUNT]->(:Account)
- Symmetric Relationships
Don't create both directions — Cypher can traverse relationships regardless of direction.
- Missing Unique Constraints
Always create constraints on business keys before loading data. Without them, MERGE creates duplicates.
- Storing Foreign Keys as Properties
// BAD: relational thinking CREATE (:Order {customerId: "C001", productId: "P001"})
// GOOD: graph thinking CREATE (:Customer {customerId: "C001"})-[:PLACED]->(:Order)-[:CONTAINS]->(:Product {productId: "P001"})
- Unbounded Fanout Without Grouping
If a node has 100,000+ relationships of the same type, consider intermediate grouping nodes (e.g., group by time period or category).
Validation Checklist
-
Model addresses all business questions
-
Every node has a unique identifier
-
Relationship types are specific and meaningful
-
No symmetric relationship pairs
-
Unique constraints exist on business keys
-
Critical query paths are indexed
-
Model validated with representative data volume
-
Naming conventions are consistent (CapitalCase labels, UPPER_SNAKE_CASE rels, camelCase props)