Unity Catalog Governance Skill
Overview
Unity Catalog provides centralized governance for data and AI assets across Databricks workspaces. This skill covers governance patterns, security models, access control, and compliance frameworks.
When to Use This Skill
Use this skill when you need to:
-
Design Unity Catalog security architecture
-
Implement RBAC/ABAC access controls
-
Configure row-level security and column masking
-
Set up data classification and tagging
-
Implement compliance requirements (GDPR, HIPAA, SOC2)
-
Audit permissions and access patterns
Core Concepts
Three-Level Namespace
catalog.schema.table production.customers.profiles
Securable Objects
-
Metastore: Top-level container
-
Catalog: Database container
-
Schema: Table container
-
Table/View: Data objects
-
Volume: File storage
-
Function: Callable routines
-
External Location: External storage
-
Storage Credential: Cloud credentials
Privilege Model
-- Catalog privileges USE CATALOG, CREATE SCHEMA, USE SCHEMA, CREATE TABLE, CREATE FUNCTION, CREATE VOLUME, ALL PRIVILEGES, OWNERSHIP
-- Schema privileges USE SCHEMA, CREATE TABLE, CREATE FUNCTION, CREATE VOLUME, SELECT, MODIFY, READ FILES, WRITE FILES
-- Table privileges SELECT, MODIFY, READ METADATA
-- Function privileges EXECUTE
-- Volume privileges READ FILES, WRITE FILES
Governance Patterns
Pattern 1: Environment Isolation
-- Separate catalogs per environment CREATE CATALOG IF NOT EXISTS dev; CREATE CATALOG IF NOT EXISTS staging; CREATE CATALOG IF NOT EXISTS production;
-- Grant appropriate access
GRANT USE CATALOG, CREATE SCHEMA ON CATALOG dev TO developers;
GRANT USE CATALOG, USE SCHEMA ON CATALOG staging TO testers;
GRANT USE CATALOG, USE SCHEMA, SELECT ON CATALOG production TO analysts;
Pattern 2: Data Domain Organization
-- Organize by business domain CREATE CATALOG business_domains; CREATE SCHEMA business_domains.customer_domain; CREATE SCHEMA business_domains.financial_domain; CREATE SCHEMA business_domains.product_domain;
-- Assign domain ownership
ALTER SCHEMA business_domains.customer_domain
SET OWNER TO customer_data_team;
-- Domain-specific access
GRANT SELECT ON SCHEMA business_domains.customer_domain TO customer_analytics;
GRANT MODIFY ON SCHEMA business_domains.customer_domain TO customer_engineering;
Pattern 3: Medallion with Security
-- Bronze: Raw data - limited access
CREATE SCHEMA production.bronze;
GRANT USE SCHEMA, SELECT, MODIFY ON SCHEMA production.bronze TO data_engineers;
-- Silver: Refined data - broader access
CREATE SCHEMA production.silver;
GRANT USE SCHEMA, SELECT ON SCHEMA production.silver TO data_analysts;
GRANT MODIFY ON SCHEMA production.silver TO data_engineers;
-- Gold: Business data - wide access
CREATE SCHEMA production.gold;
GRANT USE SCHEMA, SELECT ON SCHEMA production.gold TO business_users;
GRANT MODIFY ON SCHEMA production.gold TO analytics_engineers;
Pattern 4: Row-Level Security
-- Regional data access control CREATE FUNCTION governance.regional_filter(user_region STRING, data_region STRING) RETURNS BOOLEAN RETURN CASE WHEN IS_ACCOUNT_GROUP_MEMBER('global_access') THEN TRUE WHEN user_region = data_region THEN TRUE ELSE FALSE END;
-- Apply row filter ALTER TABLE production.customers.orders SET ROW FILTER governance.regional_filter(current_user_region(), region);
Pattern 5: Column Masking
-- Create masking function CREATE FUNCTION governance.mask_email(email STRING) RETURNS STRING RETURN CASE WHEN IS_ACCOUNT_GROUP_MEMBER('pii_admin') THEN email WHEN IS_ACCOUNT_GROUP_MEMBER('pii_viewer') THEN CONCAT(SUBSTRING(email, 1, 3), '***@', SPLIT(email, '@')[1]) ELSE 'REDACTED' END;
-- Apply to column ALTER TABLE production.customers.profiles ALTER COLUMN email SET MASK governance.mask_email;
Pattern 6: Data Classification
-- Create classification tags CREATE TAG IF NOT EXISTS governance.sensitivity VALUES ('PUBLIC', 'INTERNAL', 'CONFIDENTIAL', 'RESTRICTED');
CREATE TAG IF NOT EXISTS governance.data_domain VALUES ('CUSTOMER', 'FINANCIAL', 'EMPLOYEE', 'PRODUCT');
-- Apply classification ALTER TABLE production.customers.profiles SET TAGS ( 'governance.sensitivity' = 'RESTRICTED', 'governance.data_domain' = 'CUSTOMER' );
ALTER TABLE production.customers.profiles ALTER COLUMN ssn SET TAGS ('governance.sensitivity' = 'RESTRICTED');
Security Best Practices
- Least Privilege Access
-- ❌ BAD: Overly permissive
GRANT ALL PRIVILEGES ON CATALOG production TO all_users;
-- ✅ GOOD: Minimal required permissions
GRANT USE CATALOG ON CATALOG production TO analysts;
GRANT USE SCHEMA ON SCHEMA production.gold TO analysts;
GRANT SELECT ON SCHEMA production.gold TO analysts;
- Separation of Duties
Define distinct roles
ROLES = { "data_consumer": ["USE CATALOG", "USE SCHEMA", "SELECT"], "data_producer": ["USE CATALOG", "USE SCHEMA", "SELECT", "MODIFY"], "data_steward": ["USE CATALOG", "USE SCHEMA", "SELECT", "MODIFY", "ALTER"], "platform_admin": ["ALL PRIVILEGES"] # Minimal assignments }
- Service Principal Ownership
-- Use service principals for ownership, not individuals
ALTER SCHEMA production.customers
SET OWNER TO data-engineering-sp;
-- Transfer from individual to service principal
ALTER TABLE production.customers.profiles
SET OWNER TO customer-domain-sp;
- Audit and Monitor
-- Query audit logs SELECT user_identity.email, request_params.full_name_arg as object_accessed, action_name, event_time FROM system.access.audit WHERE event_date >= CURRENT_DATE - INTERVAL 7 DAYS AND action_name IN ('GET_TABLE', 'READ_TABLE') ORDER BY event_time DESC;
- Regular Access Reviews
def quarterly_access_review(catalog: str): """Review and recertify access quarterly.""" # Get all grants grants = spark.sql(f"SHOW GRANTS ON CATALOG {catalog}")
# Identify excessive access
excessive = grants.filter("privilege = 'ALL PRIVILEGES'")
# Find unused permissions
unused = find_unused_grants(grants)
# Generate review report
report = generate_review_report(excessive, unused)
send_to_data_stewards(report)
Compliance Frameworks
GDPR Compliance Pattern
class GDPRCompliance: """GDPR compliance implementation."""
def setup_data_inventory(self):
"""Article 30: Records of processing."""
# Tag all PII
self.classify_pii_data()
# Document processing purposes
self.document_purposes()
def implement_right_to_erasure(self):
"""Article 17: Right to erasure."""
# Create deletion workflow
self.create_deletion_api()
# Setup cascade delete
self.configure_lineage()
def enable_data_portability(self):
"""Article 20: Right to data portability."""
# Export user data
self.create_export_api()
HIPAA Compliance Pattern
class HIPAACompliance: """HIPAA compliance implementation."""
def implement_access_controls(self):
"""Technical Safeguards: Access Control."""
# Unique user identification
self.enforce_sso()
# Automatic logoff
self.configure_session_timeout()
# Audit controls
self.enable_comprehensive_logging()
def enable_encryption(self):
"""Technical Safeguards: Encryption."""
# Encryption at rest
self.enable_catalog_encryption()
# Encryption in transit
self.enforce_tls()
Templates
See /templates/ directory for:
-
governance-framework: Complete governance setup
-
rbac-configuration: Role-based access control
-
compliance-checklist: Regulatory compliance validation
-
audit-procedures: Regular audit workflows
Examples
See /examples/ directory for:
-
regional-access-control: Multi-region data isolation
-
pii-protection: PII masking and encryption
-
compliance-reporting: Automated compliance reports
Related Skills
-
data-classification : Data classification and tagging
-
compliance-automation : Automated compliance checks
-
access-management : RBAC/ABAC implementation
References
-
Unity Catalog Documentation
-
Unity Catalog Best Practices
-
Data Governance on Databricks