Data Architecture¶
Validated against PRD v1.0
1. Database Strategy¶
Single PostgreSQL Instance — Schema per Bounded Context¶
flowchart TB
subgraph PG[PostgreSQL]
ONB[(onboarding<br/>customer, individual<br/>legal_entity, ownership<br/>document)]
SCR[(screening<br/>request, result<br/>adjudication)]
RR[(riskrating<br/>assessment<br/>factor)]
CM[(casemanagement<br/>case, note<br/>evidence, decision)]
CFG[(configuration<br/>config_version<br/>workflow_template<br/>threshold_config)]
NA[(networkanalysis<br/>entity_graph<br/>linked_entity)]
NT[(notification)]
AUD[(audit<br/>audit_event<br/>immutable, partitioned)]
AUTH[(auth<br/>user, role, session)]
end
Why schemas, not separate databases:
- Single connection pool — no distributed transactions.
- PostgreSQL schemas enforce logical separation without operational overhead.
- Cross-schema queries allowed for audit/reporting but NOT for domain logic.
- Easy extraction: when a module becomes a service, its schema becomes its database.
2. Schema Design Principles¶
2.1 Entity Identification¶
All entities use UUIDs as primary keys. Never auto-increment integers.
Why: UUIDs can be generated by the application (not the DB), survive data migrations, and enable future service extraction without key collisions.
CREATE TABLE onboarding.customer (
customer_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
customer_type VARCHAR(20) NOT NULL, -- INDIVIDUAL, LEGAL_ENTITY
...
);
2.2 Timestamps¶
Every table includes:
updated_at is set by application code (not DB trigger) to ensure consistency with Temporal workflow timestamps.
2.3 Immutability for Critical Data¶
The following tables are append-only (no UPDATE, no DELETE):
audit.audit_eventcasemanagement.decisionscreening.adjudicationriskrating.risk_assessmentconfiguration.config_version
If a risk rating is recalculated, a NEW row is inserted. The old row is preserved. This is NOT event sourcing for everything — it's append-only for audit-critical data only.
2.4 JSONB for Flexible Data¶
Configuration data and workflow context use JSONB:
configuration.workflow_template.definition— the workflow definition (states, transitions, conditions)configuration.threshold_config.rules— risk scoring rulesonboarding.ownership_relationship.attributes— jurisdiction-specific ownership fields
Why JSONB, not normalized tables: Configurations are versioned blobs that change shape as rules evolve. JSONB allows the config engine to be schema-flexible without migrations. Index with GIN for query performance.
3. Key Table Designs¶
3.1 Audit Event (Immutable, Append-Only)¶
CREATE TABLE audit.audit_event (
event_id BIGSERIAL PRIMARY KEY, -- sequential, NOT UUID (for ordering)
correlation_id UUID NOT NULL, -- links events across a single request
event_type VARCHAR(50) NOT NULL, -- STATE_TRANSITION, MODULE_INVOCATION, DECISION, CONFIG_CHANGE, DATA_ACCESS, AUTH
actor_type VARCHAR(20) NOT NULL, -- USER, SYSTEM
actor_id UUID,
context_type VARCHAR(50) NOT NULL, -- CUSTOMER, CASE, WORKFLOW
context_id UUID NOT NULL,
action VARCHAR(200) NOT NULL, -- e.g., "State transition: ANALYST_REVIEW → EDD_REVIEW"
payload JSONB NOT NULL, -- full event data
config_version VARCHAR(50), -- config version active at time of event
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Indexes for audit queries
CREATE INDEX idx_audit_context ON audit.audit_event (context_type, context_id, created_at);
CREATE INDEX idx_audit_correlation ON audit.audit_event (correlation_id);
CREATE INDEX idx_audit_type ON audit.audit_event (event_type, created_at);
CREATE INDEX idx_audit_payload ON audit.audit_event USING GIN (payload);
Retention: Partition by created_at (monthly). After 7 years, detach partition and archive to cold storage (NFR-C01).
No UPDATE, no DELETE ever. The event_id is a BIGSERIAL for sequential ordering within a partition. UUIDs are used for cross-system correlation but sequential IDs are needed for "show me what happened next" queries.
3.2 Customer (Core Entity)¶
CREATE TABLE onboarding.customer (
customer_id UUID PRIMARY KEY,
customer_type VARCHAR(20) NOT NULL, -- INDIVIDUAL, LEGAL_ENTITY
status VARCHAR(30) NOT NULL, -- ACTIVE, SUSPENDED, CLOSED, PROHIBITED
risk_band VARCHAR(10), -- LOW, MEDIUM, HIGH (denormalized from riskrating)
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE TABLE onboarding.individual (
individual_id UUID PRIMARY KEY REFERENCES onboarding.customer(customer_id),
first_name VARCHAR(100) NOT NULL,
last_name VARCHAR(100) NOT NULL,
aliases JSONB DEFAULT '[]',
date_of_birth DATE,
nationality VARCHAR(3), -- ISO 3166 alpha-3
residence_country VARCHAR(3),
tax_residency VARCHAR(3),
pep_flag BOOLEAN DEFAULT FALSE,
pep_level VARCHAR(20), -- NATIONAL, INTERNATIONAL, CLOSE_ASSOCIATE
encrypted_fields BYTEA -- encrypted PII: DOB, ID numbers (AES-256 at app layer)
);
CREATE TABLE onboarding.legal_entity (
entity_id UUID PRIMARY KEY REFERENCES onboarding.customer(customer_id),
legal_name VARCHAR(200) NOT NULL,
trade_name VARCHAR(200),
registration_number VARCHAR(100) NOT NULL,
incorporation_country VARCHAR(3),
legal_form VARCHAR(50),
industry_code VARCHAR(20),
incorporation_date DATE
);
3.3 Ownership (Recursive / Graph)¶
CREATE TABLE onboarding.ownership_relationship (
relationship_id UUID PRIMARY KEY,
parent_entity_id UUID NOT NULL REFERENCES onboarding.customer(customer_id),
child_entity_id UUID NOT NULL REFERENCES onboarding.customer(customer_id),
ownership_pct NUMERIC(5,2) NOT NULL CHECK (ownership_pct > 0 AND ownership_pct <= 100),
control_type VARCHAR(20) NOT NULL, -- DIRECT, INDIRECT, BENEFICIAL
effective_from DATE,
effective_to DATE,
confidence_score DOUBLE PRECISION,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX idx_ownership_parent ON onboarding.ownership_relationship (parent_entity_id);
CREATE INDEX idx_ownership_child ON onboarding.ownership_relationship (child_entity_id);
UBO Traversal (Recursive CTE):
WITH RECURSIVE ownership_chain AS (
-- Base: direct owners of the target customer
SELECT parent_entity_id, child_entity_id, ownership_pct, 1 AS depth
FROM onboarding.ownership_relationship
WHERE child_entity_id = :target_customer_id
AND effective_to IS NULL
UNION ALL
-- Recursive: owners of owners
SELECT o.parent_entity_id, o.child_entity_id, o.ownership_pct, oc.depth + 1
FROM onboarding.ownership_relationship o
JOIN ownership_chain oc ON o.child_entity_id = oc.parent_entity_id
WHERE o.effective_to IS NULL
AND oc.depth < 10 -- max 10 levels
)
SELECT parent_entity_id, SUM(ownership_pct) as total_ownership
FROM ownership_chain
GROUP BY parent_entity_id
HAVING SUM(ownership_pct) >= 25 -- UBO threshold
ORDER BY total_ownership DESC;
This handles basic relationship traversal. When advanced graph analytics are needed (v2+), add Neo4j with a sync from PostgreSQL ownership relationships to Neo4j nodes/edges.
3.4 Configuration (Versioned, Immutable)¶
CREATE TABLE configuration.config_version (
version_id UUID PRIMARY KEY,
version_number VARCHAR(20) NOT NULL,
status VARCHAR(20) NOT NULL, -- DRAFT, TEST, ACTIVE, SUPERSEDED
created_by UUID NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
activated_at TIMESTAMPTZ
);
CREATE TABLE configuration.workflow_template (
template_id UUID PRIMARY KEY,
version_id UUID NOT NULL REFERENCES configuration.config_version(version_id),
template_name VARCHAR(100) NOT NULL,
definition JSONB NOT NULL, -- full workflow definition
UNIQUE (version_id, template_name)
);
CREATE TABLE configuration.threshold_config (
config_id UUID PRIMARY KEY,
version_id UUID NOT NULL REFERENCES configuration.config_version(version_id),
domain VARCHAR(50) NOT NULL, -- RISK_RATING, SCREENING
rules JSONB NOT NULL, -- threshold rules
UNIQUE (version_id, domain)
);
Querying active config: SELECT * FROM configuration.config_version WHERE status = 'ACTIVE' — at most one row.
3.5 Case Management¶
CREATE TABLE casemanagement.case (
case_id UUID PRIMARY KEY,
case_type VARCHAR(50) NOT NULL,
state VARCHAR(30) NOT NULL,
priority VARCHAR(10) NOT NULL DEFAULT 'MEDIUM',
customer_id UUID NOT NULL,
workflow_instance_id UUID,
assigned_to UUID,
escalation_level INT NOT NULL DEFAULT 0,
sla_deadline TIMESTAMPTZ,
resolved_at TIMESTAMPTZ,
resolution VARCHAR(30),
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Append-only notes
CREATE TABLE casemanagement.case_note (
note_id UUID PRIMARY KEY,
case_id UUID NOT NULL REFERENCES casemanagement.case(case_id),
author_id UUID NOT NULL,
content TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
-- No updated_at — notes are immutable
);
-- Decisions are immutable
CREATE TABLE casemanagement.decision (
decision_id UUID PRIMARY KEY,
case_id UUID NOT NULL REFERENCES casemanagement.case(case_id),
decision_type VARCHAR(30) NOT NULL,
actor_type VARCHAR(10) NOT NULL,
actor_id UUID,
rationale TEXT NOT NULL,
evidence_refs UUID[] DEFAULT '{}',
config_version VARCHAR(50),
override_justification TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
-- Immutable — no updated_at
);
4. Data Ownership¶
| Schema | Owned By | Accessed By (Read) | Accessed By (Write) |
|---|---|---|---|
onboarding |
Onboarding Context | Screening, Risk Rating, Case Mgmt (via API, not SQL) | Onboarding Context only |
screening |
Screening Context | Audit | Screening Context only |
riskrating |
Risk Rating Context | Audit | Risk Rating Context only |
casemanagement |
Case Management Context | Audit, Notifications | Case Management Context only |
configuration |
Configuration Engine | ALL modules (read config version) | Configuration Engine only |
networkanalysis |
Network Analysis Context | Audit | Network Analysis Context only |
notification |
Notification Context | None | Notification Context only |
audit |
Audit Context | ALL (read for replay) | ALL contexts (append events) |
auth |
Identity Context | ALL (check RBAC) | Identity Context only |
Rule: No bounded context reads another context's data directly via SQL. Cross-context data access goes through API calls (in monolith: through interface contracts).
Exception: Audit schema is read by all for audit replay. Configuration schema is read by all for config resolution. Both are through shared read-only interfaces, not direct SQL.
5. State Persistence vs. Event Sourcing¶
Current State (for most entities)¶
- Customer, Case, Document, Ownership — standard CRUD. The current state is the source of truth.
updated_attracks when the state last changed.
Append-Only (for audit-critical data)¶
- AuditEvent, Decision, Adjudication, RiskAssessment, ConfigVersion — immutable. New versions create new rows.
- This is NOT full event sourcing. It's append-only state. The "event" is "a new state was recorded."
Event Sourcing (Post-MVP Consideration)¶
Full event sourcing (where the event log IS the source of truth and state is a projection) is viable for the Audit context but unnecessary complexity for MVP. The audit table already captures every event. If event sourcing is needed later, replay from audit.audit_event.
6. Document / BLOB Storage¶
MVP: PostgreSQL BYTEA + Metadata Table¶
CREATE TABLE onboarding.document (
document_id UUID PRIMARY KEY,
customer_id UUID NOT NULL REFERENCES onboarding.customer(customer_id),
document_type VARCHAR(50) NOT NULL,
file_name VARCHAR(255) NOT NULL,
content_type VARCHAR(100) NOT NULL,
content BYTEA NOT NULL, -- document bytes
content_hash VARCHAR(64) NOT NULL, -- SHA-256 for integrity
issue_date DATE,
expiry_date DATE,
validation_status VARCHAR(20) NOT NULL DEFAULT 'PENDING',
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX idx_document_customer ON onboarding.document (customer_id);
CREATE INDEX idx_document_status ON onboarding.document (validation_status);
Size limit: 10MB per document (enforced in application). For larger documents or scale, migrate to object storage (S3/MinIO) — change storage layer without changing domain logic (the storage_reference field replaces content BYTEA).
7. Migration Strategy¶
Tool: Flyway¶
-- V1__create_onboarding_schema.sql
CREATE SCHEMA IF NOT EXISTS onboarding;
-- V2__create_customer_table.sql
CREATE TABLE onboarding.customer (...);
-- V3__create_audit_schema.sql
CREATE SCHEMA IF NOT EXISTS audit;
-- ...
Rules:
- Migrations are sequential and immutable (never modify a committed migration).
- Each bounded context has its own migration directory (under
src/main/resources/db/migration/<context>/). - CI pipeline runs migrations against a test database before deployment.
8. Indexing Strategy¶
| Table | Index | Purpose |
|---|---|---|
audit.audit_event |
(context_type, context_id, created_at) |
Audit replay by case/customer |
audit.audit_event |
(correlation_id) |
Trace a single request |
audit.audit_event |
(event_type, created_at) |
Filter by event type |
casemanagement.case |
(assigned_to, state) |
Analyst queue |
casemanagement.case |
(state, sla_deadline) |
SLA monitoring |
casemanagement.case |
(customer_id) |
Customer case history |
onboarding.customer |
(status) |
Active customer list |
onboarding.ownership_relationship |
(child_entity_id) |
UBO traversal |
onboarding.document |
(customer_id, validation_status) |
Document status |
screening.screening_result |
(match_status, created_at) |
Pending adjudication queue |
configuration.config_version |
(status) |
Active config lookup |
configuration.workflow_template |
(version_id, template_name) |
Template resolution |
9. Encryption¶
At Rest¶
PostgreSQL TDE (Transparent Data Encryption) or filesystem-level encryption. Managed by deployment environment.
At Application Level (for PII)¶
Sensitive fields encrypted before storage:
@Entity
class Individual {
@Column(columnDefinition = "BYTEA")
var encryptedFields: ByteArray? = null // AES-256-GCM encrypted JSON of {dob, idNumbers, ...}
}
What gets encrypted: date of birth, identification numbers, residence address details.
What does not: names (needed for screening), nationality (needed for risk rating), PEP flags.
Key management: Encryption keys in environment variables or vault (HashiCorp Vault post-MVP).
Data architecture designed for PRD v1.0 MVP (PostgreSQL, schema-per-context). Re-evaluate if graph or search requirements exceed PostgreSQL capabilities.