Data Architecture¶

Validated against PRD v1.0

1. Database Strategy¶

Single PostgreSQL Instance — Schema per Bounded Context¶

flowchart TB
    subgraph PG[PostgreSQL]
        ONB[(onboarding<br/>customer, individual<br/>legal_entity, ownership<br/>document)]
        SCR[(screening<br/>request, result<br/>adjudication)]
        RR[(riskrating<br/>assessment<br/>factor)]
        CM[(casemanagement<br/>case, note<br/>evidence, decision)]
        CFG[(configuration<br/>config_version<br/>workflow_template<br/>threshold_config)]
        NA[(networkanalysis<br/>entity_graph<br/>linked_entity)]
        NT[(notification)]
        AUD[(audit<br/>audit_event<br/>immutable, partitioned)]
        AUTH[(auth<br/>user, role, session)]
    end

Why schemas, not separate databases:

Single connection pool — no distributed transactions.
PostgreSQL schemas enforce logical separation without operational overhead.
Cross-schema queries allowed for audit/reporting but NOT for domain logic.
Easy extraction: when a module becomes a service, its schema becomes its database.

2. Schema Design Principles¶

2.1 Entity Identification¶

All entities use UUIDs as primary keys. Never auto-increment integers.

Why: UUIDs can be generated by the application (not the DB), survive data migrations, and enable future service extraction without key collisions.

CREATE TABLE onboarding.customer (
    customer_id  UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    customer_type VARCHAR(20) NOT NULL,  -- INDIVIDUAL, LEGAL_ENTITY
    ...
);

2.2 Timestamps¶

Every table includes:

created_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at  TIMESTAMPTZ NOT NULL DEFAULT now()

updated_at is set by application code (not DB trigger) to ensure consistency with Temporal workflow timestamps.

2.3 Immutability for Critical Data¶

The following tables are append-only (no UPDATE, no DELETE):

audit.audit_event
casemanagement.decision
screening.adjudication
riskrating.risk_assessment
configuration.config_version

If a risk rating is recalculated, a NEW row is inserted. The old row is preserved. This is NOT event sourcing for everything — it's append-only for audit-critical data only.

2.4 JSONB for Flexible Data¶

Configuration data and workflow context use JSONB:

configuration.workflow_template.definition — the workflow definition (states, transitions, conditions)
configuration.threshold_config.rules — risk scoring rules
onboarding.ownership_relationship.attributes — jurisdiction-specific ownership fields

Why JSONB, not normalized tables: Configurations are versioned blobs that change shape as rules evolve. JSONB allows the config engine to be schema-flexible without migrations. Index with GIN for query performance.

3. Key Table Designs¶

3.1 Audit Event (Immutable, Append-Only)¶

CREATE TABLE audit.audit_event (
    event_id        BIGSERIAL PRIMARY KEY,       -- sequential, NOT UUID (for ordering)
    correlation_id  UUID NOT NULL,               -- links events across a single request
    event_type      VARCHAR(50) NOT NULL,         -- STATE_TRANSITION, MODULE_INVOCATION, DECISION, CONFIG_CHANGE, DATA_ACCESS, AUTH
    actor_type      VARCHAR(20) NOT NULL,         -- USER, SYSTEM
    actor_id        UUID,
    context_type    VARCHAR(50) NOT NULL,         -- CUSTOMER, CASE, WORKFLOW
    context_id      UUID NOT NULL,
    action          VARCHAR(200) NOT NULL,        -- e.g., "State transition: ANALYST_REVIEW → EDD_REVIEW"
    payload         JSONB NOT NULL,               -- full event data
    config_version  VARCHAR(50),                  -- config version active at time of event
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Indexes for audit queries
CREATE INDEX idx_audit_context ON audit.audit_event (context_type, context_id, created_at);
CREATE INDEX idx_audit_correlation ON audit.audit_event (correlation_id);
CREATE INDEX idx_audit_type ON audit.audit_event (event_type, created_at);
CREATE INDEX idx_audit_payload ON audit.audit_event USING GIN (payload);

Retention: Partition by created_at (monthly). After 7 years, detach partition and archive to cold storage (NFR-C01).

No UPDATE, no DELETE ever. The event_id is a BIGSERIAL for sequential ordering within a partition. UUIDs are used for cross-system correlation but sequential IDs are needed for "show me what happened next" queries.

3.2 Customer (Core Entity)¶

CREATE TABLE onboarding.customer (
    customer_id     UUID PRIMARY KEY,
    customer_type   VARCHAR(20) NOT NULL,       -- INDIVIDUAL, LEGAL_ENTITY
    status          VARCHAR(30) NOT NULL,        -- ACTIVE, SUSPENDED, CLOSED, PROHIBITED
    risk_band       VARCHAR(10),                 -- LOW, MEDIUM, HIGH (denormalized from riskrating)
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE TABLE onboarding.individual (
    individual_id       UUID PRIMARY KEY REFERENCES onboarding.customer(customer_id),
    first_name          VARCHAR(100) NOT NULL,
    last_name           VARCHAR(100) NOT NULL,
    aliases             JSONB DEFAULT '[]',
    date_of_birth       DATE,
    nationality         VARCHAR(3),              -- ISO 3166 alpha-3
    residence_country   VARCHAR(3),
    tax_residency       VARCHAR(3),
    pep_flag            BOOLEAN DEFAULT FALSE,
    pep_level           VARCHAR(20),            -- NATIONAL, INTERNATIONAL, CLOSE_ASSOCIATE
    encrypted_fields    BYTEA                    -- encrypted PII: DOB, ID numbers (AES-256 at app layer)
);

CREATE TABLE onboarding.legal_entity (
    entity_id               UUID PRIMARY KEY REFERENCES onboarding.customer(customer_id),
    legal_name              VARCHAR(200) NOT NULL,
    trade_name              VARCHAR(200),
    registration_number     VARCHAR(100) NOT NULL,
    incorporation_country   VARCHAR(3),
    legal_form              VARCHAR(50),
    industry_code           VARCHAR(20),
    incorporation_date      DATE
);

3.3 Ownership (Recursive / Graph)¶

CREATE TABLE onboarding.ownership_relationship (
    relationship_id     UUID PRIMARY KEY,
    parent_entity_id    UUID NOT NULL REFERENCES onboarding.customer(customer_id),
    child_entity_id     UUID NOT NULL REFERENCES onboarding.customer(customer_id),
    ownership_pct       NUMERIC(5,2) NOT NULL CHECK (ownership_pct > 0 AND ownership_pct <= 100),
    control_type        VARCHAR(20) NOT NULL,   -- DIRECT, INDIRECT, BENEFICIAL
    effective_from      DATE,
    effective_to        DATE,
    confidence_score    DOUBLE PRECISION,
    created_at          TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX idx_ownership_parent ON onboarding.ownership_relationship (parent_entity_id);
CREATE INDEX idx_ownership_child ON onboarding.ownership_relationship (child_entity_id);

UBO Traversal (Recursive CTE):

WITH RECURSIVE ownership_chain AS (
    -- Base: direct owners of the target customer
    SELECT parent_entity_id, child_entity_id, ownership_pct, 1 AS depth
    FROM onboarding.ownership_relationship
    WHERE child_entity_id = :target_customer_id
      AND effective_to IS NULL

    UNION ALL

    -- Recursive: owners of owners
    SELECT o.parent_entity_id, o.child_entity_id, o.ownership_pct, oc.depth + 1
    FROM onboarding.ownership_relationship o
    JOIN ownership_chain oc ON o.child_entity_id = oc.parent_entity_id
    WHERE o.effective_to IS NULL
      AND oc.depth < 10  -- max 10 levels
)
SELECT parent_entity_id, SUM(ownership_pct) as total_ownership
FROM ownership_chain
GROUP BY parent_entity_id
HAVING SUM(ownership_pct) >= 25  -- UBO threshold
ORDER BY total_ownership DESC;

This handles basic relationship traversal. When advanced graph analytics are needed (v2+), add Neo4j with a sync from PostgreSQL ownership relationships to Neo4j nodes/edges.

3.4 Configuration (Versioned, Immutable)¶

CREATE TABLE configuration.config_version (
    version_id      UUID PRIMARY KEY,
    version_number  VARCHAR(20) NOT NULL,
    status          VARCHAR(20) NOT NULL,        -- DRAFT, TEST, ACTIVE, SUPERSEDED
    created_by      UUID NOT NULL,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    activated_at    TIMESTAMPTZ
);

CREATE TABLE configuration.workflow_template (
    template_id     UUID PRIMARY KEY,
    version_id      UUID NOT NULL REFERENCES configuration.config_version(version_id),
    template_name   VARCHAR(100) NOT NULL,
    definition      JSONB NOT NULL,              -- full workflow definition
    UNIQUE (version_id, template_name)
);

CREATE TABLE configuration.threshold_config (
    config_id       UUID PRIMARY KEY,
    version_id      UUID NOT NULL REFERENCES configuration.config_version(version_id),
    domain          VARCHAR(50) NOT NULL,        -- RISK_RATING, SCREENING
    rules           JSONB NOT NULL,              -- threshold rules
    UNIQUE (version_id, domain)
);

Querying active config: SELECT * FROM configuration.config_version WHERE status = 'ACTIVE' — at most one row.

3.5 Case Management¶

CREATE TABLE casemanagement.case (
    case_id             UUID PRIMARY KEY,
    case_type           VARCHAR(50) NOT NULL,
    state               VARCHAR(30) NOT NULL,
    priority            VARCHAR(10) NOT NULL DEFAULT 'MEDIUM',
    customer_id         UUID NOT NULL,
    workflow_instance_id UUID,
    assigned_to         UUID,
    escalation_level    INT NOT NULL DEFAULT 0,
    sla_deadline        TIMESTAMPTZ,
    resolved_at         TIMESTAMPTZ,
    resolution          VARCHAR(30),
    created_at          TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at          TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Append-only notes
CREATE TABLE casemanagement.case_note (
    note_id     UUID PRIMARY KEY,
    case_id     UUID NOT NULL REFERENCES casemanagement.case(case_id),
    author_id   UUID NOT NULL,
    content     TEXT NOT NULL,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT now()
    -- No updated_at — notes are immutable
);

-- Decisions are immutable
CREATE TABLE casemanagement.decision (
    decision_id             UUID PRIMARY KEY,
    case_id                 UUID NOT NULL REFERENCES casemanagement.case(case_id),
    decision_type           VARCHAR(30) NOT NULL,
    actor_type              VARCHAR(10) NOT NULL,
    actor_id                UUID,
    rationale               TEXT NOT NULL,
    evidence_refs           UUID[] DEFAULT '{}',
    config_version          VARCHAR(50),
    override_justification  TEXT,
    created_at              TIMESTAMPTZ NOT NULL DEFAULT now()
    -- Immutable — no updated_at
);

4. Data Ownership¶

Schema	Owned By	Accessed By (Read)	Accessed By (Write)
`onboarding`	Onboarding Context	Screening, Risk Rating, Case Mgmt (via API, not SQL)	Onboarding Context only
`screening`	Screening Context	Audit	Screening Context only
`riskrating`	Risk Rating Context	Audit	Risk Rating Context only
`casemanagement`	Case Management Context	Audit, Notifications	Case Management Context only
`configuration`	Configuration Engine	ALL modules (read config version)	Configuration Engine only
`networkanalysis`	Network Analysis Context	Audit	Network Analysis Context only
`notification`	Notification Context	None	Notification Context only
`audit`	Audit Context	ALL (read for replay)	ALL contexts (append events)
`auth`	Identity Context	ALL (check RBAC)	Identity Context only

Rule: No bounded context reads another context's data directly via SQL. Cross-context data access goes through API calls (in monolith: through interface contracts).

Exception: Audit schema is read by all for audit replay. Configuration schema is read by all for config resolution. Both are through shared read-only interfaces, not direct SQL.

5. State Persistence vs. Event Sourcing¶

Current State (for most entities)¶

Customer, Case, Document, Ownership — standard CRUD. The current state is the source of truth.
updated_at tracks when the state last changed.

Append-Only (for audit-critical data)¶

AuditEvent, Decision, Adjudication, RiskAssessment, ConfigVersion — immutable. New versions create new rows.
This is NOT full event sourcing. It's append-only state. The "event" is "a new state was recorded."

Event Sourcing (Post-MVP Consideration)¶

Full event sourcing (where the event log IS the source of truth and state is a projection) is viable for the Audit context but unnecessary complexity for MVP. The audit table already captures every event. If event sourcing is needed later, replay from audit.audit_event.

6. Document / BLOB Storage¶

MVP: PostgreSQL BYTEA + Metadata Table¶

CREATE TABLE onboarding.document (
    document_id         UUID PRIMARY KEY,
    customer_id         UUID NOT NULL REFERENCES onboarding.customer(customer_id),
    document_type       VARCHAR(50) NOT NULL,
    file_name           VARCHAR(255) NOT NULL,
    content_type        VARCHAR(100) NOT NULL,
    content             BYTEA NOT NULL,           -- document bytes
    content_hash        VARCHAR(64) NOT NULL,     -- SHA-256 for integrity
    issue_date          DATE,
    expiry_date         DATE,
    validation_status   VARCHAR(20) NOT NULL DEFAULT 'PENDING',
    created_at          TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX idx_document_customer ON onboarding.document (customer_id);
CREATE INDEX idx_document_status ON onboarding.document (validation_status);

Size limit: 10MB per document (enforced in application). For larger documents or scale, migrate to object storage (S3/MinIO) — change storage layer without changing domain logic (the storage_reference field replaces content BYTEA).

7. Migration Strategy¶

Tool: Flyway¶

-- V1__create_onboarding_schema.sql
CREATE SCHEMA IF NOT EXISTS onboarding;

-- V2__create_customer_table.sql
CREATE TABLE onboarding.customer (...);

-- V3__create_audit_schema.sql
CREATE SCHEMA IF NOT EXISTS audit;

-- ...

Rules:

Migrations are sequential and immutable (never modify a committed migration).
Each bounded context has its own migration directory (under src/main/resources/db/migration/<context>/).
CI pipeline runs migrations against a test database before deployment.

8. Indexing Strategy¶

Table	Index	Purpose
`audit.audit_event`	`(context_type, context_id, created_at)`	Audit replay by case/customer
`audit.audit_event`	`(correlation_id)`	Trace a single request
`audit.audit_event`	`(event_type, created_at)`	Filter by event type
`casemanagement.case`	`(assigned_to, state)`	Analyst queue
`casemanagement.case`	`(state, sla_deadline)`	SLA monitoring
`casemanagement.case`	`(customer_id)`	Customer case history
`onboarding.customer`	`(status)`	Active customer list
`onboarding.ownership_relationship`	`(child_entity_id)`	UBO traversal
`onboarding.document`	`(customer_id, validation_status)`	Document status
`screening.screening_result`	`(match_status, created_at)`	Pending adjudication queue
`configuration.config_version`	`(status)`	Active config lookup
`configuration.workflow_template`	`(version_id, template_name)`	Template resolution

9. Encryption¶

At Rest¶

PostgreSQL TDE (Transparent Data Encryption) or filesystem-level encryption. Managed by deployment environment.

At Application Level (for PII)¶

Sensitive fields encrypted before storage:

@Entity
class Individual {
    @Column(columnDefinition = "BYTEA")
    var encryptedFields: ByteArray? = null  // AES-256-GCM encrypted JSON of {dob, idNumbers, ...}
}

What gets encrypted: date of birth, identification numbers, residence address details.
What does not: names (needed for screening), nationality (needed for risk rating), PEP flags.
Key management: Encryption keys in environment variables or vault (HashiCorp Vault post-MVP).

Data architecture designed for PRD v1.0 MVP (PostgreSQL, schema-per-context). Re-evaluate if graph or search requirements exceed PostgreSQL capabilities.