Risk Register & Mitigation Strategies¶

Validated against PRD v1.0

Risk Matrix¶

Risk	Likelihood	Impact	Score	Owner
R1: Temporal learning curve slows Phase 2	Medium (4)	High (4)	16	Engineering Lead
R2: Modular monolith becomes tightly coupled	Medium (4)	High (5)	20	Tech Lead / CI
R3: External integrations unstable	Medium (3)	Medium (3)	9	Integration Lead
R4: Configuration engine complexity	Medium (4)	High (4)	16	Config Module Owner
R5: Corporate onboarding scope creep	High (5)	High (4)	20	Product Owner
R6: Team lacks FEC domain expertise	Medium (3)	Medium (3)	9	Domain Expert / PO
R7: PostgreSQL recursive queries slow at scale	Low (2)	Medium (3)	6	Data Architect
R8: Performance targets not met	Low (2)	High (4)	8	Performance Lead
R9: Regulatory requirements change during build	Low (2)	High (5)	10	Compliance Advisor
R10: Key person dependency (Temporal/Kotlin expertise)	Medium (3)	High (4)	12	Engineering Manager
R11: Audit log volume exceeds capacity	Low (2)	Medium (3)	6	Data Architect
R12: Security vulnerability discovered	Low (2)	Critical (5)	10	Security Lead

Scale: Likelihood 1-5 (Rare→Almost Certain), Impact 1-5 (Negligible→Critical). Score = Likelihood × Impact.

Detailed Risks¶

R1: Temporal Learning Curve¶

Description: Temporal introduces concepts (workflows, activities, signals, versioning) that differ from traditional request-response programming. Team may need 2-3 weeks to become productive.

Impact: Phase 2 (Workflow Engine) could slip from 4 weeks to 6-7 weeks.

Mitigation: - Phase 0 includes a "Hello World" Temporal PoC — team gets hands-on experience before Phase 2. - Temporal documentation and examples are excellent. Recommended training path documented. - Start with simple linear workflow, add complexity incrementally (branching → parallel → signals) rather than building the full Corporate NL template at once.

Fallback: If Temporal proves too complex in Phase 0, evaluate Camunda 8 as an alternative. Camunda's BPMN model may be more intuitive for business-process-oriented developers.

R2: Modular Monolith Coupling¶

Description: The modular monolith can degrade into a "ball of mud" if module boundaries are not enforced. One module importing from another's internal package is a one-line code change but breaks architectural integrity.

Impact: Extraction to services becomes a rewrite, not mechanical. Development velocity slows as changes ripple across modules.

Mitigation: - ArchUnit tests in CI from Phase 0, Day 1. Any cross-module internal import fails the build. - Code review checklist includes: "Does this PR cross bounded context boundaries?" - Shared kernel (shared/) kept minimal. Reviewed quarterly for bloat. - Module interface contracts enforced at the type level (interface in shared.contract, implementation in domain module).

Leading indicator: ArchUnit violation count. If > 0, stop and fix immediately.

R3: External Integrations Unstable¶

Description: Sanctions list providers, PEP databases, and corporate registries may be unreliable, slow, or change their API without notice.

Impact: Screening and identity validation become unreliable. Workflows stall.

Mitigation: - Adapter + fallback pattern: every external integration has a mock provider for development and a fallback wrapper for production. - Graceful degradation (NFR-R02): if external service unavailable, mark task BLOCKED, notify operator, retry when available. - Provider selection: evaluate multiple commercial sanctions/PEP providers before committing. Negotiate SLA with chosen provider. - List data cached with TTL (24h default, configurable per provider). Screening runs against cached data if provider is temporarily unavailable.

R4: Configuration Engine Complexity¶

Description: The config engine is the critical path dependency (PRD §6.12). It must be flexible enough for all domains but simple enough to build in Phase 1 (weeks 3-4, same as initial project standup).

Impact: Delays to config engine delay everything downstream.

Mitigation: - MVP config scope: Workflow templates + thresholds + document rules. NOT approval matrices, routing rules, or custom fields — those are Phase 1.5. - JSONB for config storage — no schema migrations for rule changes. - Start with a single config version (ACTIVE only). Add DRAFT/TEST/SUPERSEDED promotion pipeline in Phase 1.5 after the core engine works. - Config Admin UI: start with JSON editor (quick). Add form-based UI in Phase 1.5.

R5: Corporate Onboarding Scope Creep¶

Description: The corporate onboarding use case is the proving case but also the most complex (multi-level ownership, PEP exposure, sanctions near-hits, EDD branching). Attempting to build the full complexity in Phase 2 risks never shipping.

Impact: Phase 2 balloons from 4 weeks to 8+ weeks.

Mitigation: - Start with Retail Individual flow in Phase 2 (simplest path: no ownership, no EDD). This proves the workflow engine works. - Add Corporate flow incrementally: basic ownership → UBO identification → EDD branching → multi-jurisdiction. - Time-box EDD features: deep due diligence is v2. MVP EDD = analyst writes report in case notes, reviewer approves. - Product Owner gates scope. Any new requirement must displace something of equal size.

R6: FEC Domain Knowledge Gap¶

Description: Engineers may not understand KYC/CDD concepts (UBO, PEP, EDD, sanctions adjudication). Misunderstanding leads to incorrect implementations.

Impact: Compliance features implemented incorrectly. Regulatory findings.

Mitigation: - Client documents provide domain context. All engineers read the Business Concept document and Onboarding Specification before Phase 2. - Domain glossary maintained in docs. All entities and processes defined with real-world meaning. - Compliance reviewer (client or external) validates domain correctness at each phase exit. - Domain events and state machines named in business language (not technical jargon).

R7: PostgreSQL Slow at Scale¶

Description: Recursive CTEs for ownership traversal may become slow with deep chains (> 10 levels) or large entity graphs.

Impact: UBO identification > 2 seconds. Analysts wait for graph visualization.

Mitigation: - Performance threshold defined: UBO traversal must complete in < 1 second. Test with 10-level ownership chains in Phase 3c. - Index on ownership_relationship(child_entity_id) — already in data architecture. - Materialized path approach as Plan B: store ancestor_chain as TEXT[] for O(1) ancestor lookups. - Neo4j as Plan C: when graph queries exceed PostgreSQL capabilities, migrate to Neo4j with sync from PostgreSQL.

Leading indicator: P95 latency of GET /api/v1/network-analysis/graph. Alert if > 1s.

R8: Performance Targets Not Met¶

Description: NFRs define specific targets (2s UI, 5s screening, 100 concurrent workflows). These may not be achievable in Phase 2 with all modules running.

Impact: Poor analyst UX. Slow screening blocks onboarding.

Mitigation: - Performance testing in Phase 6 (not earlier — premature optimization). - Screening: batch subjects (customer + UBOs) into a single provider call. Cache watchlist data locally. - Risk Rating: pre-compute on data change, not on every access. - UI: paginate case list. Load workspace panels on-demand (lazy load network graph, load screening results when tab clicked). - If performance targets missed: profile, identify bottleneck, address. Do not lower targets.

R9: Regulatory Requirements Change¶

Description: AML/CTF regulations, sanctions lists, or PEP definitions may change during the 20-week build.

Impact: Already-built features need modification. Audit model may need new fields.

Mitigation: - Configuration-driven rules (not hardcoded). Regulatory changes that affect thresholds or lists are config changes, not code changes. - Audit event uses JSONB payload — new fields can be added without schema migration. - Jurisdiction-specific logic abstracted behind JurisdictionRules interface. Swap implementations per country. - Design Phase 0-2 with EU/Netherlands rules as starting point. Generalize in Phase 3+.

R10: Key Person Dependency¶

Description: Kotlin/Temporal expertise may be concentrated in one or two engineers. If they leave or are unavailable, velocity drops.

Impact: Phases dependent on that expertise stall.

Mitigation: - Pair programming for all Temporal workflows (no solo Temporal work in Phase 2). - Kotlin: hire for Java experience. Kotlin is a gentle step from Java — 2-week ramp-up. - All design decisions documented (architecture docs serve as onboarding material). - CI enforces code standards — no "expert-only" code paths.

R11: Audit Event Volume¶

Description: 1,000 audit events/second target (NFR-P04). A single onboarding workflow with parallel tasks can generate 50+ events in seconds. If 100 workflows run concurrently, that's 5,000+ events in a burst.

Impact: PostgreSQL write bottleneck. Audit write can become the slowest component.

Mitigation: - Audit events are append-only (no UPDATE, no DELETE) — writes are fast. - Batch writes: accumulate events in memory (100ms buffer), flush in batches of 50-100. - Partition by month from day one. Query against recent partitions is fast; old partitions are rarely accessed. - If PostgreSQL becomes bottleneck: offload audit writes to Kafka, with async consumer writing to PostgreSQL.

Leading indicator: P99 write latency of POST /api/v1/audit/events. Alert if > 50ms.

R12: Security Vulnerability¶

Description: A vulnerability in Spring Boot, React, PostgreSQL, or Temporal could be discovered during the build. Zero-day particularly dangerous.

Impact: Platform may need emergency patching. Compliance risk if vulnerability exploited before patch.

Mitigation: - Dependency scanning in CI (OWASP Dependency Check, Snyk). Alert on critical/high CVEs. - Regular upgrades: Spring Boot and Temporal minor versions updated within 1 week of release. - Penetration test in Phase 6 before go-live. - Rate limiting, correlation IDs, audit logging from day one — detect suspicious activity early.

Risk Monitoring¶

Metric	Frequency	Owner
ArchUnit violations	Every build (CI)	Tech Lead
P95 API latency	Weekly (Phase 3+)	Performance Lead
Audit event write latency	Weekly (Phase 5+)	Data Architect
Dependency CVEs	Weekly (CI)	Security Lead
Phase milestone slippage	Daily standup	Engineering Manager
Scope change count	Sprint review	Product Owner

Risk register validated against PRD v1.0 scope and all domain specs. Re-evaluate at each phase exit.