Technology Evaluation & Stack Decisions¶
Validated against PRD v1.0
Principle: Every technology choice is our own evaluation — not inherited from client documents. Trade-offs are explicit. No default picks.
Evaluation Criteria¶
Every option is scored against these criteria, weighted by importance to the FEC platform:
| Criterion | Weight | What It Means |
|---|---|---|
| Compliance Suitability | 10 | Works in regulated environments. Good audit story. Enterprise support available. |
| Maturity & Stability | 8 | Production-proven. Large community. Not bleeding-edge. |
| Modular Monolith Support | 8 | Supports bounded contexts without forcing microservices. |
| Type Safety | 7 | Catches errors at compile time. Important for a platform where bugs = compliance failures. |
| Hiring & Maintainability | 6 | Reasonable talent pool. Good documentation. |
| Performance | 5 | Fast enough for 100 concurrent workflows, 50 analysts, sub-second API responses. |
| Configuration-First Design | 5 | Supports building a config engine without framework lock-in. |
1. Backend Language & Framework¶
Options Evaluated¶
| Option | Compliance | Maturity | Modular Monolith | Type Safety | Hiring | Performance | Verdict |
|---|---|---|---|---|---|---|---|
| Kotlin + Spring Boot 3 | ✅ Strong | ✅ 20+ years | ✅ Good module system | ✅ Full | ✅ Large pool | ✅ Excellent | Recommended |
| Java 21 + Spring Boot 3 | ✅ Strong | ✅ 20+ years | ✅ Same | ✅ Full | ✅ Largest pool | ✅ Excellent | Viable alternative |
| TypeScript + NestJS | ⚠️ Newer | ⚠️ 6 years | ✅ DI modules | ⚠️ Partial | ✅ Large | ⚠️ Slower | Not for primary backend |
| Go | ⚠️ Limited enterprise | ⚠️ Smaller ecosystem | ✅ Packages | ⚠️ No generics (pre-1.22 weak) | ✅ Growing | ✅ Fast | Not regulatory-grade |
| Python + Django/FastAPI | ⚠️ Dynamic typing risk | ✅ Mature | ⚠️ DI weaker | ❌ None | ✅ Large | ❌ Slower | For analytics modules only |
| Rust | ❌ Overkill | ⚠️ Small pool | ✅ Crates | ✅ Strong | ❌ Tiny pool | ✅ Fastest | Wrong tool for business logic |
Recommendation: Kotlin + Spring Boot 3¶
Why Kotlin over Java:
- Null safety at the type level — eliminates the most common runtime error class. In a compliance platform, a NullPointerException in a workflow is not acceptable.
- Data classes eliminate boilerplate for domain entities (Customer, Case, Decision, etc.) — we have 10+ entity types.
- Coroutines provide structured concurrency for parallel workflow execution (screening + risk rating + document validation running simultaneously).
- Interoperable with all Java libraries and tooling — no ecosystem penalty.
- Spring Boot 3 modules map directly to bounded contexts.
Why NOT Python for primary backend: The client docs suggest Python for analytics. That's correct — Python for analytics modules (risk scoring formulas, screening algorithms). But for transactional workflow orchestration, state management, and audit trail integrity, dynamic typing is a liability. A type error in an audit log write path becomes a compliance finding.
Why NOT microservices language (Go, Node): We're building a modular monolith, not microservices. No need for a language optimized for small, independently-deployed services.
2. Frontend¶
Options Evaluated¶
| Option | Type Safety | Ecosystem | Workflow UI | Analyst UX | Verdict |
|---|---|---|---|---|---|
| TypeScript + React | ✅ Strong | ✅ Largest | ✅ Complex forms | ✅ Rich components | Recommended |
| TypeScript + Angular | ✅ Strong | ✅ Large | ✅ Forms | ⚠️ Heavier | Viable |
| TypeScript + Vue | ✅ Strong | ⚠️ Smaller | ⚠️ Fewer libraries | ✅ Good | Smaller ecosystem |
| TypeScript + Svelte | ✅ Strong | ❌ Small | ❌ No mature workflow libs | ✅ Fast | Too new |
Recommendation: TypeScript + React¶
Rationale:
- MUI (Material UI) or equivalent component library for analyst workspace with complex forms, tables, and dashboards.
- TypeScript catches data shape mismatches between API and UI at build time.
- React's component model maps well to the analyst workspace panels (customer profile, alert summary, screening results, ownership graph, decision panel).
- Large ecosystem for graph visualization (vis.js, cytoscape), timeline components, and form libraries.
Client doc alignment: Agree with client suggestion.
3. Database — Primary OLTP¶
Options Evaluated¶
| Option | ACID | JSON | Modular Schema | Maturity | Verdict |
|---|---|---|---|---|---|
| PostgreSQL | ✅ Full | ✅ JSONB | ✅ Schemas | ✅ 30+ years | Recommended |
| MySQL 8 | ✅ | ✅ | ⚠️ Schemas weaker | ✅ | Viable |
| CockroachDB | ✅ | ✅ | ✅ | ⚠️ Newer | Overkill for MVP |
| SQLite | ✅ | ⚠️ Limited | ❌ No schemas | ✅ | Not for server apps |
Recommendation: PostgreSQL¶
Rationale:
- ACID compliance is non-negotiable for audit trail integrity.
- JSONB support allows flexible config storage alongside relational customer/case data — no separate document store needed.
- Schema-per-bounded-context maps to logical data separation without separate databases.
- Mature replication, backup, and point-in-time recovery — essential for 7-year audit retention.
- Widest enterprise support. Every cloud provider offers managed PostgreSQL.
What about a graph database?
PostgreSQL can model ownership relationships recursively (WITH RECURSIVE for UBO traversal). Start with PostgreSQL for graph queries. Add Neo4j only if recursive queries become a bottleneck — which they won't for MVP (basic relationship traversal, not advanced graph intelligence).
4. Workflow Engine¶
This is the backbone of the entire platform. The wrong choice here is the hardest to fix later.
Options Evaluated¶
| Option | Configurable | Parallel | Human Tasks | SLA Timers | Retry | Versioning | Maturity | Verdict |
|---|---|---|---|---|---|---|---|---|
| Temporal | ✅ SDK DSL | ✅ Native | ✅ Signals | ✅ Timers | ✅ Built-in | ✅ Versioning | ✅ CNCF | Recommended |
| Camunda 8 | ✅ BPMN | ✅ | ✅ User tasks | ✅ Timer events | ✅ | ✅ | ✅ Mature | Viable for BPMN shops |
| Custom State Machine | ✅ Full control | ⚠️ Build it | ⚠️ Build it | ⚠️ Build it | ⚠️ Build it | ⚠️ Build it | ❌ Untested | Too much to build |
| AWS Step Functions | ✅ JSON DSL | ✅ | ⚠️ Via callbacks | ✅ | ✅ | ⚠️ Limited | ✅ | Vendor lock-in |
Recommendation: Temporal¶
Why Temporal over Camunda:
- Temporal models workflows as code (not BPMN XML). Our workflows are complex (parallel tracks, conditional EDD branching, SLA timers, retry policies). Code is more expressive and testable for this complexity.
- Temporal's workflow versioning handles running instances on old versions while new instances use the new version — exactly what the configuration engine requires (FR-CF-01).
- Signals enable human-in-the-loop without polling — an analyst action (approve/reject) sends a signal that resumes the waiting workflow.
- Built-in retry with configurable backoff (exponential, custom). Failed module invocation retries without custom code.
- Visibility: Temporal UI shows running workflows, their state, and execution history. Equivalent to an operational dashboard out of the box.
- CNCF graduated project — not a startup dependency risk.
What about the client's suggestion of Camunda?
Camunda is valid if business users need BPMN modeling. But for this platform, workflows are defined in the Configuration Engine admin UI (not BPMN) and executed as code. Temporal maps better to this model.
Anti-pattern avoided: Building a custom state machine. Every custom workflow engine eventually implements retry, timers, versioning, and visibility — poorly. Use Temporal.
5. Messaging / Event Bus¶
Options Evaluated¶
| Option | Decoupling | Audit Streaming | Latency | Operations | Verdict |
|---|---|---|---|---|---|
| Kafka | ✅ Pub/sub | ✅ Event sourcing | <10ms | ⚠️ Complex | Recommended for audit |
| RabbitMQ | ✅ Pub/sub | ⚠️ Not for long retention | <1ms | ✅ Simpler | For service-to-service |
| Embedded (Spring Events) | ✅ Simple | ❌ No persistence | N/A | ✅ Trivial | For internal bounded context events |
| Redis Pub/Sub | ✅ Simple | ❌ No persistence | <1ms | ✅ Simple | Not for audit |
Recommendation: Hybrid — Embedded for internal, Kafka for audit¶
Rationale:
- Embedded events (Spring Application Events): For intra-monolith communication between bounded contexts. An onboarding workflow starts → publishes event → screening module picks it up. Simple, no infrastructure, zero latency. This is the right pattern for a modular monolith.
- Kafka for audit streaming: Every audit event streams to Kafka. Immutable, append-only, long retention. Allows the audit log to be consumed by compliance dashboards, external monitoring, and regulatory export — without touching the transactional database.
- Postpone Kafka to late MVP or post-MVP: Start with audit events written to PostgreSQL (immutable table) and published as Spring events. Add Kafka when audit consumers are needed. The audit data model doesn't change — only the transport.
Client doc alignment: Kafka for audit is correct but postpone to when needed.
6. Identity & Authentication¶
Options Evaluated¶
| Option | RBAC | SSO | Open Source | Self-Hosted | Verdict |
|---|---|---|---|---|---|
| Keycloak | ✅ | ✅ | ✅ | ✅ | Recommended |
| Auth0 / Okta | ✅ | ✅ | ❌ SaaS only | ❌ | Viable, but vendor risk |
| Spring Security (built-in) | ✅ Custom | ⚠️ Build it | ✅ | ✅ | Viable for MVP |
Recommendation: Keycloak (deferred to late MVP)¶
Rationale:
- RBAC with predefined roles (RM, KYC Analyst, Sanctions Analyst, etc.) maps directly to Keycloak realms, clients, and roles.
- SSO support when enterprise integration is needed (P1).
- Open source, self-hosted — no vendor lock-in for a compliance-critical system.
- Start with Spring Security + database-backed users for MVP. Migrate to Keycloak when MFA or SSO is needed (NFR-S04, NFR-SC-02 MFA is P1).
7. Architecture Style¶
Recommendation: Modular Monolith → Extract Services Only When Proven¶
| Phase | Architecture | Why |
|---|---|---|
| MVP | Modular monolith (Spring Boot modules) | Prove the model. Bounded contexts as Maven/Gradle modules with clear interfaces. No network boundaries. |
| v2 | Extract audit + screening as services | If performance or deployment independence demands it. |
| v3+ | Extract remaining bounded contexts | Based on actual usage patterns, not assumptions. |
Rationale:
- Microservices solve organizational scaling problems, not technical ones. In MVP, we have one team.
- Distributed transactions are hard. Workflow orchestration across services is harder. Both are unnecessary for MVP.
- The modular monolith enforces bounded context boundaries at the module level (different packages, no internal imports allowed). If boundaries are clean, extraction is mechanical — change a method call to an HTTP call.
- This is the single most important architectural decision. Getting it wrong means distributed debugging for the MVP team.
Client doc alignment: Agree. The engineering spec explicitly says "Modular monolith for MVP. NOT microservices initially."
8. Containerization & Deployment¶
Recommendation: Docker + Docker Compose (MVP) → Kubernetes (v2)¶
| Phase | Tooling | Why |
|---|---|---|
| MVP | Docker Compose | Single deployment. One compose file with PostgreSQL + app. Simple to operate. |
| v2 | Kubernetes (K8s) | When HA, scaling, and multi-service extraction demands it. |
Cloud target: Cloud-agnostic architecture. Deploy to AWS ECS or Azure Container Apps initially. Design for portability (no cloud-proprietary APIs).
Client doc alignment: Agree with Docker/K8s but defer K8s complexity.
9. Additional Technology Decisions¶
Search¶
Recommendation: PostgreSQL full-text search for MVP.
OpenSearch/Elasticsearch adds operational complexity. PostgreSQL's tsvector and tsquery handle case search, customer lookup, and screening result search for MVP volumes.
Document Storage¶
Recommendation: PostgreSQL BYTEA or filesystem (configurable).
Avoid S3/MinIO for MVP. Documents are uploaded during onboarding and linked to cases. PostgreSQL handles BLOBs up to 1GB. Simpler than managing a separate object store.
Testing¶
Recommendation: JUnit 5 + TestContainers + Playwright.
- JUnit 5 for Kotlin/Java unit and integration tests.
- TestContainers for PostgreSQL integration tests (real DB, not H2 in-memory).
- Playwright for end-to-end workflow tests (simulate analyst processing a case from intake to approval).
CI/CD¶
Recommendation: GitHub Actions (or GitLab CI if self-hosted). Simple pipeline: build → test (unit + integration) → verify coverage → containerize.
10. Recommended Stack Summary¶
| Layer | Choice | Rationale (One Line) |
|---|---|---|
| Backend | Kotlin + Spring Boot 3 | Null safety, data classes, coroutines, enterprise ecosystem |
| Frontend | TypeScript + React + MUI | Type safety, rich component library, analyst UX |
| Database | PostgreSQL | ACID, JSONB, mature, recursive queries for graph |
| Workflow | Temporal | Workflow-as-code, versioning, signals, retry, visibility |
| Messaging (internal) | Spring Events | Zero-infrastructure decoupling for modular monolith |
| Messaging (audit) | Kafka (deferred) | Immutable event streaming for compliance consumers |
| Auth | Spring Security → Keycloak | Built-in for MVP, Keycloak when SSO needed |
| Architecture | Modular Monolith | Prove the model first, extract services when needed |
| Containers | Docker + Compose | Simple single-deployment for MVP |
| Search | PostgreSQL FTS | Sufficient for MVP case/customer lookup |
| Docs | PostgreSQL BYTEA | No separate object store for MVP |
| Testing | JUnit 5 + TestContainers + Playwright | Unit, integration, and workflow E2E |
| CI/CD | GitHub Actions | Simple, sufficient for MVP |
11. Decisions Diverging from Client Suggestions¶
| Client Suggestion | Our Decision | Why |
|---|---|---|
| Kotlin (agreed) | Kotlin + Spring Boot 3 | Aligned. Better choice than Java for this domain. |
| Python for analytics only (agreed) | Python for analytics modules | Aligned. Not for transactional orchestration. |
| Temporal (agreed) | Temporal | Aligned. Better fit than Camunda for code-defined workflows. |
| Camunda as alternative | Not selected | Temporal's versioning and signal model are better for our config-driven workflows. |
| Kafka from day one | Deferred to post-MVP | Start with Spring Events + PostgreSQL audit table. Add Kafka when audit consumers needed. |
| Keycloak from day one | Deferred to late MVP | Start with Spring Security. Add Keycloak when MFA/SSO needed (P1). |
| Neo4j for graph | PostgreSQL recursive queries | Sufficient for basic relationship traversal. Add Neo4j if needed post-MVP. |
| OpenSearch/Elasticsearch | PostgreSQL FTS | Sufficient for MVP search. Add OpenSearch only if needed. |
| Kubernetes from day one | Docker Compose | K8s complexity unjustified for single deployment MVP. |
12. Risks & Mitigations¶
| Risk | Mitigation |
|---|---|
| Temporal learning curve | Temporal has excellent docs and examples. Build a simple "Hello Workflow" PoC before onboarding workflow. |
| Modular monolith becomes a ball of mud | Architecture fitness tests: check that module A never imports from module B's internal package. CI gate. |
| PostgreSQL recursive queries become slow at scale | Define performance thresholds. If ownership chains > 10 levels become slow, evaluate Neo4j. |
| Spring Events not sufficient for decoupling | If cross-module coupling emerges, introduce a lightweight message bus (RabbitMQ) before microservices. |
| GitHub Actions limitations | If self-hosted runners needed, GitLab CI is a drop-in replacement. |
Evaluated against PRD v1.0. Re-evaluate if PRD NFRs change (especially performance, deployment target, or multi-tenancy decisions).