CODITECT
CODITECT VTR
Visual Test Report
PASSED

ADR-006/007: D1 + Durable Objects

Source: ADR-006-007-cloudflare-d1-durable-objects.md

ADR-006: Database — Cloudflare D1 (SQLite) over PostgreSQL

Date: 2026-04-09

Status: Accepted

Deciders: Platform Engineering Lead, Infrastructure Lead

Supersedes: Portions of ADR-004 (Drizzle ORM — updated to reflect D1 adapter, not PostgreSQL)

CODITECT Classification: Architecture Decision Record · A6

-----|-------|------------|-------------------|-------|

Cloudflare D1CloudflareYes (SQLite)YesThe only CF-native SQL option
Cloudflare KV onlyCloudflareNoYesDocument store — no joins, no transactions
Cloudflare Durable Objects (SQLite)CloudflareYes (SQLite per DO)YesOne SQLite DB per DO instance — no cross-DO queries
External PostgreSQL (Neon/Supabase)External SaaSYesNoViolates platform mandate
External PlanetScaleExternal SaaSYes (MySQL)NoViolates platform mandate

Decision

Adopt Cloudflare D1 as the primary relational database.

The scheduling tool's data model is a clean relational structure (polls → slots → responses → slot_responses) with modest scale requirements (thousands of polls, hundreds of participants per poll). D1's SQLite model is more than sufficient for this workload.

The Drizzle ORM D1 adapter is used for type-safe queries and migration management (ADR-004 updated accordingly — the ORM decision stands; only the adapter changes from postgres to d1).

D1-specific adaptations required:

ConcernPostgreSQL approachD1 adaptation
Primary keys`UUID PRIMARY KEY DEFAULT gen_random_uuid()``TEXT PRIMARY KEY` — UUID generated in application: `crypto.randomUUID()`
Timestamps`TIMESTAMPTZ``INTEGER` (Unix seconds) — timezone applied in application layer
Enums`CREATE TYPE status AS ENUM (...)``TEXT NOT NULL CHECK(status IN (...))`
RETURNING`INSERT INTO ... RETURNING id`Not used — query after insert
JSON aggregation`json_agg()``json_group_array()` (SQLite JSON functions)
Concurrent writesMVCCD1 serialises writes per database — acceptable for this workload

Consequences

Positive:

  • Zero external database infrastructure — no managed service, no VPC, no connection pooling setup
  • D1 is globally replicated by Cloudflare — low-latency reads from Workers at the edge
  • Drizzle ORM supports D1 with minimal adapter change — investment in ADR-004 preserved
  • D1 pricing model aligns with workload: free tier covers early usage; pay-per-row-written at scale
  • SQLite's simplicity is appropriate for this schema — no stored procedures, no triggers, no PL/pgSQL needed

Negative:

  • D1 is SQLite — some PostgreSQL features unavailable (no gen_random_uuid(), no TIMESTAMPTZ, no RETURNING, no jsonb, no row-level security, no full-text search)
  • D1 write throughput has limits — unsuitable for write-heavy analytics workloads. This tool is read-heavy (results display) vs. write-light (submit response) — acceptable.
  • D1 is still maturing as of 2026 — monitor Cloudflare changelog for breaking changes
  • Migrations must be managed carefully — D1 does not support transactional DDL in SQLite (cannot roll back a migration atomically)
  • No connection string — D1 is a binding, not a network service. Cannot be accessed from outside Cloudflare Workers environment (e.g., from a local script without wrangler d1 execute)

Operational:

  • All D1 queries in production run via wrangler d1 execute --remote for ad-hoc operations
  • Schema inspection: wrangler d1 execute scheduling-tool --remote --command ".schema"
  • Migration rollback: new migration file only (no atomic rollback) — plan migrations carefully

Alternatives Rejected

Cloudflare KV only: No join capability. Aggregating slot_responses across all responses for a poll would require fetching all response keys, loading each document, and computing totals in application code. This is O(n) KV reads per results page load — prohibitively expensive and slow. Rejected.

Durable Object per-poll SQLite: Each DO has its own SQLite DB. Cross-poll queries (e.g., expiry cron) would require iterating all DO instances — no practical mechanism. Suitable for chat apps, not for admin/cron operations. Rejected for primary store; DO used for WebSocket state only.

External PostgreSQL: Violates the platform mandate of no external services. Rejected.


Review Trigger

Revisit if D1 write throughput becomes a bottleneck (unlikely for scheduling tool scale), or if Cloudflare announces a D1 Postgres-compatible mode that removes SQLite constraints.


ADR-007: Real-Time Strategy — Durable Objects over Redis Pub/Sub

Date: 2026-04-09

Status: Accepted

Supersedes: ADR-003 v1.0 (native ws + Redis pub/sub)

Deciders: Platform Engineering Lead, Infrastructure Lead

CODITECT Classification: Architecture Decision Record · A6


Context

ADR-003 (v1.0) selected native ws npm library + Redis pub/sub for real-time WebSocket fan-out. The v2.0 platform mandate eliminates Redis (external managed service) and the ws npm library (requires Node.js runtime, not available in Cloudflare Workers).

The Cloudflare Workers runtime provides two WebSocket mechanisms:

1. Standard WebSocket in Workers — for direct client↔worker WS connections. Workers are stateless and do not persist connections across requests. Unsuitable for fan-out.

2. Durable Objects with WebSocket Hibernation API — a DO instance is a single-threaded, consistent compute unit that can hold WebSocket connections across its lifetime. The hibernation API allows the DO to sleep between messages, reducing cost.


Decision

Replace Redis pub/sub + ws npm with Cloudflare Durable Objects using the WebSocket Hibernation API.

Architecture:

  • One PollHub DO instance per poll, keyed by poll slug: env.POLL_HUB.idFromName(slug)
  • DO instance holds all WebSocket connections for that poll
  • WebSocket upgrade: Worker routes /api/polls/:slug/ws → forwards upgrade request to DO → DO creates WebSocketPair, calls ctx.acceptWebSocket(server) (hibernation), returns client to browser
  • Fan-out: when a response is submitted, Worker calls ctx.waitUntil(doStub.fetch(broadcastRequest)) → DO's broadcast() method iterates ctx.getWebSockets() and sends to all open connections
  • Hibernation: DO sleeps between messages. No memory cost while idle. Wakes in ~1ms on incoming message.

Consequences

Positive:

  • No Redis dependency — fully Cloudflare-native
  • No ws npm library — WebSockets handled by the platform natively
  • Hibernation API: DO sleeps when no messages flowing — near-zero idle cost
  • Strong consistency: DO is single-threaded — no race condition in fan-out broadcast
  • Automatic horizontal isolation: each poll's connections are in their own DO instance — no cross-poll interference
  • Cloudflare manages DO placement globally — connections are served from the nearest data center

Negative:

  • DO instance limit: ~128 MB memory per instance. A poll with 500 simultaneous WebSocket connections approaches this limit. Monitored via getConnectionCount() in the DO.
  • DO cold start: first connection to a new DO instance has a ~10ms cold start. Acceptable for scheduling use case.
  • DO egress: each broadcast() call sends a message to each connected WebSocket client — for 500 clients, this is 500 sends per broadcast. For this tool's update rate (1 update per response submission), this is well within limits.
  • Cannot share a DO instance between polls — correct by design, but means no cross-poll analytics from within DO state

Hibernation API vs legacy:

The legacy ws.accept() pattern keeps the DO alive for the duration of all WebSocket connections. For a poll open for 7 days with participants occasionally connecting, this would keep a DO alive for the entire duration — expensive. Hibernation solves this: DO wakes only when a message arrives or broadcast() is called.


Alternatives Rejected

Redis pub/sub (external): Violates platform mandate. Eliminated.

Standard WebSocket in Workers (stateless): Workers are stateless. A WebSocket connection to a Worker is terminated when the Worker finishes executing (128ms CPU time default). Cannot hold persistent connections. Unsuitable for fan-out. Rejected.

Server-Sent Events (SSE) via Workers: SSE with streaming responses can work in Workers (using ReadableStream). However, each SSE connection requires its own Worker execution context — fan-out still requires a shared pub/sub mechanism. Would still need a DO or KV polling. Adds complexity with no benefit over the DO WebSocket approach. Rejected.

HTTP polling (5s interval): Would work but creates unnecessary D1 read load and gives a laggy user experience (up to 5s delay in seeing a peer's response). The DO WebSocket approach delivers updates in ~100ms. Rejected for the primary real-time path; acceptable as a fallback if WebSocket connection fails (the client falls back to polling automatically via useRealtime hook reconnect logic).


Review Trigger

Revisit if a single poll consistently attracts 400+ simultaneous WebSocket connections (approaching DO memory limits), or if Cloudflare introduces a native pub/sub primitive that is more cost-effective than DO instances for fan-out use cases.