ADR-006/007: D1 + Durable Objects
Source: ADR-006-007-cloudflare-d1-durable-objects.md
ADR-006: Database — Cloudflare D1 (SQLite) over PostgreSQL
Date: 2026-04-09
Status: Accepted
Deciders: Platform Engineering Lead, Infrastructure Lead
Supersedes: Portions of ADR-004 (Drizzle ORM — updated to reflect D1 adapter, not PostgreSQL)
CODITECT Classification: Architecture Decision Record · A6
-----|-------|------------|-------------------|-------|
| Cloudflare D1 | Cloudflare | Yes (SQLite) | Yes | The only CF-native SQL option |
| Cloudflare KV only | Cloudflare | No | Yes | Document store — no joins, no transactions |
| Cloudflare Durable Objects (SQLite) | Cloudflare | Yes (SQLite per DO) | Yes | One SQLite DB per DO instance — no cross-DO queries |
| External PostgreSQL (Neon/Supabase) | External SaaS | Yes | No | Violates platform mandate |
| External PlanetScale | External SaaS | Yes (MySQL) | No | Violates platform mandate |
Decision
Adopt Cloudflare D1 as the primary relational database.
The scheduling tool's data model is a clean relational structure (polls → slots → responses → slot_responses) with modest scale requirements (thousands of polls, hundreds of participants per poll). D1's SQLite model is more than sufficient for this workload.
The Drizzle ORM D1 adapter is used for type-safe queries and migration management (ADR-004 updated accordingly — the ORM decision stands; only the adapter changes from postgres to d1).
D1-specific adaptations required:
| Concern | PostgreSQL approach | D1 adaptation |
| Primary keys | `UUID PRIMARY KEY DEFAULT gen_random_uuid()` | `TEXT PRIMARY KEY` — UUID generated in application: `crypto.randomUUID()` |
| Timestamps | `TIMESTAMPTZ` | `INTEGER` (Unix seconds) — timezone applied in application layer |
| Enums | `CREATE TYPE status AS ENUM (...)` | `TEXT NOT NULL CHECK(status IN (...))` |
| RETURNING | `INSERT INTO ... RETURNING id` | Not used — query after insert |
| JSON aggregation | `json_agg()` | `json_group_array()` (SQLite JSON functions) |
| Concurrent writes | MVCC | D1 serialises writes per database — acceptable for this workload |
Consequences
Positive:
- Zero external database infrastructure — no managed service, no VPC, no connection pooling setup
- D1 is globally replicated by Cloudflare — low-latency reads from Workers at the edge
- Drizzle ORM supports D1 with minimal adapter change — investment in ADR-004 preserved
- D1 pricing model aligns with workload: free tier covers early usage; pay-per-row-written at scale
- SQLite's simplicity is appropriate for this schema — no stored procedures, no triggers, no PL/pgSQL needed
Negative:
- D1 is SQLite — some PostgreSQL features unavailable (no
gen_random_uuid(), noTIMESTAMPTZ, noRETURNING, nojsonb, no row-level security, no full-text search) - D1 write throughput has limits — unsuitable for write-heavy analytics workloads. This tool is read-heavy (results display) vs. write-light (submit response) — acceptable.
- D1 is still maturing as of 2026 — monitor Cloudflare changelog for breaking changes
- Migrations must be managed carefully — D1 does not support transactional DDL in SQLite (cannot roll back a migration atomically)
- No connection string — D1 is a binding, not a network service. Cannot be accessed from outside Cloudflare Workers environment (e.g., from a local script without
wrangler d1 execute)
Operational:
- All D1 queries in production run via
wrangler d1 execute --remotefor ad-hoc operations - Schema inspection:
wrangler d1 execute scheduling-tool --remote --command ".schema" - Migration rollback: new migration file only (no atomic rollback) — plan migrations carefully
Alternatives Rejected
Cloudflare KV only: No join capability. Aggregating slot_responses across all responses for a poll would require fetching all response keys, loading each document, and computing totals in application code. This is O(n) KV reads per results page load — prohibitively expensive and slow. Rejected.
Durable Object per-poll SQLite: Each DO has its own SQLite DB. Cross-poll queries (e.g., expiry cron) would require iterating all DO instances — no practical mechanism. Suitable for chat apps, not for admin/cron operations. Rejected for primary store; DO used for WebSocket state only.
External PostgreSQL: Violates the platform mandate of no external services. Rejected.
Review Trigger
Revisit if D1 write throughput becomes a bottleneck (unlikely for scheduling tool scale), or if Cloudflare announces a D1 Postgres-compatible mode that removes SQLite constraints.
ADR-007: Real-Time Strategy — Durable Objects over Redis Pub/Sub
Date: 2026-04-09
Status: Accepted
Supersedes: ADR-003 v1.0 (native ws + Redis pub/sub)
Deciders: Platform Engineering Lead, Infrastructure Lead
CODITECT Classification: Architecture Decision Record · A6
Context
ADR-003 (v1.0) selected native ws npm library + Redis pub/sub for real-time WebSocket fan-out. The v2.0 platform mandate eliminates Redis (external managed service) and the ws npm library (requires Node.js runtime, not available in Cloudflare Workers).
The Cloudflare Workers runtime provides two WebSocket mechanisms:
1. Standard WebSocket in Workers — for direct client↔worker WS connections. Workers are stateless and do not persist connections across requests. Unsuitable for fan-out.
2. Durable Objects with WebSocket Hibernation API — a DO instance is a single-threaded, consistent compute unit that can hold WebSocket connections across its lifetime. The hibernation API allows the DO to sleep between messages, reducing cost.
Decision
Replace Redis pub/sub + ws npm with Cloudflare Durable Objects using the WebSocket Hibernation API.
Architecture:
- One
PollHubDO instance per poll, keyed by poll slug:env.POLL_HUB.idFromName(slug) - DO instance holds all WebSocket connections for that poll
- WebSocket upgrade: Worker routes
/api/polls/:slug/ws→ forwards upgrade request to DO → DO createsWebSocketPair, callsctx.acceptWebSocket(server)(hibernation), returnsclientto browser - Fan-out: when a response is submitted, Worker calls
ctx.waitUntil(doStub.fetch(broadcastRequest))→ DO'sbroadcast()method iteratesctx.getWebSockets()and sends to all open connections - Hibernation: DO sleeps between messages. No memory cost while idle. Wakes in ~1ms on incoming message.
Consequences
Positive:
- No Redis dependency — fully Cloudflare-native
- No
wsnpm library — WebSockets handled by the platform natively - Hibernation API: DO sleeps when no messages flowing — near-zero idle cost
- Strong consistency: DO is single-threaded — no race condition in fan-out broadcast
- Automatic horizontal isolation: each poll's connections are in their own DO instance — no cross-poll interference
- Cloudflare manages DO placement globally — connections are served from the nearest data center
Negative:
- DO instance limit: ~128 MB memory per instance. A poll with 500 simultaneous WebSocket connections approaches this limit. Monitored via
getConnectionCount()in the DO. - DO cold start: first connection to a new DO instance has a ~10ms cold start. Acceptable for scheduling use case.
- DO egress: each
broadcast()call sends a message to each connected WebSocket client — for 500 clients, this is 500 sends per broadcast. For this tool's update rate (1 update per response submission), this is well within limits. - Cannot share a DO instance between polls — correct by design, but means no cross-poll analytics from within DO state
Hibernation API vs legacy:
The legacy ws.accept() pattern keeps the DO alive for the duration of all WebSocket connections. For a poll open for 7 days with participants occasionally connecting, this would keep a DO alive for the entire duration — expensive. Hibernation solves this: DO wakes only when a message arrives or broadcast() is called.
Alternatives Rejected
Redis pub/sub (external): Violates platform mandate. Eliminated.
Standard WebSocket in Workers (stateless): Workers are stateless. A WebSocket connection to a Worker is terminated when the Worker finishes executing (128ms CPU time default). Cannot hold persistent connections. Unsuitable for fan-out. Rejected.
Server-Sent Events (SSE) via Workers: SSE with streaming responses can work in Workers (using ReadableStream). However, each SSE connection requires its own Worker execution context — fan-out still requires a shared pub/sub mechanism. Would still need a DO or KV polling. Adds complexity with no benefit over the DO WebSocket approach. Rejected.
HTTP polling (5s interval): Would work but creates unnecessary D1 read load and gives a laggy user experience (up to 5s delay in seeing a peer's response). The DO WebSocket approach delivers updates in ~100ms. Rejected for the primary real-time path; acceptable as a fallback if WebSocket connection fails (the client falls back to polling automatically via useRealtime hook reconnect logic).
Review Trigger
Revisit if a single poll consistently attracts 400+ simultaneous WebSocket connections (approaching DO memory limits), or if Cloudflare introduces a native pub/sub primitive that is more cost-effective than DO instances for fan-out use cases.