merge: resolve cliable db-sync worker conflicts

2026-05-30 07:29:48 +00:00 · 2026-04-01 05:02:46 +08:00
parent 0809a53e79 3a04a0e582
commit 609c475144
17 changed files with 844 additions and 910 deletions
--- a/docs/adr/0014-kv-row-r2-snapshot-download.md
+++ b/docs/adr/0014-kv-row-r2-snapshot-download.md
@@ -0,0 +1,58 @@
+# ADR 0014: KV-Row R2 Snapshot Download With Worker-Owned Low-Memory Import
+
+Date: 2026-04-01
+Status: Proposed
+
+## Context
+Snapshot download previously exported Datascript datoms as gzip NDJSON from the
+server and parsed/transacted datoms on the client main-thread handler path.
+
+That design had two issues:
+
+1. Server snapshot export walked full datoms and spent avoidable CPU/memory.
+2. Client download logic lived in handler code and was not aligned with worker
+   ownership for large-graph import.
+
+We already use framed Transit `kvs` rows for snapshot upload. Download should
+converge on the same wire format.
+
+## Decision
+1. `GET /sync/:graph-id/snapshot/download` and `/snapshot/stream` export framed
+   Transit `kvs` rows (`[addr content addresses]`) instead of datom NDJSON.
+2. Snapshot download payload content-type is `application/transit+json`
+   (gzip-compressed when available).
+3. Server snapshot export reads directly from sqlite `kvs` rows in ascending
+   `addr` batches and streams framed payloads to response/R2.
+4. Graph snapshot download orchestration is moved to
+   `frontend.worker.sync.download` and invoked from db-worker thread API.
+5. Handler code delegates graph download to worker API instead of parsing
+   snapshot payloads directly.
+6. Client import adds row-chunk API (`:thread-api/db-sync-import-rows-chunk`).
+   Row batches are staged in temp sqlite, then replayed into target conn in
+   schema-first order.
+7. Replay order must transact schema-critical datoms before regular data:
+   - `:logseq.kv/schema-version` entity datoms
+   - attribute-definition datoms (`:db/ident` and `:db/*` metadata such as
+     `:db/valueType`, `:db/cardinality`, `:db/unique`, `:db/isComponent`)
+   - all remaining datoms
+
+## Consequences
+
+### Positive
+- Lower server CPU/memory for snapshot export (no datom NDJSON generation).
+- Download/upload snapshot format is unified around framed `kvs` rows.
+- Download pipeline ownership moves to worker sync module.
+- Schema-first replay protects index/schema correctness for large imports.
+
+### Tradeoffs
+- Client still performs datom replay during finalize to rebuild a consistent
+  target store, so import cost shifts to worker finalize phase.
+- Adds temp sqlite staging and one additional import path (`rows` alongside
+  legacy datom chunk path).
+
+## Verification
+- Server tests assert snapshot download/stream return framed kv rows with
+  transit content-type and sorted addresses.
+- Handler tests assert graph download delegates to worker API and maintains
+  download-state lifecycle.
+- Worker tests assert rows-chunk API wiring and schema-first import ordering.
--- a/docs/agent-guide/db-sync/protocol.md
+++ b/docs/agent-guide/db-sync/protocol.md
@@ -99,7 +99,7 @@
  - Build a snapshot file in R2 and return a download URL.
  - Response: `{"ok":true,"key":"<graph-id>/<uuid>.snapshot","url":"<origin>/assets/:graph-id/<uuid>.snapshot","content-encoding":"gzip"}`.
  - Error response (409): `{"error":"graph not ready"}` when bootstrap upload/import has not finished.
-  - The snapshot file stored in R2 is a gzip-compressed NDJSON stream of full Datascript datoms. Each line is a Transit JSON datom map: `{e,a,v,tx,added}`.
+  - The snapshot file stored in R2 is a framed Transit stream of sqlite `kvs` rows (`[addr, content, addresses]`), optionally gzip-compressed.
 - `POST /sync/:graph-id/snapshot/upload?reset=true|false`
  - Upload a snapshot stream for bootstrap import. Current upload format remains framed Transit JSON kvs rows, optionally gzip-compressed.
  - Request body: binary stream; headers should include `content-type: application/transit+json` and `content-encoding: gzip` when compressed.