clone/logseq

Fork 0

mirror of https://github.com/logseq/logseq.git synced 2026-05-15 16:32:21 +00:00

Files

Gabriel Horner 9502f57dbe 054-db-sync-test-isolation.md

2026-03-12 15:12:34 +08:00

5.8 KiB

Raw Permalink Blame History

054-db-sync-test-isolation

Summary

bb dev:lint-and-test would currently fail in CLJS runtime tests, while lint and test compilation would still pass. The observed failures would point to test isolation problems inside frontend.worker.db-sync-test, not to a stable regression in db-sync production logic.

Background

The failing command would be:

bb dev:lint-and-test

Its execution flow would be:

lint tasks
CLJS test compilation
CLJS runtime test execution

Observed behavior from the captured run:

lint would pass
test compilation would pass
runtime tests would fail with 3 failures

The failing tests would include:

frontend.worker.db-sync-test/ensure-upload-graph-identity-defaults-e2ee-enabled-test
frontend.worker.db-sync-test/download-graph-by-id-fails-on-incomplete-snapshot-frame-test

A follow-up run of the full namespace would fail on a different test:

frontend.worker.db-sync-test/pull-ok-out-of-order-stale-response-is-ignored-test

This changing failure surface would strongly suggest flaky shared state rather than a deterministic implementation bug.

Evidence

1. Failing tests would pass in isolation

Running these tests individually would pass:

frontend.worker.db-sync-test/ensure-upload-graph-identity-defaults-e2ee-enabled-test
frontend.worker.db-sync-test/download-graph-by-id-fails-on-incomplete-snapshot-frame-test

This would indicate that the underlying code paths can behave correctly when the test environment is clean.

2. Full-namespace execution would produce a different failure

Running the whole namespace would fail in pull-ok-out-of-order-stale-response-is-ignored-test instead of the two failures seen in the full suite.

This would indicate order dependence or leaked global state between tests in the same namespace.

3. Failure payloads would match leaked state

ensure-upload-graph-identity-defaults-e2ee-enabled-test would unexpectedly receive an existing graph identity with:

:graph-id "c52bf9d4-3a95-4f17-9c8c-7f86eef5f75d"
:graph-e2ee? false

That graph id would match a value used by another test in the same file. This would suggest that the later test is reusing state left behind by an earlier test instead of executing its own intended bootstrap path.

download-graph-by-id-fails-on-incomplete-snapshot-frame-test would expect incomplete-snapshot-frame but would sometimes receive an unrelated assertion failure:

Assert failed: test-db-sync-repo (some? conn)

This would suggest that another asynchronous code path is still interacting with shared worker state while the test is running.

Root Cause

The most likely root cause would be incomplete per-test cleanup of shared global state in:

frontend.worker.state
frontend.worker.sync

The highest-risk shared atoms would include:

worker-state/*datascript-conns
worker-state/*client-ops-conns
worker-state/*db-sync-client
worker-state/*db-sync-config
db-sync/*repo->latest-remote-tx
db-sync/*start-inflight-target

The test file would also rely heavily on:

async done
nested promesa chains
cleanup inside p/finally

That structure would make it possible for a test to signal completion before all outer cleanup has fully restored shared state, especially when asynchronous callbacks or listeners are still active.

Proposed Changes

1. Add a per-test fixture for global state reset

A :each fixture in /Users/rcmerci/gh-repos/logseq/src/test/frontend/worker/db_sync_test.cljs would reset the shared worker and db-sync atoms before and after every test.

The fixture would reset at least:

worker-state/*datascript-conns
worker-state/*client-ops-conns
worker-state/*db-sync-client
worker-state/*db-sync-config
db-sync/*repo->latest-remote-tx
db-sync/*start-inflight-target

2. Review high-risk async tests

If fixture-based isolation is not sufficient, the next step would be to tighten async structure in the tests most likely to leak state.

Priority candidates would include:

upload identity tests
snapshot download tests
pull/ok ordering tests

The goal would be to ensure that:

cleanup happens after the actual promise chain settles
done is called only after cleanup boundaries are satisfied
any listener or temporary override is fully restored before the next test starts

Why Production Code Is Less Likely To Be At Fault

The current evidence would not strongly support a production db-sync bug because:

the affected tests would pass when run alone
the failing test would change depending on suite shape
the observed incorrect values would match data seeded by neighboring tests
the expected production code path for incomplete snapshot handling already appears to raise the correct error in isolated execution

Because of that, modifying production sync logic first would risk masking a test harness problem instead of fixing the actual instability.

Verification Plan

After applying the test isolation changes, verification would include:

bb dev:test -v frontend.worker.db-sync-test
bb dev:lint-and-test

Success criteria would be:

individual failing tests still pass
the full frontend.worker.db-sync-test namespace passes reliably
the full bb dev:lint-and-test command no longer fails on these db-sync tests

Risks And Follow-ups

If failures remain after the fixture reset, the next investigation would focus on:

leaked d/listen! listeners
background tasks still running across tests
js/fetch restoration timing
promise chains whose cleanup is attached at the wrong level
other shared atoms not yet covered by the fixture

Files

Would modify:

/Users/rcmerci/gh-repos/logseq/src/test/frontend/worker/db_sync_test.cljs

Would not modify initially:

/Users/rcmerci/gh-repos/logseq/src/main/frontend/worker/sync.cljs
/Users/rcmerci/gh-repos/logseq/src/main/frontend/worker/sync/client_op.cljs

5.8 KiB Raw Permalink Blame History