Files
logseq/docs/agent-guide/056-graph-name-dir-encoding-alignment.md
2026-03-12 15:12:34 +08:00

13 KiB

Align graph dir encoding between logseq-cli and desktop app

Summary

Align logseq-cli, db-worker-node, and desktop app handling of graph dir / graph-name so special characters are encoded and decoded with one shared, reversible contract.

The authoritative contract would be the existing encode-graph-dir-name / decode-graph-dir-name pair in src/main/frontend/worker_common/util.cljc, which is already used by db-worker-node and logseq-cli server-side graph directory resolution.

This plan keeps user-facing graph names unchanged and only aligns their on-disk directory representation.

Background

Current code paths do not agree on how a graph name maps to a graph directory on disk:

  • db-worker-node and logseq-cli server/runtime paths use a reversible graph-dir encoding.
  • desktop app contains paths that join the raw graph name directly into a filesystem path.
  • some Electron and CLI-adjacent helpers still use lossy sanitize-db-name behavior.
  • shared graph discovery still contains legacy decoding logic for older naming conventions, but not the current reversible encoding.

This mismatch becomes visible when graph names contain special characters such as /, :, %, ~, or spaces.

Goals

  • Use one shared graph-dir encoding/decoding contract across CLI and desktop app.
  • Preserve current user-facing graph-name semantics.
  • Keep logseq_db_ prefix canonicalization separate from graph-dir encoding.
  • Define compatibility behavior for legacy graph directory names.
  • Add tests that cover special-character graph names across all affected entry points.

Non-goals

  • Redesign the user-visible graph naming model.
  • Change the existing logseq_db_ display normalization rules.
  • Remove all legacy compatibility in one step without an explicit migration strategy.

Current behavior

Shared reversible encoding already exists

Authoritative implementation today:

  • src/main/frontend/worker_common/util.cljc
    • encode-graph-dir-name
    • decode-graph-dir-name

Current behavior:

  1. encodeURIComponent is applied.
  2. literal ~ is rewritten to %7E.
  3. % is rewritten to ~.
  4. decoding reverses ~ -> % and then applies decodeURIComponent.

This gives a reversible filesystem-safe directory key without / or \\ path separators.

db-worker-node follows the shared contract

Relevant files:

  • src/main/frontend/worker/db_worker_node_lock.cljs
  • src/main/frontend/worker/platform/node.cljs
  • src/main/frontend/worker/db_worker_node.cljs
  • src/main/frontend/worker/graph_dir.cljs

Current behavior:

  • repo identity strips one leading logseq_db_ to produce a graph-dir key.
  • graph-dir key is encoded with encode-graph-dir-name.
  • list-graphs decodes on-disk directory names back to graph-dir keys.
  • worker log paths and lock paths are stored under the encoded graph directory.

CLI is partially aligned

Relevant files:

  • src/main/logseq/cli/server.cljs
  • src/main/logseq/cli/command/core.cljs
  • src/main/logseq/cli/command/graph.cljs
  • src/main/logseq/cli/common.cljs
  • deps/cli/src/logseq/cli/common/graph.cljs
  • deps/cli/src/logseq/cli/util.cljs

Current behavior:

  • cli.server already uses the same canonical graph-dir path contract as db-worker-node.
  • graph display/input normalization strips or restores one logseq_db_ prefix as needed.
  • unlink-graph! still derives directory names with sanitize-db-name, which is lossy.
  • shared discovery in deps/cli still decodes only older directory naming patterns such as ++ and +3A+.

Desktop app is not aligned

Relevant files:

  • src/electron/electron/utils.cljs
  • src/electron/electron/db.cljs
  • src/electron/electron/handler.cljs
  • src/electron/electron/url.cljs
  • src/main/frontend/config.cljs

Current behavior:

  • electron.utils/get-graph-dir joins the raw graph name into the graph path after db-prefix stripping.
  • if the graph name contains /, the resulting path becomes nested directories.
  • electron.db still uses sanitize-db-name in some db path creation logic.
  • frontend local-dir helpers also treat graph name as a raw path segment.

Problem statement

The same logical graph name can map to different on-disk paths depending on which subsystem touches it:

  • reversible encoded path in db-worker-node
  • raw path join in Electron/frontend
  • lossy underscore replacement in sanitize-based helpers
  • legacy decode-only behavior in shared graph discovery

As a result:

  • a graph may be listable but not removable
  • a graph may be resolvable in CLI but not in desktop app
  • a graph name containing / may accidentally create path nesting in one flow but not another
  • existing tests do not enforce cross-subsystem parity

Proposed contract

1. Separate graph identity from graph directory representation

The plan would explicitly distinguish:

  • graph-name / repo: user-facing identifier, subject to existing logseq_db_ canonicalization rules
  • graph-dir key: graph-name with exactly one leading db prefix stripped
  • encoded graph-dir: on-disk directory name produced only by encode-graph-dir-name

This separation would make it clear that special-character handling belongs to the graph-dir layer, not the user-facing name layer.

2. Make the db-worker-node contract authoritative

The repository would standardize on:

  • repo -> graph-dir key: strip one leading logseq_db_
  • graph-dir key -> encoded graph-dir: encode-graph-dir-name
  • encoded graph-dir -> graph-dir key: decode-graph-dir-name

Any code path that needs an on-disk db graph directory would route through this contract rather than reimplementing path logic.

3. Keep user-visible graph names unchanged

The plan would preserve current user-visible behavior:

  • CLI graph names remain prefix-free for display and config storage where already intended.
  • desktop app continues to display logical graph names, not encoded directory names.
  • URL-level graph identification continues to resolve to logical graph names, not on-disk encoded names.

Proposed code changes

A. Consolidate path-authoritative helpers

Add or reuse one shared helper layer for:

  • converting repo to graph-dir key
  • converting graph-dir key to encoded graph directory
  • converting repo directly to on-disk graph directory path

Target files likely involved:

  • src/main/frontend/worker/graph_dir.cljs
  • src/main/frontend/worker/db_worker_node_lock.cljs
  • src/electron/electron/utils.cljs
  • src/main/frontend/config.cljs
  • deps/cli/src/logseq/cli/util.cljs

Expected outcome:

  • no raw path join for logical graph names in path-authoritative code
  • no duplicate graph-dir encoding implementations

B. Align Electron graph-dir resolution

Replace raw graph path derivation in Electron with the shared encoded graph-dir contract.

Target files:

  • src/electron/electron/utils.cljs
  • src/electron/electron/handler.cljs
  • src/electron/electron/db.cljs

Expected outcome:

  • desktop app resolves the same on-disk graph dir as db-worker-node
  • graph names containing /, :, %, ~, or spaces behave predictably
  • sanitize-db-name is no longer used for authoritative db graph-dir mapping

Update CLI removal/unlink flows to resolve graph directories via the same encoded contract used by list/start/lock behavior.

Target file:

  • src/main/logseq/cli/common.cljs

Expected outcome:

  • a graph that can be listed or switched to can also be removed through the same path mapping

D. Align shared graph discovery

Update shared discovery helpers so current encoded graph dirs are decoded correctly, while preserving deliberate support for legacy names where needed.

Target file:

  • deps/cli/src/logseq/cli/common/graph.cljs

Expected outcome:

  • desktop/CLI discovery would recognize encoded graph dirs produced by current db-worker-node logic
  • legacy decode branches would be explicitly documented as compatibility behavior

E. Audit frontend local-dir helpers

Review helpers that expose graph-related directories to ensure they are either:

  • display-only helpers, or
  • path-authoritative helpers using the shared encoded contract

Target file:

  • src/main/frontend/config.cljs

Expected outcome:

  • no ambiguous helper remains that appears safe for filesystem use while still using raw graph names

Compatibility and migration

This plan should explicitly decide how to handle already-existing graph directories created by older logic.

Option 1: Read legacy names, write canonical encoded names

Behavior:

  • discovery accepts legacy directory names and current encoded names
  • all newly created or rewritten paths use the canonical encoded form
  • optional one-time migration may rename legacy directories

Pros:

  • safer rollout
  • less risk of immediately losing access to existing graphs

Cons:

  • mixed formats may coexist temporarily

Option 2: Auto-migrate on access

Behavior:

  • when a legacy graph directory is detected, code renames it to the canonical encoded path before continuing

Pros:

  • converges quickly to one format

Cons:

  • higher operational risk
  • rename behavior must be designed carefully for active workers and lock files

Option 3: Strict cutover

Behavior:

  • only encoded graph dirs are supported after the change

Pros:

  • simplest long-term contract

Cons:

  • too risky without explicit migration tooling

Prefer Option 1 for the first rollout:

  • read compatibility for legacy directory names
  • canonical writes to encoded graph dirs
  • add explicit migration follow-up only after parity tests pass

Test plan

Unit tests

Extend or add tests for:

  • src/test/frontend/worker/worker_common_util_test.cljs
  • src/test/frontend/worker/db_worker_node_lock_test.cljs
  • src/test/logseq/cli/server_test.cljs
  • src/test/logseq/cli/common/graph_test.cljs
  • Electron-specific tests if available for graph-dir resolution

Special-character test matrix

All subsystems should use the same examples:

  • foo/bar
  • a:b
  • space name
  • 100% legit
  • til~de
  • mix/of:many %chars~here

Behavior to verify

  1. encode/decode roundtrip is lossless
  2. CLI list-graphs returns the same logical graph name that was encoded on disk
  3. CLI switch/remove resolve the same graph directory
  4. desktop app resolves the same graph directory as CLI/db-worker-node
  5. graph names remain user-visible without encoded substitutions
  6. legacy discovery behavior remains intentional and documented

Missing coverage today

The repository currently appears to lack end-to-end parity tests for:

  • CLI create/switch/remove with special-character graph names
  • Electron graph-name -> graph-dir resolution with special characters
  • desktop and CLI agreement on one on-disk graph directory for the same logical graph

Rollout sequence

  1. Make the shared graph-dir contract explicit in code and docs.
  2. Update Electron path-authoritative helpers to use encoded graph dirs.
  3. Update CLI unlink/remove behavior to use the same mapping.
  4. Update shared graph discovery for encoded graph dirs and legacy compatibility.
  5. Add parity tests across worker, CLI, and desktop-related helpers.
  6. Evaluate whether legacy directory migration should be a separate follow-up.

Risks

  • Existing graphs may already exist under lossy or raw directory naming rules.
  • Desktop-specific compatibility code may rely on current path layout assumptions.
  • URL/deeplink flows may resolve graph identifiers separately from filesystem mapping and should not accidentally expose encoded names to users.
  • Removing sanitize-db-name from authoritative paths may surface hidden assumptions in older db bootstrap code.

Open questions

  1. Should legacy raw/sanitized graph directories remain writable, or only readable?
  2. Should migration happen automatically, manually, or in a later dedicated change?
  3. Which helper should become the single exported entry point for graph-name -> on-disk graph-dir path resolution?
  4. Should docs/cli/logseq-cli.md be updated in the same change to clarify that on-disk graph directories are encoded, not always literal graph names?

Expected files to change in implementation

Likely implementation targets:

  • src/main/frontend/worker_common/util.cljc
  • src/main/frontend/worker/graph_dir.cljs
  • src/main/frontend/worker/db_worker_node_lock.cljs
  • src/main/logseq/cli/server.cljs
  • src/main/logseq/cli/common.cljs
  • deps/cli/src/logseq/cli/common/graph.cljs
  • deps/cli/src/logseq/cli/util.cljs
  • src/electron/electron/utils.cljs
  • src/electron/electron/db.cljs
  • src/electron/electron/handler.cljs
  • src/main/frontend/config.cljs
  • related tests under src/test/

Acceptance criteria

This plan would be complete when:

  • one shared graph-dir encoding contract is identified as authoritative
  • all affected subsystems and files are enumerated
  • compatibility strategy for legacy graph directories is documented
  • a concrete test matrix for special-character graph names is defined
  • the plan preserves current user-facing graph-name semantics while aligning on-disk graph-dir behavior