Files
logseq/docs/agent-guide/056-graph-name-dir-encoding-alignment.md
2026-03-12 15:12:34 +08:00

378 lines
13 KiB
Markdown

# Align graph dir encoding between logseq-cli and desktop app
## Summary
Align `logseq-cli`, `db-worker-node`, and desktop app handling of `graph dir` / `graph-name` so special characters are encoded and decoded with one shared, reversible contract.
The authoritative contract would be the existing `encode-graph-dir-name` / `decode-graph-dir-name` pair in `src/main/frontend/worker_common/util.cljc`, which is already used by `db-worker-node` and `logseq-cli` server-side graph directory resolution.
This plan keeps user-facing graph names unchanged and only aligns their on-disk directory representation.
## Background
Current code paths do not agree on how a graph name maps to a graph directory on disk:
- `db-worker-node` and `logseq-cli` server/runtime paths use a reversible graph-dir encoding.
- desktop app contains paths that join the raw graph name directly into a filesystem path.
- some Electron and CLI-adjacent helpers still use lossy `sanitize-db-name` behavior.
- shared graph discovery still contains legacy decoding logic for older naming conventions, but not the current reversible encoding.
This mismatch becomes visible when graph names contain special characters such as `/`, `:`, `%`, `~`, or spaces.
## Goals
- Use one shared graph-dir encoding/decoding contract across CLI and desktop app.
- Preserve current user-facing graph-name semantics.
- Keep `logseq_db_` prefix canonicalization separate from graph-dir encoding.
- Define compatibility behavior for legacy graph directory names.
- Add tests that cover special-character graph names across all affected entry points.
## Non-goals
- Redesign the user-visible graph naming model.
- Change the existing `logseq_db_` display normalization rules.
- Remove all legacy compatibility in one step without an explicit migration strategy.
## Current behavior
### Shared reversible encoding already exists
Authoritative implementation today:
- `src/main/frontend/worker_common/util.cljc`
- `encode-graph-dir-name`
- `decode-graph-dir-name`
Current behavior:
1. `encodeURIComponent` is applied.
2. literal `~` is rewritten to `%7E`.
3. `%` is rewritten to `~`.
4. decoding reverses `~ -> %` and then applies `decodeURIComponent`.
This gives a reversible filesystem-safe directory key without `/` or `\\` path separators.
### db-worker-node follows the shared contract
Relevant files:
- `src/main/frontend/worker/db_worker_node_lock.cljs`
- `src/main/frontend/worker/platform/node.cljs`
- `src/main/frontend/worker/db_worker_node.cljs`
- `src/main/frontend/worker/graph_dir.cljs`
Current behavior:
- repo identity strips one leading `logseq_db_` to produce a graph-dir key.
- graph-dir key is encoded with `encode-graph-dir-name`.
- list-graphs decodes on-disk directory names back to graph-dir keys.
- worker log paths and lock paths are stored under the encoded graph directory.
### CLI is partially aligned
Relevant files:
- `src/main/logseq/cli/server.cljs`
- `src/main/logseq/cli/command/core.cljs`
- `src/main/logseq/cli/command/graph.cljs`
- `src/main/logseq/cli/common.cljs`
- `deps/cli/src/logseq/cli/common/graph.cljs`
- `deps/cli/src/logseq/cli/util.cljs`
Current behavior:
- `cli.server` already uses the same canonical graph-dir path contract as `db-worker-node`.
- graph display/input normalization strips or restores one `logseq_db_` prefix as needed.
- `unlink-graph!` still derives directory names with `sanitize-db-name`, which is lossy.
- shared discovery in `deps/cli` still decodes only older directory naming patterns such as `++` and `+3A+`.
### Desktop app is not aligned
Relevant files:
- `src/electron/electron/utils.cljs`
- `src/electron/electron/db.cljs`
- `src/electron/electron/handler.cljs`
- `src/electron/electron/url.cljs`
- `src/main/frontend/config.cljs`
Current behavior:
- `electron.utils/get-graph-dir` joins the raw graph name into the graph path after db-prefix stripping.
- if the graph name contains `/`, the resulting path becomes nested directories.
- `electron.db` still uses `sanitize-db-name` in some db path creation logic.
- frontend local-dir helpers also treat graph name as a raw path segment.
## Problem statement
The same logical graph name can map to different on-disk paths depending on which subsystem touches it:
- reversible encoded path in `db-worker-node`
- raw path join in Electron/frontend
- lossy underscore replacement in sanitize-based helpers
- legacy decode-only behavior in shared graph discovery
As a result:
- a graph may be listable but not removable
- a graph may be resolvable in CLI but not in desktop app
- a graph name containing `/` may accidentally create path nesting in one flow but not another
- existing tests do not enforce cross-subsystem parity
## Proposed contract
### 1. Separate graph identity from graph directory representation
The plan would explicitly distinguish:
- **graph-name / repo**: user-facing identifier, subject to existing `logseq_db_` canonicalization rules
- **graph-dir key**: graph-name with exactly one leading db prefix stripped
- **encoded graph-dir**: on-disk directory name produced only by `encode-graph-dir-name`
This separation would make it clear that special-character handling belongs to the graph-dir layer, not the user-facing name layer.
### 2. Make the db-worker-node contract authoritative
The repository would standardize on:
- `repo -> graph-dir key`: strip one leading `logseq_db_`
- `graph-dir key -> encoded graph-dir`: `encode-graph-dir-name`
- `encoded graph-dir -> graph-dir key`: `decode-graph-dir-name`
Any code path that needs an on-disk db graph directory would route through this contract rather than reimplementing path logic.
### 3. Keep user-visible graph names unchanged
The plan would preserve current user-visible behavior:
- CLI graph names remain prefix-free for display and config storage where already intended.
- desktop app continues to display logical graph names, not encoded directory names.
- URL-level graph identification continues to resolve to logical graph names, not on-disk encoded names.
## Proposed code changes
### A. Consolidate path-authoritative helpers
Add or reuse one shared helper layer for:
- converting repo to graph-dir key
- converting graph-dir key to encoded graph directory
- converting repo directly to on-disk graph directory path
Target files likely involved:
- `src/main/frontend/worker/graph_dir.cljs`
- `src/main/frontend/worker/db_worker_node_lock.cljs`
- `src/electron/electron/utils.cljs`
- `src/main/frontend/config.cljs`
- `deps/cli/src/logseq/cli/util.cljs`
Expected outcome:
- no raw path join for logical graph names in path-authoritative code
- no duplicate graph-dir encoding implementations
### B. Align Electron graph-dir resolution
Replace raw graph path derivation in Electron with the shared encoded graph-dir contract.
Target files:
- `src/electron/electron/utils.cljs`
- `src/electron/electron/handler.cljs`
- `src/electron/electron/db.cljs`
Expected outcome:
- desktop app resolves the same on-disk graph dir as `db-worker-node`
- graph names containing `/`, `:`, `%`, `~`, or spaces behave predictably
- `sanitize-db-name` is no longer used for authoritative db graph-dir mapping
### C. Align CLI remove/unlink behavior
Update CLI removal/unlink flows to resolve graph directories via the same encoded contract used by list/start/lock behavior.
Target file:
- `src/main/logseq/cli/common.cljs`
Expected outcome:
- a graph that can be listed or switched to can also be removed through the same path mapping
### D. Align shared graph discovery
Update shared discovery helpers so current encoded graph dirs are decoded correctly, while preserving deliberate support for legacy names where needed.
Target file:
- `deps/cli/src/logseq/cli/common/graph.cljs`
Expected outcome:
- desktop/CLI discovery would recognize encoded graph dirs produced by current db-worker-node logic
- legacy decode branches would be explicitly documented as compatibility behavior
### E. Audit frontend local-dir helpers
Review helpers that expose graph-related directories to ensure they are either:
- display-only helpers, or
- path-authoritative helpers using the shared encoded contract
Target file:
- `src/main/frontend/config.cljs`
Expected outcome:
- no ambiguous helper remains that appears safe for filesystem use while still using raw graph names
## Compatibility and migration
This plan should explicitly decide how to handle already-existing graph directories created by older logic.
### Option 1: Read legacy names, write canonical encoded names
Behavior:
- discovery accepts legacy directory names and current encoded names
- all newly created or rewritten paths use the canonical encoded form
- optional one-time migration may rename legacy directories
Pros:
- safer rollout
- less risk of immediately losing access to existing graphs
Cons:
- mixed formats may coexist temporarily
### Option 2: Auto-migrate on access
Behavior:
- when a legacy graph directory is detected, code renames it to the canonical encoded path before continuing
Pros:
- converges quickly to one format
Cons:
- higher operational risk
- rename behavior must be designed carefully for active workers and lock files
### Option 3: Strict cutover
Behavior:
- only encoded graph dirs are supported after the change
Pros:
- simplest long-term contract
Cons:
- too risky without explicit migration tooling
### Recommended direction
Prefer **Option 1** for the first rollout:
- read compatibility for legacy directory names
- canonical writes to encoded graph dirs
- add explicit migration follow-up only after parity tests pass
## Test plan
### Unit tests
Extend or add tests for:
- `src/test/frontend/worker/worker_common_util_test.cljs`
- `src/test/frontend/worker/db_worker_node_lock_test.cljs`
- `src/test/logseq/cli/server_test.cljs`
- `src/test/logseq/cli/common/graph_test.cljs`
- Electron-specific tests if available for graph-dir resolution
### Special-character test matrix
All subsystems should use the same examples:
- `foo/bar`
- `a:b`
- `space name`
- `100% legit`
- `til~de`
- `mix/of:many %chars~here`
### Behavior to verify
1. encode/decode roundtrip is lossless
2. CLI list-graphs returns the same logical graph name that was encoded on disk
3. CLI switch/remove resolve the same graph directory
4. desktop app resolves the same graph directory as CLI/db-worker-node
5. graph names remain user-visible without encoded substitutions
6. legacy discovery behavior remains intentional and documented
### Missing coverage today
The repository currently appears to lack end-to-end parity tests for:
- CLI create/switch/remove with special-character graph names
- Electron graph-name -> graph-dir resolution with special characters
- desktop and CLI agreement on one on-disk graph directory for the same logical graph
## Rollout sequence
1. Make the shared graph-dir contract explicit in code and docs.
2. Update Electron path-authoritative helpers to use encoded graph dirs.
3. Update CLI unlink/remove behavior to use the same mapping.
4. Update shared graph discovery for encoded graph dirs and legacy compatibility.
5. Add parity tests across worker, CLI, and desktop-related helpers.
6. Evaluate whether legacy directory migration should be a separate follow-up.
## Risks
- Existing graphs may already exist under lossy or raw directory naming rules.
- Desktop-specific compatibility code may rely on current path layout assumptions.
- URL/deeplink flows may resolve graph identifiers separately from filesystem mapping and should not accidentally expose encoded names to users.
- Removing `sanitize-db-name` from authoritative paths may surface hidden assumptions in older db bootstrap code.
## Open questions
1. Should legacy raw/sanitized graph directories remain writable, or only readable?
2. Should migration happen automatically, manually, or in a later dedicated change?
3. Which helper should become the single exported entry point for graph-name -> on-disk graph-dir path resolution?
4. Should `docs/cli/logseq-cli.md` be updated in the same change to clarify that on-disk graph directories are encoded, not always literal graph names?
## Expected files to change in implementation
Likely implementation targets:
- `src/main/frontend/worker_common/util.cljc`
- `src/main/frontend/worker/graph_dir.cljs`
- `src/main/frontend/worker/db_worker_node_lock.cljs`
- `src/main/logseq/cli/server.cljs`
- `src/main/logseq/cli/common.cljs`
- `deps/cli/src/logseq/cli/common/graph.cljs`
- `deps/cli/src/logseq/cli/util.cljs`
- `src/electron/electron/utils.cljs`
- `src/electron/electron/db.cljs`
- `src/electron/electron/handler.cljs`
- `src/main/frontend/config.cljs`
- related tests under `src/test/`
## Acceptance criteria
This plan would be complete when:
- one shared graph-dir encoding contract is identified as authoritative
- all affected subsystems and files are enumerated
- compatibility strategy for legacy graph directories is documented
- a concrete test matrix for special-character graph names is defined
- the plan preserves current user-facing graph-name semantics while aligning on-disk graph-dir behavior