Major updates to text-slicer plugin

* In the interests of performance and expressiveness, switched to using a Sax parser instead of a DOM implementation. * Use extensible declarative rules to control the slicing process * Added new optional set of rules for slicing by heading, where the paragraphs underneath a heading are packed into the same tiddler as the heading * Added a modal dialogue for specifying parameters when slicing in the browser
2026-05-03 10:28:07 +00:00 · 2017-12-14 14:16:54 +00:00
parent f128650c6e
commit e344c38349
39 changed files with 2943 additions and 713 deletions
--- a/plugins/tiddlywiki/text-slicer/docs/docs-exporters.tid
+++ b/plugins/tiddlywiki/text-slicer/docs/docs-exporters.tid
@@ -0,0 +1,7 @@
+title: $:/plugins/tiddlywiki/text-slicer/docs/exporters
+tags: $:/plugins/tiddlywiki/text-slicer/docs
+caption: Exporters
+
+Documents can be saved under Node.js, or previewed in the browser.
+
+[TBD]
--- a/plugins/tiddlywiki/text-slicer/docs/docs-internals.tid
+++ b/plugins/tiddlywiki/text-slicer/docs/docs-internals.tid
@@ -0,0 +1,97 @@
+title: $:/plugins/tiddlywiki/text-slicer/docs/internals
+tags: $:/plugins/tiddlywiki/text-slicer/docs
+caption: Internals
+
+! Introduction
+
+The slicing process is performed by a simple automaton that scans the document and applies simple declarative rules to yield a collection of tiddlers.
+
+The automaton processes the incoming XML document starting with the root element and then recursively visits each child node and their children. Actions are triggered as each component of the document is encountered:
+
+* Opening tags of elements
+* Closing tags of elements
+* Text nodes
+
+Components are matched against the current set of rules to determine what actions should be performed. They can include a combination of:
+
+* Starting a new tiddler with specified fields
+* Rendering the markup for the current tag into the current tiddler
+* Appending the content of the current text node to the current tiddler
+* Threading tiddlers to their parents using a combination of the `list` and `tags` fields
+
+! Slicing State Data
+
+As the automaton performs its scan, it maintains the following state information:
+
+* ''chunks'' - an array of tiddlers without titles, addressed by their numeric index. The title field is reused to hold the plain text of the chunk that is later used to generate the final title for the tiddler
+* ''currentChunk'' - the numeric index of the chunk currently being filled, or `null` if there is no current chunk
+* ''parentStack'' - a stack of parent chunks stored as `{chunk: <chunk-index>, actions: <actions>}`
+
+At the start, the special document chunk is created and pushed onto the stack of parent chunks
+
+! Slicing Rules
+
+Slicing rules are maintained in tiddlers tagged `$:/tags/text-slicer/slicer-rules` with the following fields:
+
+* ''title'' - title of the tiddler containing the listof rules
+* ''name'' - short, human readable name for the set of rules
+* ''inherits-from'' - (optional) the ''name'' field of another set of rules that should be inherited as a base
+* ''text'' - JSON data as described below
+
+The JSON data is an array of rules, each of which is an object with the following fields:
+
+* ''selector'' - a selector string identifying the components to be matched by this rule
+* ''actions'' - an object describing the actions to be performed when this selector matches a tag
+
+!! Selectors
+
+The selector format is a simplified form of CSS selectors. They are specified as follows:
+
+* A ''selector'' is a list of one or more ''match expressions'' separated by commas. The rule is triggered if any of the match expressions produce a positive match
+* A ''match expression'' is a list of one or element ''tag names'' separated by spaces. The rule is triggered if the final tag name in the list matches the tag of the current element, and all of the preceding tags in the expression exist as ancestors of the current element in the specified order (but not necessarily as immediate children of one another)
+* A ''tag name'' is the textual name of an element
+* Tag names in match expressions may optionally be separated by a `>` sign surrounded by spaces to impose the requirement that the left hand element be the immediate parent of the right hand element
+
+!!! Example Selectors
+
+This XML document will be used to illustrate some examples:
+
+```
+<a>
+  <b>
+    <d>one</d>
+  </b>
+  <c>
+    <d>two</d>
+    <e>
+      three
+      <e>
+        four
+      </e>
+    </e>
+  </c>
+</a>
+
+```
+
+|!Selector |!Matches |
+|b |Matches the single `<b>` element |
+|d |Matches both of the two `<d>` elements |
+|c,d |Matches the `<c>` element and both of the two `<d>` elements |
+|c d |Matches the second of the two `<d>` elements |
+|a d |Matches both of the two `<d>` elements |
+|a > d |Doesn't match anything |
+|e |Matches both of the two `<e>` elements |
+|c > e |Matches the outermost of the two `<e>` elements |
+|e > e |Matches the innermost of the two `<e>` elements |
+
+!! Actions
+
+The ''action'' property of a slicer rule is an object that can have any of the following optional fields:
+
+* ''startNewChunk'' - causes a new chunk to be started on encountering an opening tag. The value is an object containing the fields to be assigned to the new chunk
+* ''isParent'' - causes the new chunk to be marked as a child of the current chunk (boolean flag; only applies if ''startNewChunk'' is set)
+* ''headingLevel'' - arrange heading parents according to level (numerical index; only applies if ''startNewChunk'' and ''isParent'' are set)
+* ''dontRenderTag'' - disables the default rendering of opening and closing tags to the current chunk. By default the tags are rendered as XML tags, but this can be overridden via ''markup'' (boolean; defaults to ''false'')
+* ''isImage'' - identifies an element as representing an HTML image element, with special processing for the ''src'' attribute
+* ''markup'' - optional object with either or both of `{wiki: {prefix: <str>,suffix: <str>}}` and `{html: {prefix: <str>,suffix: <str>}}` allowing the rendered tags to be customised
--- a/plugins/tiddlywiki/text-slicer/docs/docs-model.tid
+++ b/plugins/tiddlywiki/text-slicer/docs/docs-model.tid
@@ -0,0 +1,133 @@
+title: $:/plugins/tiddlywiki/text-slicer/docs/model
+tags: $:/plugins/tiddlywiki/text-slicer/docs
+caption: Document Model
+
+Individual tiddlers are created for each heading, paragraph and list item. They are linked together into a hierarchical outline using lists.
+
+For example, consider a tiddler titled ''Example'' containing this simple text:
+
+<<<
+! This is a heading
+
+This is a paragraph.
+
+* And the first list item
+* Second list item
+<<<
+
+It will be sliced up into:
+
+* a tiddler for the overall document
+** a tiddler for the heading
+*** a tiddler for the paragraph
+*** a tiddler for the list
+**** and a tiddler for each list item
+
+These tiddlers are bound together using lists: the parent tiddler has a ''list'' field that lists each child in the correct order.
+
+!! Slicing Process
+
+Slicing generates the following component tiddlers.
+
+Tiddler titles are generated automatically in most cases (but can subsequently be changed manually). The automatically generated title is made up of concatenating the following elements:
+
+* root text (e.g. ''para'')
+* a dash ''-''
+* the first few words of the text of the item (up to 40 characters), separated with dashes ''-''
+* if necessary, a dash ''-'' and a numerical index to make the title unique
+
+For example, ''para-how-to-use-pentagonal-tiles 23''.
+
+Any CSS classes used in the original document are assigned as tags to the associated tiddlers.
+
+!!! Document
+
+The document itself is represented by a tiddler with the following fields:
+
+* ''toc-type'': the text "document"
+* ''title'': the text ''"Sliced up "'' plus the title of the tiddler that was sliced
+* ''text'': Available for comments about the document
+* ''list'': ordered list of tiddlers making up the root level of this document
+
+!!! Headings
+
+Tiddlers representing headings have the following fields:
+
+* ''toc-type'': the text "heading"
+* ''toc-heading-level'': the heading level "h1", "h2", "h3" etc.
+* ''title'': an automatically generated unique title
+* ''text'': the text of the heading
+* ''list'': ordered list of tiddlers tagged with this heading (i.e. the child headings, paragraphs and lists displayed under this heading)
+** In addition, any CSS classes found in the HTML are converted into tags
+
+!!! Paragraphs
+
+Tiddlers representing paragraphs have the following fields:
+
+* ''toc-type'': the text "paragraph"
+* ''title'': an automatically generated unique title
+* ''text'': the text of the paragraph
+* ''tags'': any CSS classes found in the HTML are converted into tags
+
+!!! Lists
+
+Lists are represented by several tiddlers: one for the list itself, and one for each item in the list.
+
+The tiddler representing the list itself has the following fields:
+
+* ''toc-type'': the text "list"
+* ''toc-list-type'': the text "ul" or "ol"
+* ''toc-list-filter'': the default filter used to generate the titles of the list items
+* ''title'': an automatically generated unique title
+* ''list'': ordered list of titles of tiddlers representing the items in this list
+* ''tags'': any CSS classes found in the HTML are converted into tags
+
+The tiddlers representing items within the list have the following fields:
+
+* ''toc-type'': the text "item"
+* ''title'': an automatically generated unique title
+* ''text'': the text of the list item
+* ''tags'': any CSS classes found in the HTML are converted into tags
+
+!!! Definition lists
+
+Definition lists are represented by several tiddlers: one for the definition list itself, and one for each term and definition in the list.
+
+The tiddler representing the definition list itself has the following fields:
+
+* ''toc-type'': the text "def-list"
+* ''toc-list-filter'': the default filter used to generate the titles of the definition list items
+* ''title'': an automatically generated unique title
+* ''list'': ordered list of titles of tiddlers representing the items (terms and/or definition) in the definition list
+* ''tags'': any CSS classes found in the HTML are converted into tags
+
+The tiddlers representing terms within the definition list have the following fields:
+
+* ''toc-type'': the text "term"
+* ''title'': an automatically generated unique title
+* ''text'': the text of the definition list term
+* ''tags'': any CSS classes found in the HTML are converted into tags
+
+The tiddlers representing definitions within the definition list have the following fields:
+
+* ''toc-type'': the text "definition"
+* ''title'': an automatically generated unique title
+* ''text'' : the text of the definition list definition
+* ''tags'': any CSS classes found in the HTML are converted into tags
+
+!!! Images
+
+Tiddlers representing images have the following fields:
+
+* ''toc-type'': the text "image"
+* ''title'': an automatically generated unique title
+* ''type'': appropriate content type for the image (eg "image/jpeg")
+
+!!! Notes
+
+Notes are available during editing but hidden for static renderings. The slicing mechanism does not generate notes; they can only be subsequently added manually. Tiddlers representing notes have the following fields:
+
+* ''toc-type'': the text "note"
+* ''title'': an automatically generated unique title
+* ''text'': the text of the note
+* ''tags'': any CSS classes found in the HTML are converted into tags
--- a/plugins/tiddlywiki/text-slicer/docs/docs-preview.tid
+++ b/plugins/tiddlywiki/text-slicer/docs/docs-preview.tid
@@ -0,0 +1,14 @@
+title: $:/plugins/tiddlywiki/text-slicer/docs/preview
+tags: $:/plugins/tiddlywiki/text-slicer/docs
+caption: Preview
+
+The document preview column appears at the left side of the screen. The content of headings can be collapsed and expanded to help navigation. Clicking on a tiddler opens the corresponding tiddler in the main story river.
+
+Clicking ''Show toolbar'' causes each tiddler to be preceded by a toolbar showing the underlying title. It can be edited directly to rename the tiddler. References to the tiddler in the ''tags'' and ''list'' are automatically updated to reflect the change, but note that links to the tiddler will not be automatically changed.
+
+The following theme tweaks should be applied to enable the preview column:
+
+* Set [[story left position|$:/themes/tiddlywiki/vanilla/metrics/storyleft]] to ''400px'' (or more)
+* It is recommended to also set the [[sidebar layout|$:/themes/tiddlywiki/vanilla/options/sidebarlayout]] to ''fluid-fixed''.
+
+To preview the entire document in a separate window, locate it in the preview column and click the button labelled "View document". The document will open in plain text in a new window. The window will be automatically updated as you work on the document.
--- a/plugins/tiddlywiki/text-slicer/docs/docs-usage.tid
+++ b/plugins/tiddlywiki/text-slicer/docs/docs-usage.tid
@@ -0,0 +1,19 @@
+title: $:/plugins/tiddlywiki/text-slicer/docs/usage
+tags: $:/plugins/tiddlywiki/text-slicer/docs
+caption: Usage
+
+The tool can slice any tiddler that can be rendered as HTML, including both WikiText and HTML itself.
+
+Documents created with Microsoft Word will need to be first converted to HTML. The library [[mammoth.js|https://github.com/mwilliamson/mammoth.js]] is recommended for this purpose.
+
+!! Browser
+
+In the browser, you can slice a monolithic document tiddler using the slicer toolbar button.
+
+!! Node.js
+
+The `--slice` command allows a tiddler to be sliced under Node.js:
+
+```
+tiddlywiki mywiki --slice SourceDocument --build index
+```
--- a/plugins/tiddlywiki/text-slicer/docs/docs.tid
+++ b/plugins/tiddlywiki/text-slicer/docs/docs.tid
@@ -0,0 +1,19 @@
+title: $:/plugins/tiddlywiki/text-slicer/docs
+list: $:/plugins/tiddlywiki/text-slicer/docs/usage $:/plugins/tiddlywiki/text-slicer/docs/preview $:/plugins/tiddlywiki/text-slicer/docs/model $:/plugins/tiddlywiki/text-slicer/docs/exporters $:/plugins/tiddlywiki/text-slicer/docs/internals
+
+! Introduction
+
+This plugin contains tools to help work with documents that are structured as a hierarchical outline of tiddlers.  The structural relationships within the document are expressed through the `list` and `tags` fields: for example, headings have a list specifying the chunks of content to be shown under the heading.
+
+The major components within the text slicer plugin include:
+
+* ''the slicer'', a tool that slices up an existing monolithic document according to the headings, lists and paragraphs. It is available as a toolbar button for the browser, or as a command for use under Node.js
+* ''document preview column'', a new sidebar on the left that shows the full text of any documents in the wiki and allows individual tiddlers to be opened with a click
+* ''templates'' for previewing and exporting the individual documents as HTML files
+
+Minor components include:
+
+* a new `list-children` filter that returns all the descendents listed in the `list` field of the selected tiddlers
+* a new canned filter for [[advanced search|$:/AdvancedSearch]] that lists orphans tiddlers that are not part of any document
+
+<<tabs "[all[tiddlers+shadows]tag[$:/plugins/tiddlywiki/text-slicer/docs]!has[draft.of]]" "$:/plugins/tiddlywiki/text-slicer/docs/usage">>
--- a/plugins/tiddlywiki/text-slicer/docs/readme.tid
+++ b/plugins/tiddlywiki/text-slicer/docs/readme.tid
@@ -0,0 +1,6 @@
+title: $:/plugins/tiddlywiki/text-slicer/readme
+
+This plugin contains tools to help slice up long texts into individual tiddlers. It currently works directly with XHTML documents and with Microsoft Word compatible DOCX documents via conversion to HTML.
+
+It is an expression of the philosophy of TiddlyWiki: that text is easier to re-use and work with if it is sliced up into separate chunks that can be independently manipulated, and then woven back together to make up stories and narratives for publication.
+