≅ SiSU for documents - structuring, publishing in multiple formats & search

A short description

SiSU is a lightweight markup based, document structuring, publishing and search tool for document collections. It is command line oriented and generates static content that is also made searchable at an object level through an sql database

SiSU markup helps define text objects which are numbered sequentially by the program for object citation. Breaking the document into objects provides interesting possibilities. These object numbers provide the possibility of citing/locating text precisely across different document formats and different languages (assuming the document has been translated). For search it also makes it possible to identify precisely where within in each document search criteria is met in the form of an index. Additionally this frees the possibility to represent the document in the manner considered most suitable to a specific document format (whilst retaining its structural and citation integrity).

SiSU project source

SiSU projects repo (git)
- https://git.sisudoc.org

SiSU (scribe): document publishing (multiple formats + search)
- https://git.sisudoc.org/sisu

SiSU markup samples in document pods for sisu (scribe)
- https://git.sisudoc.org/sisu-markup

SiSU Spine markup sample output

To give an idea of how this works here is a small collection of documents marked up for and generated by the software. The curation of topics for a collection of specialized related documents would benefit from a consistently applied bespoke ontology or thesaurus.
The documents presented are documents that have been released under various creative commons licences, in the public domain, or the author's work, with the exception of one that is under GPL and the old abandoned Debian live-manual

≅ Authors (software curated from provided document header metadata)
- https://sisudoc.org/spine/authors.html

≅ Topics (software curated from provided document header metadata)
- https://sisudoc.org/spine/topics.html

SiSU Spine search

※ Search (granular search of text objects)
- https://sisudoc.org/spine_search

SiSU description

Here is a description that has been used for the original sisu (scribe):

With minimal preparation of a plain-text (UTF-8) file, using sisu markup syntax in your text editor of choice, SiSU can generate various document formats, most of which share a common object numbering system for locating content, including plain text, HTML, XHTML, XML, EPUB, OpenDocument text (ODF:ODT), LaTeX, PDF files, and populate an SQL database with objects (roughly paragraph-sized chunks) so searches may be performed and matches returned with that degree of granularity. Think of being able to finely match text in documents, using common object numbers, across different output formats and across languages if you have translations of the same document. For search, your criteria is met by these documents at these locations within each document (equally relevant across different output formats and languages). To be clear (if obvious) page numbers provide none of this functionality. Object numbering is particularly suitable for "published" works (finalized texts as opposed to works that are frequently changed or updated) for which it provides a fixed means of reference of content. Document outputs can also share provided semantic meta-data.


SiSU is less about document layout than it is about finding a way using little markup to construct an abstract representation of a document that makes it possible to produce multiple representations of it which may be rather different from each other and used for different purposes, whether layout and publishing, or search of content i.e. to be able to take advantage from this minimal preparation starting point of some of the strengths of rather different established ways of representing documents for different purposes, whether for search (relational database, or indexed flat files generated for that purpose whether of complete documents, or say of files made up of objects), online viewing (e.g. html, xml, pdf), or paper publication (e.g. pdf via latex)...
The solution arrived at is extracting structural information about the document (about headings within the document) and tracking objects which can be reconstituted as the same documents with relevant object identification numbers so text can be referenced across different output formats and presentations.

SiSU Spine

SiSU Spine is the new generator for documents prepared in sisu markup, written in D as opposed to the original sisu which was first shared in Ruby.

Spine code has not as yet been made publicly available.

As compared with the original sisu generator sisu spine:

- Spine uses the same document markup for the document body, but uses yaml for document headers (which contains document metadata and configuration details), the original sisu has a bespoke markup for headers.

- Spine (written in D) is considerably faster at generating native output than sisu (written in Ruby), on last test at least 60 times faster (what took 1 minute takes 1 second; 1 hour a minute :-) (admittedly some time ago, ruby has been getting faster, hopefully this is not over over promising).

- Spine produces fewer document outputs types than sisu (html, epub, (odt, latex) and populates sql db for search)

- (where both produce the same output type, generally) Spine produces more up to date output format representations.

ralph.amissah www since 1993 ;-)

Some external links of interest



[ D - (dlang) general purpose, multi-paradigm, fast C like programming language ] [ dub - package registry ] [ community discussion (mail list frontend) ]

[ Ruby ] [ Gems ]
[ Crystal ]


[ Sqlite - an sql database engine ]
[ PostgreSQL ]


[ HTML ] [ multipage current spec ] [ dom current spec ]
[ Epub ]
[ css - cascading style sheets ]

[ OpenDocument Format ]

[ LaTeX ]

[ po4a - maintain translations ]

Operating System Distributions

[ NixOS - linux based operating system built on the Nix declarative, reproducible and reliable, build system ] [ nixpkgs (packages @ github) ] [ package search ] [ community discussion (discourse) ]
Gnu [ Guix ] [ packages ]

[ Debian - the universal operating system distribution ]
[ Devuan ]

[ Arch Linux ] [ Arch Wiki ]

Extraneous (external) links of personal interest



[ zsh ]
[ starship - customizable cross-shell prompt ]


[ tilix ] [ alacritty ]

Terminal Multiplexer

[ tmux (github) ] [ screen ]

Window Manager

[ i3wm ] [ sway ]

Text Editors

Gnu Emacs [ Doom Emacs (github) ] [ Org-Mode - your life in plain text & literate programming ] [ Evil-Mode ]

[ Vim ] [ NeoVim ]

Source Control Manager

[ Git ]


[ vieb ] [ vimb ]
[ brave ]


[ DuckDuckGo ] [ YubNub ]

Software Archives

[ Software Heritage - the universal software archive ]

ralph.amissah www since 1993 ;-)