From 50d45c6deb0afd2e4222d2e33a45487a9d1fa676 Mon Sep 17 00:00:00 2001 From: Ralph Amissah Date: Sun, 23 Sep 2007 05:16:21 +0100 Subject: primarily todo with sisu documentation, changelog reproduced below: * start documenting sisu using sisu * sisu markup source files in data/doc/sisu/sisu_markup_samples/sisu_manual/ /usr/share/doc/sisu/sisu_markup_samples/sisu_manual/ * default output [sisu -3] in data/doc/manuals_generated/sisu_manual/ /usr/share/doc/manuals_generated/sisu_manual/ (adds substantially to the size of sisu package!) * help related edits * manpage, work on ability to generate manpages, improved * param, exclude footnote mark count when occurs within code block * plaintext changes made * shared_txt, line wrap visited * file:// link option introduced (in addition to existing https?:// and ftp://) a bit arbitrarily, diff here, [double check changes in sysenv and hub] * minor adjustments * html url match refinement * css added tiny_center * plaintext * endnotes fix * footnote adjustment to make more easily distinguishable from substantive text * flag -a only [flags -A -e -E dropped] controlled by modifiers --unix/msdos --footnote/endnote * defaults, homepage * renamed homepage (instead of index) implications for modifying skins, which need likewise to have any homepage entry renamed * added link to sisu_manual in homepage * css the css for the default homepage is renamed homepage.css (instead of index.css) [consider removing this and relying on html.css] * ruby version < ruby1.9 * place stop on installation and working with for now [ruby String.strip broken in ruby 1.9.0 (2007-09-10 patchlevel 0) [i486-linux], 2007-09-18:38/2] * debian/control restrict use to ruby > 1.8.4 and ruby < 1.9 * debian * debian/control restrict use to ruby > 1.8.4 and ruby < 1.9 * sisu-doc new sub-package for sisu documentation debian/control and sisu-doc.install --- .../sisu_manual/man/sisu_introduction.1 | 504 +++++++++++++++++++++ 1 file changed, 504 insertions(+) create mode 100644 data/doc/manuals_generated/sisu_manual/man/sisu_introduction.1 (limited to 'data/doc/manuals_generated/sisu_manual/man/sisu_introduction.1') diff --git a/data/doc/manuals_generated/sisu_manual/man/sisu_introduction.1 b/data/doc/manuals_generated/sisu_manual/man/sisu_introduction.1 new file mode 100644 index 00000000..22e04ea0 --- /dev/null +++ b/data/doc/manuals_generated/sisu_manual/man/sisu_introduction.1 @@ -0,0 +1,504 @@ +.TH "sisu_introduction" "1" "2007-09-16" "0.59.0" "SiSU" +.SH +SISU \- COMMANDS \ [0.58], +RALPH AMISSAH +.BR + +.SH +WHAT IS SISU? +.BR + +.SH +DESCRIPTION +.BR + +.SH +1. INTRODUCTION \- WHAT IS SISU? +.BR + +.BR +.B SiSU +is a system for document markup, publishing (in multiple open standard +formats) and search + +.BR +.B SiSU +[^1] is a[^2] framework for document structuring, publishing and search, +comprising of (a) a lightweight document structure and presentation markup +syntax and (b) an accompanying engine for generating standard document format +outputs from documents prepared in sisu markup syntax, which is able to produce +multiple standard outputs that (can) share a common numbering system for the +citation of text within a document. + +.BR +.B SiSU +is developed under an open source, software libre license (GPL3). It has been +developed in the context of coping with large document sets with evolving +markup related technologies, for which you want multiple output formats, a +common mechanism for cross\-output\-format citation, and search. + +.BR +.B SiSU +both defines a markup syntax and provides an engine that produces open +standards format outputs from documents prepared with +.B SiSU +markup. From a single lightly prepared document sisu custom builds several +standard output formats which share a common (text object) numbering system for +citation of content within a document (that also has implications for search). +The sisu engine works with an abstraction of the document\'s structure and +content from which it is possible to generate different forms of representation +of the document. Significantly +.B SiSU +markup is more sparse than html and outputs which include html, LaTeX, +landscape and portrait pdfs, Open Document Format (ODF), all of which can be +added to and updated. +.B SiSU +is also able to populate SQL type databases at an object level, which means +that searches can be made with that degree of granularity. Results of objects +(primarily paragraphs and headings) can be viewed directly in the database, or +just the object numbers shown \- your search criteria is met in these documents +and at these locations within each document. + +.BR +Source document preparation and output generation is a two step process: (i) +document source is prepared, that is, marked up in sisu markup syntax and (ii) +the desired output subsequently generated by running the sisu engine against +document source. Output representations if updated (in the sisu engine) can be +generated by re\-running the engine against the prepared source. Using +.B SiSU +markup applied to a document, +.B SiSU +custom builds various standard open output formats including plain text, +HTML, XHTML, XML, OpenDocument, LaTeX or PDF files, and populate an SQL +database with objects[^3] (equating generally to paragraph\-sized chunks) so +searches may be performed and matches returned with that degree of granularity +( e.g. your search criteria is met by these documents and at these locations +within each document). Document output formats share a common object numbering +system for locating content. This is particularly suitable for \"published\" +works (finalized texts as opposed to works that are frequently changed or +updated) for which it provides a fixed means of reference of content. + +.BR +In preparing a +.B SiSU +document you optionally provide semantic information related to the document +in a document header, and in marking up the substantive text provide +information on the structure of the document, primarily indicating heading +levels and footnotes. You also provide information on basic text attributes +where used. The rest is automatic, sisu from this information custom builds[^4] +the different forms of output requested. + +.BR +.B SiSU +works with an abstraction of the document based on its structure which is +comprised of its frame[^5] and the objects[^6] it contains, which enables +.B SiSU +to represent the document in many different ways, and to take advantage of +the strengths of different ways of presenting documents. The objects are +numbered, and these numbers can be used to provide a common base for citing +material within a document across the different output format types. This is +significant as page numbers are not suited to the digital age, in web +publishing, changing a browser\'s default font or using a different browser +means that text appears on different pages; and in publishing in different +formats, html, landscape and portrait pdf etc. again page numbers are of no use +to cite text in a manner that is relevant against the different output types. +Dealing with documents at an object level together with object numbering also +has implications for search. + +.BR +One of the challenges of maintaining documents is to keep them in a format that +would allow users to use them without depending on a proprietary software +popular at the time. Consider the ease of dealing with legacy proprietary +formats today and what guarantee you have that old proprietary formats will +remain (or can be read without proprietary software/equipment) in 15 years +time, or the way the way in which html has evolved over its relatively short +span of existence. +.B SiSU +provides the flexibility of outputing documents in multiple non\-proprietary +open formats including html, pdf[^7] and the ISO standard ODF.[^8] Whilst +.B SiSU +relies on software, the markup is uncomplicated and minimalistic which +guarantees that future engines can be written to run against it. It is also +easily converted to other formats, which means documents prepared in +.B SiSU +can be migrated to other document formats. Further security is provided by +the fact that the software itself, +.B SiSU +is available under GPL3 a licence that guarantees that the source code will +always be open, and free as in libre which means that that code base can be +used updated and further developed as required under the terms of its license. +Another challenge is to keep up with a moving target. +.B SiSU +permits new forms of output to be added as they become important, (Open +Document Format text was added in 2006), and existing output to be updated +(html has evolved and the related module has been updated repeatedly over the +years, presumably when the World Wide Web Consortium (w3c) finalises html 5 +which is currently under development, the html module will again be updated +allowing all existing documents to be regenerated as html 5). + +.BR +The document formats are written to the file\-system and available for indexing +by independent indexing tools, whether off the web like Google and Yahoo or on +the site like Lucene and Hyperestraier. + +.BR +.B SiSU +also provides other features such as concordance files and document content +certificates, and the working against an abstraction of document structure has +further possibilities for the research and development of other document +representations, the availability of objects is useful for example for topic +maps and the commercial law thesaurus by Vikki Rogers and Al Krtizer, together +with the flexibility of +.B SiSU +offers great possibilities. + +.BR +.B SiSU +is primarily for published works, which can take advantage of the citation +system to reliably reference its documents. +.B SiSU +works well in a complementary manner with such collaborative technologies as +Wikis, which can take advantage of and be used to discuss the substance of +content prepared in +.B SiSU +. + +.BR + + +.SH +2. HOW DOES SISU WORK? +.BR + +.BR +.B SiSU +markup is fairly minimalistic, it consists of: a (largely optional) document +header, made up of information about the document (such as when it was +published, who authored it, and granting what rights) and any processing +instructions; and markup within the substantive text of the document, which is +related to document structure and typeface. +.B SiSU +must be able to discern the structure of a document, (text headings and their +levels in relation to each other), either from information provided in the +document header or from markup within the text (or from a combination of both). +Processing is done against an abstraction of the document comprising of +information on the document\'s structure and its objects,[2] which the program +serializes (providing the object numbers) and which are assigned hash sum +values based on their content. This abstraction of information about document +structure, objects, (and hash sums), provides considerable flexibility in +representing documents different ways and for different purposes (e.g. search, +document layout, publishing, content certification, concordance etc.), and +makes it possible to take advantage of some of the strengths of established +ways of representing documents, (or indeed to create new ones). + +.SH +3. SUMMARY OF FEATURES +.BR + +.BR +* sparse/minimal markup (clean utf\-8 source texts). Documents are prepared in +a single UTF\-8 file using a minimalistic mnemonic syntax. Typical literature, +documents like \"War and Peace\" require almost no markup, and most of the +headers are optional. + +.BR +* markup is easily readable/parsable by the human eye, (basic markup is simpler +and more sparse than the most basic HTML), \ [this \ may \ also \ be \ +converted \ to \ XML \ representations \ of \ the \ same \ input/source \ +document]. + +.BR +* markup defines document structure (this may be done once in a header +pattern\-match description, or for heading levels individually); basic text +attributes (bold, italics, underscore, strike\-through etc.) as required; and +semantic information related to the document (header information, extended +beyond the Dublin core and easily further extended as required); the headers +may also contain processing instructions. +.B SiSU +markup is primarily an abstraction of document structure and document +metadata to permit taking advantage of the basic strengths of existing +alternative practical standard ways of representing documents \ [be \ that \ +browser \ viewing, \ paper \ publication, \ sql \ search \ etc.] (html, xml, +odf, latex, pdf, sql) + +.BR +* for output produces reasonably elegant output of established industry and +institutionally accepted open standard formats.[3] takes advantage of the +different strengths of various standard formats for representing documents, +amongst the output formats currently supported are: + +.BR + * html \- both as a single scrollable text and a segmented document + +.BR + * xhtml + +.BR + * XML \- both in sax and dom style xml structures for further development as + required + +.BR + * ODF \- open document format, the iso standard for document storage + +.BR + * LaTeX \- used to generate pdf + +.BR + * pdf (via LaTeX) + +.BR + * sql \- population of an sql database, (at the same object level that is + used to cite text within a document) + +.BR +Also produces: concordance files; document content certificates (md5 or sha256 +digests of headings, paragraphs, images etc.) and html manifests (and sitemaps +of content). (b) takes advantage of the strengths implicit in these very +different output types, (e.g. PDFs produced using typesetting of LaTeX, +databases populated with documents at an individual object/paragraph level, +making possible granular search (and related possibilities)) + +.BR +* ensuring content can be cited in a meaningful way regardless of selected +output format. Online publishing (and publishing in multiple document formats) +lacks a useful way of citing text internally within documents (important to +academics generally and to lawyers) as page numbers are meaningless across +browsers and formats. sisu seeks to provide a common way of pinpoint the text +within a document, (which can be utilized for citation and by search engines). +The outputs share a common numbering system that is meaningful (to man and +machine) across all digital outputs whether paper, screen, or database +oriented, (pdf, HTML, xml, sqlite, postgresql), this numbering system can be +used to reference content. + +.BR +* Granular search within documents. SQL databases are populated at an object +level (roughly headings, paragraphs, verse, tables) and become searchable with +that degree of granularity, the output information provides the +object/paragraph numbers which are relevant across all generated outputs; it is +also possible to look at just the matching paragraphs of the documents in the +database; \ [output \ indexing \ also \ work \ well \ with \ search \ indexing +\ tools \ like \ hyperestraier]. + +.BR +* long term maintainability of document collections in a world of changing +formats, having a very sparsely marked\-up source document base. there is a +considerable degree of future\-proofing, output representations are +\"upgradeable\", and new document formats may be added. e.g. addition of odf +(open document text) module in 2006 and in future html5 output sometime in +future, without modification of existing prepared texts + +.BR +* SQL search aside, documents are generated as required and static once +generated. + +.BR +* documents produced are static files, and may be batch processed, this needs +to be done only once but may be repeated for various reasons as desired +(updated content, addition of new output formats, updated technology document +presentations/representations) + +.BR +* document source (plaintext utf\-8) if shared on the net may be used as input +and processed locally to produce the different document outputs + +.BR +* document source may be bundled together (automatically) with associated +documents (multiple language versions or master document with inclusions) and +images and sent as a zip file called a sisupod, if shared on the net these too +may be processed locally to produce the desired document outputs + +.BR +* generated document outputs may automatically be posted to remote sites. + +.BR +* for basic document generation, the only software dependency is +.B Ruby +, and a few standard Unix tools (this covers plaintext, HTML, XML, ODF, +LaTeX). To use a database you of course need that, and to convert the LaTeX +generated to pdf, a latex processor like tetex or texlive. + +.BR +* as a developers tool it is flexible and extensible + +.BR +Syntax highlighting for +.B SiSU +markup is available for a number of text editors. + +.BR +.B SiSU +is less about document layout than about finding a way with little markup to +be able to construct an abstract representation of a document that makes it +possible to produce multiple representations of it which may be rather +different from each other and used for different purposes, whether layout and +publishing, or search of content + +.BR +i.e. to be able to take advantage from this minimal preparation starting point +of some of the strengths of rather different established ways of representing +documents for different purposes, whether for search (relational database, or +indexed flat files generated for that purpose whether of complete documents, or +say of files made up of objects), online viewing (e.g. html, xml, pdf), or +paper publication (e.g. pdf)... + +.BR +the solution arrived at is by extracting structural information about the +document (about headings within the document) and by tracking objects (which +are serialized and also given hash values) in the manner described. It makes +possible representations that are quite different from those offered at +present. For example objects could be saved individually and identified by +their hashes, with an index of how the objects relate to each other to form a +document. + +.SH +DOCUMENT INFORMATION (METADATA) +.BR + +.SH +METADATA +.BR + +.BR +Document Manifest @ + + +.BR +.B Dublin Core +(DC) + +.BR +.I DC tags included with this document are provided here. + +.BR +DC Title: +.I SiSU \- Commands \ [0.58] + +.BR +DC Creator: +.I Ralph Amissah + +.BR +DC Rights: +.I Copyright (C) Ralph Amissah 2007, part of SiSU documentation, License GPL +3 + +.BR +DC Type: +.I information + +.BR +DC Date created: +.I 2002\-08\-28 + +.BR +DC Date issued: +.I 2002\-08\-28 + +.BR +DC Date available: +.I 2002\-08\-28 + +.BR +DC Date modified: +.I 2007\-09\-16 + +.BR +DC Date: +.I 2007\-09\-16 + +.BR +.B Version Information + +.BR +Sourcefile: +.I sisu_introduction.sst + +.BR +Filetype: +.I SiSU text 0.58 + +.BR +Sourcefile Digest, MD5(sisu_introduction.sst)= +.I b2a6da5bd22fa1eaa92a08d81f11d1c7 + +.BR +Skin_Digest: +MD5(/home/ralph/grotto/theatre/dbld/sisu\-dev/sisu/data/doc/sisu/sisu_markup_samples/sisu_manual/_sisu/skin/doc/skin_sisu_manual.rb)= +.I 20fc43cf3eb6590bc3399a1aef65c5a9 + +.BR +.B Generated + +.BR +Document (metaverse) last generated: +.I Sun Sep 23 04:13:42 +0100 2007 + +.BR +Generated by: +.I SiSU +.I 0.59.0 +of 2007w38/0 (2007\-09\-23) + +.BR +Ruby version: +.I ruby 1.8.6 (2007\-06\-07 patchlevel 36) \ [i486\-linux] + +.TP +.BI 1. +\" +.B SiSU +information Structuring Universe\" or \"Structured information, Serialized +Units\". + also chosen for the meaning of the Finnish term "sisu". +.TP +.BI 2. +Unix command line oriented +.TP +.BI 3. +objects include: headings, paragraphs, verse, tables, images, but not +footnotes/endnotes which are numbered separately and tied to the object from +which they are referenced. +.TP +.BI 4. +i.e. the html, pdf, odf outputs are each built individually and optimised for +that form of presentation, rather than for example the html being a saved +version of the odf, or the pdf being a saved version of the html. +.TP +.BI 5. +the different heading levels +.TP +.BI 6. +units of text, primarily paragraphs and headings, also any tables, poems, +code-blocks +.TP +.BI 7. +Specification submitted by Adobe to ISO to become a full open ISO +specification + +.TP +.BI 8. +ISO/IEC 26300:2006 + +.TP +Other versions of this document: +.TP +manifest: +.TP +html: +.TP +pdf: +.TP +pdf: +." .TP +." manpage: http://www.jus.uio.no/sisu/sisu_introduction/sisu_introduction.1 +.TP +at: +.TP +.TP +* Generated by: SiSU 0.59.0 of 2007w38/0 (2007-09-23) +.TP +* Ruby version: ruby 1.8.6 (2007-06-07 patchlevel 36) [i486-linux] +.TP +* Last Generated on: Sun Sep 23 04:13:49 +0100 2007 +.TP +* SiSU http://www.jus.uio.no/sisu -- cgit v1.2.3