diff options
Diffstat (limited to 'data/doc/manuals_generated/sisu_manual/man/sisu_search.1')
-rw-r--r-- | data/doc/manuals_generated/sisu_manual/man/sisu_search.1 | 639 |
1 files changed, 0 insertions, 639 deletions
diff --git a/data/doc/manuals_generated/sisu_manual/man/sisu_search.1 b/data/doc/manuals_generated/sisu_manual/man/sisu_search.1 deleted file mode 100644 index a052ee34..00000000 --- a/data/doc/manuals_generated/sisu_manual/man/sisu_search.1 +++ /dev/null @@ -1,639 +0,0 @@ -.TH "sisu_search" "1" "2007-09-16" "0.59.0" "SiSU - SiSU information Structuring Universe" -.SH -SISU \- SISU INFORMATION STRUCTURING UNIVERSE \- SEARCH \ [0.58], -RALPH AMISSAH -.BR - -.SH -SISU SEARCH -.BR - -.SH -1. SISU SEARCH \- INTRODUCTION -.BR - -.BR -.B SiSU -output can easily and conveniently be indexed by a number of standalone -indexing tools, such as Lucene, Hyperestraier. - -.BR -Because the document structure of sites created is clearly defined, and the -text object citation system is available hypothetically at least, for all forms -of output, it is possible to search the sql database, and either read results -from that database, or just as simply map the results to the html output, which -has richer text markup. - -.BR -In addition to this -.B SiSU -has the ability to populate a relational sql type database with documents at -an object level, with objects numbers that are shared across different output -types, which make them searchable with that degree of granularity. Basically, -your match criteria is met by these documents and at these locations within -each document, which can be viewed within the database directly or in various -output formats. - -.SH -2. SQL -.BR - -.SH -2.1 POPULATING SQL TYPE DATABASES - -.BR -.B SiSU -feeds sisu markupd documents into sql type databases PostgreSQL[^1] and/or -SQLite[^2] database together with information related to document structure. - -.BR -This is one of the more interesting output forms, as all the structural data of -the documents are retained (though can be ignored by the user of the database -should they so choose). All site texts/documents are (currently) streamed to -four tables: - -.BR - * one containing semantic (and other) headers, including, title, author, - subject, (the Dublin Core...); - -.BR - * another the substantive texts by individual \"paragraph\" (or object) \- - along with structural information, each paragraph being identifiable by its - paragraph number (if it has one which almost all of them do), and the - substantive text of each paragraph quite naturally being searchable (both in - formatted and clean text versions for searching); and - -.BR - * a third containing endnotes cross\-referenced back to the paragraph from - which they are referenced (both in formatted and clean text versions for - searching). - -.BR - * a fourth table with a one to one relation with the headers table contains - full text versions of output, eg. pdf, html, xml, and ascii. - -.BR -There is of course the possibility to add further structures. - -.BR -At this level -.B SiSU -loads a relational database with documents chunked into objects, their -smallest logical structurally constituent parts, as text objects, with their -object citation number and all other structural information needed to construct -the document. Text is stored (at this text object level) with and without -elementary markup tagging, the stripped version being so as to facilitate ease -of searching. - -.BR -Being able to search a relational database at an object level with the -.B SiSU -citation system is an effective way of locating content generated by -.B SiSU -. As individual text objects of a document stored (and indexed) together with -object numbers, and all versions of the document have the same numbering, -complex searches can be tailored to return just the locations of the search -results relevant for all available output formats, with live links to the -precise locations in the database or in html/xml documents; or, the structural -information provided makes it possible to search the full contents of the -database and have headings in which search content appears, or to search only -headings etc. (as the Dublin Core is incorporated it is easy to make use of -that as well). - -.SH -3. POSTGRESQL -.BR - -.SH -3.1 NAME - -.BR -.B SiSU -\- Structured information, Serialized Units \- a document publishing system, -postgresql dependency package - -.SH -3.2 DESCRIPTION - -.BR -Information related to using postgresql with sisu (and related to the -sisu_postgresql dependency package, which is a dummy package to install -dependencies needed for -.B SiSU -to populate a postgresql database, this being part of -.B SiSU -\- man sisu). - -.SH -3.3 SYNOPSIS - -.BR - sisu \-D \ [instruction] \ [filename/wildcard \ if \ required] - -.BR - sisu \-D \-\-pg \-\-[instruction] \ [filename/wildcard \ if \ required] - -.SH -3.4 COMMANDS - -.BR -Mappings to two databases are provided by default, postgresql and sqlite, the -same commands are used within sisu to construct and populate databases however -\-d (lowercase) denotes sqlite and \-D (uppercase) denotes postgresql, -alternatively \-\-sqlite or \-\-pgsql may be used - -.BR -.B \-D or \-\-pgsql -may be used interchangeably. - -.SH -3.4.1 CREATE AND DESTROY DATABASE - -.TP -.B \ \-\-pgsql \ \-\-createall -\ initial \ step, \ creates \ required \ relations \ (tables, \ indexes) \ in -\ existing \ (postgresql) \ database \ (a \ database \ should \ be \ created \ -manually \ and \ given \ the \ same \ name \ as \ working \ directory, \ as \ -requested) \ (rb.dbi) \ - -.TP -.B \ sisu \ \-D \ \-\-createdb -\ creates \ database \ where \ no \ database \ existed \ before \ - -.TP -.B \ sisu \ \-D \ \-\-create -\ creates \ database \ tables \ where \ no \ database \ tables \ existed \ -before \ - -.TP -.B \ sisu \ \-D \ \-\-Dropall -\ destroys \ database \ (including \ all \ its \ content)! \ kills \ data \ -and \ drops \ tables, \ indexes \ and \ database \ associated \ with \ a \ -given \ directory \ (and \ directories \ of \ the \ same \ name). \ - -.TP -.B \ sisu \ \-D \ \-\-recreate -\ destroys \ existing \ database \ and \ builds \ a \ new \ empty \ database -\ structure \ - -.SH -3.4.2 IMPORT AND REMOVE DOCUMENTS - -.TP -.B \ sisu \ \-D \ \-\-import \ \-v \ \ [filename/wildcard] -populates database with the contents of the file. Imports documents(s) -specified to a postgresql database (at an object level). - -.TP -.B \ sisu \ \-D \ \-\-update \ \-v \ \ [filename/wildcard] -updates file contents in database - -.TP -.B \ sisu \ \-D \ \-\-remove \ \-v \ \ [filename/wildcard] -removes specified document from postgresql database. - -.SH -4. SQLITE -.BR - -.SH -4.1 NAME - -.BR -.B SiSU -\- Structured information, Serialized Units \- a document publishing system. - -.SH -4.2 DESCRIPTION - -.BR -Information related to using sqlite with sisu (and related to the sisu_sqlite -dependency package, which is a dummy package to install dependencies needed for -.B SiSU -to populate an sqlite database, this being part of -.B SiSU -\- man sisu). - -.SH -4.3 SYNOPSIS - -.BR - sisu \-d \ [instruction] \ [filename/wildcard \ if \ required] - -.BR - sisu \-d \-\-(sqlite|pg) \-\-[instruction] \ [filename/wildcard \ if \ - required] - -.SH -4.4 COMMANDS - -.BR -Mappings to two databases are provided by default, postgresql and sqlite, the -same commands are used within sisu to construct and populate databases however -\-d (lowercase) denotes sqlite and \-D (uppercase) denotes postgresql, -alternatively \-\-sqlite or \-\-pgsql may be used - -.BR -.B \-d or \-\-sqlite -may be used interchangeably. - -.SH -4.4.1 CREATE AND DESTROY DATABASE - -.TP -.B \ \-\-sqlite \ \-\-createall -\ initial \ step, \ creates \ required \ relations \ (tables, \ indexes) \ in -\ existing \ (sqlite) \ database \ (a \ database \ should \ be \ created \ -manually \ and \ given \ the \ same \ name \ as \ working \ directory, \ as \ -requested) \ (rb.dbi) \ - -.TP -.B \ sisu \ \-d \ \-\-createdb -\ creates \ database \ where \ no \ database \ existed \ before \ - -.TP -.B \ sisu \ \-d \ \-\-create -\ creates \ database \ tables \ where \ no \ database \ tables \ existed \ -before \ - -.TP -.B \ sisu \ \-d \ \-\-dropall -\ destroys \ database \ (including \ all \ its \ content)! \ kills \ data \ -and \ drops \ tables, \ indexes \ and \ database \ associated \ with \ a \ -given \ directory \ (and \ directories \ of \ the \ same \ name). \ - -.TP -.B \ sisu \ \-d \ \-\-recreate -\ destroys \ existing \ database \ and \ builds \ a \ new \ empty \ database -\ structure \ - -.SH -4.4.2 IMPORT AND REMOVE DOCUMENTS - -.TP -.B \ sisu \ \-d \ \-\-import \ \-v \ \ [filename/wildcard] -populates database with the contents of the file. Imports documents(s) -specified to an sqlite database (at an object level). - -.TP -.B \ sisu \ \-d \ \-\-update \ \-v \ \ [filename/wildcard] -updates file contents in database - -.TP -.B \ sisu \ \-d \ \-\-remove \ \-v \ \ [filename/wildcard] -removes specified document from sqlite database. - -.SH -5. INTRODUCTION -.BR - -.SH -5.1 SEARCH \- DATABASE FRONTEND SAMPLE, UTILISING DATABASE AND SISU FEATURES, -INCLUDING OBJECT CITATION NUMBERING (BACKEND CURRENTLY POSTGRESQL) - -.BR -Sample search frontend <http://search.sisudoc.org> \ [^3] A small database and -sample query front\-end (search from) that makes use of the citation system, -.I object citation numbering -to demonstrates functionality.[^4] - -.BR -.B SiSU -can provide information on which documents are matched and at what locations -within each document the matches are found. These results are relevant across -all outputs using object citation numbering, which includes html, XML, LaTeX, -PDF and indeed the SQL database. You can then refer to one of the other outputs -or in the SQL database expand the text within the matched objects (paragraphs) -in the documents matched. - -.BR -Note you may set results either for documents matched and object number -locations within each matched document meeting the search criteria; or display -the names of the documents matched along with the objects (paragraphs) that -meet the search criteria.[^5] - -.TP -.B \ sisu \ \-F \ \-\-webserv\-webrick -\ builds \ a \ cgi \ web \ search \ frontend \ for \ the \ database \ created -\ - -.BR -The following is feedback on the setup on a machine provided by the help -command: - -.BR - sisu \-\-help sql - - -.nf - Postgresql - user: ralph - current db set: SiSU_sisu - port: 5432 - dbi connect: DBI:Pg:database=SiSU_sisu;port=5432 - sqlite - current db set: /home/ralph/sisu_www/sisu/sisu_sqlite.db - dbi connect DBI:SQLite:/home/ralph/sisu_www/sisu/sisu_sqlite.db -.fi - -.BR -Note on databases built - -.BR -By default, \ [unless \ otherwise \ specified] databases are built on a -directory basis, from collections of documents within that directory. The name -of the directory you choose to work from is used as the database name, i.e. if -you are working in a directory called /home/ralph/ebook the database SiSU_ebook -is used. \ [otherwise \ a \ manual \ mapping \ for \ the \ collection \ is \ -necessary] - -.SH -5.2 SEARCH FORM - -.TP -.B \ sisu \ \-F -\ generates \ a \ sample \ search \ form, \ which \ must \ be \ copied \ to \ -the \ web\-server \ cgi \ directory \ - -.TP -.B \ sisu \ \-F \ \-\-webserv\-webrick -\ generates \ a \ sample \ search \ form \ for \ use \ with \ the \ webrick \ -server, \ which \ must \ be \ copied \ to \ the \ web\-server \ cgi \ directory -\ - -.TP -.B \ sisu \ \-Fv -\ as \ above, \ and \ provides \ some \ information \ on \ setting \ up \ -hyperestraier \ - -.TP -.B \ sisu \ \-W -\ starts \ the \ webrick \ server \ which \ should \ be \ available \ -wherever \ sisu \ is \ properly \ installed \ - -.BR -The generated search form must be copied manually to the webserver directory as -instructed - -.SH -6. HYPERESTRAIER -.BR - -.BR -See the documentation for hyperestraier: - -.BR - <http://hyperestraier.sourceforge.net/> - -.BR - /usr/share/doc/hyperestraier/index.html - -.BR - man estcmd - -.BR -on sisu_hyperestraier: - -.BR - man sisu_hyperestraier - -.BR - /usr/share/doc/sisu/sisu_markup/sisu_hyperestraier/index.html - -.BR -NOTE: the examples that follow assume that sisu output is placed in the -directory /home/ralph/sisu_www - -.BR -(A) to generate the index within the webserver directory to be indexed: - -.BR - estcmd gather \-sd \ [index \ name] \ [directory \ path \ to \ index] - -.BR -the following are examples that will need to be tailored according to your -needs: - -.BR - cd /home/ralph/sisu_www - -.BR - estcmd gather \-sd casket /home/ralph/sisu_www - -.BR -you may use the \'find\' command together with \'egrep\' to limit indexing to -particular document collection directories within the web server directory: - -.BR - find /home/ralph/sisu_www \-type f | egrep - \'/home/ralph/sisu_www/sisu/.+?.html$\' |estcmd gather \-sd casket \- - -.BR -Check which directories in the webserver/output directory (~/sisu_www or -elsewhere depending on configuration) you wish to include in the search index. - -.BR -As sisu duplicates output in multiple file formats, it it is probably -preferable to limit the estraier index to html output, and as it may also be -desirable to exclude files \'plain.txt\', \'toc.html\' and -\'concordance.html\', as these duplicate information held in other html output -e.g. - -.BR - find /home/ralph/sisu_www \-type f | egrep - \'/sisu_www/(sisu|bookmarks)/.+?.html$\' | egrep \-v - \'(doc|concordance).html$\' |estcmd gather \-sd casket \- - -.BR -from your current document preparation/markup directory, you would construct a -rune along the following lines: - -.BR - find /home/ralph/sisu_www \-type f | egrep \'/home/ralph/sisu_www/([specify \ - first \ directory \ for \ inclusion]|[specify \ second \ directory \ for \ - inclusion]|[another \ directory \ for \ inclusion? \ ...])/.+?.html$\' | - egrep \-v \'(doc|concordance).html$\' |estcmd gather \-sd - /home/ralph/sisu_www/casket \- - -.BR -(B) to set up the search form - -.BR -(i) copy estseek.cgi to your cgi directory and set file permissions to 755: - -.BR - sudo cp \-vi /usr/lib/estraier/estseek.cgi /usr/lib/cgi\-bin - -.BR - sudo chmod \-v 755 /usr/lib/cgi\-bin/estseek.cgi - -.BR - sudo cp \-v /usr/share/hyperestraier/estseek.* /usr/lib/cgi\-bin - -.BR - \ [see \ estraier \ documentation \ for \ paths] - -.BR -(ii) edit estseek.conf, with attention to the lines starting \'indexname:\' and -\'replace:\': - -.BR - indexname: /home/ralph/sisu_www/casket - -.BR - replace: ^file:///home/ralph/sisu_www{{!}}http://localhost - -.BR - replace: /index.html?${{!}}/ - -.BR -(C) to test using webrick, start webrick: - -.BR - sisu \-W - -.BR -and try open the url: <http://localhost:8081/cgi\-bin/estseek.cgi> - -.SH -DOCUMENT INFORMATION (METADATA) -.BR - -.SH -METADATA -.BR - -.BR -Document Manifest @ -<http://www.jus.uio.no/sisu/sisu_manual/sisu_search/sisu_manifest.html> - -.BR -.B Dublin Core -(DC) - -.BR -.I DC tags included with this document are provided here. - -.BR -DC Title: -.I SiSU \- SiSU information Structuring Universe \- Search \ [0.58] - -.BR -DC Creator: -.I Ralph Amissah - -.BR -DC Rights: -.I Copyright (C) Ralph Amissah 2007, part of SiSU documentation, License GPL -3 - -.BR -DC Type: -.I information - -.BR -DC Date created: -.I 2002\-08\-28 - -.BR -DC Date issued: -.I 2002\-08\-28 - -.BR -DC Date available: -.I 2002\-08\-28 - -.BR -DC Date modified: -.I 2007\-09\-16 - -.BR -DC Date: -.I 2007\-09\-16 - -.BR -.B Version Information - -.BR -Sourcefile: -.I sisu_search._sst - -.BR -Filetype: -.I SiSU text insert 0.58 - -.BR -Sourcefile Digest, MD5(sisu_search._sst)= -.I 52c1d6d3c3082e6b236c65debc733a05 - -.BR -Skin_Digest: -MD5(/home/ralph/grotto/theatre/dbld/sisu\-dev/sisu/data/doc/sisu/sisu_markup_samples/sisu_manual/_sisu/skin/doc/skin_sisu_manual.rb)= -.I 20fc43cf3eb6590bc3399a1aef65c5a9 - -.BR -.B Generated - -.BR -Document (metaverse) last generated: -.I Sun Sep 23 04:13:48 +0100 2007 - -.BR -Generated by: -.I SiSU -.I 0.59.0 -of 2007w38/0 (2007\-09\-23) - -.BR -Ruby version: -.I ruby 1.8.6 (2007\-06\-07 patchlevel 36) \ [i486\-linux] - -.TP -.BI 1. -<http://www.postgresql.org/> - <http://advocacy.postgresql.org/> - <http://en.wikipedia.org/wiki/Postgresql> -.TP -.BI 2. -<http://www.hwaci.com/sw/sqlite/> - <http://en.wikipedia.org/wiki/Sqlite> -.TP -.BI 3. -<http://search.sisudoc.org> -.TP -.BI 4. -(which could be extended further with current back-end). As regards scaling -of the database, it is as scalable as the database (here Postgresql) and -hardware allow. -.TP -.BI 5. -of this feature when demonstrated to an IBM software innovations evaluator in -2004 he said to paraphrase: this could be of interest to us. We have large -document management systems, you can search hundreds of thousands of documents -and we can tell you which documents meet your search criteria, but there is no -way we can tell you without opening each document where within each your -matches are found. - -.TP -Other versions of this document: -.TP -manifest: <http://www.jus.uio.no/sisu/sisu_search/sisu_manifest.html> -.TP -html: <http://www.jus.uio.no/sisu/sisu_search/toc.html> -.TP -pdf: <http://www.jus.uio.no/sisu/sisu_search/portrait.pdf> -.TP -pdf: <http://www.jus.uio.no/sisu/sisu_search/landscape.pdf> -." .TP -." manpage: http://www.jus.uio.no/sisu/sisu_search/sisu_search.1 -.TP -at: <http://www.jus.uio.no/sisu> -.TP -.TP -* Generated by: SiSU 0.59.0 of 2007w38/0 (2007-09-23) -.TP -* Ruby version: ruby 1.8.6 (2007-06-07 patchlevel 36) [i486-linux] -.TP -* Last Generated on: Sun Sep 23 04:13:52 +0100 2007 -.TP -* SiSU http://www.jus.uio.no/sisu |