A Guide To Using rtftohtml and rtftoweb

rtftohtml, written by Chris Hector, is a conversion utility to generate HTML documents, which are used within the World Wide Web, from RTF source. rtftoweb extends rtftohtml by a few additional functions such as automatic splitting of the document, inserting a navigation panel into the HTML files and generating an index with active links. Together these tools provide quite a comfortable and yet powerful means to get your (Word or whatever) documents on the Web.
And the best thing is: Both tools may be freely copied.


This guide describes rtftohtml version 2.7.5, written by Chris Hector (cjh@cray.com) and rtftoweb version 1.6, written by Christian Bolik (zzhibol@rrzn-user.uni-hannover.de), that's me. My email address will probably change during October `95, but I hope these pages may remain in place.

Read about the new features and bug fixes of rtftoweb 1.6, dated 31.08.95, in the ChangeLog.

Some parts of this guide have been copied from Chris' rtftohtml User's Guide. The main purpose of this document is to have all information regarding the conversion from RTF to HTML via rtftohtml/rtftoweb in one place. It also is an example of what you can expect from rtftohtml/rtftoweb, since it has been converted from a Microsoft Word 6.0 document (exported as RTF) to HTML by using the extended rtftohtml.

Christian Bolik (zzhibol@rrzn-user.uni-hannover.de), 04.09.95

What is rtftohtml?

rtftohtml is a tool to turn your, say, Word documents into documents which may be read from within the World Wide Web. The format of these documents is called HyperText Markup Language (HTML). rtftohtml is able to automatically convert documents stored in RTF (Rich Text Format) to HTML. Most word processors in use on UNIX, Macintosh, PC or NeXT systems can export their documents in RTF format (hint: have a look at the "Save as..." dialog box of your favorite word processor).

The author of rtftohtml is Chris Hector. Have a look at his Web pages at Cray.

In processing text, rtftohtml chooses HTML markup based on three characteristics. These are

  1. The destination of the text. Example destinations are header, footer, footnote, picture.
  2. The paragraph style. Paragraph styles are user-definable entities, but some are pre-defined by the word processing package. For Microsoft Word (on the Macintosh) examples are "Normal" and "heading 1" or ("Überschrift 1" when using a german version).
  3. The text attributes. Examples of text styles are bold, courier, 12 point.

The filter has built-in rules for dealing with destinations. For paragraph and text styles, the rules for translation are contained in a file called html-trans. By modifying this file, you can train rtftohtml to perform the correct translations for your documents. The most common change that you will need to make is to add your own paragraph styles to html-trans.

rtftohtml should produce reasonable HTML output for most documents. Here is what you can expect:

What is rtftoweb?

rtftoweb is an extension of Chris Hector's very excellent rtftohtml (see section What is rtftohtml), which converts RTF-Documents to HTML, which in turn is used within the World Wide Web Project. rtftohtml converts a linear RTF-Document to an also linear HTML-Document, with little support for hypertext and without any structuring (well, since version 2.7 this isn't quite true anymore, but that table of contents doesn't turn your documents into hypertext).

This is where rtftoweb comes in.

rtftoweb converts a linear RTF-Document (that may contain cross references, index entries and footnotes) into a fully hypertexted set of HTML-Documents.

This document, that you are reading right now, is an example of what you can expect from using an rtftohtml which has been extended by rtftoweb, since it was created from an RTF-source by rtftohtml 2.7.5 with applied rtftoweb 1.6 patches, with the command

	rtftohtml -h1 -c -x guide.rtf

The rtftoweb-patch adds the following features to rtftohtml:

For information about where to get and how to install rtftoweb, see section Installing rtftohtml and rtftoweb.


New in version 1.6

When I released rtftoweb 1.5, I thought it would be the last version of rtftoweb since I agreed with Chris Hector, the author of the original rtftohtml, that rtftoweb should be integrated into rtftohtml. That was at the end of 1994, but no new version of rtftohtml was released since then.

Chris told me that he has currently (and during the last monts) too many other things to do, and that he will not be able to continue the development of rtftohtml within the next few months. Because of this, and because I keep getting bug reports and suggestions from rtftoweb users I decided to release another (intermediate) version of rtftoweb.

Among the new features are:

This version also fixes some bugs of earlier versions, with the more important being:

For a detailed description of all changes have a look at the file CHANGES.

Supported platforms

rtftohtml is available for UNIX, Macintosh and PC systems. Binaries can be copied directly from Chris' ftp directory at Cray, but note that these binaries were not extended by rtftoweb.

rtftoweb is currently only available for UNIX, but you are more then welcome to try and port it to Macs or PCs. This should not be too hard, since rtftoweb does not use any UNIX-specific features, maybe apart from filenames longer then ridiculous 11 characters (concerning a port to PCs).

For directions on getting and installing rtftohtml and rtftoweb on UNIX systems, read the next section: Installing rtftohtml and rtftoweb.

Installing rtftohtml and rtftoweb

This section describes the installation of rtftohtml and rtftoweb under UNIX. If you are running a different system, have a look at section Supported platforms.

Installing rtftohtml and extending it with rtftoweb is really simple. Just follow these steps:

  1. Get rtftohtml version 2.7.5.
  2. Unpack it with "gzip -cd rtftohtml-2.7.5.tar.gz | tar xvpf -".
    If your shell complains that it cannot find "gzip" you should get and install this GNU utility now.
  3. Get rtftoweb version 1.6.
  4. Move rtftoweb-1.6.tar.gz into the directory rtftohtml-2.7.5 which has been created by step 2, "cd" to it, and unpack it by "gzip -cd rtftoweb-1.6.tar.gz | tar xvpf -".
  5. Patch the original rtftohtml-sources by issuing the command "patch -l <patch.rtftoweb-1.6". If your shell complains about not finding "patch", you can get it here.
  6. Edit makefile.rtftoweb to specify the installation paths and compiler options:
  7. Type "make -f makefile.rtftoweb". (Note that you can still easily make an unpatched rtftohtml, see instructions in makefile.rtftoweb.)
  8. Type "make -f makefile.rtftoweb install" to install your new, patched and full featured rtftohtml.

That's it!

Command line options

rtftohtml is invoked by a command like this:

	rtftohtml [options] file

file is the name of the RTF-file to convert. By default the HTML-output will be written to a file with the same basename, but with the .rtf-extension replaced with .html.

The most common options are (see below for more detailed descriptions of these):

-hlevel
to split the document at headings of level level,
-c
to create a table of contents, and
-x
to create an index (of course, this requires index entries to be in the RTF document)

These should suffice in most cases. When invoking rtftohtml without any arguments, a list of all available options is printed. Following is a complete and detailed list of all available command line options, sorted alphabetically:

-c
Generate a table of contents on a separate page (the title page itself normally only contains the level-1-headings).
-G
Indicates that no graphics files should be written. The hypertext links to the graphics files will still be generated. This is a performance feature for when you are re-translating a document and the graphics have not changed.
-hlevel
This tells rtftohtml to split the generated HTML document at all headings of level level, thus creating a separate HTML file for the contents of every section at that level. If level is 0, only one HTML file will be created with a table of contents at the top of the HTML file. If level is omitted, rtftohtml will create a separate HTML file for every section.
For example, -h1 only splits the HTML-File when a level-1-heading is encountered. All lower level headings will be internally referenced from the top of the respective HTML-File.
This option is also required for the -c and -x options.
-i
Indicates that imbedded graphics should be linked into the main document using an IMG tag. The default is to use an HREF style link.
-N file
If the -h option has been given, this option allows you to tell rtftohtml that it should read the description for the navigation panels it produces from file file. Navigation panels are described in more detail in section Navigation panels.
-o filename
Indicates that the basic output file name should be filename. If any other files are created (such as for graphics), the basename of the other files will be filename without ".rtf" if it is present in the name.
-P extension
Use extension as the extension for any links to graphics files. The default for this is "gif".
-s
Use short filenames when splitting the HTML output. In this case the HTML-files are simply numbered. If this option is not present the file-names contain the first eight characters of the first heading contained in the respective files.
-t
Place external references to headlines near the top of the page. (Default is that they occur at the and of the page.)
-T title
Use title as the document title. This overrides any title supplied within the RTF-file (see also section Supplying a title).
(Note: Stock rtftohtml uses this switch to prevent the generation of a table of contents. Since this conflicts with rtftoweb's ToC-handling anyway, I have redefined this command line switch.)
-x
Generate an index. This works by gathering all the index entries of the RTF-File and inserting an invisible anchor at that place.
-X text
By default, rtftohtml inserts index anchors without any text, thus producing "empty anchors" which most Web browsers have no problems with. Unfortunately there is one prominent exception to this statement: NCSA Mosaic. Mosaic requires anchors to contain at least one non-blank character, otherwise they are not recognized. This option lets you use text as the text of such anchors when using Mosaic as your Web browser, e.g. "-X &#183;".

Using rtftohtml and rtftoweb

(Whenever speaking of rtftohtml in the following I mean rtftohtml extended by rtftoweb.)

To convert a document from RTF (Rich Text Format) to HTML, rtftohtml requires the contents of the RTF-file to be formatted with a certain set of paragraph styles. For example, headings at level 1 must be formatted with the paragraph style "heading 1" (which is the built-in default for headings anyway; german heading styles may be called "Überschrift xy", but they appear in the RTF file as "heading xy", too), lists must be formatted with a paragraph style such as "numered list" etc. The reason for this is that rtftohtml needs to know which paragraph styles it should map to which HTML tags. This mapping between styles and tags can be customized be editing the file html-trans in rtftohtml's library directory (see section html-trans for more), to create a mapping from your own individual paragraph styles to HTML-tags. Although this is not as complicated as it might seem, I personally prefer to adjust my Word-documents to use only (or at least mostly) the paragraph styles recognized by rtftohtml by default. In this chapter I will stick to this strategy. See section "Adding paragraph styles" for a few words on how to customize rtftohtml to correctly interpret your own paragraph style.

To make the creation and preparation of Word documents that are to be converted to HTML as easy as possible, I have included a style file for Microsoft Word 6.0, called rtftoweb.dot into the rtftoweb-tar-file. Section "A .dot file for WinWord" describes the usage of this file in more detail.

Supplying a title

To determine the HTML-Title for the created HTML-Files (the text between the <title> and </title> tags), rtftohtml looks for the \title-token inside the \info-group of the RTF-File. Thus you should give your RTF-Documents a short, descriptive title in the respective dialog box of your word processor (should be called something like "File information").

Another way to specify the document title is via the -T command line option. For example:

	rtftohtml -T "My work of art" art.rtf

Note that this title will also be automatically inserted by rtftohtml into the first created HTML-File as a level-1-heading. That's why you should usually delete the very first heading from your RTF-Document (or at least assign a different paragraph format to that line) and use it as the document title. The reason for this is to prevent rtftoweb from interpreting the headline of your RTF-Document as a level 1 heading, where it should split.

Character styles

rtftohtml automatically recognizes and converts bold, italic and underlined text. If a certain range of text is written using a monospaced font such as Courier, it also automatically creates monospaced HTML-output for that range. What fonts are considered to be monospaced can be configured in the file html-trans in section .TMatch ("monospace fonts -> tt"). By default the fonts "Courier", "Courier New" and "Palatino" are expected to be monospaced.

If you get warning messages such as "no output translation for ..." when running rtftohtml you can either replace that character with a less exotic one in your RTF-file or add a translation to the end of rtftohtml's library file html-map, such as "character translation".

The newline character (created by Shift-Return) will be automatically
converted to the corresponding HTML-tag,
as will the unbreakable space (created by Control-Shift-Space).

Headings

Headings must be formatted with a paragraph style like "heading 1", "heading 2" etc. (resp. "Überschrift 1" etc.) to be automatically recognized by rtftohtml. rtftohtml uses these styles to determine when it should split the HTML-file. The heading level at which splitting should take place can be configured by the command line switch -hlevel (see section Command line options). If a heading contains no text (i.e. it is empty) it will be ignored by rtftohtml.

If the -h switch was present when rtftohtml was invoked, a navigation panel will be inserted at the top and at the bottom of every generated HTML file. This navigation panel will contain the following elements:

rtftohtml will try to use the language of the RTF-file for labelling the navigation panel. Currently there is support for english, spanish, french and german. However, if you would like a more fancy-looking panel, with buttons etc., you can tell rtftohtml (by writing a simple configuration file) what HTML-code it should use for the individual panel elements. The creation of such configuration files is described in detail in section Navigation panels.

Lists

rtftohtml knows about the following lists (in braces is the name of the respective paragraph style it expects such lists to be formatted with):

numbered ("numbered list")
items start with a tab and end with a paragraph mark (numbers before the tab are ignored)
unnumbered ("bullet list")
items start with a tab and end with a paragraph mark (bullets etc. before the tab are ignored)
Glossaries ("glossary")
term and definition are separated by a tab, glossary entries are separated by a paragraph mark

Nested lists can be created from an RTF document by using a different style for each level of indentation. The styles "bullet list 1" "numbered list 2" ... represent different levels of nesting, with "bullet list 1" being at nesting level 1. The only rule for use is that no levels of nesting are skipped. For example, a "numbered list 3" paragraph must not appear immediately after a "Normal" paragraph. It must follow a paragraph with a nesting level of 2 or higher.

An example sequence of paragraph styles to produce a nested list might look like this:

numbered list
	bullet list 1
		bullet list 2
		glossary 2
	bullet list 1
		numbered list 2

Tables

rtftohtml is able automatically convert tables to HTML by generating a range of preformatted text to keep the cells in their place. For this reason only plain text is allowed in tables. Bold and italic text in tables should be possible in the next release of the rtftoweb patches. Tables produced/converted by rtftohtml look something like this:

Column 1, Row 1              Column 2, Row 1              Column 3, Row 1              
Column 1, Row 2              Column 2, Row 2              Column 3, Row 2              
Column 1, Row 3              Column 2, Row 3              Column 3, Row 3              

If sometimes I have really got a lot of time on my hands I am planning to add support for tables as realized by the upcoming HTML 3.0 specification. Of course this would require you to use a HTML 3.0 capable browser such as Arena or Netscape.

Images

Graphics are imbedded in RTF in either a binary format or an (ASCII) hex dump of that binary. I have never seen a binary format graphic - I don't think that the filter will process binary correctly. It does handle the hex format of graphics, by converting the hex back into binary and writing the binary to a file. The file extension is chosen by looking at the original type of the graphic. The following list shows the file types and their extensions:

Macintosh PICT
.pict - also, 256 bytes of nulls are prepended to the graphic. This is to conform to the PICT file format.
Windows Meta-files
.wmf
Windows Bit-map
.bmp

In addition, the filter produces a link to the file containing the graphic. Now, since the above graphic formats are not very portable, the filter assumes that you will convert these files to something more useful, like GIF. So the format of the link is:

<a href="basenameN.ext">Click here for a Picture</a>

where

Since most Web browser only support images in GIF-format, you will have to convert the generated PICT- and WMF-files to GIF. For PICT there is picttoppm/ppmtogif, but for WMF? I don't know of any WMF translators for Unix; for DOS there is wmf2bmp, whose output could then be converted to GIF via the pbmplus-tools. From what I understand, WMF is not a pixel- but a vector-graphic format, so maybe it would be easier to translate WMF to Postscript and then let Ghostscript do the job of converting to GIF. Any volunteers for writing a wmftops utility?

You can also change the link to an IMG form. If you specify the -I command line option, all links to graphics will be of the form:

<IMG src="basenameN.ext">

There is one other special case. If a graphic is encountered when the filter is in the process of generating a link, the IMG form of the link is used even without the -I command line option.

Cross references

All kinds of cross references can be created from within the RTF-file. The reference itself must be formatted with the attributes "double-underline/hidden" and must follow the standard HTML-conventions, such as "http://www.w3.org" or "file.txt" or "#mark1". The "hot" text, that is the text that will appear "clickable" in your Web-browser, immediately follows the reference and must be double-underlined, but not hidden.

Anchors for internal cross references (such as "mark1", corresponding to the example above) must be formatted either with the attributes "hidden/outline" or "hidden/superscript". For example this link will bring you to the list of new features in rtftoweb 1.6.

If you just want to create a reference to a certain heading resp. section, it is sufficient to simply format the reference with the color red (when using rtftoweb.dot: mark the reference and press Control-Shift-r). The text of the reference must match the beginning n characters of the heading, so the references "Supplying" and "Supplying a title" point to the same section.

If an email address such as bolik@irb.uni-hannover.de is colored red, rtftohtml will automatically produce a cross reference of type "mailto". Not all Web browser support this type of references (Netscape does).

The same work for all other kinds of URLs, so if the URL ftp://ftp.rrzn.uni-hannover.de/pub/ is colored red, rtftohtml will automatically produce a reference pointing to that URL.

Index entries and footnotes

If your RTF document contains footnotes or endnotes, the filter will place the text of the footnote in a separate HTML document. At the footnote reference mark, the filter will generate a hypertext link to the text of the footnote. This works with either automatically numbered footnotes[1], or user supplied footnote reference marks[+]

If you insert index entries into your RTF-document and give rtftohtml the -x-option, rtftohtml will generate a hypertext'ish index for the generated HTML-documents. Note that when using NCSA-Mosaic as your Web browser you should also tell rtftohtml to insert some text into the generated anchors by using the command line switch -X text (see section Command line options).

Other features

Horizontal lines

The paragraph style "hr" can be used to produce a horizontal line in the HTML output (this will be translated to the <hr> tag).

Discarding Unwanted Text

If you have text that you do not want to appear in the HTML output, simply format the text as Hidden and Plain (that is, no underline, outline...)

If you wish to modify the formatting that discards text, you need to change the entry in html-trans that specifies "_Discard".

Imbedding HTML in a Document

Normally, if your RTF document contained the text "<cite>hello</cite>", the translator would output this as: "&lt;cite&gt;hello&lt;/cite&gt;". This ensures that the text would appear in your HTML output exactly as it appeared in the original RTF document. If, however, you want the <cite></cite> to be interpreted as HTML markup, you must format the tags using Hidden and Shadow or Hidden and Strikethrough. The filter will then send the tags through without translation. It is also possible to use the paragraph style "HTML" to let rtftohtml interpret a whole paragraph as being literal HTML.

When the rtftohtml filter produces HTML markup, it keeps track of the nesting level of tags to ensure that you don't get something like <b><cite>hello</b></cite> which would be incorrect markup. If you imbed HTML markup in your document, the filter will NOT be aware of it. You must ensure that your markup appears correctly nested.

If you wish to modify the formatting for imbedded HTML, you need to change the entry in html-trans that specifies "_Literal".

Other paragraph styles

rtftohtml understands a few other paragraph styles by default. These are (among others):

address
Will be converted to HTML's <address>-environment.
blockquote
Will be converted to HTML's <blockquote>-environment.
pre
Will be converted to HTML's <pre>-environment. This is useful when spacing is important in a paragraph.

A .dot file for WinWord

While using rtftohtml myself I have created a style file for Microsoft Word 6.0 called rtftoweb.dot (I have also a less sophisticated dot-file for Word 2.0 lying around somewhere, mail me (zzhibol@rrzn-user.uni-hannover.de) if you are interested). By using this file as the standard document type for your documents it gets really easy to create RTF-documents which can be translated by rtftohtml without any problems. You make your documents use rtftoweb.dot by following the same procedure as usual when assigning dot-files. In german Word 6.0 this is (sorry, currently there are no english instructions available):

rtftoweb.dot adds all the paragraph styles which are understood by rtftohtml (without modifying html-trans) to your document. Additionally some keyboard shortcuts are now defined (or possibly redefined...):

Ctrl-Shift-1 ... Ctrl-Shift-6
Selects paragraph style "heading 1" ... "heading 6".
Ctrl-Shift -p
Selects paragraph style "pre".
Ctrl-Shift -b
Selects paragraph style "bullet list".
Ctrl-Shift -n
Selects paragraph style "numbered list".
Ctrl-Shift -g
Selects paragraph style "glossary".
Ctrl-Shift -p
Selects paragraph style "pre".
Ctrl-Shift -h
Formats the selected text for plain HTML.
Ctrl-Shift -r
Formats the selected text with the color red (for Cross references).
Ctrl-Shift -i
Formats the selected text to be the destination of a Cross reference.
Ctrl-Shift -u
Formats the selected text to be the "hot text" of a Cross reference.
Ctrl-Shift -a
Formats the selected text to be the anchor of a Cross reference.
Ctrl-Shift -c
Formats the selected text to use font "Courier New".
Ctrl-Shift -t
Formats the selected text to use font "Times New Roman".

Customizations

Adding paragraph styles

When converting existing documents to rtftohtml you often get a lot of warning message telling you that some paragraph styles are unknown. Now you can either

To add a new paragraph style, simply go to the .PMatch table contained in the file html-trans and add an entry to the end. Put the name of the paragraph style (quoted), the nesting level (usually zero) and the name of the .PTag entry that should be used.

html-trans File Format

The file html-trans is needed by rtftohtml to map character and paragraph styles contained in the RTF-file to corresponding HTML-tags. It must be readable either from rtftohtml's library directory (as set in the file makefile.rtftoweb) or from the directory contained in the environment variable RTFLIBDIR.

In html-trans there are four tables. They are labelled .PTag, .TTag, .TMatch and .PMatch. These tables begin with the name (in column one) and continue until the next table starts. All blank lines and lines beginning with a '#' are discarded. '#' lines are typically used for comments. The tables themselves are composed of records containing a fixed number of fields which are separated by commas. The fields are either strings (which should be quoted) integers or bitmasks.

.PTag Table

Each entry in the .PTag table describes an HTML paragraph markup. The format is:

.PTag

#"name","starttag","endtag","col2mark","tabmark","parmark",allowtext,cannest,DeleteCol1,fold,TocStyl

name
A unique name for this entry. These names are referenced in the .PMatch table.
starttag
This string will be output once at the beginning of any text for this markup.
endtag
This string will be output once at the end of any text for this markup.
col2mark
This string will be output in place of the first tab in every paragraph (used for lists)
parmark
This string will be output in place of each paragraph mark. (usually <br> or <p>)
allowtext
If 0, no text markup will be allowed within this markup. (for example <pre> or <h1> don't format well if they contain additional markup.
cannest
If 1, other paragraph markup will be allowed to nest within this markup. (used for nesting lists)
DeleteCol1
If 1, all text up to the first tab in a paragraph will be deleted. (used to strip out bullets that when going to unordered lists (<ul>).
fold
If 1, the filter will add newlines to the HTML to keep the number of characters in a line to less than 80. For <pre> or <listing> elements, this should be set to 0.
TocStyl
The TOC level. If greater than 0, the filter will create a Table of contents entry for every paragraph using this markup.

Sample .PTag Entries

"h1","<h1>\n","</h1>\n","\t","\t","<br>\n",0,0,0,1
This is a level 1 heading. The "\n" in the start and end-tag fields forcesa newline in the HTML markup. Since newlines are ignored in HTML (except in <pre>) it's only effect is to make the HTML output more readable. There is no difference between the first tab and any other. They both translate to a tab mark. Paragraph marks generate "<br>" followed by a newline (just for looks). Text markup (like <b>) is not allowed within <h1> text, because we leave that up to the HTML client. No nesting is allowed - (see the discussion on nested styles). No text is deleted. Every paragraph using this markup will also generate a level-1 table of contents entry.

"Normal","","\n","\t","\t","<p>\n",1,0,0,0
This is the default for normal text. Regular text in HTML has no required start and end-tags. The "\n" in the end-tag field forces a newline in the HTML markup. Since newlines are ignored in HTML (except in <pre>) it's only effect is to make the HTML output more readable. There is no difference between the first tab and any other. They both translate to a tab mark. Paragraph marks generate "<p>" followed by a newline (just for looks). Text markup (like <b>) is allowed within Normal text. No nesting is allowed - (see the discussion on nested styles). No text is deleted.

"ul","<ul>\n<li>","</ul>","\t","\t","\n<li>",1,1,0,0
This is the entry for unordered lists. This generates a "<ul>\n<li>" at the start of the list and "</ul>/n" at the end. There is no difference between the first tab and any other. They both translate to a tab mark. Paragraph marks generate "<li>" preceded by a newline (just for looks). Text markup (like <b>) is allowed, and this entry may be nested - and it allows others to be nested within it. This allows nested lists. No text is deleted.

"ul-d","<ul>\n<li>","</ul>","\t","\t","\n<li>",1,1,1,0
This entry is identical to the previous except that the DeleteCol1 field is set to 1. This is used to remove bullets (which really appear in the RTF) because we don't want to see them in the HTML.

.TTag Table

Each entry in the .TTag table describes an HTML text markup. The format is:

.TTag

"name","starttag","endtag"

name
A unique name for this entry. These names are referenced in the .PMatch table.
starttag
This string will be output once at the beginning of any text for this markup.
endtag
This string will be output once at the end of any text for this markup.
Note that unlike the .PTag table, no text markup should appear more than once. (Of course there is no good reason that it should appear.) If you have two entries with <b></b> start and end tags, it would be possible to get HTML of the form <b><b> text</b></b>. I don't know if this is invalid markup, but it sure is ugly.

.TMatch Table

Each entry in the .TMatch table describes processing for text styles. The format is:

.TMatch

"Font",FontSize,Match,Mask,"TextStyleName"
Font
The name of a Font, or "" if all fonts match this entry.
FontSize
The point-size of the font, or 0 if all point sizes match this entry.,
Match
A bit-mask, where each bit represents a text attribute. These bits are compared to the attributes of the style being output. They must match for this entry to be matched. One in a bit position means that the text style is set, a zero is not set.
Mask
A bit-mask, where each bit represents a text attribute. In comparing the style of the text being processed, to the Match bit-mask, this field is used to select the bits that matter. If a zero appears in a bit-position, then that style attribute is ignored (for the purpose of matching this entry.) Only 1 bits are used in the above comparision.
TextStyleName
This is either the name of an entry in the .TTag table indicating the HTML markup to use, or it is one of "_Discard", "_Name", "_HRef", "_Hot", or "_Literal".
The order of bits in the Match and Mask bit-maps are:
#    v^bDWUHACSOTIB - Bold
#    v^bDWUHACSOTI - Italic
#    v^bDWUHACSOT - StrikeThrough
#    v^bDWUHACSO - Outline
#    v^bDWUHACS - Shadow
#    v^bDWUHAC - SmallCaps
#    v^bDWUHA - AllCaps
#    v^bDWUH - Hidden
#    v^bDWU - Underline
#    v^bDW - Word Underline
#    v^bD - Dotted Underline
#    v^b - Double Underline
#    v^ - SuperScript
#    v - SubScript

Sample .TMatch Entries

# double-underline/not hidden -> hot text
# double-underline/hidden -> href
#    v^bDWUHACSOTIB,v^bDWUHACSOTIB
"",0,00100000000000,00100010000000,"_Hot"
"",0,00100010000000,00100010000000,"_HRef"
The first entry will match any text formatted with double underline EXCEPT if it is hidden text. This is accomplished by using those two bits to compare (the MASK field) and having a 1 in the double underline bit and a zero for the hidden text bit. The second entry will match any text formatted with BOTH double underline and hidden text. Any text that matches the first will be treated as the hot text of a link. Any text that matches the second will be taken as the href itself. (The filter requires that the HRef text immediately precede the Hot text.)

# Regular matches - You can have multiple of these active
# monospace fonts -> tt
"Courier",0,00000000000000,00000000000000,"tt"
This will match any text that uses the Courier font and mark it using the HTML text markup appearing in the .TTag table with the entry name "tt".

# bold -> bold
#    v^bDWUIACSOTIB,v^bDWUIACSOTIB
"",0,00000000000001,00000000000001,"b"
This will match any text that has bold attributes and will mark it using the HTML text markup appearing in the .TTag table with the entry name "b". Note that bold text using the Courier font would match both this entry and the previous. This will yeild markup of the form <b><tt>hi</tt><b>. Note that "b" is the name of an entry in the .TTag table, not the HTML markup that is used!

.PMatch Table

Each entry in the .PMatch correlates a paragraph style name to some entry in the .PTag table. The format is:

.PMatch

"Paragraph Style",nesting_level,"PTagName"

Paragraph Style
The paragraph style name that appears in the RTF input.
nesting_level
The nesting level. This should be zero except for nested list entries.
PTagName
The name of the .PTag entry that should be used for paragraphs with this paragraph style.

Sample .PMatch Entries

"heading 1",0,"h1"
This is a level 1 heading. Any paragraphs with this paragraph style will be mapped to the entry in the .PTag table named "h1".

"numbered list",0,"ol-d"
This is used for numbered lists. Any paragraphs with this paragraph style will be mapped to the entry in the .PTag table named "ol-d".

"numbered list 2",2,"ol-d"
This is an entry for a nested paragraph style. The nesting level of two is used to indicate that this paragraph should appear in the HTML nested within two levels of paragraph markups. The paragraph marked with this style may only appear after a paragraph style that has a nesting level of 1 or greater.

Navigation panels and Netscape support

If you want the navigation panels produced by rtftohtml (see section Headings) to look more spiffy, e.g. with images as panel buttons, or if you want the generated HTML documents to use images as their background or another text color, this section is for you.

By using the -N Command line option when invoking rtftohtml, it is possible to tell rtftohtml exactly how you want the created navigation panels to look like. The same configuration file can be used to add a few funny Netscapisms to the generated documents. If no -N-option was given, but rtftohtml finds a file named nav-panel in its library directory or the directory contained in the environment variable RTFLIBDIR it will use this file as the layout customization file. This way you can avoid having to add the -N command line options whenever you use rtftohtml.

An example for such a customization file is the file nav-panel, which has also been used when this guide was converted to HTML. By looking at this file you should easily see how the layout of your documents can be adjusted tou your taste.

Each line of such a customization file contains the definition of a layout element, as long as the first character is not the hash-character (#), which introduces comments. Everything that follows the first colon (:) in each line will be literally inserted into the HTML-files when needed.

The following elements may be configured:

previous
What to insert into the navigation panel when the "previous" element is to be created.
next
The same for the "next" element.
up
The same for the "up" element.
title
The same for the "title" element.
contents
The same for the "contents" element.
index
The same for the "index" element.
delimiter
What to use as the delimiter between the elements of navigation panels.
hr
What HTML-code to use when it's time to insert a horizontal line beneath or above navigation panels.
bgimage
Specifies an optional background (GIF-) image that should be used as the document background (requires Netsape).
bgcolor
Specifies an optional background color that should be used in the document background (requires Netsape). Syntax: #rrggbb (hexadecimal values for red, green, blue).
textcolor
The color to use for normal text. Same synax as for bgcolor.

Plans

I currently do not know what Chris Hector's plans concerning rtftohtml are (he hasn't answered yet), but I plan to do some of the following as soon as time allows:

If you can help me with any of the items in this lists, please contact me (zzhibol@rrzn-user.uni-hannover.de). And of course, if you happen to stumble over anything that might look like a bug or you have any ideas for future releases, contact me, too!