Path: | markup.rb |
Last Update: | Sun Mar 09 06:49:04 JST 2008 |
RDoc::Markup parses plain text documents and attempts to decompose them into their constituent parts. Some of these parts are high-level: paragraphs, chunks of verbatim text, list entries and the like. Other parts happen at the character level: a piece of bold text, a word in code font. This markup is similar in spirit to that used on WikiWiki webs, where folks create web pages using a simple set of formatting rules.
RDoc::Markup itself does no output formatting: this is left to a different set of classes.
RDoc::Markup is extendable at runtime: you can add new markup elements to be recognised in the documents that RDoc::Markup parses.
RDoc::Markup is intended to be the basis for a family of tools which share the common requirement that simple, plain-text should be rendered in a variety of different output formats and media. It is envisaged that RDoc::Markup could be the basis for formating RDoc style comment blocks, Wiki entries, and online FAQs.
* this is a list with three paragraphs in the first item. This is the first paragraph. And this is the second paragraph. 1. This is an indented, numbered list. 2. This is the second item in that list This is the third conventional paragraph in the first list item. * This is the second item in the original list
[cat] a small furry mammal that seems to sleep a lot [ant] a little insect that is known to enjoy picnics
A minor variation on labeled lists uses two colons to separate the label from the list body:
cat:: a small furry mammal that seems to sleep a lot ant:: a little insect that is known to enjoy picnics
This latter style guarantees that the list bodies’ left margins are aligned: think of them as a two column table.
Word-based markup uses flag characters around individual words:
General markup affects text between a start delimiter and and end delimiter. Not surprisingly, these delimiters look like HTML markup.
Unlike conventional Wiki markup, general markup can cross line boundaries. You can turn off the interpretation of markup by preceding the first character with a backslash, so \<b>bold text</b> and \*bold* produce <b>bold text</b> and *bold respectively.
Hyperlinks can also be of the form label[url], in which case the label is used in the displayed text, and url is used as the target. If label contains multiple words, put it in braces: {multi word label}[url].
This code converts input_string to HTML. The conversion takes place in the convert method, so you can use the same RDoc::Markup object to convert multiple input strings.
require 'rdoc/markup' require 'rdoc/markup/to_html' p = RDoc::Markup.new h = RDoc::Markup::ToHtml.new puts p.convert(input_string, h)
You can extend the RDoc::Markup parser to recognise new markup sequences, and to add special processing for text that matches a regular epxression. Here we make WikiWords significant to the parser, and also make the sequences {word} and <no>text...</no> signify strike-through text. When then subclass the HTML output class to deal with these:
require 'rdoc/markup' require 'rdoc/markup/to_html' class WikiHtml < RDoc::Markup::ToHtml def handle_special_WIKIWORD(special) "<font color=red>" + special.text + "</font>" end end m = RDoc::Markup.new m.add_word_pair("{", "}", :STRIKE) m.add_html("no", :STRIKE) m.add_special(/\b([A-Z][a-z]+[A-Z]\w+)/, :WIKIWORD) h = WikiHtml.new h.add_tag(:STRIKE, "<strike>", "</strike>") puts "<body>" + m.convert(ARGF.read, h) + "</body>"