Class | RDoc::Markup |
In: |
markup.rb
markup/fragments.rb markup/inline.rb markup/lines.rb markup/to_flow.rb doc-tmp/rdoc/markup.rb |
Parent: | Object |
RDoc::Markup parses plain text documents and attempts to decompose them into their constituent parts. Some of these parts are high-level: paragraphs, chunks of verbatim text, list entries and the like. Other parts happen at the character level: a piece of bold text, a word in code font. This markup is similar in spirit to that used on WikiWiki webs, where folks create web pages using a simple set of formatting rules.
RDoc::Markup itself does no output formatting: this is left to a different set of classes.
RDoc::Markup is extendable at runtime: you can add new markup elements to be recognised in the documents that RDoc::Markup parses.
RDoc::Markup is intended to be the basis for a family of tools which share the common requirement that simple, plain-text should be rendered in a variety of different output formats and media. It is envisaged that RDoc::Markup could be the basis for formating RDoc style comment blocks, Wiki entries, and online FAQs.
* this is a list with three paragraphs in the first item. This is the first paragraph. And this is the second paragraph. 1. This is an indented, numbered list. 2. This is the second item in that list This is the third conventional paragraph in the first list item. * This is the second item in the original list
[cat] a small furry mammal that seems to sleep a lot [ant] a little insect that is known to enjoy picnics
A minor variation on labeled lists uses two colons to separate the label from the list body:
cat:: a small furry mammal that seems to sleep a lot ant:: a little insect that is known to enjoy picnics
This latter style guarantees that the list bodies’ left margins are aligned: think of them as a two column table.
Word-based markup uses flag characters around individual words:
General markup affects text between a start delimiter and and end delimiter. Not surprisingly, these delimiters look like HTML markup.
Unlike conventional Wiki markup, general markup can cross line boundaries. You can turn off the interpretation of markup by preceding the first character with a backslash, so \<b>bold text</b> and \*bold* produce <b>bold text</b> and *bold respectively.
Hyperlinks can also be of the form label[url], in which case the label is used in the displayed text, and url is used as the target. If label contains multiple words, put it in braces: {multi word label}[url].
This code converts input_string to HTML. The conversion takes place in the convert method, so you can use the same RDoc::Markup object to convert multiple input strings.
require 'rdoc/markup' require 'rdoc/markup/to_html' p = RDoc::Markup.new h = RDoc::Markup::ToHtml.new puts p.convert(input_string, h)
You can extend the RDoc::Markup parser to recognise new markup sequences, and to add special processing for text that matches a regular epxression. Here we make WikiWords significant to the parser, and also make the sequences {word} and <no>text...</no> signify strike-through text. When then subclass the HTML output class to deal with these:
require 'rdoc/markup' require 'rdoc/markup/to_html' class WikiHtml < RDoc::Markup::ToHtml def handle_special_WIKIWORD(special) "<font color=red>" + special.text + "</font>" end end m = RDoc::Markup.new m.add_word_pair("{", "}", :STRIKE) m.add_html("no", :STRIKE) m.add_special(/\b([A-Z][a-z]+[A-Z]\w+)/, :WIKIWORD) h = WikiHtml.new h.add_tag(:STRIKE, "<strike>", "</strike>") puts "<body>" + m.convert(ARGF.read, h) + "</body>"
Take a block of text and use various heuristics to determine it‘s structure (paragraphs, lists, and so on). Invoke an event handler as we identify significant chunks.
# File doc-tmp/rdoc/markup.rb, line 194 194: def initialize 195: @am = RDoc::Markup::AttributeManager.new 196: @output = nil 197: end
Take a block of text and use various heuristics to determine it‘s structure (paragraphs, lists, and so on). Invoke an event handler as we identify significant chunks.
# File markup.rb, line 194 194: def initialize 195: @am = RDoc::Markup::AttributeManager.new 196: @output = nil 197: end
Add to the sequences recognized as general markup.
# File doc-tmp/rdoc/markup.rb, line 211 211: def add_html(tag, name) 212: @am.add_html(tag, name) 213: end
Add to the sequences recognized as general markup.
# File markup.rb, line 211 211: def add_html(tag, name) 212: @am.add_html(tag, name) 213: end
Add to other inline sequences. For example, we could add WikiWords using something like:
parser.add_special(/\b([A-Z][a-z]+[A-Z]\w+)/, :WIKIWORD)
Each wiki word will be presented to the output formatter via the accept_special method.
# File markup.rb, line 224 224: def add_special(pattern, name) 225: @am.add_special(pattern, name) 226: end
Add to other inline sequences. For example, we could add WikiWords using something like:
parser.add_special(/\b([A-Z][a-z]+[A-Z]\w+)/, :WIKIWORD)
Each wiki word will be presented to the output formatter via the accept_special method.
# File doc-tmp/rdoc/markup.rb, line 224 224: def add_special(pattern, name) 225: @am.add_special(pattern, name) 226: end
Add to the sequences used to add formatting to an individual word (such as bold). Matching entries will generate attibutes that the output formatters can recognize by their name.
# File doc-tmp/rdoc/markup.rb, line 204 204: def add_word_pair(start, stop, name) 205: @am.add_word_pair(start, stop, name) 206: end
Add to the sequences used to add formatting to an individual word (such as bold). Matching entries will generate attibutes that the output formatters can recognize by their name.
# File markup.rb, line 204 204: def add_word_pair(start, stop, name) 205: @am.add_word_pair(start, stop, name) 206: end
Look through the text at line indentation. We flag each line as being Blank, a paragraph, a list element, or verbatim text.
# File markup.rb, line 254 254: def assign_types_to_lines(margin = 0, level = 0) 255: now_blocking = false 256: while line = @lines.next 257: if line.blank? then 258: line.stamp :BLANK, level 259: next 260: end 261: 262: # if a line contains non-blanks before the margin, then it must belong 263: # to an outer level 264: 265: text = line.text 266: 267: for i in 0...margin 268: if text[i] != SPACE 269: @lines.unget 270: return 271: end 272: end 273: 274: active_line = text[margin..-1] 275: 276: # 277: # block_exceptions checking 278: # 279: if @block_exceptions 280: if now_blocking 281: line.stamp(:PARAGRAPH, level) 282: @block_exceptions.each{ |be| 283: if now_blocking == be['name'] 284: be['replaces'].each{ |rep| 285: line.text.gsub!(rep['from'], rep['to']) 286: } 287: end 288: if now_blocking == be['name'] && line.text =~ be['end'] 289: now_blocking = false 290: break 291: end 292: } 293: next 294: else 295: @block_exceptions.each{ |be| 296: if line.text =~ be['start'] 297: now_blocking = be['name'] 298: line.stamp(:PARAGRAPH, level) 299: break 300: end 301: } 302: next if now_blocking 303: end 304: end 305: 306: # Rules (horizontal lines) look like 307: # 308: # --- (three or more hyphens) 309: # 310: # The more hyphens, the thicker the rule 311: # 312: 313: if /^(---+)\s*$/ =~ active_line 314: line.stamp :RULE, level, $1.length-2 315: next 316: end 317: 318: # Then look for list entries. First the ones that have to have 319: # text following them (* xxx, - xxx, and dd. xxx) 320: 321: if SIMPLE_LIST_RE =~ active_line 322: offset = margin + $1.length 323: prefix = $2 324: prefix_length = prefix.length 325: 326: flag = case prefix 327: when "*","-" then :BULLET 328: when /^\d/ then :NUMBER 329: when /^[A-Z]/ then :UPPERALPHA 330: when /^[a-z]/ then :LOWERALPHA 331: else raise "Invalid List Type: #{self.inspect}" 332: end 333: 334: line.stamp :LIST, level+1, prefix, flag 335: text[margin, prefix_length] = " " * prefix_length 336: assign_types_to_lines(offset, level + 1) 337: next 338: end 339: 340: if LABEL_LIST_RE =~ active_line 341: offset = margin + $1.length 342: prefix = $2 343: prefix_length = prefix.length 344: 345: next if handled_labeled_list(line, level, margin, offset, prefix) 346: end 347: 348: # Headings look like 349: # = Main heading 350: # == Second level 351: # === Third 352: # 353: # Headings reset the level to 0 354: 355: if active_line[0] == ?= and active_line =~ /^(=+)\s*(.*)/ 356: prefix_length = $1.length 357: prefix_length = 6 if prefix_length > 6 358: line.stamp :HEADING, 0, prefix_length 359: line.strip_leading(margin + prefix_length) 360: next 361: end 362: 363: # If the character's a space, then we have verbatim text, 364: # otherwise 365: 366: if active_line[0] == SPACE 367: line.strip_leading(margin) if margin > 0 368: line.stamp :VERBATIM, level 369: else 370: line.stamp :PARAGRAPH, level 371: end 372: end 373: end
Look through the text at line indentation. We flag each line as being Blank, a paragraph, a list element, or verbatim text.
# File doc-tmp/rdoc/markup.rb, line 254 254: def assign_types_to_lines(margin = 0, level = 0) 255: now_blocking = false 256: while line = @lines.next 257: if line.blank? then 258: line.stamp :BLANK, level 259: next 260: end 261: 262: # if a line contains non-blanks before the margin, then it must belong 263: # to an outer level 264: 265: text = line.text 266: 267: for i in 0...margin 268: if text[i] != SPACE 269: @lines.unget 270: return 271: end 272: end 273: 274: active_line = text[margin..-1] 275: 276: # 277: # block_exceptions checking 278: # 279: if @block_exceptions 280: if now_blocking 281: line.stamp(:PARAGRAPH, level) 282: @block_exceptions.each{ |be| 283: if now_blocking == be['name'] 284: be['replaces'].each{ |rep| 285: line.text.gsub!(rep['from'], rep['to']) 286: } 287: end 288: if now_blocking == be['name'] && line.text =~ be['end'] 289: now_blocking = false 290: break 291: end 292: } 293: next 294: else 295: @block_exceptions.each{ |be| 296: if line.text =~ be['start'] 297: now_blocking = be['name'] 298: line.stamp(:PARAGRAPH, level) 299: break 300: end 301: } 302: next if now_blocking 303: end 304: end 305: 306: # Rules (horizontal lines) look like 307: # 308: # --- (three or more hyphens) 309: # 310: # The more hyphens, the thicker the rule 311: # 312: 313: if /^(---+)\s*$/ =~ active_line 314: line.stamp :RULE, level, $1.length-2 315: next 316: end 317: 318: # Then look for list entries. First the ones that have to have 319: # text following them (* xxx, - xxx, and dd. xxx) 320: 321: if SIMPLE_LIST_RE =~ active_line 322: offset = margin + $1.length 323: prefix = $2 324: prefix_length = prefix.length 325: 326: flag = case prefix 327: when "*","-" then :BULLET 328: when /^\d/ then :NUMBER 329: when /^[A-Z]/ then :UPPERALPHA 330: when /^[a-z]/ then :LOWERALPHA 331: else raise "Invalid List Type: #{self.inspect}" 332: end 333: 334: line.stamp :LIST, level+1, prefix, flag 335: text[margin, prefix_length] = " " * prefix_length 336: assign_types_to_lines(offset, level + 1) 337: next 338: end 339: 340: if LABEL_LIST_RE =~ active_line 341: offset = margin + $1.length 342: prefix = $2 343: prefix_length = prefix.length 344: 345: next if handled_labeled_list(line, level, margin, offset, prefix) 346: end 347: 348: # Headings look like 349: # = Main heading 350: # == Second level 351: # === Third 352: # 353: # Headings reset the level to 0 354: 355: if active_line[0] == ?= and active_line =~ /^(=+)\s*(.*)/ 356: prefix_length = $1.length 357: prefix_length = 6 if prefix_length > 6 358: line.stamp :HEADING, 0, prefix_length 359: line.strip_leading(margin + prefix_length) 360: next 361: end 362: 363: # If the character's a space, then we have verbatim text, 364: # otherwise 365: 366: if active_line[0] == SPACE 367: line.strip_leading(margin) if margin > 0 368: line.stamp :VERBATIM, level 369: else 370: line.stamp :PARAGRAPH, level 371: end 372: end 373: end
For debugging, we allow access to our line contents as text.
# File markup.rb, line 489 489: def content 490: @lines.as_text 491: end
For debugging, we allow access to our line contents as text.
# File doc-tmp/rdoc/markup.rb, line 489 489: def content 490: @lines.as_text 491: end
We take a string, split it into lines, work out the type of each line, and from there deduce groups of lines (for example all lines in a paragraph). We then invoke the output formatter using a Visitor to display the result.
# File doc-tmp/rdoc/markup.rb, line 234 234: def convert(str, op, block_exceptions=nil) 235: lines = str.split(/\r?\n/).map { |line| Line.new line } 236: @lines = Lines.new lines 237: @block_exceptions = block_exceptions 238: 239: return "" if @lines.empty? 240: @lines.normalize 241: assign_types_to_lines 242: group = group_lines 243: # call the output formatter to handle the result 244: #group.each { |line| p line } 245: group.accept @am, op 246: end
We take a string, split it into lines, work out the type of each line, and from there deduce groups of lines (for example all lines in a paragraph). We then invoke the output formatter using a Visitor to display the result.
# File markup.rb, line 234 234: def convert(str, op, block_exceptions=nil) 235: lines = str.split(/\r?\n/).map { |line| Line.new line } 236: @lines = Lines.new lines 237: @block_exceptions = block_exceptions 238: 239: return "" if @lines.empty? 240: @lines.normalize 241: assign_types_to_lines 242: group = group_lines 243: # call the output formatter to handle the result 244: #group.each { |line| p line } 245: group.accept @am, op 246: end
For debugging, return the list of line types.
# File doc-tmp/rdoc/markup.rb, line 497 497: def get_line_types 498: @lines.line_types 499: end
For debugging, return the list of line types.
# File markup.rb, line 497 497: def get_line_types 498: @lines.line_types 499: end
Return a block consisting of fragments which are paragraphs, list entries or verbatim text. We merge consecutive lines of the same type and level together. We are also slightly tricky with lists: the lines following a list introduction look like paragraph lines at the next level, and we remap them into list entries instead.
# File markup.rb, line 456 456: def group_lines 457: @lines.rewind 458: 459: in_list = false 460: wanted_type = wanted_level = nil 461: 462: block = LineCollection.new 463: group = nil 464: 465: while line = @lines.next 466: if line.level == wanted_level and line.type == wanted_type 467: group.add_text(line.text) 468: else 469: group = block.fragment_for(line) 470: block.add(group) 471: 472: if line.type == :LIST 473: wanted_type = :PARAGRAPH 474: else 475: wanted_type = line.type 476: end 477: 478: wanted_level = line.type == :HEADING ? line.param : line.level 479: end 480: end 481: 482: block.normalize 483: block 484: end
Return a block consisting of fragments which are paragraphs, list entries or verbatim text. We merge consecutive lines of the same type and level together. We are also slightly tricky with lists: the lines following a list introduction look like paragraph lines at the next level, and we remap them into list entries instead.
# File doc-tmp/rdoc/markup.rb, line 456 456: def group_lines 457: @lines.rewind 458: 459: in_list = false 460: wanted_type = wanted_level = nil 461: 462: block = LineCollection.new 463: group = nil 464: 465: while line = @lines.next 466: if line.level == wanted_level and line.type == wanted_type 467: group.add_text(line.text) 468: else 469: group = block.fragment_for(line) 470: block.add(group) 471: 472: if line.type == :LIST 473: wanted_type = :PARAGRAPH 474: else 475: wanted_type = line.type 476: end 477: 478: wanted_level = line.type == :HEADING ? line.param : line.level 479: end 480: end 481: 482: block.normalize 483: block 484: end
Handle labeled list entries, We have a special case to deal with. Because the labels can be long, they force the remaining block of text over the to right:
this is a long label that I wrote:: and here is the block of text with a silly margin
So we allow the special case. If the label is followed by nothing, and if the following line is indented, then we take the indent of that line as the new margin.
this is a long label that I wrote:: here is a more reasonably indented block which will be attached to the label.
# File markup.rb, line 393 393: def handled_labeled_list(line, level, margin, offset, prefix) 394: prefix_length = prefix.length 395: text = line.text 396: flag = nil 397: 398: case prefix 399: when /^\[/ then 400: flag = :LABELED 401: prefix = prefix[1, prefix.length-2] 402: when /:$/ then 403: flag = :NOTE 404: prefix.chop! 405: else 406: raise "Invalid List Type: #{self.inspect}" 407: end 408: 409: # body is on the next line 410: if text.length <= offset then 411: original_line = line 412: line = @lines.next 413: return false unless line 414: text = line.text 415: 416: for i in 0..margin 417: if text[i] != SPACE 418: @lines.unget 419: return false 420: end 421: end 422: 423: i = margin 424: i += 1 while text[i] == SPACE 425: 426: if i >= text.length then 427: @lines.unget 428: return false 429: else 430: offset = i 431: prefix_length = 0 432: 433: if text[offset..-1] =~ SIMPLE_LIST_RE then 434: @lines.unget 435: line = original_line 436: line.text = '' 437: else 438: @lines.delete original_line 439: end 440: end 441: end 442: 443: line.stamp :LIST, level+1, prefix, flag 444: text[margin, prefix_length] = " " * prefix_length 445: assign_types_to_lines(offset, level + 1) 446: return true 447: end
Handle labeled list entries, We have a special case to deal with. Because the labels can be long, they force the remaining block of text over the to right:
this is a long label that I wrote:: and here is the block of text with a silly margin
So we allow the special case. If the label is followed by nothing, and if the following line is indented, then we take the indent of that line as the new margin.
this is a long label that I wrote:: here is a more reasonably indented block which will be attached to the label.
# File doc-tmp/rdoc/markup.rb, line 393 393: def handled_labeled_list(line, level, margin, offset, prefix) 394: prefix_length = prefix.length 395: text = line.text 396: flag = nil 397: 398: case prefix 399: when /^\[/ then 400: flag = :LABELED 401: prefix = prefix[1, prefix.length-2] 402: when /:$/ then 403: flag = :NOTE 404: prefix.chop! 405: else 406: raise "Invalid List Type: #{self.inspect}" 407: end 408: 409: # body is on the next line 410: if text.length <= offset then 411: original_line = line 412: line = @lines.next 413: return false unless line 414: text = line.text 415: 416: for i in 0..margin 417: if text[i] != SPACE 418: @lines.unget 419: return false 420: end 421: end 422: 423: i = margin 424: i += 1 while text[i] == SPACE 425: 426: if i >= text.length then 427: @lines.unget 428: return false 429: else 430: offset = i 431: prefix_length = 0 432: 433: if text[offset..-1] =~ SIMPLE_LIST_RE then 434: @lines.unget 435: line = original_line 436: line.text = '' 437: else 438: @lines.delete original_line 439: end 440: end 441: end 442: 443: line.stamp :LIST, level+1, prefix, flag 444: text[margin, prefix_length] = " " * prefix_length 445: assign_types_to_lines(offset, level + 1) 446: return true 447: end