MarkupToUnstyled

Summary: Converts PmWiki markup into unstyled text
Version: 2009-03-01
Prerequisites: PmWiki2 (developed and tested with 2.2.0-stable)
Status: working
Maintainer: Tontyna
Categories: Markup PHP72

Questions answered by this recipe

How can I extract just the pure unstyled text from a given string containing PmWiki markup? Like MarkupToHTML does only without HTML tags.
Links should be converted to the usual link text PmWiki produces.

Description

Developing SlimTableOfContents and extending SectionEdit I was in need for a recipe that gives the - properly unformatted - text of the headings.

I ended up with function MarkupToUnstyled($pagename, $markuptext) implemented in this cookbook.

SlimTableOfContents uses the text-only result as link text in the TOC and SectionEdit creates the edit link html title from it.

Activation

Cookbooks SlimTableOfContents and SectionEdit (since v 2.2.1-2009-02-26) include this script automatically.
When NOT using those cookbooks:

  • activate the script as usual by adding the following line to your local/config.php:
  include_once("$FarmD/cookbook/markuptounstyled.php");
  • Customize the $MarkupToUnstyledIgnorePattern array depending on recipes / markup your Wiki implements - see Customization

Usage

Whenever you need unstyled text-only call function MarkupToUnstyled():

  $unstyledtext = MarkupToUnstyled($pagename, $markuptext);

The $unstyledtext will contain no more markup, no links, no formattings, no HTML <tags>.

How it works

MarkupToUnstyled()

  1. redirects all link functions to suppress the generation of <a href></a> tags and to produce only the regular PmWiki link text
    e.g. [[PageWithTitle|+]] becomes 'TitleOfPageWithTitle'
    e.g. [[PageNotYetCreated|+]] becomes 'PageNotYetCreated'
  2. removes markup patterns from the input text which shouldn't be executed in step 4., i.e. removes markup that produces output we don't want in the unstyled text - see Customization
  3. removes html tags BEFORE evaluation markup (e.g. [@..@] might already be wrapped with <code class='escaped'>
  4. evaluates markup by calling PmWiki's MarkupToHTML
  5. removes newlines from result
  6. removes html tags from result
  7. replaces non-styling %...% - produced by $KeepTokens which might be restored in step 4.
  8. restores LinkFunctions back to their original function call

Customization

The array $MarkupToUnstyledIgnorePattern holds regex patterns for markup that should be ignored in unstyled text.
These patterns are removed from the input before calling MarkupToHTML.

By default it holds the replace pattern for [[target |#]] reference links and [[#anchor]]s:

  SDV($MarkupToUnstyledIgnorePattern, array(
        "(?>\\[\\[([^|\\]]+))\\|\\s*#\\s*\\]\\]", // [[target |#]] reference links
        "(?>\\[\\[#([A-Za-z][-.:\\w]*))\\]\\]" // [[#anchor]]
  ));

Depending on the cookbooks / markups your Wiki uses you should extend the $MarkupToUnstyledIgnorePattern array - after including the script.

E.g. if you have cookbook Footnotes installed you should add the following to your config.php:

  $MarkupToUnstyledIgnorePattern[] = '\\[\\^(.*?)\\^\\]';

Cookbook SectionEdit already adds the following pattern:

  $MarkupToUnstyledIgnorePattern[] = '\\(:sectionedit.*:\\)';

Notes

The default $MarkupToUnstyledIgnorePattern array will be extended in future versions - I'm no PmWiki expert and there might be a lot more PmWiki builtin markups that should be ignored.

The recipe is required by cookbooks

Release Notes

If the recipe has multiple releases, then release notes can be placed here. Note that it's often easier for people to work with "release dates" instead of "version numbers".

  • (2009-03-01) Added markup to $MarkupToUnstyledIgnorePattern
  • (2009-02-26) Initial version

See Also

Contributors

Comments

See discussion at MarkupToUnstyled-Talk

User notes? : If you use, used or reviewed this recipe, you can add your name. These statistics appear in the Cookbook listings and will help newcomers browsing through the wiki.