SaveThePage

Summary: Save an entire web page (the HTML) to a wiki page by using a bookmarklet to send the current page to your wiki for processing and editing.
Version: 2012-09-27
Prerequisites: v2.2+, HTML::WikiConverter (CPAN)
Status: abandoned
Maintainer:
License: GPL2+
Categories: Uncategorized
Users: (view? / edit)
Discussion: SaveThePage-Talk?

Questions answered by this recipe

How can I save an entire web page to my wiki as a new article?

Description

Save an entire web page (the HTML) to a wiki page by using a bookmarklet to send the current page to your wiki for processing and editing..

Installation

Download savethepage.zipΔ to a safe place and unzip in your wiki root directory. Add the following to local/config.php:

   include_once("$FarmD/cookbook/savethepage.php");

Configuration

The following variables can be set before the include statement above to configure the recipe:

  • $STP_PagePrefix: whatever you'd like to insert before the wiki text. Default is nothing.
  • $STP_PageSuffix: whatever you'd like to insert afterwards. Default is nothing.
  • $STP_NewPageNamePrefix: Used for the page name if the page you are saving has no <title>
  • $STP_PageFmt: the template for the page. Variables that will be inserted are:
    • $summary: the page's meta description contents, if any
    • $tags: the page's meta keyword contents, if any
    • $stp_url: the url of the source page
    • $title: the contents of the <title> tag, if any
    • $time: time the page was grabbed
    • $text: converted text of the page's HTML into PmWiki markup
The default value for $STP_PageFmt is:
"
(:linebreaks:)
Summary:\$summary
Tags:\$tags
Source:\$stp_url
Title: \$title
Saved:\$time
(:nolinebreaks:)
(:nolinkwikiwords:)
\$text
(:linkwikiwords:)
"

Usage

This recipe relies on establishing a bookmarklet that you can drag onto your bookmark toolbar (or copy and save someplace in your bookmark collection). To create it, you put the following on a page in the group you want to save pages to:

   (:savethepage:)

When you've dragged or otherwise saved the bookmark, you can delete the markup.

Notes

This recipe will not work for web pages/sites that require authorization to retreive the page. It will also not work with pages where the content you want to capture is created via javascript.

HTML::WikiConverter

This is a perl script, available on CPAN, that is used to make the conversion between HTML and PmWiki. I tried using the Cookbook:ConvertHTML recipe, but it was not complete enough and left much of the orginal HTML intact.

A patch is required to make the PmWiki conversion work correctly in some cases (see Bug 79778):

In HTML::WikiConverter::PmWiki, make the following change:

152c152
< return $node->attr('src') || '';
---
> return ' ' . ($node->attr('src') || '') . ' ' ; 

(There may be other problems with the converter as well.)

Change log / Release notes

If the recipe has multiple releases, then release notes can be placed here. Note that it's often easier for people to work with "release dates" instead of "version numbers".

  • 2012-11-23.2b -- Initial Release
  • 2012-12-03 -- Fix problem where old version of PHP 5.2.12 did not handle UTF-8 characters properly.

See also

Contributors

Comments

See discussion at SaveThePage-Talk?

User notes? : If you use, used or reviewed this recipe, you can add your name. These statistics appear in the Cookbook listings and will help newcomers browsing through the wiki.