FourOhFourCache

Note: The recipes here are for PmWiki versions 0.6 and 1.0 only. For PmWiki 2.0 recipes, see Cookbook.


Goal

Avoid running the PmWiki PHP script if at all possible, by writing HTML files out to a Web directory and serving them from there.

Solution

Attach:pmwiki-404cache-0.1.tar.gz

Attach:pmwiki-404cache-0.1.zip

404-handler caching is a way of creating dynamic Web sites with as little overhead as possible. It comes from two ideas:

  1. It is usually about an order of magnitude or two easier on a Web server to serve static files (like HTML files) rather than running a script to generate a file.
  2. Most dynamic Web systems generate the same files over and over and over and over again, with maybe a 10-to-1, or 100-to-1 or 10,000-to-1 ratio of page reads to page changes. Wiki sites are a pretty common example of this.

The goal, then, is to generate Web pages once and write them out to a cache directory. You then let your Web server serve the static files at blinding speed.

To do this, we use a feature of the Apache Web server for handling missing pages. (Other Web servers probably have similar features; this software doesn't support them.) Apache lets you assign a script to handle errors for when a file is missing from a particular directory. This is called a ''404 handler'', since it handles the HTTP error for missing files (with code 404).

With 404-handler caching, we set up a directory (the cache directory) where our generated pages will go. Initially, this will be empty. When a request comes in for some page, it will of course not be there, so the 404 handler will be called. The handler will look in its database for the page that's requested, and convert it to HTML with all the fancy trimmings our dynamic pages need. It then writes it out to the HTML to standard out -- and thus the requesting client -- but also to a file in the cache directory. The next time the same page is requested, the file in the cache directory will be served directly, without the handler being called. This is where our big savings come in.

When a page is changed (in the case of PmWiki, when a user edits and saves the page), the cache file is deleted. The next request for the page should generate the new version. We also remove the cached files for other pages that depend on the contents of the saved page.

Note that 404-handler caching is done by a lot of tools, such as HTML::Mason. It's not a new technique nor a particularly brilliant one. It just happens to be one that works nicely and isn't well-implemented for wikis.

Discussion

THIS IS EXPERIMENTAL SOFTWARE; TREAT IT WITH FEAR AND DISDAIN. This software writes files to your file system, and deletes files on the file system. There's a non-zero chance that your configuration will trigger a bug that will delete or overwrite the wrong files. So, make backups before installing, and be careful.

These installation instructions aren't particularly easy; I'd like to make the install as easy as PmWiki's is.

Installing files

You should be able to just unpack this tarball (or zipfile) and then copy all its contents recursively to your pmwiki directory:

 
    tar zxvft pmwiki-404cache-0.1.tar.gz
    cp -R pmwiki-404cache-0.1/* /path/to/pmwiki/

Configuring Apache

You'll also need to set up a directory to hold your cache pages. You'll need to create an .htaccess file to do this, and it has to have the following commands in it:

 
    Options +MultiViews
    ErrorDocument 404 /path/to/pmwiki/pmwiki.php

Here, the path to pmwiki.php is URL-relative, not filesystem-relative. You can also add these commands to a <Directory> stanza in your httpd.conf file; read your Apache docs to see how. (I've had more luck with <Directory> than with .htaccess, by the way.)

The cache directory has to be writeable as the Web user, and it has to be a directory. DON'T use a directory you have other things in -- this package will delete all files in that directory on a regular basis!

Configuring PmWiki

To activate this extension, add the following line to your PmWiki local/config.php file:

 
    include_once('cookbook/404cache.php');

Configuration variables

These are the config variables that affect how the 404-cache works.

$CacheDir
The filesystem directory for the root of the cache. Cache files will be put here. Defaults to 'pub/pages'.
$CachePath
The URL path for the root of the cache. This is where URLs for pages will point. Defaults to '/path/to/pub/pages' (it figures out the path to pub from $PubDirUrl). If you change one of these, you should change the other, too.
Template changes

Because of the way this extension works, you can't have links in your template that look like this:

 
     $PageUrl?action=foo

The $PageUrl is going to point to a static HTML page now, so it's not going to know how to handle the action. You need to change stuff like this to:

 
     $ScriptUrl?title=$Group/$Title&action=foo

There's a template installed by default that does this. You can use it by adding the following line to your local/config.php file:

 
     $PageTemplateFmt = "pub/404cache/404cache.tmpl";

(There's actually a super-tricky way to get around this with mod_rewrite foolishness, but it's best not to mess with that. This is hard enough!)

See Also

  • SimplePageCache

History

  • 30 June 2004: version 0.1.

Comments & Bugs

A common "bug" is that, after installing this extension, the edit, history, and other actions don't work. This is almost certainly because you didn't change your PmWiki page template as described above.

Contributors

Copyright

Copyright (C) 2004 Evan Prodromou <evan at pigdog.org>.

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA This software's available under the GNU General Public License (GPL) version 2 or later. pmwiki-2.3.38 -- Last modified by {{Floris}}?

from IP: 85.171.160.186 ip should be disabled by default for security reasons