SimplePageCache

<< | Cookbook-V1 | >>

Note: The recipes here are for PmWiki versions 0.6 and 1.0 only. For PmWiki 2.0 recipes, see Cookbook.


Goal

Provide caching functionality for Wiki pages, so that the HTML output does not have to be rebuilt each time.

Solution

Download cache.php.

Installation

Put cache.php in your local/ directory and add

    include_once("local/cache.php");

to your local.php file. Temporarily change the permissions on the directory containing pmwiki.php to 2777, then use your browser to visit the homepage (or any other page) of the Wiki. This will create a wiki.cache directory with the right permissions. Change the permissions back afterwards.

Discussion

This simple script was inspired by the discussion on Development.WikiCaching. In order to avoid rebuilding the HTML of a page each time a page is read, the output is captured and stored in a cache directory. Any time after the first, the cached version is served directly from disk instead.

To avoid serving stale versions of a page, the entire cache is invalidated each time a "post", "postattr" or "postupload" action is performed (or any other action that begins with "post", in order to allow WikiAdministrators to add post functionality of their own without changing the code). Also, the special action "resetcache" is provided to invalidate the cache without modifying files.

The cache is also invalidated if any file in the "local/" subdirectory is changed. This way, one can develop code without having to invalidate the cache manually each time.

You can use the browser directive [[nocache]] to disable caching for an individual page. You can also choose to just enable it for a certain group or page by including the script in the configuration file for that script or page.

Note that unless you are running on an old machine, one shared among many users, use complex markup or have long pages, you probably won't need this script. On a modern machine, PmWiki/PmWiki will serve a simple page generally in less than 0.1 seconds without any caching support.

Implementation Details

The caching mechanism creates a wrapper around the HandleBrowse() function that uses ob_start() and its companion functions to capture the results. Because HandlePost() is called from HandleEdit(), the post actions have to be enumerated explicitly, based on $action and $HTTP_POST_VARS.

Configuring the Script

The caching mechanism can be enabled and disabled by setting or clearing the $EnableCache flag. If you set the $DebugCache flag, the page title will include the text [Cached] if a cached version has been served.

A third flag is $CacheAuthCheck. Normally, the caching mechanism assumes that all pages are readable by everybody, and would therefore serve cached pages even to readers that are not authenticated. By setting $CacheAuthCheck, authentication is strictly enforced (even for included pages), at the cost of a small overhead. Given that most Wikis do not have read-protected pages, the flag is off by default to get maximum performance. A compromise between speed and security is to only include the script in groups that do not have read-protected pages.

Important: If you are using the scripts/sessionauth.php script, you have to include it before including cache.php, because cache.php hooks itself into the authentication mechanism and will not know about sessionauth.php if it is included later.

Contributors

  • Reimer Behrends

Comments

I found that after .touched was created its filemtime never changed. This problem was fixed when I altered InvalidateCache() to write to the file. Apparently under Darwin (Mac OSX), unlike Linux, merely clobbering an empty file does not update its modification time. I also changed InvalidateCache() to use $CacheTimeStamp:

 function InvalidateCache()
 {
 //global $CacheDir;
   global $CacheTimeStamp;
 //$fp = fopen("$CacheDir/.touched", "w+");
   $fp = fopen("$CacheTimeStamp", "w+");
   fwrite($fp, '.');
   fclose($fp);
 }

--Fred Henle -> mailto:henlef [snail] mercersburg [period] edu

Actually, why not simply use touch() to touch .touched?

 function InvalidateCache()
 {
   global $CacheTimeStamp;
   touch("$CacheTimeStamp");
 }

--Fred Henle -> mailto:henlef [snail] mercersburg [period] edu

I've changed cache.php to use touch().

-- Reimer Behrends

Great! I have a suggestion for a slight change in how pages are served after the cache is invalidated. The reason I want caching is that my most complex page takes 13 seconds to generate and serve. Before I started using SimplePageCache, the browser would start rendering the (partial) page immediately, and update it continually until it was done. With SimplePageCache, when that page has to be regenerated, the browser doesn't see anything for 13 seconds, then the whole thing appears when ready. I think this is because the output buffer isn't flushed until ob_end_flush() is called at shutdown. I see two potential solutions:

  1. Use ob_implicit_flush() to send the page to the browser as it is generated. I tried putting in a call to ob_implicit_flush() but I got the infamous "headers already sent" error so I must have been doing something wrong.
  2. Serve the (possibly slightly out of date) cache page with an immediate refresh/redirect to the regenerated page. That way the old page appears almost instantaneously, to be replaced by the new page whenever it's ready. There must be several ways to manage this.

I'm willing to try implementing the second option if there's no easy way to get the first option to work....

--[(approve links) edit diff]

Okay, I have a small diff for cache.php which seems to work for me:

 90,91c90,92
 <   if ($cachetime > $lastchange)
 <   {
 ---
 >   if (file_exists("$CacheFile.redo")) {
 >     unlink("$CacheFile.redo");
 >   } else {
 108a110,113
 >       if ($cachetime < $lastchange) {
 >         touch("$CacheFile.redo");
 >         $contents = str_replace("</head>", "<meta http-equiv='Refresh' content='0; URL=$PHP_SELF' /></head>", $contents);
 >       }

It serves the invalidated cache page with an immediate refresh to the regenerated page. I don't know if there's a better way to do this....

--[(approve links) edit diff]

I just noticed a problem with my patch, which is that the cached page displays before authentication. I'll have to try to figure out how to prevent that....

--[(approve links) edit diff]

The following patch should handle incremental output better. It flushes the output once per input line. I'll fold it into the main file as soon as it has seen some more extensive testing.

 
38a39,48
> $DoubleBrackets['/^/e'] = 'IncrementalFlushCache()';
> $CacheStoreIncremental = '';
> 
> function IncrementalFlushCache()
> {
>   global $CacheStoreIncremental;
>   $CacheStoreIncremental .= ob_get_contents();
>   ob_flush();
> }
> 
66c76,77
<     $CacheSavedAuth, $CacheReadAuthPages, $CacheFailedAuth;
---
>     $CacheSavedAuth, $CacheReadAuthPages, $CacheFailedAuth,
>     $CacheStoreIncremental;
125c136,137
<       "included" => $CacheReadAuthPages, "html" => ob_get_contents());
---
>       "included" => $CacheReadAuthPages,
>       "html" => $CacheStoreIncremental . ob_get_contents());

-- Reimer Behrends

I had a problem that in cached pages the pagename was written just before the <DOCTYPE .. - declaration.

After searching a while the line:

 
 // Send our own headers, not the PHP default.
 PrintFmt("headers:", $pagename);

seemed to be the problem. I'm not sure what is done here. But commenting out this line helps. Used with pmwiki 0.6.14.

-- Svogel

Things might actually work a lot faster, even, if you set up PmWiki as an Apache ErrorDocument handler, so it only gets called at all if the cache page doesn't exist. See FourOhFourCache for more information about this technique. --EvanProdromou

This recipe works for 0.6.17 except for one thing. The Search function is hijacked and rendered usless when SimplePageCache is enabled. I had to add [[nocache]] to the top of the Search page. -- TreverMiller pmwiki-2.3.38 -- Last modified by {{Arrowman}}?

from IP: 85.171.160.186 ip should be disabled by default for security reasons