[pmwiki-users] pagelist performance analysis
Martin Fick
fick at fgm.com
Mon Apr 4 17:32:47 CDT 2005
Since I have an interest in getting categories to work
faster, I have done some analysis of (:paglist:). Pageilst
is the heart of categories. Since pagelist is inherently a
search, it needs to read every single wikipage within the
pages defined by the pagelist parameters. This means that
as the page count goes up, the time to comput pagelists
goes up. This also means that if the content of the pages
grows larger the search time goes up.
I have old machine, ~ pentium 266Mhz so it should be a good
place to test performance bottlenecks. :)
I have approximately 500 page in my search criteria for
cateogries, but they are very small pages, ~ 10 lines and
the performance of pagelist is abismal. As a reference
point, The wc for my wiki.d dir is:
7138 17987 282663 total
------------
My research has led me to conclude that 2 things are really
slow here:
1) The FmtPageName function, which the comments say are
used to:
## FmtPageName handles $[internationalization] and $Variable
## substitutions in strings based on the $pagename argument.
I have replaced it with this simple hack and get drastic
speed improvements on my pages. Example, a page with a
pagelist went from 39s to 27s to render.
This hack simply gives the defaults that I seem to need
on my site.
Simple hack (pmwiki.php):
function FmtPageName($fmt, $pagename) {
global $FarmD;
if ($fmt == 'wiki.d/$FullName') return "wiki.d/$pagename";
if ($fmt == '$FarmD/wikilib.d/$FullName') return "$FarmD/wikilib.d/$pagename";
return FmtPageNameO($fmt, $pagename);
}
rename old FmtPageName to FmtPageNameO.
Now obviously I am not suggesting to run things this
way, but it is a serious contender for optimazation.
2) Reading many files in PHP. I made many hacks with page
content caching. On pages with multiple paglists this
is a great improvement. The problem is that, of
course, this only proves that it's slow, it doesn't
help speed up the simple (probably most important case)
of one pagelist.
To speed this up, I resorted to brute force: grep.
This hack is interesting because it does not actually
seem to speed things up unless used with hack #1
(FmtPageName), in fact it can slow things down. But
with hack #1, that same page now renders in ~10s!!!!
This hack simply prefilters the pages with a grep.
add this line in pagelist.php to call my hack
functions:
foreach($incl as $t) $pagelist = StristrPages($pagelist, $t);
in the function FmtPageList after this if statement:
if (@$opt['trail']) {
$t = ReadTrail($pagename,$opt['trail']);
foreach($t as $pagefile) $pagelist[] = $pagefile['pagename'];
} else $pagelist = ListPages($pats);
the new hack functions:
function StristrPages($pages, $str) {
return GrepFilterPages($pages, $str, "-iF");
}
function GrepFilterPages($pages, $str) {
global $WikiLibDirs;
foreach($pages as $pagename)
foreach((array)$WikiLibDirs as $dir) {
$pagefile = FmtPageName($dir->dirfmt,$pagename);
if (!file_exists($pagefile)) continue;
$names[$pagefile]=$pagename;
$files .= " '$pagefile'";
}
$grep = shell_exec("grep -l $args '$str' $files");
$files = explode("\n", $grep);
foreach($files as $pagefile) $out[]= $names[$pagefile];
return $out;
}
------------
So, what does this mean? The grep hack shows that adding
sometype of search engine would be a great benefit to
categories. The FmtPageName hack is less obvious. Can the
names be cached somehow? (on disk)
Any feedback about these hacks is welcome, do they work for
you, are they blatantly incorrect? If anyone wants to know
what knid of caching work I hacked together, I can post
that too (it's ugly).
-Martin
More information about the pmwiki-users
mailing list