[pmwiki-users] search does not find text with markup
Mikael Nilsson
mini at nada.kth.se
Tue Dec 20 16:56:31 CST 2005
tis 2005-12-20 klockan 16:08 -0600 skrev Patrick R. Michaud:
> Here's where things stand in 2.1.beta14. When a page is saved,
> PmWiki runs the markup text through the MarkupToHTML function
> (excluding things such as (:include:) and (:pagelist:)) and then
> saves the first 600 bytes as an "excerpt" attribute. This leading
> text is then readily available for things like RSS feeds and
> searches, and can be used to provide some idea of a page's contents
> in the absence of an explicit (:description:) directive.
> At the moment the 600 byte limit on excerpts is primarily there
> to prevent the internal $PCache from taking up too much memory,
> and also to keep disk space requirements down.
>
> However, we could modify this somewhat -- we could save the entire
> rendered text, and we could strip the HTML tags from the excerpt.
> This could nicely resolve the problem described above, since the
> excerpt would be searchable as well as the markup text. It
> would also allow searches to easily display the text surrounding
> a found search term.
>
> The downsides of this approach are:
> 1. by removing the HTML from an excerpt we're left with only
> the text -- no structural indications such as paragraphs or lists
> in the excerpt,
> 2. storing the rendered text in the page file increases the
> page file size a bit (although probably not too significantly
> except for large pages),
> 3. PmWiki's memory-based page cache can get too large if each
> page's excerpt attribute is stored there.
>
> Still, these three downsides might be a good trade for the
> extra functionality we might get as a result. Any opinions?
Well, to me it sounds like you need a simple text indexing engine using
flat-file databases, like phpdig: http://www.phpdig.net/index.php or
maybe SEARCpHp: http://www.hansanderson.com/php/search/
For each page that is saved, you let the engine re-index that page. It's
very much similar to the linkindex files pmwiki maintains.
However, I'm pretty sure you've already considered this....
/Mikael
--
Plus ça change, plus c'est la même chose
More information about the pmwiki-users
mailing list