Sphinx

Summary: How can I search my PmWiki content via the Sphinx engine?

Version: 0.1

Prerequisites: Sphinx, PmWiki, bash, sed

Status:

Maintainer: Utopiah

Categories: Searching

Users: (view ? / edit)

Discussion: Sphinx-Talk ?

Questions answered by this recipe

Description

Sphinx is a full-text search engine that extends PmWiki's built-in search capability to more efficiently support features such as stemming, indexing of multiple wikis, search of special fields, and custom ranking.

Installation

Have a PmWiki instance running
Have a Sphinx instance running
1. searchd and well configured indexer (use search in CLI to make sure it is the case)
2. try the documented tests if it is your first time
download the bash script sphinx_sources.txtΔ to locate and parse your wikis
1. rename to sphinx_sources
2. fix the paths to your own directories
download the sed script pmwiki-to-sphinxxml.txtΔ in the same directory
1. rename to pmwiki-to-sphinxxml
run the script and test that the output you get is correct (proper XML files)
1. make sure the output is in the right charset format, if not check the example given in the first lines with iconv

once you are satisfied with the output, add the xmlpipe2 command to your sphinx.conf

if this is your first time, copy and adapt

source pmwikis
{
        type                            = xmlpipe2
        xmlpipe_command = /path/to/bash/script/sphinx_sources | iconv -f ISO-8859-1 -t utf-8
}

index pmwikis
{
        source                  = pmwikis
        path                    = /var/lib/sphinxsearch/data/pmwikis
        docinfo                 = extern
        mlock                   = 0
        morphology              = stem_en
        min_word_len            = 1
        charset_type            = utf-8
}

see also http://sphinxsearch.com/docs/current.html#xmlpipe2 for details

execute indexer pmwikis -rotate and correct whatever problem might happen
1. typically char set mismatch or documents too large to be in the XML 2MB field, usually not actual documents
test your newly created index via search pmwikis mykeyword, use also indextool --dumpheader pmwikis to make sure you have indexed most, if not all, targeted documents
use the Sphinx PHP API to integrate with your PmWiki instance
1. download and install the official PHP API
2. a lot of room for creativity here
3. note that this is a no database solution (to stick to PmWiki flat files) hence IDs are generated manually via hashing and are not stored as attribute (they could and probably should anyway), consequently sphinx_pmwikis_doc_ids.php plays that role and should be included to get the paths back

Configuration

Usage

Notes

Change log / Release notes

Contributors

Comments

See discussion at Sphinx-Talk ?

User notes ? : If you use, used or reviewed this recipe, you can add your name. These statistics appear in the Cookbook listings and will help newcomers browsing through the wiki.