Sphinx

Summary: How can I search my PmWiki content via the Sphinx engine?
Version: 0.1
Prerequisites: Sphinx, PmWiki, bash, sed
Status:
Maintainer: Utopiah
Categories: Searching
Users: (view? / edit)
Discussion: Sphinx-Talk?

Questions answered by this recipe

All sections are optional, you can remove those that do not apply to your recipe, and add new ones.

Description

Sphinx is a full-text search engine that extends PmWiki's built-in search capability to more efficiently support features such as stemming, indexing of multiple wikis, search of special fields, and custom ranking.

Installation

  1. Have a PmWiki instance running
  2. Have a Sphinx instance running
    1. searchd and well configured indexer (use search in CLI to make sure it is the case)
    2. try the documented tests if it is your first time
  3. download the bash script sphinx_sources.txtΔ to locate and parse your wikis
    1. rename to sphinx_sources
    2. fix the paths to your own directories
  4. download the sed script pmwiki-to-sphinxxml.txtΔ in the same directory
    1. rename to pmwiki-to-sphinxxml
  5. run the script and test that the output you get is correct (proper XML files)
    1. make sure the output is in the right charset format, if not check the example given in the first lines with iconv
  6. once you are satisfied with the output, add the xmlpipe2 command to your sphinx.conf
    1. if this is your first time, copy and adapt
      source pmwikis
      {
              type                            = xmlpipe2
              xmlpipe_command = /path/to/bash/script/sphinx_sources | iconv -f ISO-8859-1 -t utf-8
      }
      
      index pmwikis
      {
              source                  = pmwikis
              path                    = /var/lib/sphinxsearch/data/pmwikis
              docinfo                 = extern
              mlock                   = 0
              morphology              = stem_en
              min_word_len            = 1
              charset_type            = utf-8
      }
    2. see also http://sphinxsearch.com/docs/current.html#xmlpipe2 for details
  7. execute indexer pmwikis -rotate and correct whatever problem might happen
    1. typically char set mismatch or documents too large to be in the XML 2MB field, usually not actual documents
  8. test your newly created index via search pmwikis mykeyword, use also indextool --dumpheader pmwikis to make sure you have indexed most, if not all, targeted documents
  9. use the Sphinx PHP API to integrate with your PmWiki instance
    1. download and install the official PHP API
    2. a lot of room for creativity here
    3. note that this is a no database solution (to stick to PmWiki flat files) hence IDs are generated manually via hashing and are not stored as attribute (they could and probably should anyway), consequently sphinx_pmwikis_doc_ids.php plays that role and should be included to get the paths back

Configuration

Usage

Notes

Change log / Release notes

If the recipe has multiple releases, then release notes can be placed here. Note that it's often easier for people to work with "release dates" instead of "version numbers".

See also

Contributors

Comments

See discussion at Sphinx-Talk?

User notes? : If you use, used or reviewed this recipe, you can add your name. These statistics appear in the Cookbook listings and will help newcomers browsing through the wiki.