Cookbook /
Sphinx
Summary: How can I search my PmWiki content via the Sphinx engine?
Version: 0.1
Prerequisites: Sphinx, PmWiki, bash, sed
Status:
Maintainer: Utopiah
Categories: Searching
Discussion: Sphinx-Talk?
Questions answered by this recipe
Description
Sphinx is a full-text search engine that extends PmWiki's built-in search capability to more efficiently support features such as stemming, indexing of multiple wikis, search of special fields, and custom ranking.
Installation
- Have a PmWiki instance running
- Have a Sphinx instance running
searchd
and well configured indexer (usesearch
in CLI to make sure it is the case)- try the documented tests if it is your first time
- download the bash script sphinx_sources.txtΔ to locate and parse your wikis
- rename to sphinx_sources
- fix the paths to your own directories
- download the sed script pmwiki-to-sphinxxml.txtΔ in the same directory
- rename to pmwiki-to-sphinxxml
- run the script and test that the output you get is correct (proper XML files)
- make sure the output is in the right charset format, if not check the example given in the first lines with
iconv
- make sure the output is in the right charset format, if not check the example given in the first lines with
- once you are satisfied with the output, add the xmlpipe2 command to your
sphinx.conf
- if this is your first time, copy and adapt
source pmwikis { type = xmlpipe2 xmlpipe_command = /path/to/bash/script/sphinx_sources | iconv -f ISO-8859-1 -t utf-8 } index pmwikis { source = pmwikis path = /var/lib/sphinxsearch/data/pmwikis docinfo = extern mlock = 0 morphology = stem_en min_word_len = 1 charset_type = utf-8 }
- see also http://sphinxsearch.com/docs/current.html#xmlpipe2 for details
- if this is your first time, copy and adapt
- execute
indexer pmwikis -rotate
and correct whatever problem might happen- typically char set mismatch or documents too large to be in the XML 2MB field, usually not actual documents
- test your newly created index via
search pmwikis mykeyword
, use alsoindextool --dumpheader pmwikis
to make sure you have indexed most, if not all, targeted documents - use the Sphinx PHP API to integrate with your PmWiki instance
- download and install the official PHP API
- a lot of room for creativity here
- note that this is a no database solution (to stick to PmWiki flat files) hence IDs are generated manually via hashing and are not stored as attribute (they could and probably should anyway), consequently
sphinx_pmwikis_doc_ids.php
plays that role and should be included to get the paths back
Configuration
Usage
Notes
Change log / Release notes
See also
- PmWiki:Search
- PmWiki:SearchImprovements
- plugins with alternative APIs and integration with other CMS
- author notes on Sphinx http://fabien.benetou.fr/Tools/Sphinxsearch
Contributors
Comments
See discussion at Sphinx-Talk?
User notes? : If you use, used or reviewed this recipe, you can add your name. These statistics appear in the Cookbook listings and will help newcomers browsing through the wiki.