NAME

wikibackups.pl


SYNOPSIS

wikibackups.pl [-c config] [-n] [-v] [-d]

wikibackups.pl [-h]

wikibackups.pl [-V]


DISCUSSION

wikibackups.pl provides a way of automating backups of http://www.pmwiki.org/ wikis. This script uses the rsync(1) to handle the actual file transfer. Various inclusions and exclusion can be set (see CONFIGURATION).

More discussion on backing up your wiki can be found at http://www.pmwiki.org/wiki/PmWiki/BackupAndRestore.


OPTIONS

-c config - specify an alternate configuration file. The default is $HOME/.wbu.rc.

-n - dry run only. If used with -v will show the list of files which rsync would have transfered.

-v - verbose mode. Will print out what is going on during the backup, including the output from the rsync command.

-d - debug mode. Will print out all sorts of internal structure information. Useful for checking your configuration file. If you want to check your configuration, make sure to set -n and -v as well.

-h - help. Prints a usage message.

-V - version. Prints the script version.


CONFIGURATION

Configuration of wikibackup.pl is handled with a YAML file. The default location is in $HOME/.wbu.rc but can be specified using the -c parameter on the command line.

YAML (see http://yaml.org/) is a handy language to set up nested configurations with. It generally has the syntax of:

   item1: value1
   item2: value2
     item2.1: value2.1
     item2.2: value2.2
       item2.2.1: value2.2.1
   item2.3: value2.3

And so on.

The .wbu.rc file is configured thusly:

   wikiname:
      source: user@host:path/to/wikiroot
      target: path/to/backuproot
   # Inclusions are marked by "i"
   # Exclusions are marked by "e"
      exclusions:
          - i /cookbook/
          - i /wiki.d/
          - e /wiki.d/.flock
          - e /wiki.d/.pageindex
          - e /wiki.d/.lastmod
          - e /wiki.d/*,del-*
          - e /wiki.d/*/*,del-*
          - i /pub/
          - i /local/
          - i /uploads/
          - e /.git/*
          - e /*
          - e **~
          - e **.bak
          - e **.tgz
          - e **.zip
          - e **.gz
          - e **.Z

Most of the entries should make sense if you're familiar with the structure of the wiki's directories.

A breakdown of the configuration follows:

wikiname - this is the name you give your particular wiki. It doesn't have to be the same as $WikiTitle, but it's helpful if it at least resembles that. This value will be used as the backup directory underneath the path set in backuppath.

source - this is the rsync(1) spec path for the source of the files. (There are actually few restrictions on this spec as to whether it is local or remote. The assumption is that it will be a remote server that one has ssh access to.

target - this is the rsync(2) spec path for the root directory of the backup tree.

exclusions - here is where you list what to include and exclude from your backup. See Configuring Exclusions.

Lines that begin with an octothorpe (#) are treated a comments.

Warning about TABS in YAML input

YAML chokes on tab characters in it's input. Make sure to always use spaces to perform indenting. (Emacs has an untabify function that's particularly useful for this.)

Configuring Exclusions

This may be the most difficult part of this. rsync's method of filtering what gets sent and what doesn't is pretty arcane. Some rules of thumb, though:


EXAMPLES

This will perform a backup of mywiki, on server wiki.example.com, user me, with the path to mywiki in the directory public_html/wiki in the user's login area. The backup will be saved in rolling directories in /mnt/backups/WikiBackups/mywiki:

   mywiki:
      source: me@mywiki.example.com/public_html/wiki
      target: /mnt/backups/WikiBackups/mywiki
      exclusions:
        - i /cookbook/
        - i /wiki.d/
        - e /wiki.d/.flock
        - e /wiki.d/.pageindex
        - e /wiki.d/.lastmod
        - e /wiki.d/*,del-*
        - e /wiki.d/*/*,del-*
        - i /pub/
        - i /local/
        - i /uploads/
        - e /*

The exclusions section is as follows:

i /cookbook/ - include the cookbook directory, but only from the top level. This will include the entire contents of the cookbook directory, but it won't include any occurances of cookbook in other root directories lower down.

i /wiki.d/ - include the wiki.d directory, but again, only from the top level.

e /wiki.d/.flock - exclude the .flock file that pmwiki uses to control access. This is a generated file and should not be backed up or restored.

e /wiki.d/.pageindex - exclude the .pageindex file that pmwiki creates to index wiki pages. This is also a generated file and should not be backed up or restored.

e /wiki.d/.lastmod - exclude the .lastmod file that pmwiki creates. This is generated as well, and should not be backed up or restored.

e /wiki.d/*,del-* - this is the pattern used by pmwiki to denote deleted wiki pages. If you do want to keep these in a backup, remove this line.

e /wiki.d/*/*,del-* - this is similar to the above, but works when your pages are stored in group directories.

i /pub/ - the public directory for skins, themes, buttons, images, etc that you use to customize your wiki.

i /local/ - the local directory where you store you local configuration files.

i /uploads/ - uploads directory where attachments to pages are stored. Definitely want to keep this backed up along with the pages themselves.

e /* - this is the last entry in this set -- it is telling rsync to exclude everything not explicitly included in the copy. This is the magic that makes it all work.

Some additional helpful exclusions

Sometimes, when working on a wiki installation, there may be cruft that gets left behind that you don't really want in your backups. Editor and tool backup files, downloaded archive and compressed files, etc. To deal with these, rsync looks at a double ** and says "match anything, regardless of directory". The single * will match only within a directory (between directory separators (/)). The double star then makes it convenient for matching file types anywhere. Here are some examples:

          - e **~
          - e **.bak
          - e **.tgz
          - e **.zip
          - e **.gz
          - e **.Z

e **~ - typically editor backup files, especially Emacs.

e **.bak - typically utility or filter backup files, such as from sed or perl

e **.tgz - a tarball (gzipped tar file).

e **.zip - a zip compressed file.

e **.gz - a gzip compressed file.

e **.Z - another type of compressed file.

There are probably other kinds of files that you may have laying around that you don't really want to include in a site backup.


PERIODIC BACKUPS

This script performs a rolling forward backup as described in http://www.sanitarium.net/golug/rsync_backups_2010.html. A new timestamped directory is created each time the script is run, linking files that have not changed to the last backup, and only copying or deleting files that have changed on the source.

Before scheduling this script automatically, make sure it works! Test the configuration using the -d, -v, and -n switches. When you are satisfied with how it is set up, then run it live once with -v turned on and make sure that you do get what you think you should. PmWikis' data doesn't tend to be very large, unless you have a lot of media stored as attachments. (If that is the case, you may consider another backup solution for your media and add the uploads directory to the exclusions list.)

Once you're satisfied everything is working correctly, you can add the script, without paramters, to your likely cron-type queue (cron, anacron, periodic, launchd, etc.). Set it to go off on a schedule for backing up that you feel comfortable with. In a fairly active wiki, daily should be the minimum. If you're wiki doesn't change much week-to-week or month-to-month, longer backup periods can be justified. It's entirely up to you. For most people, daily will be about right.

Backup Directory Structure

The rolling backups are stored underneath the target: specified in the configuration file. Each back is run, the directory for running the rsync is called .incomplete. When rsync ends without an error, the incomplete directory is renamed with a timestamp. The path of that directory specified by TARGET/TIMESTAMP is stored in the TARGET directory in the .lastbackup file.

Each periodic backup only holds the information from your wiki that actually changed that iteration. Files which are in previous backups unchanged are simply hard linked into the directory, thus saving space if nothing changes from backup to backup.

   TARGET/
      .lastbackup
      2012-07-09T14.28.04/
         (backed up files and directories)
      2012-07-10T14.30.33/
         (linked and changed files and directories)


OTHER CONSIDERATIONS

rsync relies heavily on ssh, which provides encrypted end-to-end communications between machines. As such, you should have ssh access to any remote machine in this setup. Make sure the remote side already has the ssh public key from the local machine so there won't be any hiccups with authentication or having to supply a password to get rsync to work correctly.

There are other ways of connecting to the remote side in rsync. Such methods are beyond the scope of this project to describe.


TODO


SEE ALSO

http://www.pmwiki.org/wiki/PmWiki/BackupAndRestore, http://www.pmwiki.org/wiki/Cookbook/BackUpPages, Thread (http://thread.gmane.org/gmane.comp.web.wiki.pmwiki.user/20317) on the pmwiki-users group


AUTHOR(S)

Tamara Temple <tamara @ tamaratemple.com> http://www.pmwiki.org/wiki/Profiles/Tamouse