Friday, January 8, 2016

Using a Moodle Yesterday Instance for Self Service Course Repairs and Improved Site Performance

Why might this be a better solution than automated course backups?

The Moodle course backup and restore feature, while a convenient mechanism for moving courses between Moodle sites, is not very efficient. Many also confuse this system with a true disaster recovery solution for Moodle. It is not. The automated course backup facility is best suited for teachers to self-service individual courses when they are accidentally damaged. The Moodle course backup system uses zip files to store individual course backups. Anytime a single file or database entry changes in a course the entire course backup has to be recreated and zipped into a new archive file. It takes a lot of CPU and IO resources to create these files. The zipping process also makes even the most minimal changes in a course invisible to most backup systems. Because of Moodle’s file structure it can also be impractical to exclude these files from the overall the server's backup system. This can have a very negative impact on the backup systems performance and cost.
The yesterday instance script relies on rsync to intelligently backup only the Moodle data files that have changed since the last run, but still allows teacher’s to self service restores for accidentally damaged courses without IT involvement.  It is however worth noting that the generic solution isn’t the most efficient method for database duplication, but for most site’s the database is 1/10th the size of Moodle data so this is generally a good tradeoff.

Important warning

Be sure you have a good backup before running! And be careful about not mixing around the source and yesterdayinstance variables as if you do you can potentially delete your production database. If you have a dev environment you should probably try it out there first. For example I did my testing with a copy of the Moodle installer on my local workstation.

General notes

I have tried to make this very generic so that it will run on most servers with standard unix / unix like tools installed. I did the testing on my Mac work station but should also work on Linux based server. Windows based server would need to have something like Cygwin installed. I have added comments to explain each section. This is intended to be setup to run once per day via cron.

Is it going to nearly double the size of the storage / virtual machine that runs Moodle? 

It depends a bit on the file system sitting behind the VM. Generally speaking we assume it will at least double the storage used. So one assumption about this concept is that the storage used is cheaper than either the labor to manually do restores for teachers, or alternately the extra server resources to run Moodle course backups (which also likely double or more storage use).   Also keep in mind that the yesterday instance is going to have a lot less usage so it could be run on cheaper storage (could also be a separate VM with lesser resources if you want to isolate things more). Additionally a lot of high and mid tier storage solutions offer built-in de-duplication. With dedup in use from the VM perspective double the storage is used, but on the backend side much less than double is being used. And this is a key advantage vs automated disk backups which compress the data into zip files which are a lot less likely to match up with the dedup functionality on the backend.
Your storage solution might also support snapshots which can be used in a similar fashion while not taking as much of a storage hit. However most snapshots are read only, so you would still end up copying files over when it came time to actually use the snapshot to restore a course and database backups can be less reliable with snapshots if not supported by the storage solution.
A more sophisticated version of this solution can use hard-links to keep more than a single copy of the yesterday instance. You could integrate this with something like rsnapshot which is a backup solution that uses rsync and hard links to provide multiple backup copies while saving disk space. Potentially the most minimal footprint for the yesterday instance is if it is actually integrated into the backup solution. At the same time SATA disks are extremely inexpensive these days, so the trade off between engineering and time to optimize the space for a single site, may not be worth the costs versus less then $100 drive.

Does this script duplicate the database and the Moodle data files? 

Yes it duplicates both of these as well as the Moodle source code
Does this also create the new yesterday web directory or will that need to be done first manually?
The script will duplicate the folders to the paths listed. As long as the path exists all the way up to the last directory it will probably make the last directory for you. It's up to you to configure the web server if needed to use the new folder. I think in my test case I just made it a subdirectory of the webroot so I didn't have to do an config changes. I would recommend configuring a new folder that is not a subdirectory of the Moodle folder, as this can cause you problems when you try to do upgrades and can also has backup implications. I would then make the new folder a virtual host so it's something like http://yesterday.my_moodle.domain so its something that users can remember easily.

Let me know if you end up using this or a variation. I would value feedback on how the generic version works out for you.

Example Yesterday Script
# Yesterday instance script
# Copyright Jonathan Moore 
# eLearning Consultancy
# Released under GPLv3 license
# Script to copy a Moodle site to yesterday instance on the same server
# Install script into a non-web folder, run daily from cron during off hours
# A more sophisticated version could use data from a backup at least 12-24 hours old so that 
# 'yesterday' instance always lags the site by a set interval. This basic example is meant to be run at night during
# off hours to provide a safety net for accidental course edits / errors made during the next day.
# This is intended as an alternative to using Moodle's automated course backups which are very I/O and CPU intensive
# Requires a working Moodle environment running mysql, sed, and rsync (all common for unix / unix like environments

# Fill in locations and variables here for your site
# Set because my dev environment doesn't set the mysql path
# Note that using .my.cnf file with 0600 permissions is a more secure option for DB password, using as variable here for simplicity of example

# Make new DB copy (maybe use one liner, may need to delete old copy of DB, or can use mysqldump into Moodle data)
# and Update site URL
echo "Dropping yesterday database"
echo "Copying database"
# Note mysqldump isn't the most efficient method to duplicate DB, but is a very general approach and works well as long as DB isn't too large
# Sed is used here to re-write site URL inline while database is copied
$MYSQL_PATH/mysqldump -u ${DB_USER} -p${DB_PASS} $SOURCE_DB | sed -e "s|$URL|$YESTERDAY_URL|g" | $MYSQL_PATH/mysql -u $DB_USER -p${DB_PASS} $YESTERDAY_DB

# Rsync (with delete to prevent orphaned files) Moodle source yesterday source
echo "Copying source code"
rsync -arp $SOURCE_MOODLE/* $YESTERDAY_MOODLE --delete

# Set correct config.php values for yesterday instance
echo "Updating config.php to path of yesterday instance"
sed -i -e "s/$SOURCE_DB/$YESTERDAY_DB/g" $YESTERDAY_MOODLE/config.php

# Rsync (with delete to prevent orphaned files) Moodledata to yesterday Moodledata
echo "Copying Moodle data"

# Delete yesterday moodledata cache to weird errors in Moodle UI (similar to running purge caches from UI)
echo "Clearing cache"
rm -rf $SOURCE_MOODLEDATA/cache/*