We use S3 to backup various kind of files on MB. We use the very convenient backup gem for that (we still use 3.9.0).
But at some point it appeared that backing up our audio recording was hammering disk IO on our server, because the syncer is calculating md5 footprint for each file each time a backup happens. When you get thousands of big files that is pretty expensive process (in our case 20k files and 50G total).
So I added a small trick there:
in Backup/models/backup_audio.rb
module Backup::Syncer::Cloud
class Base < Syncer::Base
def process_orphans
if @orphans.is_a?(Queue)
@orphans = @orphans.size.times.map { @orphans.shift }
end
"Older Files: #{ @orphans.count }"
end
end
end
Backup::Model.new(:backup_audio, 'Audio files Backup to S3') do
before do
system("/Backup/latest_audio.sh")
end
after do
FileUtils.rm_rf("/tmp/streams")
end
##
# Amazon Simple Storage Service [Syncer]
#
sync_with Cloud::S3 do |s3|
s3.access_key_id = "xxx"
s3.secret_access_key = "xxx"
s3.bucket = "bucket_backup"
s3.region = "us-east-1"
s3.path = "/mb_audio_backup"
s3.mirror = false
s3.thread_count = 50
s3.directories do |directory|
directory.add "/tmp/streams"
end
end
end
and in Backup/latest_audio.sh
#!/bin/sh
# isolate files changed in the last 3 days
TMPDIR=/tmp/streams
mkdir $TMPDIR
for i in `find /storage/audio/ -type f -cmin -4320`; do
ln -s $i $TMPDIR
done
It creates a fake backup dir with links to the files that actually changed in the last 3 days and patches the syncer to avoid flooding the logs with orphan files. When sometimes S3 upload fails on one file (and it happens from time to time for ‘amazonian’ reason) it will be caught on the next daily backup.
The result was pretty obvious on our disk usage with our daily backups:
in the logs:
[2014/06/10 07:00:25][info] Summary:
[2014/06/10 07:00:25][info] Transferred Files: 5
[2014/06/10 07:00:25][info] Older Files: 22371
[2014/06/10 07:00:25][info] Unchanged Files: 16
[2014/06/10 07:00:25][info] Syncer::Cloud::S3 Finished!
[2014/06/10 07:00:25][info] Backup for 'Audio files Backup to S3 (backup_audio)' Completed Successfully in 00:00:22
[2014/06/10 07:00:25][info] After Hook Starting...
[2014/06/10 07:00:25][info] After Hook Finished.