Fractal backup schedules
I have been working on some code to handle the aging and expiry of backups. The
idea is that I want to keep a library of backups that maintains a sensible distribution
of recent and old backups.
What's a sensible distribution? It stands to reason that I'm more
likely to want a recent backup than an old, old backup. So I ought
to maintain my library such that the bulk of my online storage is
devoted to new backups and a smaller portion to my old backups.
Ideally, I would have a smooth transition between old and new. A
fairly well-known distribution that does just that follows a
power-of-2 rule. You have yesterday's backup (1 day old), the day
before yesterday's (2 days old), then two days before that (4 days
old), then four days before that (8 days old), and so on:
o ============================================
o BACKUP RULE = POWER OF 2
o
o Day No. Backup #
o of 1111111111222222222233
o Bkps 01234567890123456789012345678901
o ============================================
o 0 1 v
o 1 2 vv
o 2 3 vvv
o 3 3 v vv
o 4 4 v vvv
o 5 4 v v vv
o 6 4 v vvv
o 7 4 v v vv
o 8 5 v v vvv
o 9 5 v v v vv
o 10 5 v v vvv
o 11 5 v v v vv
o 12 5 v v vvv
o 13 5 v v v vv
o 14 5 v v vvv
o 15 5 v v v vv
o 16 6 v v v vvv
o 17 6 v v v v vv
o 18 6 v v v vvv
o 19 6 v v v v vv
o 20 6 v v v vvv
o 21 6 v v v v vv
o 22 6 v v v vvv
o 23 6 v v v v vv
o 24 6 v v v vvv
o 25 6 v v v v vv
o 26 6 v v v vvv
o 27 6 v v v v vv
o 28 6 v v v vvv
o 29 6 v v v v vv
o 30 6 v v v vvv
o 31 6 v v v v vv
o ============================================
This works out nicely because the number of backups you must maintain
is proportional to log2 of the days you have been following the
power-of-2 schedule, which is easy on your storage resources. For
example, looking at the plot above, after a month of following the
schedule, we need only keep 6 backups on disk. Three years from
now, we'll need only 12.
An interesting thing to note is the fractal nature of the schedule.
Since most of the software I have seen uses the power-of-2 schedule,
I was curious if other powers would also work well. So I wrote a more
general scheduling algorithm and made some more plots of the
resulting fractals.
When the power is close to 1.0, you get a poor distribution of
backups. Far too many are clustered around the current backup.
On those times when you must get an old backup, you'll likely
have to go
way back to get it. In contrast, the power
2 gives you a more gradual tapering from new to old. For comparison
I have plotted both 1.25 and 2 below. (You may have to widen
your window to see them side by side.)
o ============================================ ============================================
o BACKUP RULE = POWER OF 1.25 BACKUP RULE = POWER OF 2
o
o Day No. Backup # Day No. Backup #
o of 1111111111222222222233 of 1111111111222222222233
o Bkps 01234567890123456789012345678901 Bkps 01234567890123456789012345678901
o ============================================ ============================================
o 0 1 v 0 1 v
o 1 2 vv 1 2 vv
o 2 3 vvv 2 3 vvv
o 3 3 v vv 3 3 v vv
o 4 4 v vvv 4 4 v vvv
o 5 4 v vvv 5 4 v v vv
o 6 5 v vvvv 6 4 v vvv
o 7 5 v vvvv 7 4 v v vv
o 8 6 v vvvvv 8 5 v v vvv
o 9 6 v vvvvv 9 5 v v v vv
o 10 7 v vvvvvv 10 5 v v vvv
o 11 7 v vvvvvv 11 5 v v v vv
o 12 8 v vvvvvvv 12 5 v v vvv
o 13 8 v vvvvvvv 13 5 v v v vv
o 14 9 v vvvvvvvv 14 5 v v vvv
o 15 9 v vvvvvvvv 15 5 v v v vv
o 16 9 v vvvvv vvv 16 6 v v v vvv
o 17 9 v vvvv vvvv 17 6 v v v v vv
o 18 9 v vvvv v vvv 18 6 v v v vvv
o 19 10 v vvvv v vvvv 19 6 v v v v vv
o 20 9 v vvv v vv vv 20 6 v v v vvv
o 21 10 v vvv v vv vvv 21 6 v v v v vv
o 22 10 v vv v vv vvvv 22 6 v v v vvv
o 23 10 v vv v vv vvvv 23 6 v v v v vv
o 24 9 v v v vv v vvv 24 6 v v v vvv
o 25 10 v v v vv v vvvv 25 6 v v v v vv
o 26 9 v v vv v v vvv 26 6 v v v vvv
o 27 10 v v vv v v vvvv 27 6 v v v v vv
o 28 10 v v vv v v vvvv 28 6 v v v vvv
o 29 10 v v vv v v v vvv 29 6 v v v v vv
o 30 10 v vv v v v vvvv 30 6 v v v vvv
o 31 10 v vv v v v vvvv 31 6 v v v v vv
o ============================================ ============================================
For fun I tried out two well-known mathematical constants, the golden
ratio and the base of the natural logarithm
e. Both are
interesting, but neither seems better than 2, although the golden ratio's plot does look
pretty good:
o ============================================ ============================================
o BACKUP RULE = POWER OF GOLDEN RATIO ~ 1.6180 BACKUP RULE = POWER OF EXP(1) ~ 2.718281
o
o Day No. Backup # Day No. Backup #
o of 1111111111222222222233 of 1111111111222222222233
o Bkps 01234567890123456789012345678901 Bkps 01234567890123456789012345678901
o ============================================ ============================================
o 0 1 v 0 1 v
o 1 2 vv 1 2 vv
o 2 3 vvv 2 3 vvv
o 3 3 v vv 3 4 vvvv
o 4 4 v vvv 4 4 v vvv
o 5 5 v vvvv 5 4 v vvv
o 6 4 v v vv 6 5 v vvvv
o 7 5 v v vvv 7 5 v v vvv
o 8 5 v vvvv 8 5 v v vvv
o 9 6 v vvvvv 9 5 v v vvv
o 10 5 v v v vv 10 5 v v vvv
o 11 6 v v v vvv 11 6 v v vvvv
o 12 7 v v v vvvv 12 5 v v vvv
o 13 6 v v v v vv 13 5 v v vvv
o 14 6 v v v vvv 14 6 v v vvvv
o 15 6 v v vvvv 15 6 v v v vvv
o 16 7 v v vvvvv 16 6 v v v vvv
o 17 6 v v v v vv 17 6 v v v vvv
o 18 7 v v v v vvv 18 6 v v v vvv
o 19 6 v v vvvv 19 6 v v v vvv
o 20 7 v v vvvvv 20 5 v v vvv
o 21 6 v v v v vv 21 5 v v vvv
o 22 7 v v v v vvv 22 5 v v vvv
o 23 8 v v v v vvvv 23 5 v v vvv
o 24 7 v v v v v vv 24 6 v v vvvv
o 25 7 v v v v vvv 25 6 v v v vvv
o 26 7 v v v vvvv 26 7 v v v vvvv
o 27 8 v v v vvvvv 27 7 v v v v vvv
o 28 7 v v v v v vv 28 7 v v v v vvv
o 29 8 v v v v v vvv 29 7 v v v v vvv
o 30 9 v v v v v vvvv 30 6 v v v vvv
o 31 8 v v v v v v vv 31 7 v v v vvvv
o ============================================ ============================================
The power-of-2 rule does have one edge over the other, fractional powers: The number of
backups you have on hand is non-decreasing. You never "lose" a
backup, which might make the system easier to explain to sysadmins. But the best reason to use the power of 2 is that it seems to give the best distribution between old and new backups.
Well, enough contemplating. Back to coding.