Good stuff for programming geeks
[ start | index | login or register ]
start > 2004-03-17

2004-03-17

Created by tmoertel. Last edited by tmoertel 2359 days ago. Viewed 1539 times. #1
[edit] [rdf]
labels
attachments

Fractal backup schedules

I have been working on some code to handle the aging and expiry of backups. The idea is that I want to keep a library of backups that maintains a sensible distribution of recent and old backups.

What's a sensible distribution? It stands to reason that I'm more likely to want a recent backup than an old, old backup. So I ought to maintain my library such that the bulk of my online storage is devoted to new backups and a smaller portion to my old backups.

Ideally, I would have a smooth transition between old and new. A fairly well-known distribution that does just that follows a power-of-2 rule. You have yesterday's backup (1 day old), the day before yesterday's (2 days old), then two days before that (4 days old), then four days before that (8 days old), and so on:

o  ============================================
o  BACKUP RULE = POWER OF 2
o  
o  Day   No.   Backup #
o        of              1111111111222222222233
o        Bkps  01234567890123456789012345678901
o  ============================================
o    0     1   v
o    1     2   vv
o    2     3   vvv
o    3     3   v vv
o    4     4   v vvv
o    5     4   v v vv
o    6     4   v   vvv
o    7     4   v   v vv
o    8     5   v   v vvv
o    9     5   v   v v vv
o   10     5   v   v   vvv
o   11     5   v   v   v vv
o   12     5   v       v vvv
o   13     5   v       v v vv
o   14     5   v       v   vvv
o   15     5   v       v   v vv
o   16     6   v       v   v vvv
o   17     6   v       v   v v vv
o   18     6   v       v   v   vvv
o   19     6   v       v   v   v vv
o   20     6   v       v       v vvv
o   21     6   v       v       v v vv
o   22     6   v       v       v   vvv
o   23     6   v       v       v   v vv
o   24     6   v               v   v vvv
o   25     6   v               v   v v vv
o   26     6   v               v   v   vvv
o   27     6   v               v   v   v vv
o   28     6   v               v       v vvv
o   29     6   v               v       v v vv
o   30     6   v               v       v   vvv
o   31     6   v               v       v   v vv
o  ============================================

This works out nicely because the number of backups you must maintain is proportional to log2 of the days you have been following the power-of-2 schedule, which is easy on your storage resources. For example, looking at the plot above, after a month of following the schedule, we need only keep 6 backups on disk. Three years from now, we'll need only 12.

An interesting thing to note is the fractal nature of the schedule. Since most of the software I have seen uses the power-of-2 schedule, I was curious if other powers would also work well. So I wrote a more general scheduling algorithm and made some more plots of the resulting fractals.

When the power is close to 1.0, you get a poor distribution of backups. Far too many are clustered around the current backup. On those times when you must get an old backup, you'll likely have to go way back to get it. In contrast, the power 2 gives you a more gradual tapering from new to old. For comparison I have plotted both 1.25 and 2 below. (You may have to widen your window to see them side by side.)

o  ============================================    ============================================
o  BACKUP RULE = POWER OF 1.25                     BACKUP RULE = POWER OF 2
o 
o  Day   No.   Backup #                            Day   No.   Backup #
o        of              1111111111222222222233          of              1111111111222222222233
o        Bkps  01234567890123456789012345678901          Bkps  01234567890123456789012345678901
o  ============================================    ============================================
o    0     1   v                                     0     1   v
o    1     2   vv                                    1     2   vv
o    2     3   vvv                                   2     3   vvv
o    3     3   v vv                                  3     3   v vv
o    4     4   v vvv                                 4     4   v vvv
o    5     4   v  vvv                                5     4   v v vv
o    6     5   v  vvvv                               6     4   v   vvv
o    7     5   v   vvvv                              7     4   v   v vv
o    8     6   v   vvvvv                             8     5   v   v vvv
o    9     6   v    vvvvv                            9     5   v   v v vv
o   10     7   v    vvvvvv                          10     5   v   v   vvv
o   11     7   v     vvvvvv                         11     5   v   v   v vv
o   12     8   v     vvvvvvv                        12     5   v       v vvv
o   13     8   v      vvvvvvv                       13     5   v       v v vv
o   14     9   v      vvvvvvvv                      14     5   v       v   vvv
o   15     9   v       vvvvvvvv                     15     5   v       v   v vv
o   16     9   v       vvvvv vvv                    16     6   v       v   v vvv
o   17     9   v        vvvv vvvv                   17     6   v       v   v v vv
o   18     9   v        vvvv v vvv                  18     6   v       v   v   vvv
o   19    10   v        vvvv v vvvv                 19     6   v       v   v   v vv
o   20     9   v         vvv v vv vv                20     6   v       v       v vvv
o   21    10   v         vvv v vv vvv               21     6   v       v       v v vv
o   22    10   v          vv v vv vvvv              22     6   v       v       v   vvv
o   23    10   v          vv v vv  vvvv             23     6   v       v       v   v vv
o   24     9   v           v v vv  v vvv            24     6   v               v   v vvv
o   25    10   v           v v vv  v vvvv           25     6   v               v   v v vv
o   26     9   v             v vv  v v vvv          26     6   v               v   v   vvv
o   27    10   v             v vv  v v vvvv         27     6   v               v   v   v vv
o   28    10   v             v vv  v v  vvvv        28     6   v               v       v vvv
o   29    10   v             v vv  v v  v vvv       29     6   v               v       v v vv
o   30    10   v               vv  v v  v vvvv      30     6   v               v       v   vvv
o   31    10   v               vv  v v  v  vvvv     31     6   v               v       v   v vv
o  ============================================    ============================================

For fun I tried out two well-known mathematical constants, the golden ratio and the base of the natural logarithm e. Both are interesting, but neither seems better than 2, although the golden ratio's plot does look pretty good:

o  ============================================    ============================================
o  BACKUP RULE = POWER OF GOLDEN RATIO ~ 1.6180    BACKUP RULE = POWER OF EXP(1) ~ 2.718281
o 
o  Day   No.   Backup #                            Day   No.   Backup #
o        of              1111111111222222222233          of              1111111111222222222233
o        Bkps  01234567890123456789012345678901          Bkps  01234567890123456789012345678901
o  ============================================    ============================================
o    0     1   v                                     0     1   v
o    1     2   vv                                    1     2   vv
o    2     3   vvv                                   2     3   vvv
o    3     3   v vv                                  3     4   vvvv
o    4     4   v vvv                                 4     4   v vvv
o    5     5   v vvvv                                5     4   v  vvv
o    6     4   v  v vv                               6     5   v  vvvv
o    7     5   v  v vvv                              7     5   v  v vvv
o    8     5   v    vvvv                             8     5   v  v  vvv
o    9     6   v    vvvvv                            9     5   v  v   vvv
o   10     5   v    v v vv                          10     5   v  v    vvv
o   11     6   v    v v vvv                         11     6   v  v    vvvv
o   12     7   v    v v vvvv                        12     5   v       v vvv
o   13     6   v    v v  v vv                       13     5   v       v  vvv
o   14     6   v      v  v vvv                      14     6   v       v  vvvv
o   15     6   v      v    vvvv                     15     6   v       v  v vvv
o   16     7   v      v    vvvvv                    16     6   v       v  v  vvv
o   17     6   v      v    v v vv                   17     6   v       v  v   vvv
o   18     7   v      v    v v vvv                  18     6   v       v  v    vvv
o   19     6   v           v   vvvv                 19     6   v       v  v     vvv
o   20     7   v           v   vvvvv                20     5   v       v         vvv
o   21     6   v           v   v v vv               21     5   v       v          vvv
o   22     7   v           v   v v vvv              22     5   v       v           vvv
o   23     8   v           v   v v vvvv             23     5   v       v            vvv
o   24     7   v           v   v v  v vv            24     6   v       v            vvvv
o   25     7   v           v     v  v vvv           25     6   v       v            v vvv
o   26     7   v           v     v    vvvv          26     7   v       v            v vvvv
o   27     8   v           v     v    vvvvv         27     7   v       v            v v vvv
o   28     7   v           v     v    v v vv        28     7   v       v            v v  vvv
o   29     8   v           v     v    v v vvv       29     7   v       v            v v   vvv
o   30     9   v           v     v    v v vvvv      30     6   v                    v v    vvv
o   31     8   v           v     v    v v  v vv     31     7   v                    v v    vvvv
o  ============================================    ============================================

The power-of-2 rule does have one edge over the other, fractional powers: The number of backups you have on hand is non-decreasing. You never "lose" a backup, which might make the system easier to explain to sysadmins. But the best reason to use the power of 2 is that it seems to give the best distribution between old and new backups.

Well, enough contemplating. Back to coding.

no comments | post comment
community.moertel.com | Copyright © 2003–07 Moertel Consulting