[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bacula-devel] [Bacula-users] Improving job scheduling flexibility
25.02.2008 18:47, mark.bergman@xxxxxxxxxxxxxx wrote:
> In the message dated: Sat, 23 Feb 2008 12:40:43 +0100,
> Kern Sibbald used the subject line
> <[Bacula-users] Improving job scheduling flexibility>
> and wrote:
> => Finally Job Proximity is to allow a bit of overlap. For example, if a job has
> => been running 20 minutes or ran 20 minutes ago, you might want to not apply
> => the rules.
> Could you elaborate on what this means to you a bit more?
> I see the distinction here being mainly in terms of jobs that take a "long"
> time vrs a "short" time. If the entire job normally takes 30 minutes, I don't
> really care whether there's a duplicate, and it doesn't matter to me if the
> duplicate starts 1 minute after the original or 29 minutes after.
> However, if the job normally takes 18 hours, then the conditions are very
> different. In this case, I really, really, really don't want a duplicate running
> if there's a lot of overlap--this would have a major effect on disk loads on the
> client, on network traffic, and on disk/cpu/media resource on the bacula server.
> However, if the original job is almost near completion when the duplicate is
> launched, then I don't want to cancel the duplicate. In this case, the reasoning
> is that canceling the duplicate would result in a long window with no backups,
> in an effort to close a small window of duplicate (simultaneous) backups
That's a very important distinction, and Kerns proposal will work only
if the one setting up the jobs correctly estimates the expected job
run time. As such, it's a "good enough" solution IMO.
> Here's a very complicated proposal, which will almost certainly be rejected,
Let's wait... I would vote for it :-)
> that really leverages Bacula's database backend and gives a really powerful
> if the job historically takes over $DURATION [minutes|hours|days]
> and the current job is at least $PERCENTAGE complete, then allow the duplicate
> to run, otherwise kill the duplicate
> in this case, $DURATION would be determined from database stats,
> as an average of previous runs of the same job at the same level.
> I could also see an algorithm that
> gives more weight to the duration of the most recent backups if the
> standard deviation of the average vrs. the most recent backups is
> greater than a specified value. This is because a given backup is
> more likely to take "almost as much" time as the most recent backup
> of the same level than as much time as a much earlier backup.
> similarly, the $PERCENTAGE value could be expressed as a range,
> incorporating the standard deviation in the backup duration
Yes. Actually, I don't think this is so hard to get out of the
catalog. Of course, I would need quite a number of attempts in SQL to
get reasonable results, but still it should be possible with a
reasonable amount of work invested.
> [As an aside, I'd like to see this kind of predictive/AI capability put into
> more of bacula, particularly in the scheduling. It would be wonderful to use
> the historic records to allow bacula to schedule jobs most efficiently, in a
> way similar to Amanda, rather than hard-coding specific times in each job
I also agree, though I find that in many cases, especially due to
auditing requirements or scheduled removal to off-site storage, a
fixed schedule for backups is required despite the easier setup with
such an automatic scheduling. I would like to be able to express
something like "job retention time = 8 months" "keep jobs = 6" and
"job max interval = 5 weeks" /for full backups) and Bacula decides
when to run full backups and which old ones to purge...
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
Bacula-devel mailing list
This mailing list archive is a service of Copilot Consulting.