[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bacula-devel] [Bacula-users] Improving job scheduling flexibility

On Monday 25 February 2008 18.47:50 mark.bergman@xxxxxxxxxxxxxx wrote:
> In the message dated: Sat, 23 Feb 2008 12:40:43 +0100,
> Kern Sibbald  used the subject line
> 	<[Bacula-users] Improving job scheduling flexibility>
> and wrote:
> => Hello,
> =>
> => As you know, current job scheduling has a few deficiencies, particular
> if for => some reason your backups get blocked (a bad tape driver or
> operator => intervention required), which can lead to a big pile of
> duplicate jobs being => scheduled.
> Or if a job takes so long that it is still running when the next instance
> of the same job is launched (ie., a backup that takes more than 24 hours).
> 	[SNIP!]
> =>
> => My current idea is to create a new "DuplicateJobs" resource and a new
> => Duplicate Jobs directive which would point to the duplicate jobs
> resource.
> Sounds great!
> => The reason for the resource is that there are just too many different
> => variations that it would require a lot of new directives, and it seems a
> => shame to add them to every Job.
> =>
> => My current design calls for a Duplicate Jobs resource that looks
> something => like the following:
> =>
> => DuplicateJobs {
> 	[SNIP!]
> =>
> =>   Job Proximity = <time-interval>  (0)
> =>
> => }
> =>
> 	[SNIP!]
> =>
> => Finally Job Proximity is to allow a bit of overlap.  For example, if a
> job has => been running 20 minutes or ran 20 minutes ago, you might want to
> not apply => the rules.
> Could you elaborate on what this means to you a bit more?

I think I was confused and stated it backwards.  Anyway, the Job Proximity 
directive was proposed by David Boyes, so perhaps he could give us a 
definitive definition :-)

> I see the distinction here being mainly in terms of jobs that take a "long"
> time vrs a "short" time. If the entire job normally takes 30 minutes, I
> don't really care whether there's a duplicate, and it doesn't matter to me
> if the duplicate starts 1 minute after the original or 29 minutes after.
> However, if the job normally takes 18 hours, then the conditions are very
> different. In this case, I really, really, really don't want a duplicate
> running if there's a lot of overlap--this would have a major effect on disk
> loads on the client, on network traffic, and on disk/cpu/media resource on
> the bacula server. However, if the original job is almost near completion
> when the duplicate is launched, then I don't want to cancel the duplicate.
> In this case, the reasoning is that canceling the duplicate would result in
> a long window with no backups, in an effort to close a small window of
> duplicate (simultaneous) backups running.

I can see the usefulness of the above, and don't want to rule it out, but for 
this cut, it probably requires more time to implement than I have for the 
current enhancement.  This go around, I am really targetting the problem of 
multiple jobs being scheduled and piling up waiting execution due to 
something "blocking" or taking too long.

> Here's a very complicated proposal, which will almost certainly be
> rejected, that really leverages Bacula's database backend and gives a
> really powerful feature:
> 	if the job historically takes over $DURATION [minutes|hours|days]
> 	and the current job is at least $PERCENTAGE complete, then allow the
> duplicate to run, otherwise kill the duplicate
> 		in this case, $DURATION would be determined from database stats,
> 		as an average of previous runs of the same job at the same level.
> 		I could also see an algorithm that
> 		gives more weight to the duration of the most recent backups if the
> 		standard deviation of the average vrs. the most recent backups is
> 		greater than a specified value. This is because a given backup is
> 		more likely to take "almost as much" time as the most recent backup
> 		of the same level than as much time as a much earlier backup.
> 		similarly, the $PERCENTAGE value could be expressed as a range,
> 		incorporating the standard deviation in the backup duration

I think you have something there, so you might want to put the above into a 
Feature Request. I don't think it will get implemented in the near future due 
to the long list of big, important projects that we have, but it would be a 
good way to ensure that the idea is not lost.

> [As an aside, I'd like to see this kind of predictive/AI capability put
> into more of bacula, particularly in the scheduling. It would be wonderful
> to use the historic records to allow bacula to schedule jobs most
> efficiently, in a way similar to Amanda, rather than hard-coding specific
> times in each job resource.]

Virtually everyone that I have talked to especially in companies says that 
they do not like Amanda's way of scheduling jobs.  That said, I don't rule 
out doing something like they do, and certainly the new "Max Full Age" 
directive goes in that direction.  

However, at the current time, I would suggest if you would like AI features, 
by all means turn of Bacula scheduling and implement a Perl script that does 
the scheduling.  After you have a bit of experience with your system, I would 
be really interested in hearing about it.  I suspect that you will find that 
it takes a lot of work and many iterations to get AI type features working 
correctly -- at least that would be the case for me.

Best regards,


> =>
> => As you can see, there is a lot of room for clarification of what should
> be => done, and also a need for a bit more functionality ... -- in other
> words a => bit more design is needed before beginning the implementation.
> =>
> => Comments?
> =>
> => Best regards,
> =>
> => Kern
> =>
> ----
> Mark Bergman     mark.bergman@xxxxxxxxxxxxxx     215-662-7310
> System Administrator     Section of Biomedical Image Analysis
> Department of Radiology            University of Pennsylvania
>        PGP Key at: https://www.rad.upenn.edu/sbia/bergman
> The information contained in this e-mail message is intended only for the
> personal and confidential use of the recipient(s) named above. If the
> reader of this message is not the intended recipient or an agent
> responsible for delivering it to the intended recipient, you are hereby
> notified that you have received this document in error and that any review,
> dissemination, distribution, or copying of this message is strictly
> prohibited. If you have received this communication in error, please notify
> us immediately by e-mail, and delete the original message.

This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
Bacula-devel mailing list

This mailing list archive is a service of Copilot Consulting.