[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bacula-devel] Concurrency and priority

I've been looking at the logic associated with what priorities get run
first.  Here's the scenario I want to improve the behaviour for:

        Eight backup jobs are running at priority 10, and several more
        queued (also at priority 10).  The number eight reflects the max
        concurrency of the director and storage.  Then a request comes
        in to perform a restore.  Naively, to make this run as soon as
        possible, the operator sets the priority to 1.  But it doesn't
        work -- the restore job waits until all eight jobs are done.
        This can take a long time, and for a good part of this time the
        system is basically idling, since only a couple of the job slots
        are in use.  If the priority is set to the same as the other
        jobs, it will be run when the *whole* queue is done.  The only
        workaround is to cancel all running jobs.

The behaviour is documented -- Bacula will never run two jobs having
different priorities.  The docs don't offer a rationale, though.  Alan
Brown (IIRC) mentioned that mixing priorities can cause problems when
backing up the Catalog, as it should always be done on its own to avoid
that other jobs fail spuriously.  That's a good point, but it seems like
a very heavy-handed solution to the problem, so I considered

One is to allow priorities of *running* jobs to be changed
interactively, so that they get the same high priority as the restore
job.  It's a bit counter-intuitive, and the concurrency of the system
will suffer since no new backup jobs will start until the running ones
(and the restore) are done.

Another solution is to add an option to allow higher priority jobs to
run alongside jobs with lower priorities.  If the option is a global
setting, you need to schedule the Catalog backup manually outside the
normal backup time slots to avoid the potential problem.

This leads to my proposal: make the priority override option an
attribute for each Job.  This way an installation can turn it on by
via JobDefs, and turn it off for the Catalog backup.  The attached patch
adds the keyword AllowMixedPriority (default false) to the Job resource.

(If accepted, I can write a documentation patch.)

Kjetil T. Homme
Linpro AS
Index: src/dird/dird_conf.c
--- src/dird/dird_conf.c	(revision 7263)
+++ src/dird/dird_conf.c	(working copy)
@@ -331,6 +331,7 @@
    {"selectiontype",      store_migtype, ITEM(res_job.selection_type), 0, 0, 0},
    {"usestatistics",      store_bool,  ITEM(res_job.stats_enabled), 0, 0, 0},
    {"accurate",           store_bool, ITEM(res_job.accurate), 0,0,0},
+   {"allowmixedpriority", store_bool, ITEM(res_job.allow_mixed_priority), 0, ITEM_DEFAULT, false},
    {"allowduplicatejobs", store_bool, ITEM(res_job.AllowDuplicateJobs), 0, ITEM_DEFAULT, false},
    {"allowhigherduplicates",   store_bool, ITEM(res_job.AllowHigherDuplicates), 0, ITEM_DEFAULT, true},
    {"cancelqueuedduplicates",  store_bool, ITEM(res_job.CancelQueuedDuplicates), 0, ITEM_DEFAULT, true},
Index: src/dird/jobq.c
--- src/dird/jobq.c	(revision 7263)
+++ src/dird/jobq.c	(working copy)
@@ -504,11 +504,22 @@
       Dmsg0(2300, "Done check ready, now check wait queue.\n");
       if (!jq->waiting_jobs->empty() && !jq->quit) {
          int Priority;
+         bool one_priority = true;
          je = (jobq_item_t *)jq->waiting_jobs->first();
          jobq_item_t *re = (jobq_item_t *)jq->running_jobs->first();
          if (re) {
             Priority = re->jcr->JobPriority;
             Dmsg2(2300, "JobId %d is running. Look for pri=%d\n", re->jcr->JobId, Priority);
+	    one_priority = false;
+	    for ( ; re;  ) {
+	       if (!re->jcr->job->allow_mixed_priority) {
+		  one_priority = true;
+		  break;
+	       }
+	       re = (jobq_item_t *)jq->running_jobs->next(re);
+	    }
+            Dmsg1(2300, "The running job(s) %s mixing priorities.\n",
+		  one_priority ? "don't allow" : "allow");
          } else {
             Priority = je->jcr->JobPriority;
             Dmsg1(2300, "No job running. Look for Job pri=%d\n", Priority);
@@ -526,7 +537,8 @@
                jcr->JobId, jcr->JobPriority, Priority);
             /* Take only jobs of correct Priority */
-            if (jcr->JobPriority != Priority) {
+            if (jcr->JobPriority > Priority ||
+		(jcr->JobPriority < Priority && one_priority)) {
                set_jcr_job_status(jcr, JS_WaitPriority);
Index: src/dird/dird_conf.h
--- src/dird/dird_conf.h	(revision 7263)
+++ src/dird/dird_conf.h	(working copy)
@@ -429,6 +429,7 @@
    bool enabled;                      /* Set if job enabled */
    bool OptimizeJobScheduling;        /* Set if we should optimize Job scheduling */
    bool stats_enabled;                /* Keep job records in a table for long term statistics */
+   bool allow_mixed_priority;         /* Allow jobs with higher priority concurrently with this */
    bool accurate;                     /* Set if it is an accurate backup job */
    bool AllowDuplicateJobs;           /* Allow duplicate jobs */
    bool AllowHigherDuplicates;        /* Permit Higher Level */
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
Bacula-devel mailing list

This mailing list archive is a service of Copilotco.