[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bacula-devel] Autoloader issue (was: Patch: Migration jobmedia table insert incomplete)


Hello,

I am sorry, but this email is not something we cannot act on because:

1. The background is snipped, and the problem is not described sufficiently 
(see www.bacula.org -> Bugs for information; e.g. we don't even know what 
version of Bacula you are using).

2. Unfortunately your patch will not help you.  It is a NOOP -- i.e. it 
doesn't change anything since at the points you are testing NumConcurrentJobs 
is *guaranteed* to be non-zero (at least in the current code base).


Best regards,

Kern

On Sunday 09 March 2008 01.34:06 Peter Much wrote:
> <al@xxxxxxxxxxxxxx> aka Arno Lehmann  schrieb
>
> mit Datum Sun, 02 Mar 2008 12:50:17 +0100 in m2n.bacula.devel:
> |> 2. This is the thing that I have been worrying the most about. I
> |>    have been following various theories about what might happen
> |>    there, yet to no avail. The last of my theories was that it might
> |>    have to do with the migrations, but currently I tend to dismiss
> |>    this theory also. In fact, I am still clueless.
> |>    What happens is that the Director puts all jobs (and all newly
> |>    started jobs) into either "waiting on max Storage jobs" or
> |>    "waiting execution", while there is no job running on any client
> |>    and no job running on the SD. It just does nothing and has to
> |>    be restarted.
> |
> |That definitely qualifies as a bug... have you tried looking at the
> |debug output, once the DIR is in this state?
>
> This was a good hint. The debug shows this:
> >BxDir: jcr.c:603-0 OnEntry JobStatus=s set=s
> >BxDir: jcr.c:623-0 OnExit JobStatus=s set=s
> >BxDir: jobq.c:701-0 Wstore=Files
> >BxDir: jobq.c:723-0 Fail wncj=-2
>
> And what I also have seen is rncj=-2, and rncj=3.
>
> Looking into jobq.c, I find that rncj is never supposed to take any
> value except 0 and 1 (maximum one read job per device).
> OTOH, I find that rncj is not a unique entity - it is just the
> NumConcurrentJobs of any Storage device.
>
> So, this seems not to be a migration issue, it seems to be a problem
> with multidrive autoloaders.
> According to the manual, since Bacula version 1.whatever an
> autoloader has to be defined as a single device in the DIR.
> So, if this autoloader has multiple drives, it is well possible
> that these drives are used for reading AND writing at the same time.
>
> And this seems to break the rncj/wncj logic. My current most likely
> interpretation runs that way: Suppose we have one restore running:
> rncj=1. Then we get two backups running: wncj=rncj=3. Then the
> restore terminates and sets rncj=0. So, when the two backup
> jobs terminate, it goes to -2  - and this is where the show ends.
>
> I am now trying the following as a fix, and see if it helps.
>
> rgds,
> PMc
>
> --- src/dird/jobq.c.orig        Mon Dec 10 18:54:41 2007
> +++ src/dird/jobq.c     Sun Mar  9 00:27:02 2008
> @@ -478,7 +478,8 @@
>            */
>           if (jcr->acquired_resource_locks) {
>              if (jcr->rstore) {
> -               jcr->rstore->NumConcurrentJobs = 0;
> +               if (jcr->rstore->NumConcurrentJobs > 0)
> +                  jcr->rstore->NumConcurrentJobs--;
>                 Dmsg1(200, "Dec rncj=%d\n",
> jcr->rstore->NumConcurrentJobs); }
>              if (jcr->wstore) {
> @@ -738,7 +739,8 @@
>           Dmsg1(200, "Dec wncj=%d\n", jcr->wstore->NumConcurrentJobs);
>        }
>        if (jcr->rstore) {
> -         jcr->rstore->NumConcurrentJobs = 0;
> +         if(jcr->rstore->NumConcurrentJobs > 0);
> +            jcr->rstore->NumConcurrentJobs--;
>           Dmsg1(200, "Dec rncj=%d\n", jcr->rstore->NumConcurrentJobs);
>        }
>        set_jcr_job_status(jcr, JS_WaitClientRes);
> @@ -753,7 +755,8 @@
>           Dmsg1(200, "Dec wncj=%d\n", jcr->wstore->NumConcurrentJobs);
>        }
>        if (jcr->rstore) {
> -         jcr->rstore->NumConcurrentJobs = 0;
> +         if(jcr->rstore->NumConcurrentJobs > 0);
> +            jcr->rstore->NumConcurrentJobs--;
>           Dmsg1(200, "Dec rncj=%d\n", jcr->rstore->NumConcurrentJobs);
>        }
>        jcr->client->NumConcurrentJobs--;
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> Bacula-devel mailing list
> Bacula-devel@xxxxxxxxxxxxxxxxxxxxx
> https://lists.sourceforge.net/lists/listinfo/bacula-devel



-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bacula-devel mailing list
Bacula-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/bacula-devel


This mailing list archive is a service of Copilotco.