[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bacula-devel] Director bug when using two storage daemons?


On Thursday 20 November 2008 17:13:07 Robin O'Leary wrote:
> On Mon, Nov 17, 2008 at 05:24:13PM +0100, Kern Sibbald wrote:
> > On Monday 17 November 2008 16:51:31 Graham Keeling wrote:
> > > On Mon, Nov 17, 2008 at 02:05:53PM +0100, Kern Sibbald wrote:
> > > > Solution:
> > > > - I don't have one, because we have no way to "lock" a volume from
> > > > being purged.  Any thing we might do would be prone to errors if the
> > > > SD should fail while the volume was "locked".
> > > > Bottom line: it is easy to work around this problem, and unless we
> > > > are lucky and come up with a good idea, I don't see that there is any
> > > > easy way to resolve the problem.
>
> Luckily, Kern did come up with a good idea!
> 	http://bugs.bacula.org/view.php?id=1188
> Thanks.  This fixes the problem in the test case and in our real-world
> situation.
>
> As Kern points out above, this fix does open up the possibility that
> certain sorts of abnormal failure can leave a volume in a state where
> it won't ever be re-used without manual intervention, but that is
> certainly preferable to having volumes silently overwritten (and we
> have a suggestion to improve that in another post and bug report).
>
> On Mon, Nov 17, 2008 at 05:24:13PM +0100, Kern Sibbald wrote:
> > On Monday 17 November 2008 16:51:31 Graham Keeling wrote:
> > > I do not think that it is easy to work around this problem. I also
> > > think that the problem is very serious and that it is quite likely that
> > > other people have triggered it without noticing - it is hard to realise
> > > that it has happened if you are not watching very closely indeed.
> >
> > You have certainly found a bug, but it is a rather artificial problem
> > that virtually no one is likely to have, so I do not consider it at this
> > point to be too serious.
>
> Looking back through the mailing list discussions for similar topics,
> I think this bug is possibly related to Ulrich Leodolter's query "Maximum
> Volume Jobs ignored":
> 	http://sourceforge.net/mailarchive/message.php?msg_id=1211877887.7417.17.c
>amel%40leodolter.bibvb.ac.at and surely also the cause of Kevin Keane's
> "Bacula ignores Max Volume Jobs":
> http://sourceforge.net/mailarchive/forum.php?thread_name=491E952B.5070608%4
>0kkeane.com&forum_name=bacula-users
>
> The documentation refered to in the latter thread (Configuring the
> Director, The Pool Resource, Maximum Volume Jobs) says:
>
> 	If you are running multiple simultaneous jobs, this directive
> 	[Maximum Volume Jobs] may not work correctly because when a drive
> 	is reserved for a job, this directive is not taken into account,
> 	so multiple jobs may try to start writing to the Volume. At some
> 	point, when the Media record is updated, multiple simultaneous
> 	jobs may fail since the Volume can no longer be written.
>
> It sounds as if this might be alluding to the same issue, so should this
> caveat be removed from the documentation now we have a fix?  

If I could be completely sure the problem is fixed, I would like nothing 
better than removing the caveat. I haven't looked at the problem reports you 
mention above to form a good enough opinion.  Since I have been and am aware 
of these kinds of constraints, I have when possible been adding new code that 
removes these restrictions or at least reduces them -- In the current SVN it 
may well be that most of these features now work correctly, because I have 
recently added code to explicitly check for these conditions in both the Dir 
and the SD.

> If it needs 
> to stay (to document the behaviour of earlier versions before the fix, or
> becuase there are other problematic conditions not covered by the fix),
> perhaps the wording of the last bit should be amended to point out that
> even jobs reported as successful could be damaged (if that's still the
> case), and perhaps it should be moved (or copied) to the section of the
> manual specifically related to concurrent jobs: "Basic Volume Management",
> "Concurrent Disk Jobs".

Generally, we don't keep information in the current document if it only 
pertains to old versions -- at least when we find it.  We do, however, often 
tell users in what version a particular feature or correction was 
implemented.

Regards,

Kern


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-devel mailing list
Bacula-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/bacula-devel


This mailing list archive is a service of Copilot Consulting.