[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bacula-devel] Director bug when using two storage daemons?

On Mon, Nov 17, 2008 at 05:24:13PM +0100, Kern Sibbald wrote:
> On Monday 17 November 2008 16:51:31 Graham Keeling wrote:
> > On Mon, Nov 17, 2008 at 02:05:53PM +0100, Kern Sibbald wrote:
> > > Solution:
> > > - I don't have one, because we have no way to "lock" a volume from being
> > > purged.  Any thing we might do would be prone to errors if the SD should
> > > fail while the volume was "locked".
> > > Bottom line: it is easy to work around this problem, and unless we are
> > > lucky and come up with a good idea, I don't see that there is any easy
> > > way to resolve the problem.
Luckily, Kern did come up with a good idea!
Thanks.  This fixes the problem in the test case and in our real-world

As Kern points out above, this fix does open up the possibility that
certain sorts of abnormal failure can leave a volume in a state where
it won't ever be re-used without manual intervention, but that is
certainly preferable to having volumes silently overwritten (and we
have a suggestion to improve that in another post and bug report).

On Mon, Nov 17, 2008 at 05:24:13PM +0100, Kern Sibbald wrote:
> On Monday 17 November 2008 16:51:31 Graham Keeling wrote:
> > I do not think that it is easy to work around this problem. I also think
> > that the problem is very serious and that it is quite likely that other
> > people have triggered it without noticing - it is hard to realise that it
> > has happened if you are not watching very closely indeed.
> You have certainly found a bug, but it is a rather artificial problem that 
> virtually no one is likely to have, so I do not consider it at this point to 
> be too serious.

Looking back through the mailing list discussions for similar topics,
I think this bug is possibly related to Ulrich Leodolter's query "Maximum
Volume Jobs ignored":
and surely also the cause of Kevin Keane's "Bacula ignores Max Volume Jobs":

The documentation refered to in the latter thread (Configuring the Director,
The Pool Resource, Maximum Volume Jobs) says:

	If you are running multiple simultaneous jobs, this directive
	[Maximum Volume Jobs] may not work correctly because when a drive
	is reserved for a job, this directive is not taken into account,
	so multiple jobs may try to start writing to the Volume. At some
	point, when the Media record is updated, multiple simultaneous
	jobs may fail since the Volume can no longer be written.

It sounds as if this might be alluding to the same issue, so should this
caveat be removed from the documentation now we have a fix?  If it needs
to stay (to document the behaviour of earlier versions before the fix, or
becuase there are other problematic conditions not covered by the fix),
perhaps the wording of the last bit should be amended to point out that
even jobs reported as successful could be damaged (if that's still the
case), and perhaps it should be moved (or copied) to the section of the
manual specifically related to concurrent jobs: "Basic Volume Management",
"Concurrent Disk Jobs".

Robin O'Leary.
email: robin@xxxxxxxxxxxx    Equiinet Ltd., Edison Road, Dorcan,
Tel.:  +44 1793 603708       Swindon, SN3 5JX, U.K.  51.5558N,1.7286W

Attachment: signature.asc
Description: Digital signature

This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
Bacula-devel mailing list

This mailing list archive is a service of Copilot Consulting.