[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bacula-devel] Remaining dual changer problems
Kern Sibbald wrote:
> On Saturday 24 May 2008 15:54:51 Eric Bollengier wrote:
>> On Saturday 24 May 2008 15:08:19 Kern Sibbald wrote:
>>> Hello Eric,
>>> I assume that since I haven't heard anything from you that the last fixes
>>> (2.2.10-b4) to the reservation system, fixed the problems you were
>>> having. For others on the list, the company Eric works for runs 270
>>> nightly jobs so they have a tendency to run into Bacula SD bugs.
>> Yes everything is ok right now.
>>> There still remains one outstanding problem of which I am aware that
>>> fortunately is not hitting you, and that is bug #1083 "SD attempts to
>>> load volume already loaded in another drive for multi-drive disk
>>> autochanger". This bug shows up only during swapping of a volume from one
>>> drive to another and typically is created when PreferMountedVolumes=no.
>>> I know what is causing the problem and after thinking about different
>>> solutions, I think the best one is to add a new autochanger script query.
>>> For memory the current commands are:
>>> # The commands are:
>>> # Command Function
>>> # unload unload a given slot
>>> # load load a given slot
>>> # loaded which slot is loaded?
>>> # list list Volume names (requires barcode reader)
>>> # slots how many slots total?
>>> The new one would be "where" and would be called
>>> mtx-changer "changer-device" where "slot-number"
>>> the other two arguments would be ignored. This new function asks where a
>>> Volume with "slot-number" is located.
>>> The answer can be:
>>> slot nnn
>>> drive nnn
>> If you ask "where 10" you will have
>> slot 10
>> drive 1
> Yes, you would get one or the other but not both. I forgot to mention that it
> could also return an error in case the volume for slot 10 does not exist.
> Otherwise if it does exist, it is either in its slot and returns "slot 10" or
> in a drive and returns "drive n" where n is the drive index (zero based).
>> ok, or we can change the "load" function to do the work.
>> load drive0 slot1
>> - check if slot1 is already loaded (do nothing if already in drive0)
>> - load the slot1 to drive0 if slot1 is unloaded
>> - exit with error code 1,2 or 3 and with a message "already loaded 2"
>> And we have to handle the third case in the SD.
> Yes, that would be possible, but since we have to handle the third case, I
> prefer to keep each of the commands to mtx-changer as simple as possible.
> This means a few more calls from the SD, but it makes the code much cleaner,
> and I would strongly prefer not to change any of the existing commands.
>>> This command would be issued before each load request, and if the Volume
>>> is already in the correct drive, nothing more would be done; if the
>>> volume is in its slot, it would be loaded; and if the volume is in a
>>> different drive, it would be unloaded, then loaded into the desired
>>> This would allow a simple interface for the SD to ensure that it takes
>>> the right action to load a particular slot in a particular drive without
>>> the need for trying to track it within the SD. For me it makes the most
>>> sense because it is the changer device that definitively knows where the
>>> volume is.
>> It makes sens and will simplify the code, it's just a bit strange to loose
>> volume location across the code, but i know that it's a very complex part
>> of the SD.
> Actually, we wouldn't lose any Volume location compared to what happens today.
> The problem today is that once the SD knows that a Volume is no longer going
> to be used on a particular drive, the info about that Volume is lost in the
> For example, we currently have Vol001 on drive 0. The SD wants to load Vol002
> on drive 0. Before that operation, the SD has Vol001 in the Volume list, and
> after that operation it has only Vol002. I thought about keeping both, but
> that is a real nightmare from a coding stand point -- you need to know that
> Vol001 must be unloaded and that Vol002 must be loaded, and somehow the drive
> must point to both Volumes. The situation becomes even more complicated when
> moving a Volume from one drive to another.
> In the end, I decided that the SD will keep track of what Volumes it has
> mounted and is using, and will ask the autochanger questions when it wants to
> move them around rather than trying to have the SD duplicate all the info
> that is kept by the autochanger. Duplicating the info is dangerous because
> the SD generally knows a Volume name and the Slot number, and possibly a
> drive if it is loaded. The Autochanger generally does not know the Volume
> name (unless barcodes are enabled and being used).
The "where" could be emulated by doing a "loaded" on each drive. The
only difference is that a "where" followed by a "load" requires only two
locks of the mutex, where with using "loaded" it requires a lock for
each drive plus one for the "load" command. The problem is, requiring
two locks is not much better than requiring 4 or 5, because it still
introduces a race condition. Imagine two simultaneous jobs using the
same pool when the next available volume for that pool is in slot 3. Job
1 is trying to use drive 0 and job 2 is trying to use drive 1. The
following is likely to happen.
1. Both jobs begin by issuing "where" nearly simultaneously.
2. Job 1 happens to get the lock first and calls "where", forcing job 2
to wait on the mutex.
3. When "where" returns, job 1 releases the mutex and sees the volume is
in slot 3
4. Job 2 locks the mutex it has been waiting on and calls "where".
5. Job 1 decides to perform a "load" from slot 3 into drive 0, so begins
waiting on the mutex.
6. When "where" returns, job 2 releases the mutex and sees the volume is
in slot 3
7. Job 1 locks the mutex it has been waiting on and loads slot 3 into
8. Job 2 decides to perform a "load" from slot 3 into drive 1, so begins
waiting on the mutex.
9. When "load" returns, job 1 releases the mutex
10. Job 2 locks the mutex it has been waiting on and attempts to load
slot 3 into drive 1
This results in job 2 failing because the volume is no longer in slot 3.
Its "where" essentially gave it an incorrect answer, though the answer
was correct at the time the command was issued.
My understanding from looking at autochanger.c is that calls to the
autochanger script are serialized by locking/unlocking a mutex defined
for each autochanger resource, thus making autochanger commands atomic.
My humble thought is that making the autochanger commands atomic is not
sufficient when more than one command is required to make the needed
volume available. The autochanger needs to be locked the entire time,
from the start of the search for the next available volume until a
appendable volume is marked in use. If this can be done, then there
would be no need to change either the Autochanger API or any existing
>>> Aside from wanting feedback on this idea, the big question is when to
>>> implement this. Clearly it should be done before the next major version
>>> as it will allow us to eliminate a class of annoying little problems.
>>> I am also considering the possibility of implementing it in Branch-2.2,
>>> but I really don't like that idea too much because it means that it will
>>> break all the autochanger scripts implemented by users (at least one
>>> virtual disk changer and the FreeBSD chio based script). In any case, it
>>> is *very* unlikely to be implemented before the 2.2.10 release.
>> I'm agree with you, changing the mtx script during a 2.2.X release is not a
>> very good idea, but if the changelog and the error message is clear, i
>> think that it can be done.
> Yes, that is my feeling too. I think once 2.2.10 is out (hopefully a the end
> of next week), we should encourage anyone having big problems with the
> autochanger to try the development version, where we can implement this idea
> (providing it still seems like a good idea after a longer reflection on the
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> Bacula-devel mailing list
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
Bacula-devel mailing list
This mailing list archive is a service of Copilotco.