[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bacula-devel] Possible autochanger logic problem in 2.2.10-b1 SD


On Thursday 01 May 2008 18:30:51 Josh Fisher wrote:
> Kern Sibbald wrote:
> > On Wednesday 30 April 2008 23:02:10 Josh Fisher wrote:
> >> Kern Sibbald wrote:
> >>> On Wednesday 30 April 2008 18:49:38 Josh Fisher wrote:
> >>>> Is this a bug in version 2.2.10-b1?
> >>>
> >>> That is possible, but 2.2.10-b1 was *supposed* to fix most of these
> >>> kinds of problems.
> >>>
> >>>> Should the SD not have recognized that
> >>>> the volume it wanted was already loaded in another drive?
> >>>
> >>> Yes, though if the Volume was loaded in a drive at SD start time, it is
> >>> possible that the SD did not recognize it as loaded. For tapes, it will
> >>> check what is in each drive, but for a "disk" it doesn't expect any
> >>> Volume "mounted", so probably does not check.
> >>
> >> So far, I believe that is exactly what happened. Stopping SD, then
> >> making sure that the autochanger script thinks all drives are unloaded,
> >> then starting the SD seems to have fixed the problem. I will let you
> >> know if that turns out not to be the case.
> >
> > OK, please do give me your feedback either positive or negative ...  If
> > you are able to reproduce the problem, it would be best to submit it as a
> > bug and include the -d100 or possibly better -d150 of the SD.  You can
> > also turn on debug with the console setdebug command.
>
> Looks like it is a bug, and I have submitted a bug report including
> -d150 log. I don't believe it is related to the problem of starting SD
> with drives not empty, as it is reproducible even when starting the SD
> with both drives empty.

Oh, so you are jaybus.  I was wondering why this problem was showing up for 
more than one person, which isn't really that odd :-)

I haven't taken the time to actually duplicate the bug yet, but I have taken 
the time to look over your debug listing in detail, and I found a really 
interesting bug.  While a Volume is being swapped from one drive to another, 
it can get released from the Volume list, which means that the SD loses track 
of where it is.  This is very likely at least part of your problem (hopefully 
all of it).  

I'm testing the patch now, and will post it when the regression tests finish 
(probably within an hour).  Obviously the current tests are not finding this 
problem, but at least running them makes me more sure the fix doesn't cause 
bigger problems ...

Since I haven't yet duplicated the problem I cannot be sure this fixes 
everything, but I'll try to work on confirming it tomorrow or Saturday.

Thanks for the nice debug listing and the simple instructions on how to 
duplicate it.

With the new 2.2.10 code, the fix was actually quite trivial.  Most of the 
patch is comments/documentation.

Regards,

Kern

>
> >>> It this is what happened to you, I would recommend some procedure that
> >>> unloads all the drives when starting the SD.
> >>
> >> I think what likely happened is that when I stopped 2.2.8 to try
> >> 2.2.10-b1 all (two) drives were loaded. I didn't realize tape
> >> autoloaders were treated differently in this way. I wonder, though, if
> >> it would be better to unload the drives at SD shutdown, rather than
> >> startup.
> >
> > Yes, I agree. I wanted to find a work around for you, but the real
> > solution, IMO, is for the SD to unload all the virtual disk volumes at
> > startup, which is something I will look into doing.  It would even be
> > better if the SD would query, but currently there is no good way for the
> > code to remember what is mounted for disk volumes.  We are working on
> > that though for a future release ...
> >
> > Best regards,
> >
> > Kern
> >
> > PS: for anyone using 2.2.10-b1, it *should* fix most of the two drive
> > changer problems, though it is almost certain that I not handled all
> > possible problems. In any case, with 2.2.10-b1, it should be much easier
> > to fix any remaining problems ...
> >
> > You should know that one user reported that 2.2.10-b1 is unstable and
> > crashes, so I would appreciate any feedback from those using it (either
> > positive or negative).  That said, in all my tests, I find the SD far
> > more stable that 2.2.9 in 2 drive situations.  I have been running it in
> > production here with no problems, but then again, my backup needs are
> > relatively modest ...
> >
> >>>> Or is it the
> >>>> autochanger script's responsibility to swap the volume from one drive
> >>>> to another in this situation?
> >>>
> >>> No.
> >>>
> >>>> Looks like it unloaded drive 0, then checked
> >>>> what was in drive 0 (why?).
> >>>
> >>> It always checks again that the drive is empty before attempting to
> >>> load.
> >>>
> >>>> When it saw that drive 0 was empty it tried to
> >>>> load a volume into drive 0 that was already loaded in drive 1.
> >>>
> >>> That is what it looks like it tried to do -- incorrectly.
> >>>
> >>> To diagnose this kind of thing, I need it submitted as a bug report
> >>> with a debug listing of -d100 on the SD.   Also after the problem
> >>> occurs it would be nice (not mandatory) to have the output of a "status
> >>> storage=xxx" so I can see what Volumes are in the Volume reservation
> >>> list.
> >>>
> >>> Kern
> >>>
> >>>> 29-Apr 23:51 s3-sd JobId 7094: 3301 Issuing autochanger "loaded? drive
> >>>> 0" command. 29-Apr 23:51 s3-sd JobId 7094: 3302 Autochanger "loaded?
> >>>> drive 0", result is Slot 17. 29-Apr 23:51 s3-sd JobId 7094: 3301
> >>>> Issuing autochanger "loaded? drive 0" command. 29-Apr 23:51 s3-sd
> >>>> JobId 7094: 3302 Autochanger "loaded? drive 0", result is Slot 17.
> >>>> 29-Apr 23:51 s3-sd JobId 7094: Volume "m03s017" previously written,
> >>>> moving to end of data. 29-Apr 23:51 s3-sd JobId 7094: Ready to append
> >>>> to end of Volume "m03s017" size=2459827940 s1-fd JobId 7094:      /dev
> >>>> is a different filesystem. Will not descend from / into /dev 30-Apr
> >>>> 00:00 s3-sd JobId 7094: User defined maximum volume capacity
> >>>> 8,420,065,280 exceeded on device "usb-changer-1-drive-0"
> >>>> (/var/lib/bacula/vchanger1/drive0). 30-Apr 00:00 s3-sd JobId 7094: End
> >>>> of medium on Volume "m03s017" Bytes=8,420,027,000 Blocks=130,536 at
> >>>> 30-Apr-2008 00:00. 30-Apr 00:00 s3-sd JobId 7094: 3307 Issuing
> >>>> autochanger "unload slot 17, drive 0" command. 30-Apr 00:00 s3-sd
> >>>> JobId 7094: 3301 Issuing autochanger "loaded? drive 0" command. 30-Apr
> >>>> 00:00 s3-sd JobId 7094: 3302 Autochanger "loaded? drive 0", result:
> >>>> nothing loaded. 30-Apr 00:00 s3-sd JobId 7094: 3304 Issuing
> >>>> autochanger "load slot 16, drive 0" command. 30-Apr 00:00 s3-sd JobId
> >>>> 7094: Fatal error: 3992 Bad autochanger "load slot 16, drive 0":
> >>>> ERR=Child exited with code 1. Results=Storage Element 16 Empty (loaded
> >>>> in drive 1)
> >>>>
> >>>> --- Josh Fisher
> >>>>
> >>>>
> >>>> ----------------------------------------------------------------------
> >>>>-- - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
> >>>> Don't miss this year's exciting event. There's still time to save
> >>>> $100. Use priority code J8TL2D2.
> >>>> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com
> >>>>/j ava one _______________________________________________
> >>>> Bacula-devel mailing list
> >>>> Bacula-devel@xxxxxxxxxxxxxxxxxxxxx
> >>>> https://lists.sourceforge.net/lists/listinfo/bacula-devel
> >>
> >> ------------------------------------------------------------------------
> >>- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't
> >> miss this year's exciting event. There's still time to save $100. Use
> >> priority code J8TL2D2.
> >> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/j
> >>ava one _______________________________________________
> >> Bacula-devel mailing list
> >> Bacula-devel@xxxxxxxxxxxxxxxxxxxxx
> >> https://lists.sourceforge.net/lists/listinfo/bacula-devel



-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Bacula-devel mailing list
Bacula-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/bacula-devel


This mailing list archive is a service of Copilotco.