[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bacula-devel] bacula-sd hanging after tape gets full + unload (2.5.19)


On Thu, Dec 04, 2008 at 03:31:08PM +0200, Pasi Kärkkäinen wrote:
> On Thu, Dec 04, 2008 at 02:47:45PM +0200, Pasi Kärkkäinen wrote:
> > On Thu, Dec 04, 2008 at 01:41:21PM +0100, Eric Bollengier wrote:
> > > Hello,
> > > 
> > > I think that i have a fix, please, could you try it ?
> > > 
> > > cd trunk
> > > patch -p0 < sd-hang.patch
> > > make && make install
> > > 
> > 
> > Thanks! I'll try it now. 
> > 
> 
> And it seems this patch fixed my problem !!
> 
> Now let's see what happens when the tape gets full and it needs to be
> unloaded and swapped to some other tape.. that's when I first met this
> problem.. 
> 
> But yeah, now at least the job got started and bacula-sd didn't get
> stuck/hang/crash.
> 
> Thanks a lot!
> 

And it still seems to work! Running fine after tape unload/swap.. at least
so far :) 

Will you commit this patch to SVN? 

-- Pasi

> 
> > 
> > > 
> > > Le Thursday 04 December 2008 13:38:12 Pasi Kärkkäinen, vous avez écrit :
> > > > On Thu, Dec 04, 2008 at 01:33:46PM +0100, Eric Bollengier wrote:
> > > > > > > Could you stop all daemons with a sigsegv to force a backtrace ?
> > > > > > > killall -SEGV bacula-sd bacula-dir
> > > > > > >
> > > > > > > (you will find 2 kind of file, *traceback and *bactrace in working
> > > > > > > directory)
> > > > > > >
> > > > > > > After, if you can put results to pastbin, it will give information
> > > > > > > about your problem.
> > > > > >
> > > > > > Ok, problems again.. here are the tracebacks:
> > > > > >
> > > > > > http://pasik.reaktio.net/bacula/debug/bacula-sd-traceback.txt
> > > > > > http://pasik.reaktio.net/bacula/debug/bacula-dir-traceback.txt
> > > > > >
> > > > > > Here's what I did to make bacula-sd hang:
> > > > > >
> > > > > > 1. Rebooted the bacula server and the tape library
> > > > > > 2. Fresh after the reboot made sure mtx and bacula mtx-changer work OK.
> > > > > > 3. Started bacula
> > > > > > 4. Ran a job that copies jobs from disk pool to tape pool
> > > > > > 5. Bacula starts a bunch of jobs, but nothing happens.. bacula-sd is
> > > > > > stuck.
> > > > > >
> > > > > > Any ideas how to debug this further?
> > > > >
> > > > > Thanks for this traceback, it's very useful, i have found a problem in
> > > > > the code.
> > > > >
> > > > > in bool DCR::can_i_write_volume() we have :
> > > > >    lock_read_volumes();
> > > > >    vol = find_read_volume(VolumeName);
> > > > >
> > > > > And the first step of find_read_volume() is to call lock_read_volumes().
> > > > > And this lock is not recursive.
> > > > >
> > > > > Now, i will take a look.
> > > >
> > > > Thanks. Good to hear those tracebacks helped.
> > > >
> > > > If you want me to test patches, I'm happy to help.
> > > >
> > > > -- Pasi
> > > >
> > 
> > > Index: src/stored/vol_mgr.c
> > > ===================================================================
> > > --- src/stored/vol_mgr.c        (révision 8100)
> > > +++ src/stored/vol_mgr.c        (copie de travail)
> > > @@ -681,9 +681,7 @@
> > >  {
> > >     VOLRES *vol;
> > > 
> > > -   lock_read_volumes();
> > >     vol = find_read_volume(VolumeName);
> > > -   unlock_read_volumes();
> > >     if (vol) {
> > >        Dmsg1(100, "Found in read list; cannot write vol=%s\n", VolumeName);
> > >        return false;
> > 
> 

------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
Bacula-devel mailing list
Bacula-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/bacula-devel


This mailing list archive is a service of Copilotco.