Re: [Bacula-devel] bacula-sd hanging after tape gets full + unload (2.5.19)

On Thu, Dec 04, 2008 at 02:13:56PM +0200, Pasi Kärkkäinen wrote:

> > Could you stop all daemons with a sigsegv to force a backtrace ?
> > killall -SEGV bacula-sd bacula-dir
> > 
> > (you will find 2 kind of file, *traceback and *bactrace in working directory)
> > 
> > After, if you can put results to pastbin, it will give information about your 
> > problem.
> > 
> Ok, problems again.. here are the tracebacks:
> http://pasik.reaktio.net/bacula/debug/bacula-sd-traceback.txt
> http://pasik.reaktio.net/bacula/debug/bacula-dir-traceback.txt
> Here's what I did to make bacula-sd hang:
> 1. Rebooted the bacula server and the tape library
> 2. Fresh after the reboot made sure mtx and bacula mtx-changer work OK.
> 3. Started bacula
> 4. Ran a job that copies jobs from disk pool to tape pool
> 5. Bacula starts a bunch of jobs, but nothing happens.. bacula-sd is stuck.
> Any ideas how to debug this further? 
> Atm I'm running Bacula 2.5.20 (svn rev 8083) on CentOS 5.2 x86 32bit.
> I also tried applying 2.4.3-sd-deadlock.patch (from bug #1192) but it didn't
> seem to help.

And how did I verify bacula-sd is stuck/hanged.. 

- Checking what's happening on SCSI devices with "iostat 1" -> I don't see any disk activity.
- Nothing happens in bconsole
- Checking the status of Storage (tape pool) in bconsole makes bconsole stuck:


Like that.. nothing appears after "Used Volume status:".

I've noticed always when this crash/hang happens the status of tape drive is like this:

Device "IBM-LTO3-Drive" (/dev/nst0) is not open.
    Device is being initialized.
    Drive 0 status unknown.

Physical tape drive/library doesn't show any errors on the display, and everything looks OK. 
I don't see any SCSI errors in dmesg or logs.

-- Pasi

