[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bacula-devel] bacula-sd hanging after tape gets full + unload (2.5.19)


On Thu, Dec 04, 2008 at 03:32:37PM +0100, Ulrich Leodolter wrote:
> On Thu, 2008-12-04 at 14:35 +0200, Pasi Kärkkäinen wrote:
> > On Thu, Dec 04, 2008 at 02:13:56PM +0200, Pasi Kärkkäinen wrote:
> > 
> > > > Could you stop all daemons with a sigsegv to force a backtrace ?
> > > > killall -SEGV bacula-sd bacula-dir
> > > > 
> > > > (you will find 2 kind of file, *traceback and *bactrace in working directory)
> > > > 
> > > > After, if you can put results to pastbin, it will give information about your 
> > > > problem.
> > > > 
> > > 
> > > Ok, problems again.. here are the tracebacks:
> > > 
> > > http://pasik.reaktio.net/bacula/debug/bacula-sd-traceback.txt
> > > http://pasik.reaktio.net/bacula/debug/bacula-dir-traceback.txt
> > > 
> > > Here's what I did to make bacula-sd hang:
> > > 
> > > 1. Rebooted the bacula server and the tape library
> > > 2. Fresh after the reboot made sure mtx and bacula mtx-changer work OK.
> > > 3. Started bacula
> > > 4. Ran a job that copies jobs from disk pool to tape pool
> > > 5. Bacula starts a bunch of jobs, but nothing happens.. bacula-sd is stuck.
> > > 
> > > Any ideas how to debug this further? 
> > > 
> > > Atm I'm running Bacula 2.5.20 (svn rev 8083) on CentOS 5.2 x86 32bit.
> > > 
> > > I also tried applying 2.4.3-sd-deadlock.patch (from bug #1192) but it didn't
> > > seem to help.
> > > 
> > 
> > And how did I verify bacula-sd is stuck/hanged.. 
> > 
> > - Checking what's happening on SCSI devices with "iostat 1" -> I don't see any disk activity.
> > - Nothing happens in bconsole
> > - Checking the status of Storage (tape pool) in bconsole makes bconsole stuck:
> > 
> > http://pasik.reaktio.net/bacula/debug/bconsole-sd-hang.txt
> > 
> 
> Hi,
> 
> Did you notice broken "Terminated Jobs:" list in bconsole-sd-hang.txt?
>

Yeah I did actually.. didn't pay too much attention to that, because it was
more important to get the hang fixed :) 
 
> 
> Here ist my output of "status dir" (after upgrade to current svn)
> 

Mine was from "status Storage".. but yeah, something wrong in the list..

-- Pasi

> 
> Connecting to Director troll:9101
> 1000 OK: troll-dir Version: 2.5.22 (01 December 2008)
> Enter a period to cancel a command.
> *status dir
> troll-dir Version: 2.5.22 (01 December 2008) i686-pc-linux-gnu redhat
> Enterprise release
> Daemon started 04-Dec-08 15:18, 0 Jobs run since started.
>  Heap: heap=274,432 smbytes=129,912 max_bytes=130,465 bufs=1,260
> max_bufs=1,294
> 
> ...
> 
> Terminated Jobs:
>  JobId  Level    Files      Bytes   Status   Finished        Name 
> ====================================================================
>   8696  Incr        366    6.058 G  OK       03-Feb-27 18:17
> belix.2008-12-04_09
> 1228072365  ??? (2          1    5.275 E  Other    29-Sep-21 15:51
> 008-12-04_09
> 104123763  8 (5   1,228,379,050    8.299 E  Other    15-Jan-94 19:04
> 4_09
> 1228378991         1,802,725,700    3.615 E  Other    28-May-00 01:50 56
> 1801675074  ??? (1   841,903,973    3.471 E  Other    28-Sep-97 12:49 
> 1632923476  D (1   808,268,337    3.328 E  Other    01-Jan-70 01:00 
> 758657072  i (8   774,975,534    232.8 G  Other    01-Jan-70 01:00 
> 959471412  1 (8         53         0   Other    01-Jan-70 01:00 
> 775304494  9 (8          0         0   Other    01-Jan-70 01:00 
> 
> 

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-devel mailing list
Bacula-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/bacula-devel


This mailing list archive is a service of Copilotco.