[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bacula-devel] bacula-sd hanging after tape gets full + unload (2.5.19)

Hello list!

I'm using Bacula 2.5.19 and trying 'copy jobs' feature to copy jobs from
disk volumes/pools to tape.

Sometimes bacula-sd seems to get stuck.. it hangs without doing anything.
Now it happened when tape got full and Bacula started to change the tape on
the drive (using autoloader):

bacula-sd JobId 3082: Start Copying JobId 3082, Job=CopyPool4UncopiedToTape.2008-11-13_10.53.04.54
bacula-sd JobId 3082: Using Device "IBM-LTO3-Drive"
bacula-sd JobId 3082: Ready to read from volume "Pool4-Vol-0127" on device "FSDevice4" (/mnt/backup1/pool04).
bacula-sd JobId 3082: Forward spacing Volume "Pool4-Vol-0127" to file:block 0:218.
bacula-sd JobId 3082: End of Volume "756NNNL3" at 764:10067 on device "IBM-LTO3-Drive" (/dev/nst0). Write of 64512 bytes got -1.
bacula-sd JobId 3082: Re-read of last block succeeded.
bacula-sd JobId 3082: End of medium on Volume "756NNNL3" Bytes=725,237,130,240 Blocks=11,241,894 at 13-Nov-2008 11:51.
bacula-sd JobId 3082: 3307 Issuing autochanger "unload slot 3, drive 0" command.

<nothing happens after this>

Status available for:
     1: Director
     2: Storage
     3: Client
     4: All
Select daemon type for status (1-4): 2


Device status:
Autochanger "IBM-LTO3-AutoChanger" with devices:
   "IBM-LTO3-Drive" (/dev/nst0)
Device "FSDevice0" (/mnt/backup1/pool00) is not open.
Device "FSDevice1" (/mnt/backup1/pool01) is not open.
Device "FSDevice2" (/mnt/backup1/pool02) is not open.
Device "FSDevice3" (/mnt/backup1/pool03) is not open.
Device "FSDevice4" (/mnt/backup1/pool04) is mounted with:
    Volume:      Pool4-Vol-0127
    Pool:        Pool4
    Media type:  File4
    Total Bytes Read=1,649,507,328 Blocks Read=25,569 Bytes/block=64,512
    Positioned at File=0 Block=1,649,507,534
Device "IBM-LTO3-Drive" (/dev/nst0) is not open.
    Device is being initialized.
    Drive 0 is not loaded.

Used Volume status:

<hangs here and nothing happens>

I can exit bconsole by pressing CTRL+C multiple times.. if I restart
bconsole and run that again, it gets stuck again.. 

I tried 'strace -p <pid>' to see what bacula-sd is doing:

# strace -p 7339
Process 7339 attached - interrupt to quit
select(5, [4], NULL, NULL, NULL <unfinished ...>
Process 7339 detached

So.. bacula-sd seems to be stuck on select() .. 

Running 'mtx' seems to work fine.. at the same time when bacula-sd is stuck.

# mtx -f /dev/sg3 status
  Storage Changer /dev/sg3:1 Drives, 8 Slots ( 0 Import/Export )
Data Transfer Element 0:Empty
      Storage Element 1:Full :VolumeTag=179MMML3
      Storage Element 2:Full :VolumeTag=658NNNL3
      Storage Element 3:Full :VolumeTag=756NNNL3
      Storage Element 4:Full :VolumeTag=177MMML3
      Storage Element 5:Full :VolumeTag=655NNNL3
      Storage Element 6:Full :VolumeTag=656NNNL3
      Storage Element 7:Full :VolumeTag=657NNNL3
      Storage Element 8:Full :VolumeTag=CLNU38L1

Any ideas how to fix this? Other than restarting Bacula.. 

I don't see any IO errors in dmesg and/or messages.

-- Pasi

This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
Bacula-devel mailing list

This mailing list archive is a service of Copilot Consulting.