[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bacula-devel] Problem in 2.2.9-p3 (was: Patch: Migration media table update ...)


Hi all, 

  I now can say for certain that there is a problem in 2.2.9-p3 
which was not present in 2.2.8, and it will hurt people who
use a (real) 2+ drive autochanger.

  This is *not* one of my "highly-ambitions-migration stuff" - it
will give problems with plain normal backups.

  I demonstrate it from the logfile:


1. This is my autochanger emulation as defined to the SD:

>Device status:
>Autochanger "FileSwap" with devices:
>   "Files0" (/var/bxpool/files0)
>   "Files1" (/var/bxpool/files1)
>   "Files2" (/var/bxpool/files2)
>   "Files3" (/var/bxpool/files3)
>Device "Files0" (/var/bxpool/files0) is not open.
>    Drive 0 status unknown.
>Device "Files1" (/var/bxpool/files1) is not open.
>    Drive 1 is not loaded.
>Device "Files2" (/var/bxpool/files2) is not open.
>    Drive 2 status unknown.
>Device "Files3" (/var/bxpool/files3) is not open.
>    Drive 3 status unknown.


2. This is my "undelete" Function: a plain incremental backup
   of the homedirs running every 10 minutes, at xx:x4. We can see 
   that it currently writes to Volume DISK-004 (aka slot 4) mounted
   to "Files0" (aka drive 0):

29-Mar 19:04 BxDir JobId 19445: Start Backup JobId 19445, Job=HomeDirsEdge.2008-03-29_19.04.30
29-Mar 19:04 BxDir JobId 19445: Using Device "Files0" 
29-Mar 19:04 BxSdGate JobId 19445: 3301 Issuing autochanger "loaded? drive 0" command.
29-Mar 19:04 BxSdGate JobId 19445: 3302 Autochanger "loaded? drive 0", result is Slot 4.
29-Mar 19:04 BxSdGate JobId 19445: 3301 Issuing autochanger "loaded? drive 0" command.
29-Mar 19:04 BxSdGate JobId 19445: 3302 Autochanger "loaded? drive 0", result is Slot 4.
29-Mar 19:04 BxSdGate JobId 19445: Volume "DISK-004" previously written, moving to end of data.
29-Mar 19:04 BxSdGate JobId 19445: Ready to append to end of Volume "DISK-004" size=89145203
29-Mar 19:04 BxSdGate JobId 19445: Job write elapsed time = 00:00:07, Transfer rate = 162 bytes/second
29-Mar 19:04 BxDir JobId 19445: Bacula BxDir 2.2.9-b3 (27Mar08): 29-Mar-2008 19:04:08
  Build OS:               i386-portbld-freebsd6.3 freebsd 6.3-RELEASE-p1
  JobId:                  19445
  Job:                    HomeDirsEdge.2008-03-29_19.04.30
  Backup Level:           Incremental, since=2008-03-29 18:54:01
  Client:                 "Edge" 2.2.8 (26Jan08) i386-portbld-freebsd6.3,freebsd,6.3-RELEASE-p1
  FileSet:                "HomeDirs" 2008-02-07 23:28:19
  Pool:                   "DiskOnly" (From Run pool override)
  Storage:                "Files" (From Pool resource)
  Scheduled time:         29-Mar-2008 19:04:00
  Start time:             29-Mar-2008 19:04:01
  End time:               29-Mar-2008 19:04:08
  Elapsed time:           7 secs
  Priority:               10
  FD Files Written:       2
  SD Files Written:       2
  FD Bytes Written:       956 (956 B)
  SD Bytes Written:       1,134 (1.134 KB)
  Rate:                   0.1 KB/s
  Software Compression:   65.8 %
  VSS:                    no
  Storage Encryption:     no
  Volume name(s):         DISK-004
  Volume Session Id:      139
  Volume Session Time:    1206772246
  Last Volume Bytes:      89,358,477 (89.35 MB)
  Non-fatal FD errors:    0
  SD Errors:              0
  FD termination status:  OK
  SD termination status:  OK
  Termination:            Backup OK


3. This is the regular daily incremental backup running at 19:12. 
   It currently writes to DISK-009 (aka slot 9), and we can see
   that it moves the DISK-004 out of slot 0 and mounts DISK-009
   there (and later switches to DISK-026):

29-Mar 19:14 BxDir JobId 19448: Start Backup JobId 19448, Job=BackupEdge.2008-03-29_19.12.33
29-Mar 19:14 BxDir JobId 19448: Using Device "Files0"
BxFdEdge JobId 19448:  /usr is a different filesystem. Will not descend from / into /usr
[etc.etc.]
29-Mar 19:21 BxSdGate JobId 19448: User defined maximum volume capacity 266,291,200 exceeded on device "Files0" (/var/bxpool/files0).
29-Mar 19:21 BxSdGate JobId 19448: End of medium on Volume "DISK-009" Bytes=266,270,170 Blocks=4,130 at 29-Mar-2008 19:21.
29-Mar 19:21 BxDir JobId 19448: There are no more Jobs associated with Volume "DISK-026". Marking it purged.
29-Mar 19:21 BxDir JobId 19448: All records pruned from Volume "DISK-026"; marking it "Purged"
29-Mar 19:21 BxDir JobId 19448: Recycled volume "DISK-026" 
29-Mar 19:21 BxSdGate JobId 19448: 3301 Issuing autochanger "loaded? drive 0" command.
29-Mar 19:21 BxSdGate JobId 19448: 3302 Autochanger "loaded? drive 0", result is Slot 9.
29-Mar 19:21 BxSdGate JobId 19448: 3307 Issuing autochanger "unload slot 9, drive 0" command.
29-Mar 19:21 BxSdGate JobId 19448: 3304 Issuing autochanger "load slot 26, drive 0" command.
29-Mar 19:21 BxSdGate JobId 19448: 3305 Autochanger "load slot 26, drive 0", status is OK.
29-Mar 19:21 BxSdGate JobId 19448: 3301 Issuing autochanger "loaded? drive 0" command.
29-Mar 19:21 BxSdGate JobId 19448: 3302 Autochanger "loaded? drive 0", result is Slot 26.
29-Mar 19:21 BxSdGate JobId 19448: Recycled volume "DISK-026" on device "Files0" (/var/bxpool/files0), all previous data lost.
29-Mar 19:21 BxSdGate JobId 19448: New volume "DISK-026" mounted on device "Files0" (/var/bxpool/files0) at 29-Mar-2008 19:21.
BxFdEdge JobId 19448:  /j/serv/home is a different filesystem. Will not descend from /j/serv into /j/serv/home
[etc.etc.]
29-Mar 19:23 BxSdGate JobId 19448: Job write elapsed time = 00:08:04, Transfer rate = 79.71 K bytes/second
29-Mar 19:23 BxDir JobId 19448: Bacula BxDir 2.2.9-b3 (27Mar08): 29-Mar-2008 19:23:30 Build OS:  i386-portbld-freebsd6.3 freebsd 6.3-RELEASE-p1
  JobId:                  19448
  Job:                    BackupEdge.2008-03-29_19.12.33
  Backup Level:           Incremental, since=2008-03-28 19:13:20
  Client:                 "Edge" 2.2.8 (26Jan08) i386-portbld-freebsd6.3,freebsd,6.3-RELEASE-p1
  FileSet:                "SysBack" 2008-02-10 06:46:30
  Pool:                   "SysBackDisk" (From Job resource)
  Storage:                "Files" (From Pool resource)
  Scheduled time:         29-Mar-2008 19:12:00
  Start time:             29-Mar-2008 19:14:56
  End time:               29-Mar-2008 19:23:30
  Elapsed time:           8 mins 34 secs
  Priority:               10
  FD Files Written:       2,602
  SD Files Written:       2,602
  FD Bytes Written:       38,215,529 (38.21 MB)
  SD Bytes Written:       38,583,617 (38.58 MB)
  Rate:                   74.3 KB/s
  Software Compression:   61.2 %
  VSS:                    no
  Storage Encryption:     no
  Volume name(s):         DISK-009|DISK-026
  Volume Session Id:      144
  Volume Session Time:    1206772246
  Last Volume Bytes:      14,063,002 (14.06 MB)
  Non-fatal FD errors:    0
  SD Errors:              0
  FD termination status:  OK
  SD termination status:  OK
  Termination:            Backup OK


4. This is the next "undelete" Backup at 19:14, it runs *while* the 
   regular daily is still running. We can see that now 
   -as drive 0 is occupied-
   it uses drive 1 and mounts it's DISK-004 there. That's fine so
   far:

29-Mar 19:14 BxDir JobId 19449: Start Backup JobId 19449, Job=HomeDirsEdge.2008-03-29_19.14.34
29-Mar 19:14 BxDir JobId 19449: Using Device "Files1"
29-Mar 19:14 BxSdGate JobId 19449: 3301 Issuing autochanger "loaded? drive 1" command.
29-Mar 19:14 BxSdGate JobId 19449: 3302 Autochanger "loaded? drive 1", result: nothing loaded.
29-Mar 19:14 BxSdGate JobId 19449: 3304 Issuing autochanger "load slot 4, drive 1" command.
29-Mar 19:14 BxSdGate JobId 19449: 3305 Autochanger "load slot 4, drive 1", status is OK.
29-Mar 19:14 BxSdGate JobId 19449: 3301 Issuing autochanger "loaded? drive 1" command.
29-Mar 19:14 BxSdGate JobId 19449: 3302 Autochanger "loaded? drive 1", result is Slot 4.
29-Mar 19:14 BxSdGate JobId 19449: Volume "DISK-004" previously written, moving to end of data.
29-Mar 19:14 BxSdGate JobId 19449: Ready to append to end of Volume "DISK-004" size=89358477
29-Mar 19:14 BxSdGate JobId 19449: Job write elapsed time = 00:00:30, Transfer rate = 3.094 K bytes/second
29-Mar 19:14 BxDir JobId 19449: Bacula BxDir 2.2.9-b3 (27Mar08): 29-Mar-2008 19:14:41
  Build OS:  i386-portbld-freebsd6.3 freebsd 6.3-RELEASE-p1
  JobId:                  19449
  Job:                    HomeDirsEdge.2008-03-29_19.14.34
  Backup Level:           Incremental, since=2008-03-29 19:04:01
  Client:                 "Edge" 2.2.8 (26Jan08) i386-portbld-freebsd6.3,freebsd,6.3-RELEASE-p1
  FileSet:                "HomeDirs" 2008-02-07 23:28:19
  Pool:                   "DiskOnly" (From Run pool override)
  Storage:                "Files" (From Pool resource)
  Scheduled time:         29-Mar-2008 19:14:00
  Start time:             29-Mar-2008 19:14:02
  End time:               29-Mar-2008 19:14:41
  Elapsed time:           39 secs
  Priority:               10
  FD Files Written:       4
  SD Files Written:       4
  FD Bytes Written:       92,443 (92.44 KB)
  SD Bytes Written:       92,823 (92.82 KB)
  Rate:                   2.4 KB/s
  Software Compression:   74.8 %
  VSS:                    no
  Storage Encryption:     no
  Volume name(s):         DISK-004
  Volume Session Id:      142
  Volume Session Time:    1206772246
  Last Volume Bytes:      89,462,810 (89.46 MB)
  Non-fatal FD errors:    0
  SD Errors:              0
  FD termination status:  OK
  SD termination status:  OK
  Termination:            Backup OK


5. This is the next "undelete" Backup at 19:24, it runs *after* the
   regular daily has terminated. Now the situation is: 
     in drive 0 there is still DISK-026 from the regular daily
     in drive 1 is DISK-004 where this Backup should go to.
   Now what happens is this:

29-Mar 19:24 BxDir JobId 19451: Start Backup JobId 19451, Job=HomeDirsEdge.2008-03-29_19.24.36
29-Mar 19:24 BxDir JobId 19451: Using Device "Files0"
29-Mar 19:24 BxSdGate JobId 19451: 3301 Issuing autochanger "loaded? drive 0" command.
29-Mar 19:24 BxSdGate JobId 19451: 3302 Autochanger "loaded? drive 0", result is Slot 26.
29-Mar 19:24 BxSdGate JobId 19451: 3307 Issuing autochanger "unload slot 26, drive 0" command.
29-Mar 19:24 BxSdGate JobId 19451: 3304 Issuing autochanger "load slot 4, drive 0" command.
29-Mar 19:24 BxSdGate JobId 19451: Fatal error: 3992 Bad autochanger "load slot 4, drive 0": ERR=Child exited with code 1.
Results=
29-Mar 19:24 BxFdEdge JobId 19451: Fatal error: job.c:1811 Bad response to Append Data command. Wanted 3000 OK data
, got 3903 Error append data

29-Mar 19:24 BxDir JobId 19451: Error: Bacula BxDir 2.2.9-b3 (27Mar08): 29-Mar-2008 19:24:08
  Build OS:               i386-portbld-freebsd6.3 freebsd 6.3-RELEASE-p1
  JobId:                  19451
  Job:                    HomeDirsEdge.2008-03-29_19.24.36
  Backup Level:           Incremental, since=2008-03-29 19:14:02
  Client:                 "Edge" 2.2.8 (26Jan08) i386-portbld-freebsd6.3,freebsd,6.3-RELEASE-p1
  FileSet:                "HomeDirs" 2008-02-07 23:28:19
  Pool:                   "DiskOnly" (From Run pool override)
  Storage:                "Files" (From Pool resource)
  Scheduled time:         29-Mar-2008 19:24:00
  Start time:             29-Mar-2008 19:24:02
  End time:               29-Mar-2008 19:24:08
  Elapsed time:           6 secs
  Priority:               10
  FD Files Written:       0
  SD Files Written:       0
  FD Bytes Written:       0 (0 B)
  SD Bytes Written:       0 (0 B)
  Rate:                   0.0 KB/s
  Software Compression:   None
  VSS:                    no
  Storage Encryption:     no
  Volume name(s):         
  Volume Session Id:      145
  Volume Session Time:    1206772246
  Last Volume Bytes:      89,462,810 (89.46 MB)
  Non-fatal FD errors:    0
  SD Errors:              0
  FD termination status:  Error
  SD termination status:  Error
  Termination:            *** Backup Error ***


We see: it unloads the DISK-026 from drive 0, and then wants to load DISK-004
_from_it's_slot_ - while DISK004 actually is in drive 1 where it was
written the last time! 
This is reproducible, i.e. it seems to happen quite always under
such circumstances.

This problem will then persist until _either_
 - you do a manual umount on drive 1 _or_
 - some other backup job comes along and does that umount.

This did not happen with Release 2.2.8 - there the SD would remember
it's mounts (and then either move the Volume correctly from drive 1
to drive 0, or use drive 1 as write-target right away). 

At least it did work for *writing* jobs. *Reading* jobs had exactly 
this problem all the time - this was one of the issues on my list which 
I were not yet talking about (due to my lack in understanding that there
is a lot of difference in handling reading and writing jobs).


rgds,
PMc

P.S. If you use the "disk-changer" script as it is in the
     distribution, you will most likely *not* see this problem. 
     That is because the "disk-changer" can mount the same volume
     into two (virtual) drives at the same time.

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
_______________________________________________
Bacula-devel mailing list
Bacula-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/bacula-devel


This mailing list archive is a service of Copilotco.