[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bacula-devel] Backup Corruption with Certain Types of Drive


Hello,
 
This is a support issue and is either a:

1. Configuration error (most likely)
or
2. An OS (driver) bug (less likely)

It is not a Bacula bug.  From the output, I can see that the OS returned a 
status indicating that no bytes were written to the tape, but in fact, it had 
written the last record anyway.

I am on vacation and it is a bit detailed to explain, so I will leave it to 
you to:

- Read the tape testing chapter of the manual, and in particular
  - Run the btape fill test
  - Ensure that you are using the correct device name so that the device
    behaves as Bacula expects.
 - Ensure you have any HBA caching turned off
 - Ensure the driver is running in synchronous mode.

Best regards,

Kern


On Sunday 07 December 2008 14:20:58 Allan Black wrote:
> Hi, all,
>
> After noticing a curious warning message from a catalog backup, I have
> been doing a little investigation and I now think that, with certain
> types of drive, a backup which spans more than one tape can corrupt the
> data being written at the point where the tape change takes place.
>
> When a tape reaches the end of medium, there are two ways in which the
> drive may act:
>
> 1) The drive writes until it hits the end of the tape, i.e. the write
>   fails because the drive cannot pull any more tape past the heads.
>   In this case the write does not take place, and an EOM sense code
>   is returned.
>
> 2) The drive detects *impending* end of medium. In this case, the
>   write takes place, but an EOM is still returned.
>
> In case 1, everything is fine, and the backup proceeds as expected (DLT
> and LTO both appear to behave in this way). In case 2, however, the same
> data appears to be written twice: as the last block of the first tape,
> and then again as the first block of the second.
>
> Here is what happened to me:
>
> My catalog backup produced this output:
>
> 23-Nov 22:03 gershwin-dir JobId 506: Using Device "DDS3-0"
> 23-Nov 22:06 gershwin-sd JobId 506: End of Volume "MainCatalog-004" at
> 57:4963 on device "DDS3-0" (/dev/rmt/1cbn). Write of 64512 bytes got 0.
> 23-Nov 22:06 gershwin-sd JobId 506: Error: Re-read of last block OK, but
> block numbers differ. Last block=4963 Current block=4963. 23-Nov 22:06
> gershwin-sd JobId 506: End of medium on Volume "MainCatalog-004"
> Bytes=28,148,843,520 Blocks=436,334 at 23-Nov-2008 22:06. 23-Nov 22:06
> gershwin-sd JobId 506: 3307 Issuing autochanger "unload slot 4, drive 0"
> command.
>
> And, after my initial confusion at being told that the block numbers
> differed because the last block was 4963 instead of 4963 ;-) I decided to
> investigate, and I found from the source code that the SD was actually
> expecting to find block 4962 at the end of the tape, hence the error.
>
> I ran btape on the volume and got this:
>
> [....]
> Block: 4959 size=64512
> Block: 4960 size=64512
> Block: 4961 size=64512
> Block: 4962 size=64512
> Block: 4963 size=64512
> 26-Nov 13:44 bls JobId 0: End of file 58 on device "dds3" (/dev/rmt/1cbn),
> Volume "MainCatalog-004"
>
> So it looks as if block 4963 has indeed been successfully written to the
> tape. The next step was to restore the job which spanned the two tapes:
>
> 29-Nov 03:14 gershwin-sd JobId 510: End of all volumes.
> 29-Nov 03:14 gershwin-fd JobId 510: Error: attribs.c:421 File size of
> restored file /tmp/bacula-restores/var/bacula/bacula.sql not correct.
> Original 538130868, restored 538195332.
>
> The difference between the two sizes is 64464. I am guessing this would be
> a 64512 byte block, minus a 48 byte header. Does anyone know is this is
> correct?
>
> It looks as if one block of data has indeed been written twice, resulting
> in "bad" data being backed up. If there is a general agreement that I am
> correct here, then I think I will have a go at fixing it. It is an easy
> condition to detect, since the last block number read back from the tape
> tells us whether the current block has been written or not.
>
> Allan
>
> PS I ran a btape test on a pair of blank tapes, and it passed the test,
> although it actually generated the same error as I got when the backup
> changed tapes:
>
> [....]
> Wrote blk_block=150000, dev_blk_num=6500 VolBytes=9,676,735,488 rate=841.6
> KB/s Wrote blk_block=155000, dev_blk_num=11500 VolBytes=9,999,295,488
> rate=842.6 KB/s 04-Dec 20:56 btape JobId 0: End of Volume "TestVolume1" at
> 14:13639 on device "DDS3-0" (/dev/rmt/1cbn). Write of 64512 bytes got 0.
> 04-Dec 20:56 btape JobId 0: Error: Re-read of last block OK, but block
> numbers differ. Last block=157139 Current block=157139. btape: btape.c:2349
> Last block at: 14:13638 this_dev_block_num=13639 btape: btape.c:2383 End of
> tape 14:0. VolumeCapacity=10,137,286,656. Write rate = 842.5 KB/s [....]
> 20:59:34 Done filling tapes at 0:13. Now beginning re-read of first tape
> ... 04-Dec 20:59 btape JobId 0: 3307 Issuing autochanger "unload slot 2,
> drive 0" command. 04-Dec 21:00 btape JobId 0: 3304 Issuing autochanger
> "load slot 1, drive 0" command. 04-Dec 21:00 btape JobId 0: 3305
> Autochanger "load slot 1, drive 0", status is OK. 04-Dec 21:01 btape JobId
> 0: Ready to read from volume "TestVolume1" on device "DDS3-0"
> (/dev/rmt/1cbn). Rewinding.
> Reading the first 10000 records from 0:0.
> 10000 records read now at 1:5084
> Reposition from 1:5084 to 14:13638
> Reading block 13638.
>
> The last block of the first tape matches.
> [....]
> Reposition from 0:0 to 0:1
> Reading block 1.
>
> The first block on the second tape matches.
>
> Reposition from 0:2 to 0:11
> Reading block 11.
>
> The last block on the second tape matches. Test succeeded.
>
> PPS I was lucky, since this was a catalog backup. I simply ran the backup
> again manually. I'd still like to fix it, though! :-)
>
> ARB
>
> ---------------------------------------------------------------------------
>--- SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas,
> Nevada. The future of the web can't happen without you.  Join us at MIX09
> to help pave the way to the Next Web now. Learn more and register at
> http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com
>/ _______________________________________________
> Bacula-devel mailing list
> Bacula-devel@xxxxxxxxxxxxxxxxxxxxx
> https://lists.sourceforge.net/lists/listinfo/bacula-devel



------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
Bacula-devel mailing list
Bacula-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/bacula-devel


This mailing list archive is a service of Copilotco.