[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bacula-devel] SIGSEV with btape


On Friday 28 November 2008 22:44:55 Willians Vivanco wrote:
> After some failed attemps to report a bug using the BTS

I suggest you ask for help on the bacula-users email list, since to the best 
of my knowledge no one has problems submitting bug reports.  You do have to 
be careful about using foreign character sets that are not UTF-8 as the 
database seems to always want UTF-8 data.

> (http://bugs.bacula.org/) I decided to write to your the mail. I will
> follow the indications explained in http://www.bacula.org/en/?page=bugs.

Yes, thanks for taking the pains to document your problem carefully.
However, what you are reporting here seems to me to be a case of hardware 
problems and not a bug, and as such you should ask on the bacula-users email 
list for additional help, but mostly you need to take a careful look at your 
hardware.

See more specific responses below ...

>
> ----
>
> *Operating System*: debian/GNU Linux [Lenny (kernel 2.6.26-1-686)]
>
> *Version of Bacula*: 2.4.2 (26 July 2008)
>
> *Tape drive*: HP StorageWorks MSL6000 series
>
> *Description of the problem*:
> I have a HP StorageWorks MSL6000 series tape library. My bacula-sd (and
> the rest of the daemons) are running in a blade server connected to the
> tape library trough a FibreChannel SAN. My problem is that the btape
> "fill" test fails. Find below the output of all commands I've tried and
> some other useful information.

Well, the first thing you should check is to ensure that the HBA is not 
caching data from the tape drive.  Most HBAs do so to improve throughput, but 
it will also cause Bacula to write too much data to the tape.  Basically 
Bacula needs to be run in synchronous mode with the tape drive rather than 
asynchronous.

>
> *Results of "_test_" btape test*:
>
> *test
>
> <output command="test">
> === Write, rewind, and re-read test ===
>
> I'm going to write 1000 records and an EOF
> then write 1000 records and an EOF, then rewind,
> and re-read the data to verify that it is correct.
>
> This is an *essential* feature ...
>
> btape: btape.c:831 Wrote 1000 blocks of 64412 bytes.
> btape: btape.c:505 Wrote 1 EOF to "Drive-0" (/dev/nst0)
> btape: btape.c:847 Wrote 1000 blocks of 64412 bytes.
> btape: btape.c:505 Wrote 1 EOF to "Drive-0" (/dev/nst0)
> btape: btape.c:856 Rewind OK.
> 1000 blocks re-read correctly.
> Got EOF on tape.
> 1000 blocks re-read correctly.
> === Test Succeeded. End Write, rewind, and re-read test ===
>
>
> === Write, rewind, and position test ===
>
> I'm going to write 1000 records and an EOF
> then write 1000 records and an EOF, then rewind,
> and position to a few blocks and verify that it is correct.
>
> This is an *essential* feature ...
>
> btape: btape.c:943 Wrote 1000 blocks of 64412 bytes.
> btape: btape.c:505 Wrote 1 EOF to "Drive-0" (/dev/nst0)
> btape: btape.c:959 Wrote 1000 blocks of 64412 bytes.
> btape: btape.c:505 Wrote 1 EOF to "Drive-0" (/dev/nst0)
> btape: btape.c:968 Rewind OK.
> Reposition to file:block 0:4
> Block 5 re-read correctly.
> Reposition to file:block 0:200
> Block 201 re-read correctly.
> Reposition to file:block 0:999
> Block 1000 re-read correctly.
> Reposition to file:block 1:0
> Block 1001 re-read correctly.
> Reposition to file:block 1:600
> Block 1601 re-read correctly.
> Reposition to file:block 1:999
> Block 2000 re-read correctly.
> === Test Succeeded. End Write, rewind, and re-read test ===
>
>
>
> === Append files test ===
>
> This test is essential to Bacula.
>
> I'm going to write one record  in file 0,
>                    two records in file 1,
>              and three records in file 2
>
> btape: btape.c:475 Rewound "Drive-0" (/dev/nst0)
> btape: btape.c:1577 Wrote one record of 64412 bytes.
> btape: btape.c:1579 Wrote block to device.
> btape: btape.c:505 Wrote 1 EOF to "Drive-0" (/dev/nst0)
> btape: btape.c:1577 Wrote one record of 64412 bytes.
> btape: btape.c:1579 Wrote block to device.
> btape: btape.c:1577 Wrote one record of 64412 bytes.
> btape: btape.c:1579 Wrote block to device.
> btape: btape.c:505 Wrote 1 EOF to "Drive-0" (/dev/nst0)
> btape: btape.c:1577 Wrote one record of 64412 bytes.
> btape: btape.c:1579 Wrote block to device.
> btape: btape.c:1577 Wrote one record of 64412 bytes.
> btape: btape.c:1579 Wrote block to device.
> btape: btape.c:1577 Wrote one record of 64412 bytes.
> btape: btape.c:1579 Wrote block to device.
> btape: btape.c:505 Wrote 1 EOF to "Drive-0" (/dev/nst0)
> btape: btape.c:372 open device "Drive-0" (/dev/nst0): OK
> btape: btape.c:475 Rewound "Drive-0" (/dev/nst0)
> btape: btape.c:1097 Now moving to end of medium.
> btape: btape.c:526 Moved to end of medium.
> We should be in file 3. I am at file 3. This is correct!
>
> Now the important part, I am going to attempt to append to the tape.
>
> btape: btape.c:1577 Wrote one record of 64412 bytes.
> btape: btape.c:1579 Wrote block to device.
> btape: btape.c:505 Wrote 1 EOF to "Drive-0" (/dev/nst0)
> btape: btape.c:475 Rewound "Drive-0" (/dev/nst0)
> Done appending, there should be no I/O errors
>
> Doing Bacula scan of blocks:
> 1 block of 64448 bytes in file 1
> End of File mark.
> 2 blocks of 64448 bytes in file 2
> End of File mark.
> 3 blocks of 64448 bytes in file 3
> End of File mark.
> 1 block of 64448 bytes in file 4
> End of File mark.
> Total files=4, blocks=7, bytes = 451,136
> End scanning the tape.
> We should be in file 4. I am at file 4. This is correct!
>
> The above Bacula scan should have output identical to what follows.
> Please double check it ...
> === Sample correct output ===
> 1 block of 64448 bytes in file 1
> End of File mark.
> 2 blocks of 64448 bytes in file 2
> End of File mark.
> 3 blocks of 64448 bytes in file 3
> End of File mark.
> 1 block of 64448 bytes in file 4
> End of File mark.
> Total files=4, blocks=7, bytes = 451,136
> === End sample correct output ===
>
> If the above scan output is not identical to the
> sample output, you MUST correct the problem
> or Bacula will not be able to write multiple Jobs to
> the tape.
>
>
> === Write, backup, and re-read test ===
>
> I'm going to write three records and an EOF
> then backup over the EOF and re-read the last record.
> Bacula does this after writing the last block on the
> tape to verify that the block was written correctly.
>
> This is not an *essential* feature ...
>
> btape: btape.c:475 Rewound "Drive-0" (/dev/nst0)
> btape: btape.c:710 Wrote first record of 64412 bytes.
> btape: btape.c:721 Wrote second record of 64412 bytes.
> btape: btape.c:732 Wrote third record of 64412 bytes.
> btape: btape.c:505 Wrote 1 EOF to "Drive-0" (/dev/nst0)
> btape: btape.c:748 Backspaced over EOF OK.
> btape: btape.c:753 Backspace record OK.
> btape: btape.c:771
> Block re-read correct. Test succeeded!
> === End Write, backup, and re-read test ===
>
>
>
> === Forward space files test ===
>
> This test is essential to Bacula.
>
> I'm going to write five files then test forward spacing
>
> btape: btape.c:475 Rewound "Drive-0" (/dev/nst0)
> btape: btape.c:1577 Wrote one record of 64412 bytes.
> btape: btape.c:1579 Wrote block to device.
> btape: btape.c:505 Wrote 1 EOF to "Drive-0" (/dev/nst0)
> btape: btape.c:1577 Wrote one record of 64412 bytes.
> btape: btape.c:1579 Wrote block to device.
> btape: btape.c:1577 Wrote one record of 64412 bytes.
> btape: btape.c:1579 Wrote block to device.
> btape: btape.c:505 Wrote 1 EOF to "Drive-0" (/dev/nst0)
> btape: btape.c:1577 Wrote one record of 64412 bytes.
> btape: btape.c:1579 Wrote block to device.
> btape: btape.c:1577 Wrote one record of 64412 bytes.
> btape: btape.c:1579 Wrote block to device.
> btape: btape.c:1577 Wrote one record of 64412 bytes.
> btape: btape.c:1579 Wrote block to device.
> btape: btape.c:505 Wrote 1 EOF to "Drive-0" (/dev/nst0)
> btape: btape.c:1577 Wrote one record of 64412 bytes.
> btape: btape.c:1579 Wrote block to device.
> btape: btape.c:1577 Wrote one record of 64412 bytes.
> btape: btape.c:1579 Wrote block to device.
> btape: btape.c:505 Wrote 1 EOF to "Drive-0" (/dev/nst0)
> btape: btape.c:1577 Wrote one record of 64412 bytes.
> btape: btape.c:1579 Wrote block to device.
> btape: btape.c:505 Wrote 1 EOF to "Drive-0" (/dev/nst0)
> btape: btape.c:475 Rewound "Drive-0" (/dev/nst0)
> btape: btape.c:1311 Now forward spacing 1 file.
> We should be in file 1. I am at file 1. This is correct!
> btape: btape.c:1323 Now forward spacing 2 files.
> We should be in file 3. I am at file 3. This is correct!
> btape: btape.c:475 Rewound "Drive-0" (/dev/nst0)
> btape: btape.c:1336 Now forward spacing 4 files.
> We should be in file 4. I am at file 4. This is correct!
>
> btape: btape.c:1354 Now forward spacing 1 more file.
> We should be in file 5. I am at file 5. This is correct!
>
> === End Forward space files test ===
>
>
> Ah, I see you have an autochanger configured.
> To test the autochanger you must have a blank tape
>  that I can write on in Slot 1.
>
> Do you wish to continue with the Autochanger test? (y/n): y
>
>
> === Autochanger test ===
>
> 3301 Issuing autochanger "loaded" command.
> Slot 1 loaded. I am going to unload it.
> 3302 Issuing autochanger "unload 1 0" command.
> unload status=OK 0
> 3303 Issuing autochanger "load 1 0" command.
> 3303 Autochanger "load 1 0" status is OK.
> btape: btape.c:372 open device "Drive-0" (/dev/nst0): OK
> btape: btape.c:1241 Rewound "Drive-0" (/dev/nst0)
> btape: btape.c:1248 Wrote EOF to "Drive-0" (/dev/nst0)
>
> The test autochanger worked!!
> </output>
>
>
> I got this in dmesg output after running the btape test command:
>
> [ 1471.898470] st0: Sense Key : Aborted Command [current]
> [ 1471.898470] st0: Add. Sense: Information unit iuCRC error detected
> [ 1476.199199] st0: Sense Key : Aborted Command [current]
> [ 1476.199364] st0: Add. Sense: Information unit iuCRC error detected
> [ 1479.457267] st0: Sense Key : Aborted Command [current]
> [ 1479.457431] st0: Add. Sense: Information unit iuCRC error detected
> [ 1482.704076] st0: Sense Key : Aborted Command [current]
> [ 1482.704240] st0: Add. Sense: Information unit iuCRC error detected*

I am not a hardware expert, but this clearly indicates that either you have a 
hardware error (bad hardware, inappropriate cables, ...) or you have a bad 
kernel tape driver.  From the message, I imagine that somewhere along the 
path between the computer and the tape, the hardware has detected a data 
transmission error (lost or corrupted bit(s)) because of the CRC error 
detection.

>
>
> *Results of "_qfill_" btape test* (with 1000 blocks of bytes):
>
> *qfill
> btape: btape.c:2431 Test writing blocks of 64512 bytes to tape.
> How many blocks do you want to write? (1000):
> btape: btape.c:475 Rewound "Drive-0" (/dev/nst0)
> btape: btape.c:2448 Begin writing 1000 Bacula blocks to tape ...
> ++++++++++
> btape: btape.c:505 Wrote 1 EOF to "Drive-0" (/dev/nst0)
> btape: btape.c:475 Rewound "Drive-0" (/dev/nst0)
> 1000 blocks of 64448 bytes in file 1
> End of File mark.
> Total files=1, blocks=1000, bytes = 64,448,000
>
>
>
> *Results of "_qfill_" btape test* (with 1001 blocks of bytes, and here
> comes the error, when i try to write more tan 64Mb to the tape):
>
> *qfill
> btape: btape.c:2431 Test writing blocks of 64512 bytes to tape.
> How many blocks do you want to write? (1000): 1001
> btape: btape.c:475 Rewound "Drive-0" (/dev/nst0)
> btape: btape.c:2448 Begin writing 1001 Bacula blocks to tape ...
> +28-Nov 15:33 btape: ERROR in record.c:253 Failed ASSERT: block->binbuf
> == (uint32_t) (block->bufp - block->buf)

The above error message indicates a seriously inconsistent state in the 
storage daemon when attempting to write a block to the tape.  Either it is a 
bug (very unlikely given the amount of time that code has worked without 
failing), or you are getting a memory or CPU error.

I suggest running memory tests on your computer.  If you don't have a UPS you 
might also think about getting a good quality one for your server.  It will 
protect your computer from power fluxuations that often cause hardware 
problems.

Regards,

Kern

> 28-Nov 15:33 btape: Fatal Error because: Bacula interrupted by signal
> 11: Segmentation violation
> Kaboom! btape, btape got signal 11 - Segmentation violation. Attempting
> traceback.
> Kaboom! exepath=/root
> Calling: /root/btraceback /root/btape 7106
> execv: /root/btraceback failed: ERR=No such file or directory
> Traceback complete, attempting cleanup ...
> #
>
>
>
>
> *Results of "fill" btape test* (It returns the same error of above)
>
> *fill
>
> This command simulates Bacula writing to a tape.
> It requires either one or two blank tapes, which it
> will label and write.
>
> If you have an autochanger configured, it will use
> the tapes that are in slots 1 and 2, otherwise, you will
> be prompted to insert the tapes when necessary.
>
> It will print a status approximately
> every 322 MB, and write an EOF every 3.2 GB.  If you have
> selected the simple test option, after writing the first tape
> it will rewind it and re-read the last block written.
>
> If you have selected the multiple tape test, when the first tape
> fills, it will ask for a second, and after writing a few more
> blocks, it will stop.  Then it will begin re-reading the
> two tapes.
>
> This may take a long time -- hours! ...
>
> Do you want to run the simplified test (s) with one tape
> or the complete multiple tape (m) test: (s/m) s
> Simple test (single tape) selected.
> Wrote Volume label for volume "TestVolume1".
> Wrote Start of Session label.
> 15:29:29 Begin writing Bacula records to tape ...
> 28-Nov 15:29 btape JobId 0: Error: block.c:568 Write error at 1:2 on
> device "Drive-0" (/dev/nst0). ERR=Input/output error.
> 28-Nov 15:29 btape JobId 0: Error: block.c:318 Volume data error at
> 1:4294967295!
> Block checksum mismatch in block=4018 len=64512: calc=a0cac187
> blk=a16bca86
> 28-Nov 15:29 btape JobId 0: Error: Re-read last block at EOT failed.
> ERR=block.c:318 Volume data error at 1:4294967295!
> Block checksum mismatch in block=4018 len=64512: calc=a0cac187
> blk=a16bca86
> btape: btape.c:2349 Last block at: 1:1 this_dev_block_num=2
> btape: btape.c:2383 End of tape 1:-1. VolumeCapacity=129,024. Write
> rate
> = 6.5 KB/s
> 28-Nov 15:29 btape: ERROR in record.c:332 Failed ASSERT: remlen >=
> WRITE_RECHDR_LENGTH
> 28-Nov 15:29 btape: Fatal Error because: Bacula interrupted by signal
> 11: Segmentation violation
> Kaboom! btape, btape got signal 11 - Segmentation violation. Attempting
> traceback.
> Kaboom! exepath=/root
> Calling: /root/btraceback /root/btape 6197
> execv: /root/btraceback failed: ERR=No such file or directory
> Traceback complete, attempting cleanup ...
> #
>
> --
> Some days ago I had reported this issue on IRC. I maded a pastebin with
> the output of dmesg after running btape test. On that moment I was
> getting a kernel oops, I was using 2.6.18.
>
> I recently upgraded to 2.6.26. This time I got no oops but bacula keeps
> failing the same it did with 2.6.18.
>
> I'm curious about the cause of the failed assert.
>
> Regards,
> Willians Vivanco
>
>
>
>
>
> ---------------------------------------
>     Red Telematica de Salud - Cuba
>     	  CNICM - Infomed



-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-devel mailing list
Bacula-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/bacula-devel


This mailing list archive is a service of Copilot Consulting.