[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bacula-devel] Trying out the VirtualFull feature.


Hello,

On Monday 18 August 2008 17:48:30 Graham Keeling wrote:
> On Sat, Aug 16, 2008 at 08:30:04AM +0200, Kern Sibbald wrote:
> > I stumbled into a problem with NULL volume names:
> >
> > localhost-sd: askdir.c:339 NULL Volume name. This shouldn't happen!!!
> >
> > which several users have reported, but I could never reproduce.  Now I
> > have 100% reproduction of the problem (perhaps not exactly the same
> > scenario) so I am working on fixing that -- and hopefully knocking off a
> > couple of unexplained bugs ...
>
> I was getting that problem too, when I was allowing multiple backups to go
> into each volume.

Hmm. The bug I found has nothing to do with multiple backups but is caused by 
switching drives while reading, which happens because the Storage definition 
is not totally correct, possibly because I was using two different Media 
Types.  

So if you can reproduce the problem without reading (i.e. no VirtualFull and 
no Migration), I would like to know more about this problem.  See my note 
below about the seg fault ...

>
> > For those who want a real test case of Virtual Backup that works or if
> > you are doing heavy production use of Bacula, I recommend implementing
> > the Bacula regression testing.
>
> OK, I've set it up with sqlite, using the bacula development trunk, and run
> it long enough to see the first three or so disk tests work (accurate-test,
> ansi-label-test, backup-bacula-test)...
>
> > The test that shows Virtual Backup working and for the moment also fails
> > on a second Virtual Backup, is virtual-backup-test.  Once you have
> > regress install and your "config" file properly edited, you can run the
> > test by doing:
> >
> > make setup
> > export REGRESS_DEBUG=1
> > tests/virtual-backup-test
> >
> > You might want to redirect the output of the last line above to a file
> > since there is a good deal of output and for the last test that fails, I
> > have debug logging turned on ...  You can find the config
> > (bacula-dir.conf and bacula-sd.conf) files in the <regress>/bin
> > directory.
>
> ...but when I try the virtual-backup-test, I get a segfault in my storage
> daemon. The output is as follows, and after that is the gdb backtrace that
> it emailed to me.
> I will run 'do_all' overnight, and see if anything else comes up.
>
>
>  === Starting virtual-backup-test at 16:34:52 ===
>
> Starting the Bacula Storage daemon
> Starting the Bacula File daemon
> Starting the Bacula Director daemon
> Connecting to Director localhost:8101
> 1000 OK: localhost-dir Version: 2.5.3 (12 Aug 2008)
> Enter a period to cancel a command.
> @output
> messages
> 18-Aug 16:33 localhost-dir JobId 2: No prior Full backup Job record found.
> 18-Aug 16:33 localhost-dir JobId 2: No prior or suitable Full backup found
> in catalog. Doing FULL backup. 18-Aug 16:33 localhost-dir JobId 2: Start
> Backup JobId 2, Job=Vbackup.2008-08-18_16.33.04 @tee
> /home/graham/bacula/regress/tmp/log1.out
> @#setdebug level=100 storage=File
> label storage=File volume=FileVolume001 Pool=Default
> Automatically selected Catalog: MyCatalog
> Using Catalog "MyCatalog"
> Connecting to Storage daemon File at localhost:8103 ...
> Sending label command for Volume "FileVolume001" Slot 0 ...
> 3000 OK label. VolBytes=201 DVD=0 Volume="FileVolume001"
> Device="FileStorage" (/home/graham/bacula/regress/tmp) Catalog record for
> Volume "FileVolume001", Slot 0  successfully created. Requesting to mount
> FileStorage ...
> 3906 File device "FileStorage" (/home/graham/bacula/regress/tmp) is always
> mounted. label storage=DiskChanger volume=ChangerVolume001 slot=1 Pool=Full
> drive=0 Connecting to Storage daemon DiskChanger at localhost:8103 ...
> Sending label command for Volume "ChangerVolume001" Slot 1 ...
> 3301 Issuing autochanger "loaded? drive 0" command.
> 3302 Autochanger "loaded? drive 0", result: nothing loaded.
> 3304 Issuing autochanger "load slot 1, drive 0" command.
> 3305 Autochanger "load slot 1, drive 0", status is OK.
> 3000 OK label. VolBytes=213 DVD=0 Volume="ChangerVolume001"
> Device="Drive-0" (/home/graham/bacula/regress/tmp/disk-changer/drive0)
> Catalog record for Volume "ChangerVolume001", Slot 1  successfully created.
> Requesting to mount DiskChanger ...
> 3002 Device "Drive-0" (/home/graham/bacula/regress/tmp/disk-changer/drive0)
> is mounted. label storage=DiskChanger volume=ChangerVolume002 slot=2
> Pool=Full drive=0 Connecting to Storage daemon DiskChanger at
> localhost:8103 ...
> Sending label command for Volume "ChangerVolume002" Slot 2 ...
> 3307 Issuing autochanger "unload slot 1, drive 0" command.
> 3304 Issuing autochanger "load slot 2, drive 0" command.
> 3305 Autochanger "load slot 2, drive 0", status is OK.
> 3000 OK label. VolBytes=213 DVD=0 Volume="ChangerVolume002"
> Device="Drive-0" (/home/graham/bacula/regress/tmp/disk-changer/drive0)
> Catalog record for Volume "ChangerVolume002", Slot 2  successfully created.
> Requesting to mount DiskChanger ...
> 3002 Device "Drive-0" (/home/graham/bacula/regress/tmp/disk-changer/drive0)
> is mounted. @# run several jobs
> run job=Vbackup level=Full yes
> Job queued. JobId=1
> wait
> Kaboom! bacula-sd, localhost-sd got signal 11 - Segmentation violation.
> Attempting traceback. Kaboom! exepath=/home/graham/bacula/regress/bin/
> Calling: /home/graham/bacula/regress/bin/btraceback
> /home/graham/bacula/regress/bin/bacula-sd 8850 Traceback complete,
> attempting cleanup ...
> Orphaned buffer:  localhost-sd    528 bytes buf=80c3418 allocated at
> bnet.c:677 Orphaned buffer:  localhost-sd    272 bytes buf=80c2228
> allocated at jcr.c:314 Orphaned buffer:  localhost-sd    528 bytes
> buf=80c3d38 allocated at jcr.c:316 Orphaned buffer:  localhost-sd    528
> bytes buf=80d6d88 allocated at bnet.c:676 Orphaned buffer:  localhost-sd  
> 1040 bytes buf=80d8ca0 allocated at record.c:207 Orphaned buffer: 
> localhost-sd    528 bytes buf=80d90d0 allocated at bnet.c:677 Orphaned
> buffer:  localhost-sd   1040 bytes buf=80d9300 allocated at spool.c:261
> Orphaned buffer:  localhost-sd    152 bytes buf=80c6230 allocated at
> bnet.c:670 Orphaned buffer:  localhost-sd      7 bytes buf=80c6058
> allocated at bnet.c:678 Orphaned buffer:  localhost-sd     10 bytes
> buf=80c6030 allocated at bnet.c:679 Orphaned buffer:  localhost-sd      8
> bytes buf=80c6008 allocated at workq.c:167 Orphaned buffer:  localhost-sd  
>   16 bytes buf=80d7110 allocated at jcr.c:308 Orphaned buffer: 
> localhost-sd     24 bytes buf=80c1340 allocated at dircmd.c:190 Orphaned
> buffer:  localhost-sd    146 bytes buf=80d8718 allocated at job.c:129
> Orphaned buffer:  localhost-sd    146 bytes buf=80d87c8 allocated at
> job.c:132 Orphaned buffer:  localhost-sd    146 bytes buf=80d8878 allocated
> at job.c:135 Orphaned buffer:  localhost-sd    146 bytes buf=80d8928
> allocated at job.c:144 Orphaned buffer:  localhost-sd     40 bytes
> buf=80d8ab0 allocated at job.c:158 Orphaned buffer:  localhost-sd     24
> bytes buf=80d8af8 allocated at reserve.c:690 Orphaned buffer:  localhost-sd
>     40 bytes buf=80d8b68 allocated at alist.c:53 Orphaned buffer: 
> localhost-sd     24 bytes buf=80d6b58 allocated at reserve.c:713 Orphaned
> buffer:  localhost-sd     12 bytes buf=80d9ef8 allocated at reserve.c:728
> Orphaned buffer:  localhost-sd     40 bytes buf=80d9f28 allocated at
> alist.c:53 Orphaned buffer:  localhost-sd     80 bytes buf=80da478
> allocated at block.c:118 Orphaned buffer:  localhost-sd  64528 bytes
> buf=80c6c40 allocated at block.c:130 Orphaned buffer:  localhost-sd     84
> bytes buf=80d6870 allocated at record.c:205 Orphaned buffer:  localhost-sd 
>   152 bytes buf=80da598 allocated at bnet.c:670 Orphaned buffer: 
> localhost-sd      7 bytes buf=80da650 allocated at bnet.c:678 Orphaned
> buffer:  localhost-sd     10 bytes buf=80da678 allocated at bnet.c:679
> Orphaned buffer:  localhost-sd  65652 bytes buf=80dfda0 allocated at
> bsock.c:629 Orphaned buffer:  localhost-sd   1040 bytes buf=80da6a0
> allocated at record.c:207 Orphaned buffer:  localhost-sd   2184 bytes
> buf=80daad0 allocated at spool.c:257 Orphaned buffer:  localhost-sd     24
> bytes buf=80d8b30 allocated at spool.c:259 Orphaned buffer:  localhost-sd  
> 1036 bytes buf=80db378 allocated at acquire.c:566 Orphaned buffer: 
> localhost-sd     80 bytes buf=80d6a58 allocated at block.c:118 Orphaned
> buffer:  localhost-sd  64528 bytes buf=80efe38 allocated at block.c:130
> Orphaned buffer:  localhost-sd     84 bytes buf=80d9730 allocated at
> record.c:205 You have messages.
> messages
> 18-Aug 16:34 localhost-dir JobId 1: Start Backup JobId 1,
> Job=Vbackup.2008-08-18_16.34.03 18-Aug 16:34 localhost-dir JobId 1: Using
> Device "FileStorage"
> 18-Aug 16:34 localhost-sd JobId 1: Wrote label to prelabeled Volume
> "FileVolume001" on device "FileStorage" (/home/graham/bacula/regress/tmp)
> 18-Aug 16:34 localhost-sd JobId 1: Spooling data ...
> 18-Aug 16:35 localhost-sd JobId 1: Job write elapsed time = 00:00:03,
> Transfer rate = 40.56 M bytes/second 18-Aug 16:35 localhost-sd JobId 1:
> Committing spooled data to Volume "FileVolume001". Despooling 122,109,560
> bytes ... 18-Aug 16:35 localhost-sd: Fatal Error because: Bacula
> interrupted by signal 11: Segmentation violation 18-Aug 16:35 localhost-dir
> JobId 1: Error: Bacula localhost-dir 2.5.3 (12Aug08): 18-Aug-2008 16:35:01
> Build OS:               i686-pc-linux-gnu debian 3.1
>   JobId:                  1
>   Job:                    Vbackup.2008-08-18_16.34.03
>   Backup Level:           Full
>   Client:                 "localhost-fd" 2.5.3 (12Aug08)
> i686-pc-linux-gnu,debian,3.1 FileSet:                "Full Set" 2008-08-18
> 16:34:55
>   Pool:                   "Default" (From Job resource)
>   Catalog:                "MyCatalog" (From Client resource)
>   Storage:                "File" (From Pool resource)
>   Scheduled time:         18-Aug-2008 16:34:55
>   Start time:             18-Aug-2008 16:34:58
>   End time:               18-Aug-2008 16:35:01
>   Elapsed time:           3 secs
>   Priority:               10
>   FD Files Written:       9,677
>   SD Files Written:       0
>   FD Bytes Written:       120,153,844 (120.1 MB)
>   SD Bytes Written:       0 (0 B)
>   Rate:                   40051.3 KB/s
>   Software Compression:   None
>   VSS:                    no
>   Encryption:             no
>   Accurate:               no
>   Volume name(s):
>   Volume Session Id:      1
>   Volume Session Time:    1219073692
>   Last Volume Bytes:      201 (201 B)
>   Non-fatal FD errors:    0
>   SD Errors:              0
>   FD termination status:  OK
>   SD termination status:  Error
>   Termination:            *** Backup Error ***
>
> list jobs
> +-------+---------+---------------------+------+-------+----------+--------
>---+-----------+
>
> | JobId | Name    | StartTime           | Type | Level | JobFiles |
> | JobBytes  | JobStatus |
>
> +-------+---------+---------------------+------+-------+----------+--------
>---+-----------+
>
> | 1     | Vbackup | 2008-08-18 16:34:58 | B    | F     | 9677     |
> | 120153844 | E         |
>
> +-------+---------+---------------------+------+-------+----------+--------
>---+-----------+ @exec "sh -c 'touch
> /home/graham/bacula/regress/build/src/dird/*.c'" run job=Vbackup
> level=Incremental yes
> Job queued. JobId=2
> You have messages.
> wait
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Using host libthread_db library "/lib/libthread_db.so.1".
> [Thread debugging using libthread_db enabled]
> [New Thread 16384 (LWP 8850)]
> [New Thread 32769 (LWP 8857)]
> [New Thread 98306 (LWP 8927)]
> [New Thread 32771 (LWP 8865)]
> 0x40239081 in select () from /lib/libc.so.6
> $1 = "localhost-sd", '\0' <repeats 17 times>
> $2 = 0x80c11c8 "bacula-sd"
> $3 = 0x80c11f0 "/home/graham/bacula/regress/bin/bacula-sd"
> $4 = 0x0
> $5 = 0x80b94cc "2.5.3 (12 Aug 2008)"
> $6 = 0x80b4b46 "i686-pc-linux-gnu"
> $7 = 0x80b4b3f "debian"
> $8 = 0x80b4b76 "3.1"
> #0  0x40239081 in select () from /lib/libc.so.6
> #1  0x00000005 in ?? ()
> #2  0x40297c40 in ?? () from /lib/libc.so.6
> #3  0x00000000 in ?? ()
> #4  0x00000000 in ?? ()
> #5  0x08087b8c in bnet_thread_server (addrs=0x80c1550, max_clients=-514,
>     client_wq=0x80bf880, handle_client_request=0xfffffdfe) at
> bnet_server.c:161 #6  0x0804b05e in main (argc=0, argv=0x80c1380) at
> stored.c:271
>
> Thread 4 (Thread 32771 (LWP 8865)):
> #0  0x4003a456 in nanosleep () from /lib/libpthread.so.0
> #1  0x00000001 in ?? ()
> #2  0x4003652a in __pthread_timedsuspend_new () from /lib/libpthread.so.0
> #3  0x40033122 in pthread_cond_timedwait_relative () from
> /lib/libpthread.so.0 #4  0x080a195d in watchdog_thread (arg=0x0) at
> watchdog.c:307
> #5  0x40033e51 in pthread_start_thread () from /lib/libpthread.so.0
> #6  0x4023f92a in clone () from /lib/libc.so.6
>
> Thread 3 (Thread 98306 (LWP 8927)):
> #0  0x4003ac0b in waitpid () from /lib/libpthread.so.0
> #1  0xbf7fefd0 in ?? ()
> #2  0x00000001 in ?? ()
> #3  0x0809dcbc in signal_handler (sig=11) at signal.c:167
> #4  0x40039825 in __pthread_sighandler () from /lib/libpthread.so.0
> #5  <signal handler called>
> #6  0x080db304 in ?? ()
> #7  0x080db378 in ?? ()
> #8  0x080db378 in ?? ()
> #9  0x0807e6ee in despool_data (dcr=0x80d9f70, commit=true) at spool.c:275
> #10 0x0807e064 in commit_data_spool (dcr=0x80d9f70) at spool.c:137
> #11 0x08054705 in do_append_data (jcr=0x80d8358) at append.c:320
> #12 0x0806a8ae in append_data_cmd (jcr=0x80d8358) at fd_cmds.c:194
> #13 0x0806a7e1 in do_fd_commands (jcr=0x80d8358) at fd_cmds.c:165
> #14 0x0806a5f0 in run_job (jcr=0x80d8358) at fd_cmds.c:128
> #15 0x0806be29 in run_cmd (jcr=0x80d8358) at job.c:212
> #16 0x08064ad8 in handle_connection_request (arg=0x80c6230) at dircmd.c:230
> #17 0x080a2359 in workq_server (arg=0x80bf880) at workq.c:357
> #18 0x40033e51 in pthread_start_thread () from /lib/libpthread.so.0
> #19 0x4023f92a in clone () from /lib/libc.so.6
>
> Thread 2 (Thread 32769 (LWP 8857)):
> #0  0x40236ada in poll () from /lib/libc.so.6
> #1  0x40033b50 in __pthread_manager () from /lib/libpthread.so.0
> #2  0x4023f92a in clone () from /lib/libc.so.6
>
> Thread 1 (Thread 16384 (LWP 8850)):
> #0  0x40239081 in select () from /lib/libc.so.6
> #1  0x00000005 in ?? ()
> #2  0x40297c40 in ?? () from /lib/libc.so.6
> #3  0x00000000 in ?? ()
> #4  0x00000000 in ?? ()
> #5  0x08087b8c in bnet_thread_server (addrs=0x80c1550, max_clients=-514,
>     client_wq=0x80bf880, handle_client_request=0xfffffdfe) at
> bnet_server.c:161 #6  0x0804b05e in main (argc=0, argv=0x80c1380) at
> stored.c:271
> #0  0x40239081 in select () from /lib/libc.so.6
> #0  0x40239081 in select () from /lib/libc.so.6
> No symbol table info available.
> #1  0x00000005 in ?? ()
> No symbol table info available.
> #2  0x40297c40 in ?? () from /lib/libc.so.6
> No symbol table info available.
> #3  0x00000000 in ?? ()
> No symbol table info available.
> #4  0x00000000 in ?? ()
> No symbol table info available.
> #5  0x08087b8c in bnet_thread_server (addrs=0x80c1550, max_clients=-514,
>     client_wq=0x80bf880, handle_client_request=0xfffffdfe) at
> bnet_server.c:161 161	      if ((stat = select(maxfd + 1, &sockset, NULL,
> NULL, NULL)) < 0) { Current language:  auto; currently c++
> maxfd = 6
> sockset = {fds_bits = {32, 0 <repeats 31 times>}}
> newsockfd = 0
> stat = 0
> clilen = 16
> cli_addr = {sa_family = 2,
>   sa_data = "á\037\177\000\000\001\000\000\000\000\000\000\000"}
> tlog = 0
> turnon = 1
> p = (IPADDR *) 0x0
> fd_ptr = (s_sockfd *) 0x0
> buf =
> "127.0.0.1\000ÿ¿\032Q\b\b\204\a\f\b\200\023\f\b`y\003@+\033\n\b\021{\003@Èö
>\v\b\000\000\000\000 \206\001\000\000\000\000\000\000Í\003@\000\000\000\000à
>ó\v\bhûÿ¿\0223\003@àó\v\b0ù\r\b\200\023\f\b\000\000\000\000xûÿ¿_\022\n\b\000
>\000\000\000d\000\000\000¨ûÿ¿\a\025\n\bÐø\r\b0ù\r\b¨ûÿ¿\027\024\n\b" sockfds
> = {<SMARTALLOC> = {<No data fields>}, head = 0xbfffefb0,
>   tail = 0xbfffefb0, loffset = 0, num_items = 1}
> allbuf =
> "H\a\f\b\214l\v\b(ñÿ¿zM\003@X\a\f\b\000\000\000\000\002\231\004\bM\000\000\
>000\000\000\000\000\200'\f\bxñÿ¿\032Q\b\bH\a\f\bh'\f\bXñÿ¿zM\003@\020\203)@X
>\000\000\000\000\203)@\200x)@\200x)@\000\203)@xñÿ¿Ò\217\035@~\177\003@M\000\
>000\000\000\203)@\000\000\000\000\200'\f\b\000\000\000\000¨ñÿ¿\000Í\003@H\a\
>f\bk¸\n\b¨ñÿ¿zM\003@X\a\f\b\000\000\000\000Èñÿ¿8'\f\b8'\f\bª\000\000\000øñÿ¿
>\032Q\b\bH\a\f\b\030,\f\bØñÿ¿zM\003@\020\203)@X\000\000\000"... #6 
> 0x0804b05e in main (argc=0, argv=0x80c1380) at stored.c:271
> 271	   bnet_thread_server(me->sdaddrs, me->max_concurrent_jobs * 2 + 1,
> ch = -514
> no_signals = false
> test_config = false
> thid = 16386
> uid = 0x0
> gid = 0x0
> #0  0x00000000 in ?? ()
> No symbol table info available.

The above segmentation problem is because you are running a relatively old 
kernel, which is broken -- or at least the header files and OS calls on your 
system do not correspond to the reality.  

The problem occurs because on your system, posix_fadvise() is configured, but 
a call to it seg faults.  To fix it, I recommend upgrading or fixing your 
kernel.  Alternatively you can manually turn off the HAVE_POSIX_FADVISE 
#define in src/config.h after each ./configure that you run, and then rebuild 
Bacula with:

  make clean
  make
  ...
  make install

Until you fix this, you will always have some seg fault someplace in Bacula -- 
either in the SD or in the FD.



Best regards,

Kern

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-devel mailing list
Bacula-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/bacula-devel


This mailing list archive is a service of Copilotco.