[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bacula-devel] Bacula 2.2.8 segfaulting with multiple autochangers
Hi Kern.
Maybe I was misunderstood. I'm not asking for support with running
bacula. I have a working, healthy system. Rather, I was asking if you
guys were interested in debugging a seg faulting process.
No, I didn't read the kaboom section, so I didn't know that bacula will
force itself to segfault. I generally jump to the conclusion that a
segfault in one version that doesn't happen in the previous, working
version, is a coding error, and want to help in debugging/fixing that. A
crash caused solely by an upgrade performed according to the
documentation is probably not really a "support issue", is it?
I guess if you guys don't want help finding and fixing bugs.. I can
respect that.
It will probably take me a couple of weeks to fix or at least fully
identify this problem by myself. Would it be out of line to post patches
here, should I find that there is, in fact, a problem with the code?
On Wed, 2008-02-20 at 23:20 +0100, Kern Sibbald wrote:
> Hello,
>
> I would suggest that you start by reading www.bacula.org -> Support and then
> asking for help on the support list. Off hand, I would say that you build
> was perhaps broken, but most likely you just have your bacula-sd.conf file
> incorrectly configured, and we don't deal with those issues on this list.
> Since you are also talking about modifying the port addresses, so you are
> treading on very dangerous waters.
>
> Then if the above doesn't help, there is plenty of info in the Kaboom chapter
> of the manual.
>
> On Wednesday 20 February 2008 19.29:50 Clint Byrum wrote:
> > Hi everyone. This is my first post to this list.
> >
> > I think I found a bug in Bacula 2.2.8 after I spent a couple of days
> > struggling with two autochangers before regressing to 2.2.7, which
> > solved my problems.
> >
> > Recently I rebuilt my director and main storage daemon box after a RAID
> > array died on the old one. I setup a new box (HP DL380 G4) with CentOS
> > 5, x86_64, and 8G RAM. I also added a second Quantum PX502 autochanger.
> >
> > I downloaded the 2.2.8 source RPM and built it using this commandline:
> >
> > rpmbuild --rebuild --define nobuild_gconsole=1 --define build_mysql=1
> > --define build_centos5=1
> >
> > The RPM built fine, and everything seemed to be working fine.. I used
> > the binaries to restore the old catalog from tape (some of you may have
> > seen me and my fun with MySQL index building in the IRC channel as
> > MapspaM).
> >
> > Anyway, after getting things running again as they were with the old
> > configurations, I wanted to start using my new, second autochanger,
> > which is on /dev/sg0 and /dev/nst0 (the old one ended up on /dev/sg2
> > and /dev/nst1). Things got ugly at this point.
> >
> > (In case you're wondering, my config files are at the bottom)
> >
> > Any time I would try to run a backup job on the new Storage resource,
> > bacula-sd would disappear.
> >
> > I tried a few more things including running seperate SD's and such.
> > Something about this new storage resource (In the attached configs as
> > LTO3-2) was broken.
> >
> > I finally ran a seperate bacula-sd with
> >
> > bacula-sd -f -d99 -c /etc/bacula/bacula-sd2.conf -u bacula -g bacula
> >
> > And ran my backup job against it, which caused this output:
> >
> > 19-Feb 16:33 snap2-sd: ABORTING due to ERROR in dev.c:724
> > dev.c:723 Bad call to rewind. Device "PX502-2-Drive-1" (/dev/nst0) not open
> > Kaboom! bacula-sd, snap2-sd got signal 11 - Segmentation violation.
> > Attempting traceback.
> >
> > The traceback failed due to "Permission Denied". I'm sorry but I
> > accidentally let the other debug messages scroll off my terminal, though
> > they looked normal. I can re-run the test after my backups for this week
> > finish if the debug output before/after that would be helpful. The
> > problems are very reproduceable, so it won't be hard.
> >
> > Here are the configs. A few notes:
> >
> > * I removed all the clients and jobs that didn't cause issues from the
> > configs.
> > * The segfault happened whether I was running with a seperate bacula-sd
> > or not, so bacula-sd.conf, and bacula-sd2.conf both had the problem. It
> > would seem though, that bacula-sd2.conf has something in it that
> > isolates this issue.
> > * When I was running bacula-sd2.conf, LTO3-2 had the port number changed
> > to 9104 to connect to the second sd, and the PX502-2 and its associated
> > resources were commented out of bacula-sd.conf.
> > * Of course, this could be the result of something in my catalog too,
> > I'm aware of that.
> >
> > I'm hoping I can work with you all to figure out why 2.2.8 segfaults,
> > and 2.2.7 does not. I'm at your disposal and quite experienced with C/C
> > ++ development, though not bacula's internals. Please just tell me what
> > other information and procedures would be helpful to you all to
> > debug/reproduce this.
> >
> > ----------------- bacula-dir.conf -----------------
> >
> > Director { # define myself
> > Name = snap-dir
> > DIRport = 9101 # where we listen for UA connections
> > QueryFile = "/etc/bacula/query.sql"
> > WorkingDirectory = "/var/lib/bacula"
> > PidDirectory = "/var/run"
> > #Password = "xxxxxxxxxx" # Console password
> > Password = "xxxxxxxxxx"
> > Messages = Daemon
> > Maximum Concurrent Jobs = 3
> > }
> >
> > Console {
> > Name = lavache-mon
> > Password = "xxxxxxxxxx"
> > CatalogACL = MyCatalog
> > CommandACL = status, .status
> > }
> >
> >
> > JobDefs {
> > Name = "DefaultJob"
> > Type = Backup
> > Level = Incremental
> > FileSet = "Full Set"
> > Schedule = "WeeklyCycle"
> > Storage = LTO3
> > Storage = LTO3-2
> > Messages = Standard
> > Full Backup Pool = Monthly
> > Differential Backup Pool = Weekly
> > Incremental Backup Pool = Daily
> > Pool = Default
> > Priority = 10
> > SpoolData = yes
> > Maximum Concurrent Jobs = 2
> > }
> >
> > #Snap's local backups -- should go after production stuff
> > JobDefs {
> > Name = "DefaultJobNoSpool"
> > Type = Backup
> > Level = Incremental
> > FileSet = "Full Set"
> > Schedule = "WeeklyCycle"
> > Storage = LTO3
> > Messages = Standard
> > Full Backup Pool = Monthly
> > Differential Backup Pool = Weekly
> > Incremental Backup Pool = Daily
> > Pool = Default
> > Priority = 11
> > SpoolData = no
> > Maximum Concurrent Jobs = 1
> > }
> > JobDefs {
> > Name = "DefaultJobHulkExt"
> > Type = Backup
> > Level = Full
> > FileSet = "External Set"
> > Storage = Hulk-LTO-1
> > Messages = Standard
> > Pool = DefaultHulk
> > Priority = 10
> > SpoolData = no
> > Maximum Concurrent Jobs = 1
> > }
> >
> > JobDefs {
> > Name = "DefaultJobHulk"
> > Type = Backup
> > Level = Incremental
> > FileSet = "Full Set"
> > Schedule = "WeeklyCycle"
> > Storage = Hulk-LTO-1
> > Messages = Standard
> > Pool = DefaultHulk
> > Priority = 10
> > SpoolData = no
> > Maximum Concurrent Jobs = 1
> > }
> >
> > #
> > # Define the main nightly save backup job
> > #
> > Job {
> > Name = "snap"
> > Client = snap-fd
> > JobDefs = "DefaultJobNoSpool"
> > Write Bootstrap = "/mnt/remote/pull/snap/bootstraps/snap.bsr"
> > }
> >
> > Job {
> > Name = "clapSnapshot"
> > Client = clap-fd
> > JobDefs = "DefaultJob"
> > FileSet = "clapSnapshot"
> > Maximum Concurrent Jobs = 2
> > Write Bootstrap = "/mnt/remote/pull/snap/bootstraps/clap.bsr"
> > Storage = LTO3
> > Schedule = "DailyFullDaytime"
> > }
> >
> >
> > # Backup the catalog database (after the nightly save)
> > Job {
> > Name = "BackupCatalog"
> > JobDefs = "DefaultJob"
> > Level = Full
> > FileSet="Catalog"
> > Schedule = "DailyFull"
> > # This creates an ASCII copy of the catalog
> > RunBeforeJob = "/etc/bacula/make_tmp_catalog_backup bacula bacula
> > xxxxxxxxxxxxx"
> > # This deletes the copy of the catalog
> > #RunAfterJob = "/etc/bacula/delete_catalog_backup"
> > Write Bootstrap = "/home/tmp/bacula.bsr"
> > Client = snap-fd
> > Priority = 11 # run after main backup
> > }
> >
> > #
> > # Standard Restore template, to be changed by Console program
> > # Only one such job is needed for all Jobs/Clients/Storage ...
> > #
> > Job {
> > Name = "RestoreFiles"
> > Storage = LTO3
> > Fileset = "Full Set"
> > Pool = Default
> > Client = snap-fd
> > Type = Restore
> > Messages = Standard
> > Where = /back/bacula/restores
> > }
> >
> >
> > # List of files to be backed up
> > FileSet {
> > Name = "Full Set"
> > Include {
> > Options {
> > signature = MD5
> > }
> > File = /
> > File = /home
> > File = /home/vpopmail
> > File = /back
> > File = /usr
> > File = /var
> > }
> > Exclude {
> > File = /proc
> > File = *swap.file*
> > File = /tmp
> > File = /sys
> > File = /home/tmp
> > File = /var/tmp
> > File = /var/log/lastlog
> > File = /back/bacula
> > File = /.journal
> > File = /.fsck
> > }
> > }
> >
> > FileSet {
> > Name = "External Set"
> > Include {
> > Options {
> > signature = MD5
> > }
> > File = /external
> > }
> > }
> >
> > FileSet {
> > Name = "NFS Set"
> > Include {
> > Options {
> > signature = MD5
> > }
> > File = /
> > File = /home
> > File = /home/ftp
> > File = /home/files
> > File = /back
> > File = /usr
> > File = /var
> > }
> > Exclude {
> > File = /proc
> > File = /tmp
> > File = /sys
> > File = /home/tmp
> > File = /var/tmp
> > File = /var/log/lastlog
> > File = /.journal
> > File = /.fsck
> > }
> > }
> >
> > FileSet {
> > Name = "Xen Host Set"
> > Include {
> > Options {
> > signature = MD5
> > }
> > File = /
> > File = /home
> > File = /back
> > File = /usr
> > File = /var
> > }
> > Exclude {
> > File = /proc
> > File = /tmp
> > File = /sys
> > File = /home/tmp
> > File = /home/xen/boxes
> > File = /var/tmp
> > File = /var/log/lastlog
> > File = /.journal
> > File = /.fsck
> > }
> > }
> >
> > FileSet {
> > Name = "clapSnapshot"
> > Include {
> > File = /mnt/snapshots/latest
> > }
> > }
> >
> > FileSet {
> > Name = "homeLogs"
> > Include {
> > File = /home/logs
> > }
> > }
> > FileSet {
> > Name = "quickbooks"
> > Include {
> > File = "C:/Quickbooks Data"
> > }
> > }
> >
> >
> >
> > #
> > # When to do the backups, full backup on first sunday of the month,
> > # differential (i.e. incremental since full) every other sunday,
> > # and incremental backups other days
> > Schedule {
> > Name = "WeeklyCycle"
> > Run = Full 1st friday at 21:05
> > Run = Differential 2nd-5th friday at 21:05
> > Run = Incremental sat-thu at 21:05
> > }
> > Schedule {
> > Name = "AfterBusinessHours"
> > Run = Full 1st sat at 17:00
> > Run = Differential 2nd-5th sat at 17:00
> > Run = Incremental sun-fri at 17:00
> > }
> >
> >
> > # This schedule does the catalog. It starts after the WeeklyCycle
> > Schedule {
> > Name = "DailyFull"
> > Run = Full sun-sat at 23:10
> > }
> > Schedule {
> > Name = "DailyFullDaytime"
> > Run = Full sat at 14:30
> > Run = Full tue at 14:30
> > Run = Full thu at 14:30
> > }
> >
> >
> > # This is the backup of the catalog
> > FileSet {
> > Name = "Catalog"
> > Include {
> > Options {
> > signature = MD5
> > }
> > File = /mnt/remote/pull/snap/catalog
> > }
> > }
> >
> > # Client (File Services) to backup
> > Client {
> > Name = snap-fd
> > Address = snap
> > FDPort = 9102
> > Catalog = MyCatalog
> > Password = "xxxxxxxxxx"
> > File Retention = 2 months # 30 days
> > Job Retention = 60 months # six months
> > AutoPrune = yes # Prune expired Jobs/Files
> > }
> >
> > Client {
> > Name = clap-fd
> > Address = clap
> > FDPort = 9102
> > Catalog = MyCatalog
> > Password = "xxxxxxxxxx"
> > File Retention = 30 days
> > Job Retention = 18 months
> > AutoPrune = yes
> > }
> >
> > Storage {
> > Name = LTO3
> > Address = snap
> > SDPort = 9103
> > Password = "xxxxxxxxxx"
> > Device = PX502
> > Media Type = Ultrium-LTO-3
> > Autochanger = yes
> > Maximum Concurrent Jobs = 2
> > }
> >
> > Storage {
> > Name = LTO3-2
> > Address = snap
> > SDPort = 9103
> > Password = "xxxxxxxxxx"
> > Device = PX502-2
> > Media Type = Ultrium-LTO-3
> > Autochanger = yes
> > Maximum Concurrent Jobs = 2
> > }
> >
> > Storage {
> > Name = Hulk-LTO-1
> > Address = hulk
> > SDPort = 9103
> > Password = "xxxxxxxxxx"
> > Device = LTO-Drive-1
> > Media Type = Ultrium-LTO-3
> > }
> >
> > # Generic catalog service
> > Catalog {
> > Name = MyCatalog
> > dbname = bacula; user = bacula; Password = "xxxxxxxxxx"
> > }
> >
> > # Reasonable message delivery -- send most everything to email address
> > # and to the console
> > Messages {
> > Name = Standard
> > #
> > # NOTE! If you send to two email or more email addresses, you will need
> > # to replace the %r in the from field (-f part) with a single valid
> > # email address in both the mailcommand and the operatorcommand.
> > #
> > mailcommand = "/usr/sbin/bsmtp -h localhost -f \"\(Bacula\) %r\" -s
> > \"Bacula: %t %e of %c %l\" %r"
> > operatorcommand = "/usr/sbin/bsmtp -h localhost -f \"\(Bacula\) %r\"
> > -s \"Bacula: Intervention needed for %j\" %r"
> > mail = bacula@xxxxxxxxxx = all, !skipped
> > operator = bacula@xxxxxxxxxx = mount
> > console = all, !skipped, !saved
> > #
> > # WARNING! the following will create a file that you must cycle from
> > # time to time as it will grow indefinitely. However, it will
> > # also keep all your messages if they scroll off the console.
> > #
> > append = "/var/lib/bacula/log" = all, !skipped
> > }
> >
> > #
> > # Message delivery for daemon messages (no job).
> > Messages {
> > Name = Daemon
> > mailcommand = "/usr/sbin/bsmtp -h localhost -f \"\(Bacula\) %r\" -s
> > \"Bacula daemon message\" %r"
> > mail = bacula@xxxxxxxxxx = all, !skipped
> > console = all, !skipped, !saved
> > append = "/var/lib/bacula/log" = all, !skipped
> > }
> >
> > # Default pool definition
> > Pool {
> > Name = Default
> > Pool Type = Backup
> > Recycle = no # Bacula can automatically recycle
> > Volumes
> > AutoPrune = no # Prune expired volumes
> > Volume Retention = 365 days # one year
> > # Accept Any Volume = yes # write on any volume in the pool
> > }
> > Pool {
> > Name = DefaultHulk
> > Pool Type = Backup
> > Recycle = yes # Bacula can automatically recycle
> > Volumes
> > AutoPrune = yes # Prune expired volumes
> > Volume Retention = 365 days # one year
> > # Accept Any Volume = yes # write on any volume in the pool
> > }
> >
> > # Disk backups - mostly incrementals
> > Pool {
> > Name = Daily
> > Pool Type = Backup
> > Recycle = yes
> > AutoPrune = yes
> > Volume Retention = 8 days
> > # Accept Any Volume = yes
> > }
> > Pool {
> > Name = Weekly
> > Pool Type = Backup
> > Recycle = yes
> > AutoPrune = yes
> > Volume Retention = 6 weeks
> > # Accept Any Volume = yes
> > }
> > Pool {
> > Name = Monthly
> > Pool Type = Backup
> > Recycle = no # We don't want these recycled, they will be retained
> > for a long time
> > AutoPrune = no
> > Volume Retention = 5 years # Some are taken offsite, others will be
> > recycled
> > # Accept Any Volume = yes
> > Cleaning Prefix = "CLN";
> > }
> >
> > # Tapes for dumps of backup server
> > Pool {
> > Name = BackMonthlyArchives
> > Pool Type = Backup
> > Recycle = no # We don't want these recycled, they will be retained
> > for a long time
> > AutoPrune = no
> > Volume Retention = 5 years # Some are taken offsite, others will be
> > recycled
> > # Accept Any Volume = yes
> > }
> >
> >
> >
> > #
> > # Restricted console used by tray-monitor to get the status of the
> > director
> > #
> > Console {
> > Name = snap-mon
> > Password = "xxxxxxxxxx"
> > CommandACL = status, .status
> > }
> >
> > -------------------- bacula-sd.conf ------------------------
> >
> > Storage { # definition of myself
> > Name = snap-sd
> > SDPort = 9103 # Director's port
> > WorkingDirectory = "/var/lib/bacula"
> > Pid Directory = "/var/run"
> > Maximum Concurrent Jobs = 20
> > }
> >
> > #
> > # List Directors who are permitted to contact Storage daemon
> > #
> > Director {
> > Name = snap-dir
> > Password = "xxxxxxxxxx"
> > }
> >
> > #
> > # Restricted Director, used by tray-monitor to get the
> > # status of the storage daemon
> > #
> > Director {
> > Name = lavache-mon
> > Password = "xxxxxxxxxx"
> > Monitor = yes
> > }
> >
> > #
> > # Devices supported by this Storage daemon
> > # To connect, the Director's bacula-dir.conf must have the
> > # same Name and MediaType.
> > #
> >
> > #
> > # An autochanger device with two drives
> > #
> > Autochanger {
> > Name = PX502
> > Device = PX502-Drive-1
> > #Device = PX502-Drive-2 # uncomment when second drive is installed
> > Changer Command = "/etc/bacula/mtx-changer %c %o %S %a %d"
> > Changer Device = /dev/sg2
> > }
> >
> > Autochanger {
> > Name = PX502-2
> > Device = PX502-2-Drive-1
> > Changer Command = "/etc/bacula/mtx-changer %c %o %S %a %d"
> > Changer Device = /dev/sg0
> > }
> >
> > Device {
> > Name = PX502-Drive-1;
> > Drive Index = 0;
> > Media Type = Ultrium-LTO-3;
> > Archive Device = /dev/nst1;
> > AutomaticMount = yes;
> > Always Open = yes;
> > RemovableMedia = yes;
> > Random Access = no;
> > AutoChanger = yes;
> > Changer Device = /dev/sg2;
> > Alert Command = "sh -c 'tapeinfo -f %c |grep TapeAlert|cat'";
> > Maximum Job Spool Size = 100G;
> > Maximum Spool Size = 120G;
> > Spool Directory = /var/spool/bacula
> > }
> >
> > Device {
> > Name = PX502-2-Drive-1;
> > Drive Index = 0;
> > Media Type = Ultrium-LTO-3;
> > Archive Device = /dev/nst0;
> > AutomaticMount = yes;
> > Always Open = yes;
> > RemovableMedia = yes;
> > Random Access = no;
> > AutoChanger = yes;
> > Changer Device = /dev/sg0;
> > Alert Command = "sh -c 'tapeinfo -f %c |grep TapeAlert|cat'";
> > Maximum Job Spool Size = 100G;
> > Maximum Spool Size = 120G;
> > Spool Directory = /var/spool/bacula
> > }
> >
> > #
> > # Send all messages to the Director,
> > # mount messages also are sent to the email address
> > #
> > Messages {
> > Name = Standard
> > director = snap-dir = all
> > }
> >
> > ------------------ bacula-sd2.conf -----------------
> >
> > Storage { # definition of myself
> > Name = snap2-sd
> > SDPort = 9104 # Director's port
> > WorkingDirectory = "/var/lib/bacula"
> > Pid Directory = "/var/run"
> > Maximum Concurrent Jobs = 20
> > }
> >
> > #
> > # List Directors who are permitted to contact Storage daemon
> > #
> > Director {
> > Name = snap-dir
> > Password = "xxxxxxxxxx"
> > }
> >
> > #
> > # Restricted Director, used by tray-monitor to get the
> > # status of the storage daemon
> > #
> > Director {
> > Name = lavache-mon
> > Password = "xxxxxxxxxx"
> > Monitor = yes
> > }
> >
> > Autochanger {
> > Name = PX502-2
> > Device = PX502-2-Drive-1
> > Changer Command = "/etc/bacula/mtx-changer %c %o %S %a %d"
> > Changer Device = /dev/sg0
> > }
> >
> > Device {
> > Name = PX502-2-Drive-1;
> > Drive Index = 0;
> > Media Type = Ultrium-LTO-3;
> > Archive Device = /dev/nst0;
> > AutomaticMount = yes;
> > Always Open = yes;
> > RemovableMedia = yes;
> > Random Access = no;
> > AutoChanger = yes;
> > Changer Device = /dev/sg0;
> > Alert Command = "sh -c 'tapeinfo -f %c |grep TapeAlert|cat'";
> > Maximum Job Spool Size = 100G;
> > Maximum Spool Size = 120G;
> > Spool Directory = /var/spool/bacula
> > }
> >
> >
> > #
> > # Send all messages to the Director,
> > # mount messages also are sent to the email address
> > #
> > Messages {
> > Name = Standard
> > director = snap-dir = all
> > }
>
>
--
Clint Byrum <clint@xxxxxxxxxx>
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bacula-devel mailing list
Bacula-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/bacula-devel
This mailing list archive is a service of Copilot Consulting.