[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bacula-devel] sql query with PathId=0 during verify job
On Tuesday 25 March 2008 09:09:19 Ralf Gross wrote:
> Kern Sibbald schrieb:
> > On Monday 24 March 2008 23:28:07 Ralf Gross wrote:
> > > > The workaround is either to turn off heartbeat in your FD for your
> > > > Verify jobs (not possible on a job by job basis) or set it longer
> > > > than the time it takes to run the verify.
> > By the way, when using your localhost loopback device, you should never
> > need to enable heartbeat. It is needed only when you have a router
> > between two machines and that router does not correctly implement the
> > Internet keepalive standard.
> The heartbeat option is part of my template file for all fd configs.
> But it's obviously not needed here.
> > > > The longterm solution is that Bacula should not use the heartbeat
> > > > code during a Verify.
> > >
> > > I'm really glad that you found the reason, as you can see in bug
> > > report #1061 it was slowly driving me crazy ;)
> > Sorry it was driving you crazy, but often some of the most difficult
> > problems are uncovered and resolved when people are upset with it and
> > determined to find a resolution, which was your case here :-)
> The symptom with the bsock error was misleading. Looking at the bacula
> logs, a message about a missing or different file wasn't all the time
> present when a bsock error occured.
> 15-Mär 12:32 VUMEM004-sd JobId 1580: Forward spacing Volume
> "vu0em003-inc-0120" to file:block 0:226. 15-Mär 12:40 VUMEM004-dir JobId
> 1580: Fatal error: bsock.c:415 Packet size too big from "Client:
> VUMEM004-fd:10.60.1.231:9102. Terminating connection.
> 22-Mär 10:31 VUMEM004-sd JobId 1663: Forward spacing Volume "itd-diff-0133"
> to file:block 0:227. 22-Mär 10:35 VUMEM004-dir JobId 1663: Fatal error:
> bsock.c:415 Packet size too big from "Client: VUMEM004-fd:10.60.1.231:9102.
> Terminating connection.
> in contrast to this message:
> 22-Mär 21:11 VUMEM004-sd JobId 1674: Forward spacing Volume "itd-diff-0133"
> to file:block 0:227. 22-Mär 21:14 VUMEM004-dir JobId 1674: New file:
> /var/www_etas/howto/fit????/pics/.xvpics/email3.jpg 22-Mär 21:14
> VUMEM004-dir JobId 1674: Fatal error: bsock.c:415 Packet size too big from
> "Client: VUMEM004-fd:10.60.1.231:9102. Terminating connection.
Yes, it can be very confusing if you are not familiar with the code as I am.
Once the packets start getting corrupted, you can get all kinds of problems,
simple job failures as in your Verify, packet size too big, and unfortunately
in a few cases crashes. Bacula tries to protect itself from bad data, but it
doesn't always work out. In your case, I still don't understand why the
mutex error occurred, but 95% it is caused at the root by the bad data
arriving from the FD.
I've got a fix that I am testing for the problem, but it is definitely too
large to go into 2.2.9, so I will probably release a patch for it next week.
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
Bacula-devel mailing list
This mailing list archive is a service of Copilotco.