[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bacula-devel] sql query with PathId=0 during verify job


Kern Sibbald schrieb:
> >
> > sql_get.c:73-0 db_get_file_att_record
> > fname=/server/cvsroot/iprep/ANTSRT/SRC/Components/UTA2/ImageRectificationOp
> >enGl/ip/ip_hi<FF><FF><FF><FA>ghlevel/src/pelCont.cc,v sql_get.c:127-0
> > Get_file_record JobId=1683 FilenameId=539133 PathId=0 sql_get.c:129-0
> > Query=SELECT FileId, LStat, MD5 FROM File WHERE File.JobId=1683 AND
> > File.PathId=0 AND File.FilenameId=539133 sql_get.c:133-0 get_file_record
> > num_rows=0
> > verify.c:580-0 File not in catalog:
> > /server/cvsroot/iprep/ANTSRT/SRC/Components/UTA2/ImageRectificationOpenGl/i
> >p/ip_hi<FF><FF><FF><FA>ghlevel/src/pelCont.cc,v
> >
> > I don't know where the <FF><FF><FF><FA> characters are coming from.
> 
> Well, I would say that the most likely causes of those characters are:
> 
> 1. You have a bad network card or the loopback interface is screwed up.

There is no traffic going over eth0-eth2, only lo. I've no idea how I
could screw up a loopback device?


> 2. You are not running Bacula with UTF-8 turned on and you have German accents 
> in some filenames.

In the above example the complete filename is:

/server/cvsroot/iprep/ANTSRT/SRC/Components/UTA2/ImageRectificationOpenGl/ip/ip_highlevel/src/pelCont.cc,v

So no accents in that name, but there are surly filnames with accents
in the backup.

But I've no problem in restoring this backup job. I tried it
yesterday. The whole differential backup was restored without error. I
also checked the md5sum of the file that failed during the verify
before that restore, it was identical to the md5sum of the original
file.

> 3. There is some strange bug in Bacula that is causing the filename record to 
> get clobbered.

This is not the first time I've this problem, but it happens more
often now (more backup clients were added).

I found one of my old posts from last August:

http://article.gmane.org/gmane.comp.bacula.user/37076/match=FF

Maybe it's not 100% related to the recent one, but there were these
characters too. And since then bacula was migrated to new hardware, as
well as a dump/import of the pgsql database happend. I run dbcheck
once each month.


> To me the most likely is #1 because it looks to me like a negative integer (4 
> characters) has been inserted in the middle of the line.

Always the same characters as it seems.


> > Interesting is, that the successful verify job did a different sql
> > query with an additional File.PathId=750055 instead of File.PathId=0.
> > This is the only query with File.PathId=0 in the debug file.
> 
> Well, the Pathid=0 is to be expected if the path is messed up as is the above 
> case.  That simply means that the path was not found.
> 
> However, in looking over the code, I can see that it is not really optimal and 
> it really should avoid submitting the sql with the 0 because it is clear that 
> the sql will fail, so why bother submitting it in the first place.
 
Ok.

> > Is the File.PathId in the failed job 0 because of the extra characters
> > in the path to the file? Where are these characters are coming from?
> > Bad memory?  But wouldn't I then get similar errors during a backup
> > job that runs 10 hours? I haven't had a backup error for weeks, but
> > many verify error (100% bsock errors).
> 
> If hypothesis #1 is correct, I would worry about my backups too, but as you 
> point out, it doesn't quite make sense that only verify fails.
 
Yes, that's what I don't understand at all. I did some test restores a
few weeks ago and compared the md5sum with the backup source. No
difference.


> > All daemons that are involved in this verify job are running on the
> > same server. All traffic is going over the lo interface. postgres 8.1,
> > bacula 2.2.8.
> >
> > I've no problems in backing up TB's of data, but verify jobs are
> > giving me a hard time at the moment.
> 
> I've attached a patch for 2.2.8 that will prevent Bacula from submitting an 
> SQL command when the PathId (or FilenameId) is zero.  I recommend that you 
> apply it and see if it fixes your problem.  I give it a probability of about 
> 1% that it will solve your problem (i.e. I don't see why it should help 
> unless the PathId=0 makes PostgreSQL go off the deep end.
> 
> The real problem lies in why you are getting the FFFFFFFA in the middle of 
> your data record.

Thanks, I'll try it tomrrow. But I think it won't help much, because
the problem is (as you say) the additional FFFFFFFA.

> Another interesting thing you might want to try is to get debug output
> from the DIR and from the FD.  When you find the FFFFFFFA garbage in
> the middle of your DIR record, it would be interesting to see what the
> FD actually sent.  If you set the FD debug level to 20 the FD should
> dump the record just after it sent it.

I'm running all daemons with d100 at the moment. It's a bit hard to
compare the output of all daemons, because they hvae no timestamps.

Ok, this is what I find for the last error (other post, FFFF..
characters in the attribute field, not path):

bacula-dir debug:

getmsg.c:110-0 bget_dirmsg 140: 130489 1 pinsug5 /server/cvsroot/iprep/lane_recog/simulation/simulator.pro,v
verify.c: 570-0 dird<filed: stream=1 /server/cvsroot/iprep/lane_recog/simulation/simulator.pro,v
verify.c:571-0 dird<filed: attr=P0C CgF<FF><FF><FF><FA>9Wr IEk B BQu Sg A u4 BAA I BH2lKA BH2lKA BH2lKA A A
sql_get.c:73-0 db_get_file_att_record fname=/server/cvsroot/iprep/lane_recog/simulation/simulator.pro,v 
sql_get.c:127-0 Get_file_record JobId=1683 FilenameId=1929406 PathId=753722
sql_get.c:129-0 Query=SELECT FileId, LStat, MD5 FROM File WHERE File.JobId=1683 AND File.PathId=753722 AND File.FilenameId=1929406
sql_get.c:133-0 get_file_record num_rows=1


bacula-fd debug:

I guess I've to search for the string 130489 (bget_dirmsg) in the fd's
log...

[...]
verify_vol.c:219-0 bfiled>bdird: MD5 len=44: msg=130485 3 0QhSdd9glJtiGnzCnsmSXw *MD5-130485*
verify_vol.c:102-0 Got hdr: FilInx=130486 Stream=1.
verify_vol.c:115-0 Got stream data, len=121
verify_vol.c:149-0 Got Attr: FilInx=130486 type=5
verify_vol.c:102-0 Got hdr: FilInx=130486 Stream=15.
verify_vol.c:115-0 Got stream data, len=33
verify_vol.c:102-0 Got hdr: FilInx=130487 Stream=1.
verify_vol.c:115-0 Got stream data, len=129
verify_vol.c:149-0 Got Attr: FilInx=130487 type=3
verify_vol.c:102-0 Got hdr: FilInx=130487 Stream=2.
verify_vol.c:115-0 Got stream data, len=392
verify_vol.c:102-0 Got hdr: FilInx=130487 Stream=15.
verify_vol.c:115-0 Got stream data, len=33
verify_vol.c:102-0 Got hdr: FilInx=130487 Stream=3.
verify_vol.c:115-0 Got stream data, len=16
verify_vol.c:219-0 bfiled>bdird: MD5 len=44: msg=130487 3 P6w5mCzRap5HQbxKGr6hEg *MD5-130487*
verify_vol.c:102-0 Got hdr: FilInx=130488 Stream=1.
verify_vol.c:115-0 Got stream data, len=121
verify_vol.c:149-0 Got Attr: FilInx=130488 type=5
verify_vol.c:102-0 Got hdr: FilInx=130488 Stream=15.
verify_vol.c:115-0 Got stream data, len=33
---->
verify_vol.c:102-0 Got hdr: FilInx=130489 Stream=1.
verify_vol.c:115-0 Got stream data, len=133
verify_vol.c:149-0 Got Attr: FilInx=130489 type=3
verify_vol.c:102-0 Got hdr: FilInx=130489 Stream=2.
verify_vol.c:115-0 Got stream data, len=3000
verify_vol.c:102-0 Got hdr: FilInx=130489 Stream=15.
verify_vol.c:115-0 Got stream data, len=33
verify_vol.c:102-0 Got hdr: FilInx=130489 Stream=3.
verify_vol.c:115-0 Got stream data, len=16
verify_vol.c:219-0 bfiled>bdird: MD5 len=44: msg=130489 3 Wu88Xoj8IWwr88rOaGbyZg *MD5-130489*
verify_vol.c:102-0 Got hdr: FilInx=130490 Stream=1.
verify_vol.c:115-0 Got stream data, len=118
verify_vol.c:149-0 Got Attr: FilInx=130490 type=5
verify_vol.c:102-0 Got hdr: FilInx=130490 Stream=15.
verify_vol.c:115-0 Got stream data, len=33
verify_vol.c:102-0 Got hdr: FilInx=130491 Stream=1.
verify_vol.c:115-0 Got stream data, len=160
verify_vol.c:149-0 Got Attr: FilInx=130491 type=3
verify_vol.c:102-0 Got hdr: FilInx=130491 Stream=2.
verify_vol.c:115-0 Got stream data, len=5563
verify_vol.c:102-0 Got hdr: FilInx=130491 Stream=15.
verify_vol.c:115-0 Got stream data, len=33
verify_vol.c:102-0 Got hdr: FilInx=130491 Stream=3.
verify_vol.c:115-0 Got stream data, len=16
[...]
verify_vol.c:219-0 bfiled>bdird: MD5 len=44: msg=132004 3 QSFr1HBEqjFTDUYIAYjoZg *MD5-132004*
verify_vol.c:102-0 Got hdr: FilInx=132005 Stream=1.
verify_vol.c:115-0 Got stream data, len=152
verify_vol.c:149-0 Got Attr: FilInx=132005 type=3
verify_vol.c:102-0 Got hdr: FilInx=132005 Stream=2.
verify_vol.c:115-0 Got stream data, len=44224
verify_vol.c:102-0 Got hdr: FilInx=132005 Stream=15.
verify_vol.c:115-0 Got stream data, len=33
verify_vol.c:102-0 Got hdr: FilInx=132005 Stream=3.
verify_vol.c:115-0 Got stream data, len=16
verify_vol.c:219-0 bfiled>bdird: MD5 len=44: msg=132005 3 2b5COfMkFXZVgiAp/NBU5A *MD5-132005*
verify_vol.c:102-0 Got hdr: FilInx=132006 Stream=1.
verify_vol.c:115-0 Got stream data, len=149
verify_vol.c:149-0 Got Attr: FilInx=132006 type=3
jcr.c:603-0 OnEntry JobStatus=R set=f
jcr.c:623-0 OnExit JobStatus=f set=f
jcr.c:603-0 OnEntry JobStatus=f set=E
jcr.c:623-0 OnExit JobStatus=f set=E
verify_vol.c:266-0 End Verify-Vol. Files=132006 Bytes=0
heartbeat.c:139-0 Send kill to heartbeat id
job.c:252-0 Quit command loop. Canceled=1
[...]



Ralf

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bacula-devel mailing list
Bacula-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/bacula-devel


This mailing list archive is a service of Copilotco.