[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bacula-devel] Marking Files for Restore


On Wednesday 19 November 2008 17:52:13 Tim Frank wrote:
> Dan Langille wrote on 11/18/2008 10:43 PM:
> > On Nov 18, 2008, at 3:06 PM, Tim Frank wrote:
> >> I will not pretend to understand all of the complexities of the system
> >> for multiple jobID's and file versions, or which queries are or are not
> >> compatible across PostgreSQL, MySQL and SQLite, but it would seem that
> >> some optimization may be possible.
> >
> > At first glance, I agree.  I'd have to look deeper into this to know
> > more.
> >
> > We have traditionally sought to keep the SQL the same whereever we
> > can.  But I'm thinking there are some thing we can do to speed this up.
> >
> > I have only but glanced at your comments and don't have the time to
> > investigate now.
>
> I have dug a little deeper into the issue and I believe that the large
> number of queries is related to hardlinks. 

Yes, that is true.  It is because if you mark a hard link, to do things 
correctly, Bacula must go back to the orginal file that was backed up (the 
first one found) and mark it for restoration also. This is because Bacula 
normally saves only one copy of a hard linked file (there is only one 
physical copy on disk -- the rest of the files are just pointers to the same 
data).

> I noticed that when marking 
> various directories that sometimes there would be no hits to the
> database when marking 12,000 or 27,000 files. Other times there would be
> multiple hits to the database when marking 100 files.
>
> The most simple example are the files "/bin/gzip", "/bin/gunzip", and
> "/bin/zcat", which are all hardlinks. When issuing "mark bin" in
> bconsole, there are two database hits for "/bin/gunzip" and "/bin/zcat".
>
> The large number of database hits when marking the /usr directory come
> from the fact that I have 44,479 hardlinks in that directory. So, there
> was not quite a query for each of the 79,979 files, but one or more for
> each of the 22,639 unique hardlinks. (A few kernel source packages are
> installed).

To handle hard links, Bacula must read the full meta data stored in the 
database.

>
> The only thing I could think of to speed this up would be to keep a list
> of the hardlinks and execute a single query or a smaller number of
> queries to mark these entries.

Well, the simple way to eliminate this would be to start with all files 
marked, which is the way I originally implemented the code then unmark the 
ones you do not want to restore.  The users complained a lot about this and 
voted to start with all files unmarked.  Thus when you mark them, instead of 
making one pass through all the data in one big swoop as Bacula does in the 
beginning, it is forced to issue individual sql statements for each file it 
encounters.

If you want to avoid this problem, you can simply add the keyword "all" 
(without quotes) on the restore command line, and unless something is broken, 
you will start with all files marked.  Unmarking them should be a rapid 
process (I think -- without looking at the code).

>
> Thanks again for any insight.

Best regards,

Kern


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-devel mailing list
Bacula-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/bacula-devel


This mailing list archive is a service of Copilot Consulting.