[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bacula-devel] Selective restore when files are pruned [patch]


Hello,

Here are a couple of notes about this feature, then some related ideas ...

- There is a clear need for a feature like this.  If you have a Job that has 
the File records pruned, and it was a backup of 1TB but you only want a tiny 
portion of that, the only alternative to a solution like this is to scan the 
Volume, which is terribly slow.

- As Martin points out, this code gives the SD a bit more knowledge of the 
records it has stored, but unless someone has a better idea, I see no 
alternative.

- One aspect of this code I haven't looked at yet is whether it is really 
required to add it in read_record.c rather than match_bsr.c, where all the 
other bsr filtering code is located.  To be investigated ...

========

On a similar but slightly different subject: one user brought up a problem 
that we are surely likely to see quite a lot in the near future.  He has 600 
million File records in his Bacula catalog, and he is required to have at 
least a 7 year retention period, which means the database is growing (I think 
it is currently at 100GB), and it will continue to grow.

He has proposed to improve performance to have a separate File table for each 
client.  This would very likely improve the performance quite a lot because 
if you have say 60 clients, instead of having one gigantic File table it 
would be split into 60 smaller tables.  For example, instead of referencing 
File, Bacula would for a clients named FD1 and FD2 reference FD1Files and 
FD2Files, and so on, each of which would be identical tables but containing 
only the data for a single client.

The problem I have with the suggestion is that it would require rather massive 
changes to the current SQL code, and it would break all external programs 
that reference the File table of the database.

The first important information is that version 3.0.0 we are planning to 
switch to by default using a 64 bit Id for the File table -- this will remove 
the current restriction of 4G files (it can manually be enabled in the 
current version, so the main change is to make it automatic).

The second thing that could help a lot is the "Selective restore" patch 
submitted by Kjetil, because although a user may have a requirement for long 
retention periods, that does not necessarily mean the all the File records 
must be kept -- what is probably the most important is retaining the data and 
being able to extract it in a reasonable amount of time.  Implementation of 
this patch will allow some users to prune the File records even though the 
Volumes must be kept a long time.  Obviously this will not satisfy all 
requirements.

Another suggestion that I have for the problem of growing File tables is a 
sort of compromise.  Suppose that we implement two File retention periods.  
One as currently exists that defines when the records are deleted, and a new 
period that defines when the records are moved out of the File table and 
placed in a secondary table perhaps called OldFiles.  This would allow users 
to keep the efficiency for active files high but at the same time allow the 
delete retention period to be quite long.  The database would still grow, but 
there would be a lot less overhead.  Actually the name of the table for 
these "expired" File records could even be defined on a client by client or 
Job by Job basis which would allow for having multiple "OldFiles" tables.

Another advantage of my suggestion would be that within Bacula itself, 
switching from using the File table to using the OldFiles table could be made 
totally automatic (it will require a bit of code, but no massive changes).  
External programs would still function normally in most cases, but if they 
wanted to access older data, they would need some modification.

We could also envision moving the "expired" File records to a different 
database, which would in the end be much more efficient, but would require 
considerably more work to implement.

Whatever is finally decided, it is clear to me that it is unlikely to be 
implemented in time for the next major release (planned for the end of the 
year).

I would appreciate your comments on either the "Selective restore" feature 
and/or the "multiple File table" feature.

Best regards,

Kern



On Friday 15 August 2008 14:00:12 Kjetil Torgrim Homme wrote:
> I needed to restore a subset of some old backups.  Restoring the full
> backups would need a terabyte of temporary storage, which seemed a bit
> wasteful (and inconvenient to get hold of) since the data I was
> interested in took less than a gigabyte.
>
> Anyway -- I implemented a simple regex to filter the files to restore.
> It works like this:
>
>     Building directory tree for JobId(s) 28644 ...
>     There were no files inserted into the tree, so file selection
>     is not possible.Most likely your retention policy pruned the files
>
>     Do you want to restore all the files? (yes|no): no
>
>     Regexp matching files to restore? (empty to abort): ^/var/log
>
> The patch adds a new keyword to the bootstrap file, FilePattern, which
> the storage daemon will apply to all files before deciding whether to
> send the file over to the fd.  The fd doesn't need any changes, btw.
>
> This is just a quick hack, and there is some polishing left to do:
>
>    * Only available interactively in the specific case above, but
>      could be useful as an alternative/supplement to marking files and
>      directories manually.
>
>    * Can not be modified like the other job parameters.
>
>    * Bacula will complain that the number of restored files is
>      different from what it expected in the final report.
>
>    * Documentation is not updated.
>
> The patch is against revision 7469, which we are now running in
> production (don't tell my boss ;-).  I hope others will find it
> useful.
>
> PS. Kern, the GPL paperwork is on its way to Switzerland.



-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-devel mailing list
Bacula-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/bacula-devel


This mailing list archive is a service of Copilotco.