[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bacula-devel] Selective restore when files are pruned [patch]
Here are a couple of notes about this feature, then some related ideas ...
- There is a clear need for a feature like this. If you have a Job that has
the File records pruned, and it was a backup of 1TB but you only want a tiny
portion of that, the only alternative to a solution like this is to scan the
Volume, which is terribly slow.
- As Martin points out, this code gives the SD a bit more knowledge of the
records it has stored, but unless someone has a better idea, I see no
- One aspect of this code I haven't looked at yet is whether it is really
required to add it in read_record.c rather than match_bsr.c, where all the
other bsr filtering code is located. To be investigated ...
On a similar but slightly different subject: one user brought up a problem
that we are surely likely to see quite a lot in the near future. He has 600
million File records in his Bacula catalog, and he is required to have at
least a 7 year retention period, which means the database is growing (I think
it is currently at 100GB), and it will continue to grow.
He has proposed to improve performance to have a separate File table for each
client. This would very likely improve the performance quite a lot because
if you have say 60 clients, instead of having one gigantic File table it
would be split into 60 smaller tables. For example, instead of referencing
File, Bacula would for a clients named FD1 and FD2 reference FD1Files and
FD2Files, and so on, each of which would be identical tables but containing
only the data for a single client.
The problem I have with the suggestion is that it would require rather massive
changes to the current SQL code, and it would break all external programs
that reference the File table of the database.
The first important information is that version 3.0.0 we are planning to
switch to by default using a 64 bit Id for the File table -- this will remove
the current restriction of 4G files (it can manually be enabled in the
current version, so the main change is to make it automatic).
The second thing that could help a lot is the "Selective restore" patch
submitted by Kjetil, because although a user may have a requirement for long
retention periods, that does not necessarily mean the all the File records
must be kept -- what is probably the most important is retaining the data and
being able to extract it in a reasonable amount of time. Implementation of
this patch will allow some users to prune the File records even though the
Volumes must be kept a long time. Obviously this will not satisfy all
Another suggestion that I have for the problem of growing File tables is a
sort of compromise. Suppose that we implement two File retention periods.
One as currently exists that defines when the records are deleted, and a new
period that defines when the records are moved out of the File table and
placed in a secondary table perhaps called OldFiles. This would allow users
to keep the efficiency for active files high but at the same time allow the
delete retention period to be quite long. The database would still grow, but
there would be a lot less overhead. Actually the name of the table for
these "expired" File records could even be defined on a client by client or
Job by Job basis which would allow for having multiple "OldFiles" tables.
Another advantage of my suggestion would be that within Bacula itself,
switching from using the File table to using the OldFiles table could be made
totally automatic (it will require a bit of code, but no massive changes).
External programs would still function normally in most cases, but if they
wanted to access older data, they would need some modification.
We could also envision moving the "expired" File records to a different
database, which would in the end be much more efficient, but would require
considerably more work to implement.
Whatever is finally decided, it is clear to me that it is unlikely to be
implemented in time for the next major release (planned for the end of the
I would appreciate your comments on either the "Selective restore" feature
and/or the "multiple File table" feature.
On Friday 15 August 2008 14:00:12 Kjetil Torgrim Homme wrote:
> I needed to restore a subset of some old backups. Restoring the full
> backups would need a terabyte of temporary storage, which seemed a bit
> wasteful (and inconvenient to get hold of) since the data I was
> interested in took less than a gigabyte.
> Anyway -- I implemented a simple regex to filter the files to restore.
> It works like this:
> Building directory tree for JobId(s) 28644 ...
> There were no files inserted into the tree, so file selection
> is not possible.Most likely your retention policy pruned the files
> Do you want to restore all the files? (yes|no): no
> Regexp matching files to restore? (empty to abort): ^/var/log
> The patch adds a new keyword to the bootstrap file, FilePattern, which
> the storage daemon will apply to all files before deciding whether to
> send the file over to the fd. The fd doesn't need any changes, btw.
> This is just a quick hack, and there is some polishing left to do:
> * Only available interactively in the specific case above, but
> could be useful as an alternative/supplement to marking files and
> directories manually.
> * Can not be modified like the other job parameters.
> * Bacula will complain that the number of restored files is
> different from what it expected in the final report.
> * Documentation is not updated.
> The patch is against revision 7469, which we are now running in
> production (don't tell my boss ;-). I hope others will find it
> PS. Kern, the GPL paperwork is on its way to Switzerland.
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
Bacula-devel mailing list
This mailing list archive is a service of Copilotco.