[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bacula-devel] Selective restore when files are pruned [patch]

Kern Sibbald <kern@xxxxxxxxxxx> writes:
> - As Martin points out, this code gives the SD a bit more knowledge
> of the records it has stored, but unless someone has a better idea,
> I see no alternative.

the SD has this knowledge already, even if it ignores it.

> - One aspect of this code I haven't looked at yet is whether it is
> really required to add it in read_record.c rather than match_bsr.c,
> where all the other bsr filtering code is located.  To be
> investigated ...

as far as I could tell, match_bsr is only called once per volume, and
changing that design decision seemed more obtrusive.

> On a similar but slightly different subject: one user brought up a
> problem that we are surely likely to see quite a lot in the near
> future.  He has 600 million File records in his Bacula catalog, and
> he is required to have at least a 7 year retention period, which
> means the database is growing (I think it is currently at 100GB),
> and it will continue to grow.
> He has proposed to improve performance to have a separate File table
> for each client.  This would very likely improve the performance
> quite a lot because if you have say 60 clients, instead of having
> one gigantic File table it would be split into 60 smaller tables.
> For example, instead of referencing File, Bacula would for a clients
> named FD1 and FD2 reference FD1Files and FD2Files, and so on, each
> of which would be identical tables but containing only the data for
> a single client.
> The problem I have with the suggestion is that it would require
> rather massive changes to the current SQL code, and it would break
> all external programs that reference the File table of the database.

this would be very awkward.  we have hundreds of clients, and they
have very long names in Bacula (based on FQDN, often more than 40
characters), so I dread typing in the SQL table names by hand :-)

> The first important information is that version 3.0.0 we are
> planning to switch to by default using a 64 bit Id for the File
> table -- this will remove the current restriction of 4G files (it
> can manually be enabled in the current version, so the main change
> is to make it automatic).

oops, I just noticed our database schema still uses "int(10) unsigned"
for FileId, I'll need to change that for sure ...

> The second thing that could help a lot is the "Selective restore"
> patch submitted by Kjetil, because although a user may have a
> requirement for long retention periods, that does not necessarily
> mean the all the File records must be kept -- what is probably the
> most important is retaining the data and being able to extract it in
> a reasonable amount of time.  Implementation of this patch will
> allow some users to prune the File records even though the Volumes
> must be kept a long time.  Obviously this will not satisfy all
> requirements.

yes, it's a bit of a hack.  it's also a bit contradictory --
typically, a full restore is only useful from a very recent backup.
when restoring files from old backups, the user will want to
cherrypick files, so it would definitely be best to not prune File
information as long as the backup data is available.  but we're living
in an imperfect world, and I think Bacula should try to cater for home
users who have many files, but no beefy database server.

that said, building the tree prior to the restore can take a long
time, so even when full File information is available, entering a
regexp can be much more convenient,

> Another suggestion that I have for the problem of growing File
> tables is a sort of compromise.  Suppose that we implement two File
> retention periods.  One as currently exists that defines when the
> records are deleted, and a new period that defines when the records
> are moved out of the File table and placed in a secondary table
> perhaps called OldFiles.  This would allow users to keep the
> efficiency for active files high but at the same time allow the
> delete retention period to be quite long.  The database would still
> grow, but there would be a lot less overhead.  Actually the name of
> the table for these "expired" File records could even be defined on
> a client by client or Job by Job basis which would allow for having
> multiple "OldFiles" tables.
> Another advantage of my suggestion would be that within Bacula
> itself, switching from using the File table to using the OldFiles
> table could be made totally automatic (it will require a bit of
> code, but no massive changes).  External programs would still
> function normally in most cases, but if they wanted to access older
> data, they would need some modification.

using this scheme, an admin could configure Bacula to only keep the
most current full backup and incrementals in the main (fast) table,
and move the historic information the the OldFiles table.  this would
allow more optimisation for the DBA than basing it on partitioning in
the database, I think?

Kjetil T. Homme
Linpro AS

This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
Bacula-devel mailing list

This mailing list archive is a service of Copilotco.