[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bacula-devel] Selective restore when files are pruned [patch]

Kern Sibbald wrote:
> Hello,
> Here are a couple of notes about this feature, then some related ideas ...
> - There is a clear need for a feature like this.  If you have a Job that has 
> the File records pruned, and it was a backup of 1TB but you only want a tiny 
> portion of that, the only alternative to a solution like this is to scan the 
> Volume, which is terribly slow.
> - As Martin points out, this code gives the SD a bit more knowledge of the 
> records it has stored, but unless someone has a better idea, I see no 
> alternative.
> - One aspect of this code I haven't looked at yet is whether it is really 
> required to add it in read_record.c rather than match_bsr.c, where all the 
> other bsr filtering code is located.  To be investigated ...
> ========
> On a similar but slightly different subject: one user brought up a problem 
> that we are surely likely to see quite a lot in the near future.  He has 600 
> million File records in his Bacula catalog, and he is required to have at 
> least a 7 year retention period, which means the database is growing (I think 
> it is currently at 100GB), and it will continue to grow.
> He has proposed to improve performance to have a separate File table for each 
> client.  This would very likely improve the performance quite a lot because 
> if you have say 60 clients, instead of having one gigantic File table it 
> would be split into 60 smaller tables.  For example, instead of referencing 
> File, Bacula would for a clients named FD1 and FD2 reference FD1Files and 
> FD2Files, and so on, each of which would be identical tables but containing 
> only the data for a single client.
> The problem I have with the suggestion is that it would require rather massive 
> changes to the current SQL code, and it would break all external programs 
> that reference the File table of the database.
> The first important information is that version 3.0.0 we are planning to 
> switch to by default using a 64 bit Id for the File table -- this will remove 
> the current restriction of 4G files (it can manually be enabled in the 
> current version, so the main change is to make it automatic).
> The second thing that could help a lot is the "Selective restore" patch 
> submitted by Kjetil, because although a user may have a requirement for long 
> retention periods, that does not necessarily mean the all the File records 
> must be kept -- what is probably the most important is retaining the data and 
> being able to extract it in a reasonable amount of time.  Implementation of 
> this patch will allow some users to prune the File records even though the 
> Volumes must be kept a long time.  Obviously this will not satisfy all 
> requirements.
> Another suggestion that I have for the problem of growing File tables is a 
> sort of compromise.  Suppose that we implement two File retention periods.  
> One as currently exists that defines when the records are deleted, and a new 
> period that defines when the records are moved out of the File table and 
> placed in a secondary table perhaps called OldFiles.  This would allow users 
> to keep the efficiency for active files high but at the same time allow the 
> delete retention period to be quite long.  The database would still grow, but 
> there would be a lot less overhead.  Actually the name of the table for 
> these "expired" File records could even be defined on a client by client or 
> Job by Job basis which would allow for having multiple "OldFiles" tables.
> Another advantage of my suggestion would be that within Bacula itself, 
> switching from using the File table to using the OldFiles table could be made 
> totally automatic (it will require a bit of code, but no massive changes).  
> External programs would still function normally in most cases, but if they 
> wanted to access older data, they would need some modification.
> We could also envision moving the "expired" File records to a different 
> database, which would in the end be much more efficient, but would require 
> considerably more work to implement.
> Whatever is finally decided, it is clear to me that it is unlikely to be 
> implemented in time for the next major release (planned for the end of the 
> year).
> I would appreciate your comments on either the "Selective restore" feature 
> and/or the "multiple File table" feature.

The cleanest way of splitting up tables is probably to use partitioning.  This 
lets you create a single logical table that is backed by multiple physical 
partitions, with a ruleset on the logical table that determines which 
partition a given row is stored in.  This might let you, for example, sort the 
File table rows into partitions on a per-year basis.

To the majority of the standard insert/select statements, the partitioning 
isn't visible, so it should have far less impact on the SQL code in Bacula 
than managing multiple tables manually.

Partitioning is supported by postgresql


and is already in the 5.1 development version of MySQL


I don't see any support for it in sqlite, but if you're worrying about that 
large a catalog you probably shouldn't be using sqlite anyway.

Frank Sweetser fs at wpi.edu  |  For every problem, there is a solution that
WPI Senior Network Engineer   |  is simple, elegant, and wrong. - HL Mencken
     GPG fingerprint = 6174 1257 129E 0D21 D8D4  E8A3 8E39 29E3 E2E8 8CEC

This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
Bacula-devel mailing list

This mailing list archive is a service of Copilotco.