[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bacula-devel] Selective restore when files are pruned [patch]


Kern Sibbald wrote:
> Hello,
> 
> Here are a couple of notes about this feature, then some related ideas ...
> 
> - There is a clear need for a feature like this.  If you have a Job that has 
> the File records pruned, and it was a backup of 1TB but you only want a tiny 
> portion of that, the only alternative to a solution like this is to scan the 
> Volume, which is terribly slow.
> 
> - As Martin points out, this code gives the SD a bit more knowledge of the 
> records it has stored, but unless someone has a better idea, I see no 
> alternative.
> 
> - One aspect of this code I haven't looked at yet is whether it is really 
> required to add it in read_record.c rather than match_bsr.c, where all the 
> other bsr filtering code is located.  To be investigated ...
> 
> ========
> 
> On a similar but slightly different subject: one user brought up a problem 
> that we are surely likely to see quite a lot in the near future.  He has 600 
> million File records in his Bacula catalog, and he is required to have at 
> least a 7 year retention period, which means the database is growing (I think 
> it is currently at 100GB), and it will continue to grow.
> 
> He has proposed to improve performance to have a separate File table for each 
> client.  This would very likely improve the performance quite a lot because 
> if you have say 60 clients, instead of having one gigantic File table it 
> would be split into 60 smaller tables.  For example, instead of referencing 
> File, Bacula would for a clients named FD1 and FD2 reference FD1Files and 
> FD2Files, and so on, each of which would be identical tables but containing 
> only the data for a single client.
> 
> The problem I have with the suggestion is that it would require rather massive 
> changes to the current SQL code, and it would break all external programs 
> that reference the File table of the database.
> 
> The first important information is that version 3.0.0 we are planning to 
> switch to by default using a 64 bit Id for the File table -- this will remove 
> the current restriction of 4G files (it can manually be enabled in the 
> current version, so the main change is to make it automatic).
> 
> The second thing that could help a lot is the "Selective restore" patch 
> submitted by Kjetil, because although a user may have a requirement for long 
> retention periods, that does not necessarily mean the all the File records 
> must be kept -- what is probably the most important is retaining the data and 
> being able to extract it in a reasonable amount of time.  Implementation of 
> this patch will allow some users to prune the File records even though the 
> Volumes must be kept a long time.  Obviously this will not satisfy all 
> requirements.
> 
> Another suggestion that I have for the problem of growing File tables is a 
> sort of compromise.  Suppose that we implement two File retention periods.  
> One as currently exists that defines when the records are deleted, and a new 
> period that defines when the records are moved out of the File table and 
> placed in a secondary table perhaps called OldFiles.  This would allow users 
> to keep the efficiency for active files high but at the same time allow the 
> delete retention period to be quite long.  The database would still grow, but 
> there would be a lot less overhead.  Actually the name of the table for 
> these "expired" File records could even be defined on a client by client or 
> Job by Job basis which would allow for having multiple "OldFiles" tables.
> 
> Another advantage of my suggestion would be that within Bacula itself, 
> switching from using the File table to using the OldFiles table could be made 
> totally automatic (it will require a bit of code, but no massive changes).  
> External programs would still function normally in most cases, but if they 
> wanted to access older data, they would need some modification.
> 
> We could also envision moving the "expired" File records to a different 
> database, which would in the end be much more efficient, but would require 
> considerably more work to implement.
> 
> Whatever is finally decided, it is clear to me that it is unlikely to be 
> implemented in time for the next major release (planned for the end of the 
> year).
> 
> I would appreciate your comments on either the "Selective restore" feature 
> and/or the "multiple File table" feature.

The cleanest way of splitting up tables is probably to use partitioning.  This 
lets you create a single logical table that is backed by multiple physical 
partitions, with a ruleset on the logical table that determines which 
partition a given row is stored in.  This might let you, for example, sort the 
File table rows into partitions on a per-year basis.

To the majority of the standard insert/select statements, the partitioning 
isn't visible, so it should have far less impact on the SQL code in Bacula 
than managing multiple tables manually.

Partitioning is supported by postgresql

http://www.postgresql.org/docs/8.1/interactive/ddl-partitioning.html

and is already in the 5.1 development version of MySQL

http://dev.mysql.com/doc/refman/5.1/en/partitioning.html

I don't see any support for it in sqlite, but if you're worrying about that 
large a catalog you probably shouldn't be using sqlite anyway.

-- 
Frank Sweetser fs at wpi.edu  |  For every problem, there is a solution that
WPI Senior Network Engineer   |  is simple, elegant, and wrong. - HL Mencken
     GPG fingerprint = 6174 1257 129E 0D21 D8D4  E8A3 8E39 29E3 E2E8 8CEC

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-devel mailing list
Bacula-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/bacula-devel


This mailing list archive is a service of Copilotco.