[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bacula-devel] Partitions (Was: Selective restore when files are pruned)


Hello,

(as the original message sent was lost I resend this)

Having a solid background in Oracle (incl. partitioning) and some
experience with PostgreSQL (never used partitioning) I decided to make a
short study.
Partitioning would be the easiest and most reasonable solution for the
"file table becomes huge" *if* it worked well (well as in Oracle).
The "jobid" field in the "file" table seems to be the obvious
partitioning key for me (as far as I read sqlcmds.c, at least) for range
partitions (maybe with adding date limit to the "job" table in queries).

However, for PostgreSQL, the partitioning is far from
convenient-and-complete; they currently discuss whether to add
Oracle-like partitioning support;
anyway, the current state is not so good. Managing partitions looks
crazy for me; to insert data, triggers (or rules - more expensive per
row but cheaper per transaction) are required. Worst of all,
documentation (of 8.3) states that only constant values in queries can
be used for "partition pruning" (i.e. utilizing only "needed"
partitions), so no "join conditions" can be utilized for this purpose.
I believe it will take a year or more for partitions on PostgreSQL to
become "reasonable" (however, they still can be used in spite of being
ugly).

MySQL, on the other hand, is much smarter on partitions (at least based
on documentation). Defining and managing of partitions sounds
reasonable. Partition pruning has been implemented in 5.1.6, but it also
works for constants only.

SQLite does not support partitioning at all AFAIK.

So, partitioning the file table by jobid won't work. The only way to
utilize partitions is to introduce a new column in the "file" table
(don't shoot me! just an integer, maybe). Say, year-number as a
partition key or whatever based on the retention period?

And here goes the thing that I should have stared with ;-). What are the
worst queries in question? what are the problems (problematic
scenarios)? I cannot answer myself because my first setup is less than
10Mb so far... If we could assume that clientid is always known (at
least in "problematic" queries) that would be of some help maybe (to add
it to the "file" table again)...

Alex

P.S. I've not found any database model picture in the Development
documentation so I created one (reverse engineering a PostgreSQL setup
and adding foreign keys manually to the picture). There seem to be many
unused tables (marked yellow; at least, they are empty in my setup) and
unused fields (especially "unused id fields"; foreign keys for them are
drawn in red or just missing in the picture); the "red" table
"mediatype" is just a total lie. Have I missed something?
P.P.S. I am afraid that it is the picture attachment that blocked my 
sending so I've  removed it. Does anybody need the data model picture?

Kern Sibbald wrote:
>>> Another suggestion that I have for the problem of growing File
>>> tables is a sort of compromise.  Suppose that we implement two File
>>> retention periods.  One as currently exists that defines when the
>>> records are deleted, and a new period that defines when the records
>>> are moved out of the File table and placed in a secondary table
>>> perhaps called OldFiles.  This would allow users to keep the
>>> efficiency for active files high but at the same time allow the
>>> delete retention period to be quite long.  The database would still
>>> grow, but there would be a lot less overhead.  Actually the name of
>>> the table for these "expired" File records could even be defined on
>>> a client by client or Job by Job basis which would allow for having
>>> multiple "OldFiles" tables.
>>>
>>> Another advantage of my suggestion would be that within Bacula
>>> itself, switching from using the File table to using the OldFiles
>>> table could be made totally automatic (it will require a bit of
>>> code, but no massive changes).  External programs would still
>>> function normally in most cases, but if they wanted to access older
>>> data, they would need some modification.
>>>       
>> using this scheme, an admin could configure Bacula to only keep thcsqe
>> most current full backup and incrementals in the main (fast) table,
>> and move the historic information the the OldFiles table.  this would
>> allow more optimisation for the DBA than basing it on partitioning in
>> the database, I think?
>>     
>
> Yes, I need to look at how partitioning works.  I have a feeling it will not 
> solve any of the problems if really gigantic database where some of the data 
> is used all the time, and other data is almost never used.  I haven't given 
> up on the File and OldFiles table idea.  However, I need to research 
> partitioning because if it really solve the problem correctly, all the 
> better -- we can focus on the many missing features ...
>
> Kern
>   



-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-devel mailing list
Bacula-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/bacula-devel


This mailing list archive is a service of Copilotco.