[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bacula-devel] space saving in the database

> Send me your Volume, Job, and File retention periods and the scheme you use
> for doing Full, Differential, and Incremental backups and your desired
> granularity of recovery, and I will tell you whether or not this idea will
> help you.
> I suspect that just doing strict pruning will reduce your database size by
> 50%.
Maybe Eric will, he's the one doing the setups for pruning ... But I think 
what our administrators want is to reduce pruning to what's really not on 
tapes, because the tapes have been erased.
We're not worried about having a very big database. We're just trying to have 
it as small as possible for our current use.

> Yes, for someone outside of Bacula that would be a natural reaction, but I
> am not likely to accept code that complicates or kludges the current
> situation. I would prefer that if any changes are going to be made to
> re-evaluate the current design.

There will be no changes in bacula, that's the whole point. It's more along 
the lines of creating 'dedicated' data types in postgresql to 
handle 'wasteful' base64 encoded fields a bit better. That's just a short 
term optimization.

> One hash is not possible without seriously restricting user's flexibility
> -- the MD5 field though not totally used as planned in 2.2. (hopefully it
> will be in 2.2) is a critical field for security and certain government
> legal requirements for the storage of data.  As legal requirements become
> more strict with increasing computer speed/technology we can expect the
> need for larger and larger hash codes.  In any case, it certainly needs to
> be user configurable without rebuilding the database -- thus variable.  In
> fact, as it stands, the user can have multiple different MD5 sizes in any
> given Job. I.e. some files may be more important to verify than others.

I understand that md5 is required, as it's the only way of reliably checking 
that a file has not been modified. But only one type of checksum may be more 
efficient from a database point of view, as it could be fixed size (no need 
to waste 4 bytes for instance in postgresql telling the engine : be careful, 
next field is variable length, here is it's size). Of course, going from md5 
to sha256 sacrifices 16 bytes... I don't know if there could be an efficient 
way of doing this. Anyway, base64 "wastes" more space in this scheme, so a 
transparent conversion at database level may be useful.

> As already explained, I would be very reluctant to make it a requirement to
> be a multiple of 32 bits.  It just takes one genius to come up with a new
> super fast algorithm that uses 129 bits to break the code.
Okay, I'll experiment with both. For us right now, a byte per record is only 
300MB in database size :)

This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
Bacula-devel mailing list

This mailing list archive is a service of Copilot Consulting.