[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bacula-devel] space saving in the database


I too think that we should defer to Kern and his list of priorities.  

Kern has been doing a great job of prioritizing and running this
project.  I think that people that are that tight on storage should
prune the files a few days earlier and let Kern work on functionality,
disk space is fairly in expensive.


On Tue, 2008-02-12 at 09:52 -0500, Bill Moran wrote:
> In response to Cousin Marc <mcousin@xxxxxxxx>:
> [snip]
> > > One hash is not possible without seriously restricting user's flexibility
> > > -- the MD5 field though not totally used as planned in 2.2. (hopefully it
> > > will be in 2.2) is a critical field for security and certain government
> > > legal requirements for the storage of data.  As legal requirements become
> > > more strict with increasing computer speed/technology we can expect the
> > > need for larger and larger hash codes.  In any case, it certainly needs to
> > > be user configurable without rebuilding the database -- thus variable.  In
> > > fact, as it stands, the user can have multiple different MD5 sizes in any
> > > given Job. I.e. some files may be more important to verify than others.
> > 
> > I understand that md5 is required, as it's the only way of reliably checking 
> > that a file has not been modified. But only one type of checksum may be more 
> > efficient from a database point of view, as it could be fixed size (no need 
> > to waste 4 bytes for instance in postgresql telling the engine : be careful, 
> > next field is variable length, here is it's size). Of course, going from md5 
> > to sha256 sacrifices 16 bytes... I don't know if there could be an efficient 
> > way of doing this. Anyway, base64 "wastes" more space in this scheme, so a 
> > transparent conversion at database level may be useful.
> > 
> > > As already explained, I would be very reluctant to make it a requirement to
> > > be a multiple of 32 bits.  It just takes one genius to come up with a new
> > > super fast algorithm that uses 129 bits to break the code.
> > Okay, I'll experiment with both. For us right now, a byte per record is only 
> > 300MB in database size :)
> Any time you look at complicating things to improve efficiency, there's
> the question "is it worth it".
> On the larger of our two Bacula servers, the database size is 8.5G.  The
> file table contains 35 million rows.  If you can save 16 bytes per row,
> that means an on-disk savings of 1/2G.
> My reaction to that would be "big friggin deal".  Considering the fact
> that we've got 750G of file volumes on a RAID 5, saving 500M on the
> database doesn't really seems worth the effort to me.
> Let's say Bacula moves to using SHA-256 hashes instead of md5.  Now the
> savings in storage space is 32 bytes instead of 16 bytes.  So, I'd be
> saving a whole G on the total database size.  I still say, "why bother"
> Bacula works just dandy for us with these sizes.  If I do a "list jobs
> where a given file is saved" on the largest of our servers, the response
> is fast enough that I don't even consider it a wait.  Quite honestly, it's
> fast enough that I have trouble believing that it doesn't take longer.
> Nearly instantaneous.
> Just my opinion, of course.  I'd be interested to hear how much effect
> this would have on others and whether they think it's worthwhile to even
> investigate.
Jason A. Kates (jason@xxxxxxxxx) 
Fax:    208-975-1514
Phone:  212-400-1670 x2

This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
Bacula-devel mailing list

This mailing list archive is a service of Copilot Consulting.