[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bacula-devel] space saving in the database


On Tuesday 12 February 2008 15.59:10 Jason A. Kates wrote:
> Bill,
>
> I too think that we should defer to Kern and his list of priorities.

Thanks for your vote of confidence :-)

It is interesting because this is the first time I have not worked on the 
highest priority job (Item 1: Accurate restoration of renamed/deleted files), 
which was also the project that interested me the most.  

Instead, I am working on plugins (item 12: Add Plug-ins to the FileSet Include 
statements), which I decided to work on because it is the #1 most requested 
feature for enterprises (they want to be able to backup MS Exchange with 
a "module").  Well, I wasn't much enjoying the project, because it is a lot 
of *really* heavy design and delicate integration with Bacula, but now that I 
am into it, it is getting really interesting.

And the real nice part is that a very kind programmer came along and is making 
very good progress on the Accurate Backup project :-)

Also another kind programmer came along and is making great progress on Item 
h7: Commercial database support, which is the #2 most demanded enterprise 
feature -- I still have to figure out how to make this work legally/morally 
with our Open Source license ...

Kern

>
> Kern has been doing a great job of prioritizing and running this
> project.  I think that people that are that tight on storage should
> prune the files a few days earlier and let Kern work on functionality,
> disk space is fairly in expensive.
>
> 			-Jason
>
> On Tue, 2008-02-12 at 09:52 -0500, Bill Moran wrote:
> > In response to Cousin Marc <mcousin@xxxxxxxx>:
> >
> > [snip]
> >
> > > > One hash is not possible without seriously restricting user's
> > > > flexibility -- the MD5 field though not totally used as planned in
> > > > 2.2. (hopefully it will be in 2.2) is a critical field for security
> > > > and certain government legal requirements for the storage of data. 
> > > > As legal requirements become more strict with increasing computer
> > > > speed/technology we can expect the need for larger and larger hash
> > > > codes.  In any case, it certainly needs to be user configurable
> > > > without rebuilding the database -- thus variable.  In fact, as it
> > > > stands, the user can have multiple different MD5 sizes in any given
> > > > Job. I.e. some files may be more important to verify than others.
> > >
> > > I understand that md5 is required, as it's the only way of reliably
> > > checking that a file has not been modified. But only one type of
> > > checksum may be more efficient from a database point of view, as it
> > > could be fixed size (no need to waste 4 bytes for instance in
> > > postgresql telling the engine : be careful, next field is variable
> > > length, here is it's size). Of course, going from md5 to sha256
> > > sacrifices 16 bytes... I don't know if there could be an efficient way
> > > of doing this. Anyway, base64 "wastes" more space in this scheme, so a
> > > transparent conversion at database level may be useful.
> > >
> > > > As already explained, I would be very reluctant to make it a
> > > > requirement to be a multiple of 32 bits.  It just takes one genius to
> > > > come up with a new super fast algorithm that uses 129 bits to break
> > > > the code.
> > >
> > > Okay, I'll experiment with both. For us right now, a byte per record is
> > > only 300MB in database size :)
> >
> > Any time you look at complicating things to improve efficiency, there's
> > the question "is it worth it".
> >
> > On the larger of our two Bacula servers, the database size is 8.5G.  The
> > file table contains 35 million rows.  If you can save 16 bytes per row,
> > that means an on-disk savings of 1/2G.
> >
> > My reaction to that would be "big friggin deal".  Considering the fact
> > that we've got 750G of file volumes on a RAID 5, saving 500M on the
> > database doesn't really seems worth the effort to me.
> >
> > Let's say Bacula moves to using SHA-256 hashes instead of md5.  Now the
> > savings in storage space is 32 bytes instead of 16 bytes.  So, I'd be
> > saving a whole G on the total database size.  I still say, "why bother"
> >
> > Bacula works just dandy for us with these sizes.  If I do a "list jobs
> > where a given file is saved" on the largest of our servers, the response
> > is fast enough that I don't even consider it a wait.  Quite honestly,
> > it's fast enough that I have trouble believing that it doesn't take
> > longer. Nearly instantaneous.
> >
> > Just my opinion, of course.  I'd be interested to hear how much effect
> > this would have on others and whether they think it's worthwhile to even
> > investigate.



-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bacula-devel mailing list
Bacula-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/bacula-devel


This mailing list archive is a service of Copilot Consulting.