[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bacula-devel] Alternative DB structures-Porposal

On Wednesday 01 October 2008 19:56:54 Kjetil Torgrim Homme wrote:
> "David Boyes" <dboyes@xxxxxxxxxxxxxx> writes:
> > Partially, but I've been working on USS on z/OS and OpenVMS Bacula
> > clients, where the filesystems are block oriented rather than byte
> > oriented. One *can* obtain a precise dataset size in bytes, but the
> > cost is reading the entire file to determine where the file data
> > actually ends, which is very expensive on terabyte- or
> > petabyte-scale datasets.
> Bacula *must* read the entire file in order to back it up, so the
> exact byte count is known with no extra cost.
> > It also doesn't really take sparse files or structured files (like
> > VMS indexed datasets or VSAM data spaces) into account very well, so
> > if this proposal is added to the "standard" Bacula database
> > structure, you will encounter problems when you deal with these
> > platforms (or anything more complicated than a simple sequential
> > file).
> as Kern mentions, Bacula already has code which expects a simple byte
> count to be sufficient to describe the size of an object.  as a Unix
> person, it is very hard for me to imagine a data object whose byte
> count can not be summed up.

I am not arguing on either side of this issue at this point, but ...

There is one important point that I would like to bring up, and that is that 
Bacula writes the attributes record (which contains the LStat) before it 
backs up the file (i.e. before it reads the file).  This is because on the 
restore side, the File daemon needs the attributes record (parts of the 
LStat) in order to create the file.  

So if you want to calculate the file size and somehow save it at the end of 
the backup, that is possible (that is the way the MD5 works), but it would 
require some change to the current algorithm.



> > If the idea is modified to be "number of units" and adds a "unit
> > size" factor of bytes per unit (ie, is the unit 1 byte or 4K block,
> > or VSAM cluster size of 32K, etc) I'd be more inclined to go for it.
> I think we need to come up with what use cases we want to solve before
> going on.  there's no point in making changes for change's sake.
> I mentioned earlier in another thread that an exact byte count
> combined with the MD5 sum can be used to implement incremental backups
> of append-only files, *however* since the decision whether to backup
> or not is based on ctime anyway, extracting the information from LStat
> as needed is quite acceptable.
> the reason I brought up size as a separate column in that other thread
> was to do exactly what bfileview does (thanks, Eric!), and the main
> bottleneck with bfileview is really not extracting the information
> from LStat, but "flattening" the directory tree inside the
> brestore_pathvisibility (as the name indicates, that table must be
> added in addition to the core Bacula DB schema, and it must be kept
> up-to-date using a batch job while no backups are running).
> having mtime available during restore may be more useful, both to
> provide more information interactively when selecting files, and to
> pick which files to restore ("everything newer than a week from the
> full backup which ran last night").
> I don't see any use case for ctime.
> now it's your and/or John's turn :-)

This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
Bacula-devel mailing list

This mailing list archive is a service of Copilotco.