[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bacula-devel] Alternative DB structures-Porposal

"David Boyes" <dboyes@xxxxxxxxxxxxxx> writes:

> Partially, but I've been working on USS on z/OS and OpenVMS Bacula
> clients, where the filesystems are block oriented rather than byte
> oriented. One *can* obtain a precise dataset size in bytes, but the
> cost is reading the entire file to determine where the file data
> actually ends, which is very expensive on terabyte- or
> petabyte-scale datasets.

Bacula *must* read the entire file in order to back it up, so the
exact byte count is known with no extra cost.

> It also doesn't really take sparse files or structured files (like
> VMS indexed datasets or VSAM data spaces) into account very well, so
> if this proposal is added to the "standard" Bacula database
> structure, you will encounter problems when you deal with these
> platforms (or anything more complicated than a simple sequential
> file).

as Kern mentions, Bacula already has code which expects a simple byte
count to be sufficient to describe the size of an object.  as a Unix
person, it is very hard for me to imagine a data object whose byte
count can not be summed up.

> If the idea is modified to be "number of units" and adds a "unit
> size" factor of bytes per unit (ie, is the unit 1 byte or 4K block,
> or VSAM cluster size of 32K, etc) I'd be more inclined to go for it.

I think we need to come up with what use cases we want to solve before
going on.  there's no point in making changes for change's sake.

I mentioned earlier in another thread that an exact byte count
combined with the MD5 sum can be used to implement incremental backups
of append-only files, *however* since the decision whether to backup
or not is based on ctime anyway, extracting the information from LStat
as needed is quite acceptable.

the reason I brought up size as a separate column in that other thread
was to do exactly what bfileview does (thanks, Eric!), and the main
bottleneck with bfileview is really not extracting the information
from LStat, but "flattening" the directory tree inside the
brestore_pathvisibility (as the name indicates, that table must be
added in addition to the core Bacula DB schema, and it must be kept
up-to-date using a batch job while no backups are running).

having mtime available during restore may be more useful, both to
provide more information interactively when selecting files, and to
pick which files to restore ("everything newer than a week from the
full backup which ran last night").

I don't see any use case for ctime.

now it's your and/or John's turn :-)

regards,          | Redpill  _
Kjetil T. Homme   | Linpro  (_)

This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
Bacula-devel mailing list

This mailing list archive is a service of Copilotco.