[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bacula-devel] Alternative DB structures-Porposal

> > Partially, but I've been working on USS on z/OS and OpenVMS Bacula
> > clients, where the filesystems are block oriented rather than byte
> > oriented. One *can* obtain a precise dataset size in bytes, but the
> > cost is reading the entire file to determine where the file data
> > actually ends, which is very expensive on terabyte- or
> > petabyte-scale datasets.
> Bacula *must* read the entire file in order to back it up, so the
> exact byte count is known with no extra cost.

Umm, not on the OSes I mentioned. If you fstat the file or read the
directory inode with Unix compatibility on, the underlying OS reads the
file once to determine the actual file size in bytes to fill into the
file stat structure in order to be compatible with the assumption that
files are streams of bytes. You then get to do it again to get the
actual data blocks. 

Reading a 20 TB file twice is nontrivial. I have LOTS of files that
large, and a few that will grow into exabyte-scale in the not too
distant future. 
> > It also doesn't really take sparse files or structured files (like
> > VMS indexed datasets or VSAM data spaces) into account very well, so
> > if this proposal is added to the "standard" Bacula database
> > structure, you will encounter problems when you deal with these
> > platforms (or anything more complicated than a simple sequential
> > file).
> as Kern mentions, Bacula already has code which expects a simple byte
> count to be sufficient to describe the size of an object.  as a Unix
> person, it is very hard for me to imagine a data object whose byte
> count can not be summed up.

See above. It's not that we can't get a byte count, but that there are
systems where it's very expensive to get that byte count and very cheap
to get the number of allocation units used, and also the size of the
allocation unit. If the allocation unit size happens to be 1 byte (as it
is on most Unix and Windows systems), you lose nothing and it's not a
problem. If the allocation unit size is > 1, you win big by skipping the
additional read of the file needed to report the size in the stat

But, I think a bit differently than most folks using Bacula, so I'll be
interested to see what John says. John -- your rock, sir...8-)

This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
Bacula-devel mailing list

This mailing list archive is a service of Copilotco.