[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bacula-devel] Alternative DB structures-Porposal


On Thursday 25 September 2008 14:19:46 David Boyes wrote:
> > On Wednesday 24 September 2008 23:40:03 David Boyes wrote:
> > > > ALTER TABLE file  add column size bigint default 0;
> > >
> > > This seems to assume that files/objects are byte streams. This is
>
> not
>
> > > always true on all the platforms that Bacula supports.
> >
> > I am not sure I understand why this assumes that files/objects are
>
> byte
>
> > streams.  I am not arguing, but I find your statement interesting ...
> >
> > One point is that no platform is obligated to provide size (st_mtime
>
> is
>
> > needed), so it may not always be available -- perhaps that is what you
> > (David) mean above?
>
> Partially, but I've been working on USS on z/OS and OpenVMS Bacula
> clients, where the filesystems are block oriented rather than byte
> oriented. One *can* obtain a precise dataset size in bytes, but the cost
> is reading the entire file to determine where the file data actually
> ends, which is very expensive on terabyte- or petabyte-scale datasets.
> It also doesn't really take sparse files or structured files (like VMS
> indexed datasets or VSAM data spaces) into account very well, so if this
> proposal is added to the "standard" Bacula database structure, you will
> encounter problems when you deal with these platforms (or anything more
> complicated than a simple sequential file).

Bacula does not need the field to be precise, though some users may want it to 
be that way.  In general, Bacula doesn't need the bytes, but it does use it 
to warn you if the file size changed during backup (if user enabled), and it 
is always used to check the resulting size during a restore, but all that 
Bacula cares is that the size that is in the catalog is consistent with the 
size the FD sees during restore.  If it is rounded to the system block size, 
that is fine with me -- that is how a lot of OSes work in this world.

This is an entirely different set of code, but the only place where the size 
is really critical is when the SD wants to append to a disk file.  It needs 
to be able to seek to the end of the disk file and check that it corresponds 
to what is in the catalog.  Again as long as the value stored in the catalog 
is consistent with the value obtained when seeking to the end of file, Bacula 
doesn't really care if it is rounded or not.  However, if there are "holes" 
in the file because you cannot properly see to the byte corresponding to the 
end of the file, I doubt that a Volume Bacula wrote will be readable by 
Bacula.  As I say, this is a totally different question than the one at hand 
but may be an issue when porting the SD.

>
> Example: I can easily tell you that a file on VMS takes 4M 4K blocks in
> a few nanoseconds, but telling you that it takes exactly 22,239,394
> bytes would take order of several seconds. 

> I propose for my client work 
> to store the number the OS returns from stat as units of allocation, 
> and include a scaling factor. (BTW this concept is built into the virtual
> storage manager I talked about a few weeks ago).

I propose that each port simply compute the number the best it can either by 
multiplying the block size by the block count or by obtaining the real size, 
which ever suits the particular port the best.  As I say, at the current 
time, this number is not critical.  There will undoubtedly be users that want 
it exact, but if we document it properly that is OK ...

>
> I'm not objecting to the idea per se, but all the world is not Unix or
> Windows, and you'll get some very strange results if you ran scripts
> against my database and you don't understand what units you're talking
> about...8-)

Every system is different. I quite familar with porting issues.  Each case 
must be dealt with individually, and I don't see any problem here.

> If the idea is modified to be "number of units" and adds a "unit size"
> factor of bytes per unit (ie, is the unit 1 byte or 4K block, or VSAM
> cluster size of 32K, etc) I'd be more inclined to go for it.

For the moment, I don't see any reason to add that complication as a core part 
of Bacula especially since we have as of yet received no code contributions 
for such machines, and if I am not mistaken, that information is already 
available in the Lstat packet, so it is just a question of using it.

>
> So, while the original idea works if the base assumption is
> non-structured byte oriented filesystems, it's not easily expanded when
> more complex filesystem constructs are in play. Thus my objection.

OK.

Kern

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-devel mailing list
Bacula-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/bacula-devel


This mailing list archive is a service of Copilotco.