[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bacula-devel] storing file size in the catalog


Le Monday 25 August 2008 17:33:44 Kjetil Torgrim Homme, vous avez écrit :
> I wanted to make a du(1) style report for Bacula, but was a bit
> surprised to see that this information is not readily available in the
> File table -- it's encoded as quasi-base64 in the LStat column.  I
> modified base64.sql[1] to support Bacula's format, but it's running
> too slow to be useful, ie. less than 10k filesizes extracted per
> second on my relatively beefy database server.  For a largish fileset
> this means a couple of minutes CPU time will be spent on that alone,
> so it's impractical to do it without heavy caching -- and then we
> might as well add the information while doing the backup.

The bweb interface have an interface like "filelight" tool that display used 
space with circular graphs. http://www.methylblue.com/filelight/ (bfileview 
module from bweb isn't so beautiful :)

To have good performance, i've used a new table that store directory size and 
I have also a PL procedure that extract file size from the "base64" field 
(much faster than a piece of C that retrieves all rows). I can do something 
SELECT SUM(extract_lstat('size', LStat)), filename from File ....

> Does anyone else thinks it's a good idea to extend the table?

Extend the file table isn't always possible, in this case you will add 8 bytes 
per row, in my case 500,000,000*8 bytes (near from 4GB).  

We will modify the file table in the next major version (to increase the 
FileId size), and perhaps we will add a Size field, that need some tests and 

> A graphical disk usage browser would make it easier to visualise which
> directories are big, or are growing the fastest -- or to spot files
> which should be omitted, or even directories which are inadvertently
> missing.
> Here's the current definition:
> +------------+------------------+------+-----+---------+----------------+
> | Field      | Type             | Null | Key | Default | Extra          |
> +------------+------------------+------+-----+---------+----------------+
> | FileId     | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
> | FileIndex  | int(10) unsigned | YES  |     | 0       |                |
> | JobId      | int(10) unsigned | NO   | MUL | NULL    |                |
> | PathId     | int(10) unsigned | NO   | MUL | NULL    |                |
> | FilenameId | int(10) unsigned | NO   | MUL | NULL    |                |
> | MarkId     | int(10) unsigned | YES  |     | 0       |                |
> | LStat      | tinyblob         | NO   |     | NULL    |                |
> | MD5        | tinyblob         | YES  |     | NULL    |                |
> +------------+------------------+------+-----+---------+----------------+

I see that you are using mysql, you won't be able to use the bweb module :( 
(at this time mysql don't permit this kind of feature)

> As you can see, the MD5 sum is already stored there, and as a bonus
> the combination of the file size and MD5 would make it possible to
> implement incremental storage of files which grows by appending (logs,
> mbox).  Just calculate the MD5 sum of the first N bytes, and if it
> matches, don't store the start of the file (this needs a new record
> type, too).  You don't even waste CPU time, since the calculation has
> to be done anyway.  Append-only files may be too rare to be worth the
> special case code, though.

IHMO, it's a bit too specific, but quite interesting :)


This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
Bacula-devel mailing list

This mailing list archive is a service of Copilotco.