[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bacula-devel] storing file size in the catalog

I wanted to make a du(1) style report for Bacula, but was a bit
surprised to see that this information is not readily available in the
File table -- it's encoded as quasi-base64 in the LStat column.  I
modified base64.sql[1] to support Bacula's format, but it's running
too slow to be useful, ie. less than 10k filesizes extracted per
second on my relatively beefy database server.  For a largish fileset
this means a couple of minutes CPU time will be spent on that alone,
so it's impractical to do it without heavy caching -- and then we
might as well add the information while doing the backup.

Does anyone else thinks it's a good idea to extend the table?

A graphical disk usage browser would make it easier to visualise which
directories are big, or are growing the fastest -- or to spot files
which should be omitted, or even directories which are inadvertently

Here's the current definition:

| Field      | Type             | Null | Key | Default | Extra          |
| FileId     | int(10) unsigned | NO   | PRI | NULL    | auto_increment | 
| FileIndex  | int(10) unsigned | YES  |     | 0       |                | 
| JobId      | int(10) unsigned | NO   | MUL | NULL    |                | 
| PathId     | int(10) unsigned | NO   | MUL | NULL    |                | 
| FilenameId | int(10) unsigned | NO   | MUL | NULL    |                | 
| MarkId     | int(10) unsigned | YES  |     | 0       |                | 
| LStat      | tinyblob         | NO   |     | NULL    |                | 
| MD5        | tinyblob         | YES  |     | NULL    |                | 

As you can see, the MD5 sum is already stored there, and as a bonus
the combination of the file size and MD5 would make it possible to
implement incremental storage of files which grows by appending (logs,
mbox).  Just calculate the MD5 sum of the first N bytes, and if it
matches, don't store the start of the file (this needs a new record
type, too).  You don't even waste CPU time, since the calculation has
to be done anyway.  Append-only files may be too rare to be worth the
special case code, though.

Kjetil T. Homme
Linpro AS

This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
Bacula-devel mailing list

This mailing list archive is a service of Copilotco.