[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bacula-devel] small dbcheck patch for mysql


Here are a few comments on this thread:

On Wednesday 10 September 2008 14:01:22 Kjetil Torgrim Homme wrote:
> Bill Moran <wmoran@xxxxxxxxxxxxxxxxxxxxxxx> writes:
> > My understanding of what happens is this:
> > 1) Job runs and creates a filename entry for a filename not used before
> >    (let's say something really unique, such as would be created by
> >    mktemp)
> > 2) Time goes by and that file is deleted, never to be seen again.
> >    Eventually, Bacula gets to a point where all the jobs that backed
> >    up that filename have been pruned, thus there are no longer any
> >    file entries referencing that row.
> > 3) Now we have an orphaned filname row in that table.
> indeed.  ideally Bacula would check for this, but it means pruning
> will be more database intensive (today's DELETE FROM File WHERE JobId
> = N won't do).

We are not planning to check for orphaned file and path records during a prune 
it is simply too expensive, prunning can be triggered when a new tape is 
needed and that is no time to have the DB cranking away for 3 hours or even 
20 minutes if it is optimized.

> BTW, dbcheck is slightly unsafe since it runs without transactions
> today: a filename entry can be orphaned during the SELECT, but reused
> by a backup job inserting data before the DELETE.  this means you
> should not run dbcheck while backups are running (or at likely to
> insert attributes).

Yes, that is correct.  We probably should use some sort of table lock, but for 
the moment, you just need to be careful.

> > The same thing can happen with path.
> as a datapoint, this is today's new orphans:
> Found 311 orphaned Path records. (out of 6,363,370 rows)
> Found 28311 orphaned Filename records. (out of 28,833,369 rows)
> > The canonical way to solve this would be to keep a reference counter in
> > the filename/path tables that keeps track of how many file entries
> > reference that row.  When it's hits 0, it can be deleted.  But this
> > creates other issues:
> > 1) What is the overhead of maintaining the reference counter?
> it's more I/O of course, since you need to do UPDATE on two tables in
> addition to the INSERT into File.  storage for the column itself is
> rather modest compared to the space used for the text string holding
> the path itself.  it can be done with just one SQL statement,
> something like:
>   UPDATE Filename SET RefCount = RefCount + 1
>     WHERE FileId IN (SELECT FileId FROM File WHERE JobId = N);
> new rows need to be added with RefCount = 0 if this UPDATE is to
> be done at the end.  in a way, this means Bacula gets its own
> "transaction" support -- a backup which crashes halfway through will
> be cleaned away.

We won't be adding reference counts to the database -- it is hard enough to 
make them work correctly in a direct memory situation and there are so many 
ways they could fail that such an implementation could seriously damage the 
reliablity of our database which has been perfect to this point.

> > 2) In the case of a crash, we _must_ fix all reference counters
> >    immediately, otherwise records could be deleted that still need
> >    to be used.
> well, depends on how you want Bacula to behave.  I'm not convinced
> it's useful to support "partial" backups.

In the future partial backups will probably become *very* important in order 
to restart long running jobs that fail.

> > Referential integrity doesn't even gain us much here, as that only
> > guarantees that records can't be created that don't have proper
> > references, it doesn't automatically clean up after deletes.
> indeed, the FK reference would be in the other direction.
> > Transactions, if properly implemented, can guarantee that the
> > reference counters will _always_ be correct, even in the event of a
> > crash, but what does that mean for platforms that don't have
> > transaction support or have it turned off?
> people who are worried about data integrity will use a system
> supporting transactions.

Perhaps, but with the possible exception of dbcheck, Bacula does not have a 
problem with data integrity nor with referential integrity.  In 2000 when 
this project started, not all databases we supported had transactions, and if 
I am not mistaken this is still the case (some older MySQLs still in use do 
not have transactions).  We just don't have the resources to maintain 
different code for all the databases supported (a lot more of them with the 
DBI interface in 3.0.0), so we program to a reasonable but limited set of 
functionality trying to balance performance (in Bacula) versus 

By the way, we do use transactions in some cases and over the years their use 
will surely increase.  Much of it is related to priorities and programming 
manpower.  The unfortunate part of Open Source projects like Bacula with a 
lack of programers, is if someone does not step forward, and it is not 
critical, it must wait.  Pretty much this is the case with dbcheck.  I think 
only one time did I make a quick pass through it improving the performance.  
This dbcheck pass will certainly clean up about 90% or more of the serious 
performance problems (well, perhaps it does not address potential SQLite 



This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
Bacula-devel mailing list

This mailing list archive is a service of Copilot Consulting.