[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bacula-devel] small dbcheck patch for mysql

Bill Moran <wmoran@xxxxxxxxxxxxxxxxxxxxxxx> writes:
> My understanding of what happens is this:
> 1) Job runs and creates a filename entry for a filename not used before
>    (let's say something really unique, such as would be created by
>    mktemp)
> 2) Time goes by and that file is deleted, never to be seen again.
>    Eventually, Bacula gets to a point where all the jobs that backed
>    up that filename have been pruned, thus there are no longer any
>    file entries referencing that row.
> 3) Now we have an orphaned filname row in that table.

indeed.  ideally Bacula would check for this, but it means pruning
will be more database intensive (today's DELETE FROM File WHERE JobId
= N won't do).

BTW, dbcheck is slightly unsafe since it runs without transactions
today: a filename entry can be orphaned during the SELECT, but reused
by a backup job inserting data before the DELETE.  this means you
should not run dbcheck while backups are running (or at likely to
insert attributes).

> The same thing can happen with path.

as a datapoint, this is today's new orphans:

Found 311 orphaned Path records. (out of 6,363,370 rows)
Found 28311 orphaned Filename records. (out of 28,833,369 rows)

> The canonical way to solve this would be to keep a reference counter in
> the filename/path tables that keeps track of how many file entries
> reference that row.  When it's hits 0, it can be deleted.  But this
> creates other issues:
> 1) What is the overhead of maintaining the reference counter?

it's more I/O of course, since you need to do UPDATE on two tables in
addition to the INSERT into File.  storage for the column itself is
rather modest compared to the space used for the text string holding
the path itself.  it can be done with just one SQL statement,
something like:

  UPDATE Filename SET RefCount = RefCount + 1
    WHERE FileId IN (SELECT FileId FROM File WHERE JobId = N);

new rows need to be added with RefCount = 0 if this UPDATE is to
be done at the end.  in a way, this means Bacula gets its own
"transaction" support -- a backup which crashes halfway through will
be cleaned away.

> 2) In the case of a crash, we _must_ fix all reference counters
>    immediately, otherwise records could be deleted that still need
>    to be used.

well, depends on how you want Bacula to behave.  I'm not convinced
it's useful to support "partial" backups.

> Referential integrity doesn't even gain us much here, as that only
> guarantees that records can't be created that don't have proper
> references, it doesn't automatically clean up after deletes.

indeed, the FK reference would be in the other direction.

> Transactions, if properly implemented, can guarantee that the
> reference counters will _always_ be correct, even in the event of a
> crash, but what does that mean for platforms that don't have
> transaction support or have it turned off?

people who are worried about data integrity will use a system
supporting transactions.
Kjetil T. Homme
Linpro AS

This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
Bacula-devel mailing list

This mailing list archive is a service of Copilot Consulting.