[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bacula-devel] small dbcheck patch for mysql


In response to Martin Simmons <martin@xxxxxxxxxxxxx>:

> >>>>> On Tue, 9 Sep 2008 11:40:02 -0400, Bill Moran said:
> > 
> > In response to Bob Hetzel <beh@xxxxxxxx>:
> > 
> > > Sorry to jump in late here... but if both of
> > > a) the dbceck takes a long time w/o the index
> > > b) the index adds quickly, then the dbcheck runs quickly
> > > 
> > > I would have to agree with Kern's earlier suggestion that perhaps 
> > > somebody should look at the SQL code in dbcheck to optimize it.  This 
> > > also has the added benefit of not adding more db specific code, 
> > > especially given the course the mysql Inc.
> > > 
> > > Aren't there any SQL gurus interested in diving into it?
> > 
> > After spending about 5 minutes looking at the code, I can tell you that
> > (from an SQL standpoint) dbcheck is enormously inefficient, and will
> > not get much better without a huge rewrite.  Additionally, doing that
> > rewrite so that it results in improvements to all SQL platforms is
> > probably going to require that the code be broken into SQL-dependent
> > and SQL-independent sections.
> > 
> > For example, the process to delete orphaned path records, first does
> > a select to get a list of IDs, then goes through a loop to delete each
> > record individually.  If you wanted to speed this up in PostgreSQL,
> > you could do the following:
> > 1) Wrap all the DELETE statements in a transaction
> > 2) Make 1 big DELETE statement: DELETE FROM path where id IN ([list
> >    id #s here])
> > 3) Make less delete statements (much like #2, but instead of doing
> >    all IDs in the IN, do, say 1000 at a clip)
> > 4) A combination of 1, 2, and 3.
> > 5) Combine the original select with the delete in a subselect.
> > 
> > Any one of these is going to improve performance.  My guess would be
> > that #5 will be the fastest on PG, but I can't be sure until I actually
> > test it.
> 
> You are right in theory, but I there is no current evidence that the DELETEs
> are significantly slow.

Not really.  I have done lots of experiments, and can say that it's
_always_ faster to do bulk actions like this inside a transaction, in
PostgreSQL.  It's just a fact of how PostgreSQL is designed.

... and running dbcheck on one of my systems now, it's taken something
around 30 minutes just to do the delete portion of 336046 orphaned
records.

> For a start, Path.PathId is a primary key and the
> suggested new index on File.PathId must have sped up the SELECT, not the
> DELETEs.

While you bring up a valid point about the select possibly being a
larger problem than the delete, it doesn't change my overall premise:
that tweaking the SQL to make it faster on one platform might make it
slower on another.

> Secondly, unless the Path table is very much out of date relative to
> the File table, you would not expect to find a huge number of orphaned path
> records.

Define "huge"?  is 336046 huge?  I don't consider it to be all that
large when you consider a path table of almost 1 million, a file
table of over 43 million and a filename table of over 11 million.

-- 
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@xxxxxxxxxxxxxxxxxxxxxxx
Phone: 412-422-3463x4023

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-devel mailing list
Bacula-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/bacula-devel


This mailing list archive is a service of Copilot Consulting.