[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bacula-devel] Alternative DB structures-Porposal

Ok I've got a better handle on this.
First, a side issue that anyone dealing with PG needs to know about.

PG uses MVCC. This allows unlimited transactions in flight and is lock-free.
So it has great concurrency.

But there is a draw back. Edit a row, and we get another version of that row.
Any indexes on that table have to updated, even if the field you are updateing 
is not indexed (!!!!!!!!!)

PG 8.3 has an optimisation (Heap Only Tuples) but that only helps when the 
number of updates  << number of rows.

Fortunately bacula doesn't do that, but you might, while developing.

I proposed two additional tables for general attributes.
However when I think of use cases the only fields of interest
are size, ctime and mtime and we can add thes to table  'file'.

ALTER TABLE file  add column size bigint default 0;
ALTER TABLE file  add column ctime timestamp without time zone;
ALTER TABLE file  add column mtime timestamp without time zone;

There is no need to remove lstat so compatibility is assured.

Speed impact on database inserts is too small to notice.
Speed impact on retrieving the fields from the fd, trivial, since the files 
are already stat'd.
Space impact is minor.

Benefits. Its now possible to estimate a restore and possible to find versions 
of files based on mtime.
Adhoc queries are now more useful.

That this DB change be done with the next major release of bacula.
Code updates to use it can then be done at leisure without user-visible 


On Mon, 22 Sep 2008 08:43:48 John Huttley wrote:
> I've been running some tests and I have some postgresql performance
> problems.
> On the file table with 2.7M records in it, updating a non-indexed text
> field is taking 570sec with a huge IO load. I can do a complete db load in
> only 94sec!
> It may take me a while to figure this out.  I'll report back when i have
> some believable figures.
> --john
> John Huttley wrote:
> > Kern, I think you are on the wrong foot here. You mention 100G as if
> > it has some significance. It doesn't.
> > You can be sure that if the DB is size N then they will have 5xN
> > storage available. The % change and speed impact is likely
> > to be the same for any N where N > the system ram.
> >
> >
> >
> > We don't actually know what the space change will be or the speed
> > impact on inserts.
> >
> > I'll run some tests tonight and see if I can get some numbers.
> >
> > --John
> >
> > Kern Sibbald wrote:
> >> On Saturday 20 September 2008 13:40:29 Yuri Timofeev wrote:
> >>
> >> <Snip>
> >>
> >>> But for me it is obvious that we should remove LStat and make multiple
> >>> fields (FileSize, atime, mtime, etc) instead Lstat.
> >>
> >> That might be nice for people who want to access the database directly,
> >> but for someone who has a 100GB Bacula database, it would be rather
> >> catastrophic.
> >>
> >> Concerning the possibility of normalizing some of the subfields of the
> >> LStat record -- that is a possibility, but someone other than myself
> >> would need to do some careful testing on the size (should shrink the DB)
> >> and the performance of the DB particularly on multi-GB database.

This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
Bacula-devel mailing list

This mailing list archive is a service of Copilot Consulting.