[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bacula-devel] Alternative DB structures-Porposal


On Wednesday 24 September 2008 22:56:17 John Huttley wrote:
> Ok I've got a better handle on this.
> First, a side issue that anyone dealing with PG needs to know about.
>
> PG uses MVCC. This allows unlimited transactions in flight and is
> lock-free. So it has great concurrency.
>
> But there is a draw back. Edit a row, and we get another version of that
> row. Any indexes on that table have to updated, even if the field you are
> updateing is not indexed (!!!!!!!!!)
>
> PG 8.3 has an optimisation (Heap Only Tuples) but that only helps when the
> number of updates  << number of rows.
>
> Fortunately bacula doesn't do that, but you might, while developing.
>
> I proposed two additional tables for general attributes.
> However when I think of use cases the only fields of interest
> are size, ctime and mtime and we can add thes to table  'file'.
>
> ALTER TABLE file  add column size bigint default 0;
> ALTER TABLE file  add column ctime timestamp without time zone;
> ALTER TABLE file  add column mtime timestamp without time zone;
>
> There is no need to remove lstat so compatibility is assured.
>
> Speed impact on database inserts is too small to notice.

Can you show the timing difference some real tests on say a 10GB database with 
an insert of 10 Million file records in a Job.

> Speed impact on retrieving the fields from the fd, trivial, since the files
> are already stat'd.

The fields are already retreived from the FD, so the only difference would be 
in extracting them from the LStat packet before the insert.  The time for 
this should be included in the above time comparison.

> Space impact is minor.

What would the *exact* change in database size be for a 100GB database 
containing 600,000,000 file records?  (this is a real database, not one that 
I imagined).

>
> Benefits. Its now possible to estimate a restore and possible to find
> versions of files based on mtime.
> Adhoc queries are now more useful.

The above mentioned things are already possible, so can you be more specific 
about the benefits, in particular to the current use of Bacula as it stands.  
Lots of external programs already access these fields, so aside from making 
it just slightly easier for them, I am not able to identify any real 
benefits.  Can you fill me in here?

>
> Proposal.
> That this DB change be done with the next major release of bacula.

This is possible, but I would like to see the results of some real tests, 
noted above, of what these changes would do. 

> Code updates to use it can then be done at leisure without user-visible
> changes.

Agreed,

Another way of doing this would be for those who need these features to modify 
the Bacula tables (or create new tables), and to run a nightly job that 
inserts the new data.  If I am not mistaken, this is how brestore works its 
magic -- though, I believe they have not modified any tables, but rather 
added information in new tables they have created.

Kern

>
> --John
>
> On Mon, 22 Sep 2008 08:43:48 John Huttley wrote:
> > I've been running some tests and I have some postgresql performance
> > problems.
> > On the file table with 2.7M records in it, updating a non-indexed text
> > field is taking 570sec with a huge IO load. I can do a complete db load
> > in only 94sec!
> >
> >
> > It may take me a while to figure this out.  I'll report back when i have
> > some believable figures.
> >
> >
> > --john
> >
> > John Huttley wrote:
> > > Kern, I think you are on the wrong foot here. You mention 100G as if
> > > it has some significance. It doesn't.
> > > You can be sure that if the DB is size N then they will have 5xN
> > > storage available. The % change and speed impact is likely
> > > to be the same for any N where N > the system ram.
> > >
> > >
> > >
> > > We don't actually know what the space change will be or the speed
> > > impact on inserts.
> > >
> > > I'll run some tests tonight and see if I can get some numbers.
> > >
> > > --John
> > >
> > > Kern Sibbald wrote:
> > >> On Saturday 20 September 2008 13:40:29 Yuri Timofeev wrote:
> > >>
> > >> <Snip>
> > >>
> > >>> But for me it is obvious that we should remove LStat and make
> > >>> multiple fields (FileSize, atime, mtime, etc) instead Lstat.
> > >>
> > >> That might be nice for people who want to access the database
> > >> directly, but for someone who has a 100GB Bacula database, it would be
> > >> rather catastrophic.
> > >>
> > >> Concerning the possibility of normalizing some of the subfields of the
> > >> LStat record -- that is a possibility, but someone other than myself
> > >> would need to do some careful testing on the size (should shrink the
> > >> DB) and the performance of the DB particularly on multi-GB database.
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's
> challenge Build the coolest Linux based applications with Moblin SDK & win
> great prizes Grand prize is a trip for two to an Open Source event anywhere
> in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Bacula-devel mailing list
> Bacula-devel@xxxxxxxxxxxxxxxxxxxxx
> https://lists.sourceforge.net/lists/listinfo/bacula-devel



-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-devel mailing list
Bacula-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/bacula-devel


This mailing list archive is a service of Copilotco.