[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bacula-devel] Tokyo Cabinet DBM candidate for the accurate project


On Friday 28 March 2008 08:53:52 Eric Bollengier wrote:
> Hello,
>
> On Friday 28 March 2008 00:16:11 Kern Sibbald wrote:
> > I ran a small test of Tokyo Cabinet DBM compared to our htable routines.
> >
> > Inputting 5 million records to htable and then reading them all back
> > takes 7.7 seconds, and uses up 240MB .
> >
> > Inputting 5 million records to TCDBM (using the same records as above)
> > and reading them back takes 1 minute 33 seconds.
>
> In our case, 1 or 2 minutes in a 5 million files backup is not such a too
> big cost :) But during this time, the director have to lock the db
> connection... so many things can be frozen.

If that becomes a problem, we can simply write the data to a file, then reread 
the file and send it.  That will avoid waiting for the FD to do its indexing.  
The other alternative would be to spool the data in the FD.  For the moment, 
I would say to ignore this problem.

>
> Have you run this test with valid filename ?

No it was run with dummy data.

>
> > Using 1 million records, it runs in 8.8 seconds.
> >
> > So, it is a bit slower for at a million records, and quite a bit at 5
> > million, but that can probably be tuned.  In those tests, I did tune it
> > to use something like 40MB of memory.  In addition, it mallocs and frees
> > each record returned.  He has calls to allow the records to be returned
> > in our own buffers, so this would probably reduce the time a lot.
>
> Yes, we can also probably skip some bucket re-allocation if we know in
> advance how many files we have.

Yes, he has some configuration and tuning APIs -- I used them but did not 
optimize them.

I'll seen you my little test program offline (actually it is his example 
program that I modified to add the same data creation loop as in htable).

I think he has an API that allows you to search the system, but in any case, 
it would only take about a half hour to feed all the filenames from the 
system into either htable or tcdbm for doing some tests.  

I think we have a lot of tests to run before we decide to use this package, 
but the advantage is that if it is relatively fast, we can replace the in 
memory part of the tree code with the tcdbm.

I think the first thing to do is to use a different call to retrieve the data 
so that tcdbm doesn't malloc (and we free) all 5 million records.  Then it 
might be worth sending the program to the author and ask if he has any 
suggestions to make it run faster ...

Kern

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
_______________________________________________
Bacula-devel mailing list
Bacula-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/bacula-devel


This mailing list archive is a service of Copilotco.