[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bacula-devel] Accurate file project hash tables

>>>>> On Tue, 25 Mar 2008 17:55:50 +0100, Kern Sibbald said:
> Hello Eric,
> Background (somewhat simplified):
> The Accurate delete/restore project (item #1) is now implemented (thanks to 
> Eric).  For it to work, the client machine must maintain a temporary table 
> supplied by the Director for each job. This table contains all the files that 
> are in the current backup from the Director's point of view at the beginning 
> of the job.  The table allows the FD to determine what files have been added 
> and/or deleted to/from the system that are not yet saved (or deleted).
> As currently implemented this table is a hash table using the hash class that 
> I wrote 3 or 4 years ago for this particular project. It is fast and 
> efficient.
> Problem:
> Currently the hash table is entirely kept in memory, which means the client 
> uses a rather huge amount of memory, which for systems with 10 million files 
> will probably exceed even the largest amounts of RAM currently used.  We 
> would like to have some solution that allows the number of files to grow to 
> 20 million or more and still use less than 1GB of RAM.
> Recently (the last few days), I implemented what I call BIG_MALLOC for the 
> htable class that allocates memory in 1MB chunks.  This makes the code run 
> several times faster with datasets of the order of 5 million by reducing the 
> number of system malloc requests from 5-6 million to less than 500.  However, 
> it still does not reduce the memory foot print.
> Suggestions for reducing the memory requirements:
> ... Any other ideas?

How about not sending all of the info to the FD at once :-)

I'm thinking of something like "merge" on ordered lists.  If the Director
could generate the file list breadth-first and in alphabetical order (like ls
-R), and the FD could traverse the file tree in the same order, then it would
be easy to spot the missing files as they occur.  This would allow the
Director to send the list to the FD in chunks during the backup, rather than
sending it all at the start of the job.


Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
Bacula-devel mailing list

This mailing list archive is a service of Copilotco.