[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bacula-devel] Using the OS to get the list of files changed

With incremental and differential backups, the goal is a list of files 
that have changed since a given point in time.   Traditionally, this 
list is obtained by using find (AFAIK).  On filesystems with large 
numbers of files,the list creation can take considerable time.  How can 
we improve the performance of this stage?

For some time, I have wanted the filesystem (or OS) to give me a list of 
files changed since time X.  I've just heard Pawel Jakub Dawidek speak 
at NYCBSDCon about ZFS.  I asked him if he thought getting such a list 
was possible.  He suggested it may be.  Strategy: ZFS is able to return 
a file name given a particular block.  That is, if you know a particular 
block has changed, ZFS will tell you what file it belongs to.

When using ZFS snapshots, the system keeps a list of blocks that have 
changed since the last snapshot.  It may be feasible to traverse this 
list of block to obtain the list of files required by Bacula.

My thoughts:

1 - I'm not proposing to apply this strategy to Bacula internally. 
However, an external script would be very useful.  Bacula already 
provides a mechanism for a FileSet to be specified by an external 
script.  I'm thinking: combine the two approaches to provide faster 
incremental/differential backups.

2 - This strategy may be applicable to other filesystems that provide 
similar mechanisms (list of blocks changed, getting file names for those 
blocks).  It was suggested during the talk that auditing, etc may 
provide the needed hooks.

I hope this triggers some brain storms from people.

This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
Bacula-devel mailing list

This mailing list archive is a service of Copilotco.