[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bacula-devel] Selective restore when files are pruned [patch]


On Tuesday 19 August 2008 22:22:38 Martin Simmons wrote:
> >>>>> On Fri, 15 Aug 2008 23:32:17 +0200, Kern Sibbald said:
> >
> > On Friday 15 August 2008 23:12:51 Martin Simmons wrote:
> > > >>>>> On Fri, 15 Aug 2008 18:06:31 +0200, Kern Sibbald said:
> > > >
> > > > On Friday 15 August 2008 14:00:12 Kjetil Torgrim Homme wrote:
> > > > > I needed to restore a subset of some old backups.  Restoring the
> > > > > full backups would need a terabyte of temporary storage, which
> > > > > seemed a bit wasteful (and inconvenient to get hold of) since the
> > > > > data I was interested in took less than a gigabyte.
> > > > >
> > > > > Anyway -- I implemented a simple regex to filter the files to
> > > > > restore. It works like this:
> > > > >
> > > > >     Building directory tree for JobId(s) 28644 ...
> > > > >     There were no files inserted into the tree, so file selection
> > > > >     is not possible.Most likely your retention policy pruned the
> > > > > files
> > > > >
> > > > >     Do you want to restore all the files? (yes|no): no
> > > > >
> > > > >     Regexp matching files to restore? (empty to abort): ^/var/log
> > > > >
> > > > > The patch adds a new keyword to the bootstrap file, FilePattern,
> > > > > which the storage daemon will apply to all files before deciding
> > > > > whether to send the file over to the fd.  The fd doesn't need any
> > > > > changes, btw.
> > > > >
> > > > > This is just a quick hack, and there is some polishing left to do:
> > > >
> > > > The code looks pretty clean and in the spirit of the current code ...
> > > > :-)
> > >
> > > IMHO, the read_records function is the very worst place to decode the
> > > records looking for regexps, because it breaks the abstraction
> > > boundaries. Wouldn't it be much cleaner if only the Director and FD
> > > needed to know about the contents of the records?
> >
> > Yes, it would be cleaner, but I don't see how you can do it any other
> > way.  In any case, the bsr code already knows about Job Type, JobId, Job
> > names, and Job Levels, so it isn't too much of an extension to make it
> > know about File names -- though, in this case, the matching is done in
> > the read_record subroutine rather than match_bsr.
> >
> > > How about adding a markregex operation to the existing tree building
> > > phase in the Director (c.f. markdir)?  I know this was an old backup,
> > > so probably pruned from the catalog, but that is a different problem
> > > from the one of selecting files by regex.
> >
> > When the File records are pruned from the catalog, there is no way to get
> > to the filenames, so there is no tree to be built.  At the point this
> > code kicks in the filenames exist only on the Volume.  The only other way
> > to do this would be for the SD to send all the records to the FD and the
> > FD decides which ones to restore.  However, that is extremely inefficient
> > if a user wants one file from a 10GB backup that has been pruned.
>
> True, but the whole 10GB has to be read off backup medium anyway, so
> sending it to the FD might not be a problem.  If this kind of restore
> happens often, then you clearly have the wrong file retention periods :-)

Well, if you have an LTO3 and a 100Mb comm line to the FD, reading the tape 
and making the decision before sending the data to the FD will probably be 
100's of times faster than sending the full amount of the data (in the 
original case 1TB) to the FD.

>
> >                                                          If you have some
> > other way to add the feature, I would like to hear about it.
>
> I would probably use bextract for this, which already contains a partial FD
> supporting wildcards.  

The difference with bextract and the need for the patch in question is that 
bextract works by directly reading the Volume and directly writing the data 
to the filesystem on which it is running.  There is no comm line in between.

> I think the "partial" status of this FD (and bscan 
> and bls) is the main reason why I didn't like the idea of making yet
> another partial FD in the SD.

I don't understand what you mean by "partial status of this FD (and bscan and 
bls)"

I don't see this as having an FD in the SD, all it adds is some extra 
filtering via the .bsr file, where there is already quite a lot of filtering 
capability.

Regards,

Kern

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-devel mailing list
Bacula-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/bacula-devel


This mailing list archive is a service of Copilotco.