[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bacula-devel] Patch: Migration jobmedia table insert incomplete


Hello Kern!

<kern@xxxxxxxxxxx> aka Kern Sibbald  schrieb
mit Datum Tue, 26 Feb 2008 21:57:10 +0100 in m2n.bacula.devel:

|Yes, there are some problems with migration, but you don't explicitly mention 
|what deficiencies you expect to run into.

Ups... should I? 
To be honest, I am a little bit worried, because I do not at all want
to frustrate You.
And I am in a dilemma: on one hand Bacula is something like a
dream-come-true: I often thought of fetching a copy of IBM's TSM for
my installation (I think I could get an employees evaluation copy),
but 1.) it is just too bulky for a couple of home-computers, and 2.)
it doesnt run on FreeBSD. So now I have what I wished to have...
On the other hand, when trying to get the most out of Bacula, I find
many of these small things that might still need a little fix-up. 
Now, if I do report all of these, then I am the guy who is always
criticising. :-/

But ok, now just four things that come to my mind:

1. Everytime when a job is migrated, the "Run=" directives in
   the job ressource are executed again. This is almost never what
   one wants to happen, and in fact tends to disrupt backup cycles
   severely.

2. This is the thing that I have been worrying the most about. I
   have been following various theories about what might happen
   there, yet to no avail. The last of my theories was that it might
   have to do with the migrations, but currently I tend to dismiss
   this theory also. In fact, I am still clueless.
   What happens is that the Director puts all jobs (and all newly
   started jobs) into either "waiting on max Storage jobs" or
   "waiting execution", while there is no job running on any client
   and no job running on the SD. It just does nothing and has to
   be restarted.
   What I have learned from reading bacula-users, is that most 
   people do not run such quantities of jobs as I do. So maybe this
   is the reason.

3. When running a migration that will move multiple jobs, there is
   a kind of "envelope" job: the "g" job that is started first will
   start all the other "g" jobs that are needed. After this, this
   "envelope" job itself will also do one of the migrations. But
   occasionally this job just disappears silently and it's activity
   is not to be found in the logfile.
   On one occasion it gave me a sig-11, which might give some hint
   at what is going on there. From the logfile:

25-Feb 08:56 BxDir JobId 9595: The following 163 JobIds were chosen to 
	be migrated: 7705,7714,7723,7732,7741,7750,7759,...
25-Feb 08:56 BxDir JobId 9595: Job queued. JobId=9596
25-Feb 08:56 BxDir JobId 9595: Migration JobId 9596 started.
25-Feb 08:56 BxDir JobId 9595: Job queued. JobId=9597
25-Feb 08:56 BxDir JobId 9595: Migration JobId 9597 started.
..

   The interesting thing here is that this output is not retained
   until job 9595 would finish, instead it is dropped to the logfile
   immediately at start of the job. And it ends in the middle of a
   line:

25-Feb 08:57 BxDir JobId 9595: Migration JobId 9742 started.
25-Feb 08:57 BxDir JobId 9595: Job queued. JobId=9743
25-Feb 08:57 BxDir Jo25-Feb 08:57 BxDir JobId 9773: The following 163
	JobIds were chosen to be migrated: 7706,7715,7724,7733,...
25-Feb 08:57 BxDir JobId 9773: Job queued. JobId=9774
25-Feb 08:57 BxDir JobId 9773: Migration JobId 9774 started.
25-Feb 08:57 BxDir JobId 9773: Job queued. JobId=9775
25-Feb 08:57 BxDir JobId 9773: Migration JobId 9775 started.
..

   The remaining part of the log of job 9595 follows a couple
   of hours later:

25-Feb 10:52 BxDir: Fatal Error because: Bacula interrupted by signal
	11: Segmentation violation
bId 9595: Migration JobId 9743 started.
25-Feb 08:57 BxDir JobId 9595: Job queued. JobId=9744
25-Feb 08:57 BxDir JobId 9595: Migration JobId 9744 started.
25-Feb 08:57 BxDir JobId 9595: Job queued. JobId=9745
25-Feb 08:57 BxDir JobId 9595: Migration JobId 9745 started.
..

   At that point I decided that there is some problem, but that it is
   not all too easy to find and fix. So I decided that for now to 
   postpone the issue (indefinitely), and instead redesign my schedules
   so that they would create a lesser amount of jobs. (I was saving
   database redo-logs via a Bacula schedule, which means to check
   every quarter of an hour if there are any to save - which every time 
   does create an empty job that will qualify for later migration - and
   that will nicely disappear during that migration. Now I have allowed
   the database to call bconsole on demand only after it has batched 
   up a couple of logs.)

4. When migrating from disk to tape, there should be no need to do
   SD data spooling - as the data is already packed up, it will flow
   quickly to the tape, and data spooling would only slow down the
   process.
   But in that case it is likely possible that multiple jobs write
   simultanously to the tape. When later restoring such jobs, each 
   job must be restored by a separate restore command, which can
   make the process very slow. If not, that is, if multiple jobs that
   have intermingled on tape are restored by one and the same restore
   command, then the names of the restored files will all be correct,
   but the sizes may be wrong and the contents may be garbage.


So, this is more or less the background which led me to my statement
that pervasive use of migration would currently show some
deficiencies... I hope You understand...

best regards,
PMc

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bacula-devel mailing list
Bacula-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/bacula-devel


This mailing list archive is a service of Copilotco.