[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bacula-devel] Patch: Migration jobmedia table insert incomplete
Hello Kern!
<kern@xxxxxxxxxxx> aka Kern Sibbald schrieb
mit Datum Tue, 26 Feb 2008 21:57:10 +0100 in m2n.bacula.devel:
|Yes, there are some problems with migration, but you don't explicitly mention
|what deficiencies you expect to run into.
Ups... should I?
To be honest, I am a little bit worried, because I do not at all want
to frustrate You.
And I am in a dilemma: on one hand Bacula is something like a
dream-come-true: I often thought of fetching a copy of IBM's TSM for
my installation (I think I could get an employees evaluation copy),
but 1.) it is just too bulky for a couple of home-computers, and 2.)
it doesnt run on FreeBSD. So now I have what I wished to have...
On the other hand, when trying to get the most out of Bacula, I find
many of these small things that might still need a little fix-up.
Now, if I do report all of these, then I am the guy who is always
criticising. :-/
But ok, now just four things that come to my mind:
1. Everytime when a job is migrated, the "Run=" directives in
the job ressource are executed again. This is almost never what
one wants to happen, and in fact tends to disrupt backup cycles
severely.
2. This is the thing that I have been worrying the most about. I
have been following various theories about what might happen
there, yet to no avail. The last of my theories was that it might
have to do with the migrations, but currently I tend to dismiss
this theory also. In fact, I am still clueless.
What happens is that the Director puts all jobs (and all newly
started jobs) into either "waiting on max Storage jobs" or
"waiting execution", while there is no job running on any client
and no job running on the SD. It just does nothing and has to
be restarted.
What I have learned from reading bacula-users, is that most
people do not run such quantities of jobs as I do. So maybe this
is the reason.
3. When running a migration that will move multiple jobs, there is
a kind of "envelope" job: the "g" job that is started first will
start all the other "g" jobs that are needed. After this, this
"envelope" job itself will also do one of the migrations. But
occasionally this job just disappears silently and it's activity
is not to be found in the logfile.
On one occasion it gave me a sig-11, which might give some hint
at what is going on there. From the logfile:
25-Feb 08:56 BxDir JobId 9595: The following 163 JobIds were chosen to
be migrated: 7705,7714,7723,7732,7741,7750,7759,...
25-Feb 08:56 BxDir JobId 9595: Job queued. JobId=9596
25-Feb 08:56 BxDir JobId 9595: Migration JobId 9596 started.
25-Feb 08:56 BxDir JobId 9595: Job queued. JobId=9597
25-Feb 08:56 BxDir JobId 9595: Migration JobId 9597 started.
..
The interesting thing here is that this output is not retained
until job 9595 would finish, instead it is dropped to the logfile
immediately at start of the job. And it ends in the middle of a
line:
25-Feb 08:57 BxDir JobId 9595: Migration JobId 9742 started.
25-Feb 08:57 BxDir JobId 9595: Job queued. JobId=9743
25-Feb 08:57 BxDir Jo25-Feb 08:57 BxDir JobId 9773: The following 163
JobIds were chosen to be migrated: 7706,7715,7724,7733,...
25-Feb 08:57 BxDir JobId 9773: Job queued. JobId=9774
25-Feb 08:57 BxDir JobId 9773: Migration JobId 9774 started.
25-Feb 08:57 BxDir JobId 9773: Job queued. JobId=9775
25-Feb 08:57 BxDir JobId 9773: Migration JobId 9775 started.
..
The remaining part of the log of job 9595 follows a couple
of hours later:
25-Feb 10:52 BxDir: Fatal Error because: Bacula interrupted by signal
11: Segmentation violation
bId 9595: Migration JobId 9743 started.
25-Feb 08:57 BxDir JobId 9595: Job queued. JobId=9744
25-Feb 08:57 BxDir JobId 9595: Migration JobId 9744 started.
25-Feb 08:57 BxDir JobId 9595: Job queued. JobId=9745
25-Feb 08:57 BxDir JobId 9595: Migration JobId 9745 started.
..
At that point I decided that there is some problem, but that it is
not all too easy to find and fix. So I decided that for now to
postpone the issue (indefinitely), and instead redesign my schedules
so that they would create a lesser amount of jobs. (I was saving
database redo-logs via a Bacula schedule, which means to check
every quarter of an hour if there are any to save - which every time
does create an empty job that will qualify for later migration - and
that will nicely disappear during that migration. Now I have allowed
the database to call bconsole on demand only after it has batched
up a couple of logs.)
4. When migrating from disk to tape, there should be no need to do
SD data spooling - as the data is already packed up, it will flow
quickly to the tape, and data spooling would only slow down the
process.
But in that case it is likely possible that multiple jobs write
simultanously to the tape. When later restoring such jobs, each
job must be restored by a separate restore command, which can
make the process very slow. If not, that is, if multiple jobs that
have intermingled on tape are restored by one and the same restore
command, then the names of the restored files will all be correct,
but the sizes may be wrong and the contents may be garbage.
So, this is more or less the background which led me to my statement
that pervasive use of migration would currently show some
deficiencies... I hope You understand...
best regards,
PMc
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bacula-devel mailing list
Bacula-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/bacula-devel
This mailing list archive is a service of Copilotco.