[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bacula-devel] Slow CopyDiskToTape, was: MaximumBlockSize Problem and Question


> On Fri, 2008-11-07 at 17:17 +0100, Kern Sibbald wrote:
>> On Friday 07 November 2008 16:38:51 Ulrich Leodolter wrote:
>> > On Fri, 2008-11-07 at 14:47 +0100, Eric Bollengier wrote:
>> > > Le Friday 07 November 2008 14:36:46 Kern Sibbald, vous avez écrit :
>> > > > On Friday 07 November 2008 13:19:45 Ralf Gross wrote:
>> > > > > Kern Sibbald schrieb:
>> > > > > > On Thursday 06 November 2008 22:47:41 Ralf Gross wrote:
>> > > > > > > Alex Chekholko schrieb:
>> > > > > > > > On Wed, 5 Nov 2008 16:12:51 +0100
>> > > > > > > >
>> > > > > > > > Kern Sibbald <kern@xxxxxxxxxxx> wrote:
>> > > > > > > > > For writing to tape (providing it is LTO-n) I strongly
>> > > > > > > > > recommend a block size not to exceed 256K.
>> > > > > > > >
>> > > > > > > > Hi Kern,
>> > > > > > > >
>> > > > > > > > Why do you say that?  Is this thread relevant?:
>> > > > > > > > http://www.mail-archive.com/bacula-devel@xxxxxxxxxxxxxxxxxxxxx/
>> > > > > > > >msg0 12 46.h tml
>> > > > > > > >
>> > > > > > > > Also, I would like to corroborate the OP's experiences; I
>> had
>> > > > > > > > an almost identical thread about small block size and slow
>> > > > > > > > write speed:
>> > > > > > > > http://www.nabble.com/LTO-4-performance--td17407840.html
>> > > > > > > >
>> > > > > > > > In fact, I was unable to get higher block sizes working at
>> all
>> > > > > > > > with btape:
>> > > > > > > > http://www.adsm.org/lists/html/Bacula-users/2008-05/msg00504.ht
>> > > > > > > >ml
>> > > > > > > >
>> > > > > > > > So I am still stuck at ~22MB/s writing to LTO-4 with the
>> > > > > > > > default block size.
>> > > > > > >
>> > > > > > > I don't think that the blocksize is the problem. I did some
>> tests
>> > > > > > > but couldn't get higher results with larger blocksizes. I
>> get
>> > > > > > > 75-85 MB/s with the default bs and no additional tuning.
>> > > > > >
>> > > > > > That is probably correct, but most likely only because you
>> have a
>> > > > > > bottleneck elsewhere -- probably in one of the points I
>> mentioned.
>> > > > > > The speed is always capped by the slowest component. Once you
>> > > > > > remove the other bottlenecks on your system, the blocksize
>> will
>> > > > > > very likely become the bottleneck and then you can measure the
>> > > > > > difference.
>> > > > >
>> > > > > I didn't want to compain, just show the org. poster that his 22
>> MB/s
>> > > > > are likely not a bs issue.
>> > > > >
>> > > > > That being said, I started a thread on the user list a while ago
>> > > > > where I aked what throughput people are getting when writing to
>> tape.
>> > > > > Nobody involved in this thread got higher numbers than 80-85
>> MB/s for
>> > > > > a single job.
>> > > >
>> > > > That is probably reasonable for one job, but if you are writing to
>> an
>> > > > LTO-2,3, or 4, we know that with multiple simultaneous jobs it is
>> > > > possible to get write speeds of 150 MB/sec.
>> > > >
>> > > > Kern
>> > >
>> > > This is with a good hardware compression rate, but it's a very good
>> test,
>> > > IMHO, more than using random data to get only 80MB/sec. If you able
>> to
>> > > write at 150MB/s, i'm pretty sure that you will be able to write at
>> > > 80MB/s... Even if your source file is made with a dd if=/dev/zero,
>> your
>> > > harddisk, your network, your SCSI/SAS or whatever controler have to
>> > > handle it like real data.
>> > >
>> > > Bye
>> >
>> > Hi
>> >
>> > I think we are talking about different things.
>> >
>> > I have only a performance problem for one special job:
>> >
>> > Copy Disk to Tape
>> >
>> > My Disk Backup jobs run concurrent and spread over size
>> > limited Disk Volumes.
>> >
>> > Bacula runs the Copy Disk to Tape jobs in sequential order
>> > ( Selection Type = PoolUncopiedJobs )
>> >
>> >
>> > foreach job in PoolUncopiedJobs
>> > {
>> >    Read Job Data from Disk Volumes
>> >    Optionally Spool Data
>> >    Write Job Data to Tape
>> > }
>> >
>> >
>> > Better Performance inside a single storage daemon is
>> > only possibly if Read from Disk and (Spooling) Write to Tape
>> > run in buffered and parallel mode.
>> > (multithreading,  afaik not used for this situation)
>>
>> Yes, the above is more or less correct.  That is the way it works today,
>> some
>> day we will have mutiple buffering ...
>>
>> >
>> >
>> > But there is an other issue:
>> >
>> > Bacula reads much more (up to 10 times) Data from Disk than
>> > it writes to Tape (verified that by sysstat tools)
>> >
>> > It looks like bacula first does an lseek() to position
>> > and then always reads to the end of the disk volume.
>> > I see "End of Volume" messages for each disk volume which is used.
>>
>> Yes, it is possible that it is reading to the end of the volume. The
>> original
>> code did that.  I added code a long time ago that keeps track of when
>> the bsr
>> is consumed and then stops.  However, perhaps it doesn't work or doesn't
>> work
>> in some situations.  With a few restores, extra reading is not very
>> noticable
>> or important, but for Copy or Migration jobs it could be a real
>> performance
>> problem.
>>
>> >
>> > 07-Nov 11:10 troll-sd JobId 5572: Forward spacing Volume "Backup-0358"
>> to
>> > file:block 0:1963856407. 07-Nov 11:11 troll-sd JobId 5572: End of
>> Volume at
>> > file 0 on device "FileStorage" (/disk0/bacula/files), Volume
>> "Backup-0358"
>> > 07-Nov 11:11 troll-sd JobId 5572: Ready to read from volume
>> "Backup-0359"
>> > on device "FileStorage" (/disk0/bacula/files). 07-Nov 11:11 troll-sd
>> JobId
>> > 5572: Forward spacing Volume "Backup-0359" to file:block 0:387281.
>> 07-Nov
>> > 11:12 troll-sd JobId 5572: End of Volume at file 0 on device
>> "FileStorage"
>> > (/disk0/bacula/files), Volume "Backup-0359" 07-Nov 11:12 troll-sd
>> JobId
>> > 5572: Ready to read from volume "Backup-0360" on device "FileStorage"
>> > (/disk0/bacula/files). 07-Nov 11:12 troll-sd JobId 5572: Forward
>> spacing
>> > Volume "Backup-0360" to file:block 0:580817. 07-Nov 11:13 troll-sd
>> JobId
>> > 5572: End of Volume at file 0 on device "FileStorage"
>> > (/disk0/bacula/files), Volume "Backup-0360" 07-Nov 11:13 troll-sd
>> JobId
>> > 5572: Ready to read from volume "Backup-0361" on device "FileStorage"
>> > (/disk0/bacula/files). 07-Nov 11:13 troll-sd JobId 5572: Forward
>> spacing
>> > Volume "Backup-0361" to file:block 0:258257.
>> >
>> > Remember my Disk Volumes hold data for up to 15 backup jobs.
>> >
>> > Other people reported the same issue for Migration jobs.
>> >
>> > I will try to verify and track this down at home
>> > over the weekend.  debug = 999 should help :-)
>>
>> Well, the problem is very likely to be in src/stored/match_bsr.c, which
>> currently is set to debug level 500 so 999 will turn it on, but you will
>> also
>> get a lot of other data particularly from read_record.c so depending on
>> the
>> output, you might want to set the dbglevel variable at the top of
>> match_bsr.c
>> to something like 50 and then use a debug=50 simply to reduce the output
>> a
>> bit.
>>
>
> Hi,
>
> i tried your suggestions (dbglvl=50 in match_bsr.c and also
> read_record.c)
>
> debug settings while one CopyDiskToTape job was running:
>
> setdebug level=250 storage=Tape trace=1
>
> level 250 because i was interested in raw block reads.
> total job size was 5.5GB,  the job is spread over 13 4G disk volumes
> (because of concurrent disk backup)
> i verified that each of the 13 4G volumes are read to the end after
> the  first seek.  there are about 50G reads for this job,
> our full backup size (30 clients) is about 700G,  copy to tape
> runs more than 24 hours.  this is a performance killer to our
> backup strategy:
>
> 	backup to disk pool 1 month retention (fast restore)
> 	copy disk to tape pool 6 month retention
>
>
> fgrep "Read device got 64512 bytes at " troll-sd.trace
> ...
> troll-sd: block.c:1010-6512 Read device got 64512 bytes at 0:4294563835
> troll-sd: block.c:1010-6512 Read device got 64512 bytes at 0:4294628347
> troll-sd: block.c:1010-6512 Read device got 64512 bytes at 0:4294692859
> troll-sd: block.c:1010-6512 Read device got 64512 bytes at 0:4294757371
> troll-sd: block.c:1010-6512 Read device got 64512 bytes at 0:4294821883
> troll-sd: block.c:1010-6512 Read device got 64512 bytes at 0:4294886395
> troll-sd: block.c:1010-6512 Read device got 64512 bytes at 0:0
> troll-sd: block.c:1010-6512 Read device got 64512 bytes at 0:0
> troll-sd: block.c:1010-6512 Read device got 64512 bytes at 0:1096908
> troll-sd: block.c:1010-6512 Read device got 64512 bytes at 0:1161419
> troll-sd: block.c:1010-6512 Read device got 64512 bytes at 0:1225931
> troll-sd: block.c:1010-6512 Read device got 64512 bytes at 0:1290443
> troll-sd: block.c:1010-6512 Read device got 64512 bytes at 0:1354955
> troll-sd: block.c:1010-6512 Read device got 64512 bytes at 0:1419467
> troll-sd: block.c:1010-6512 Read device got 64512 bytes at 0:1483979
> ...
>
> corresponding messages:
> 13-Nov 12:27 troll-sd JobId 6512: Forward spacing Volume "Backup-0054" to
> file:block 0:1161420.
> 13-Nov 12:28 troll-sd JobId 6512: End of Volume at file 0 on device
> "FileStorage" (/disk0/bacula/files), Volume "Backup-0054"
> 13-Nov 12:28 troll-sd JobId 6512: Ready to read from volume "Backup-0055"
> on device "FileStorage" (/disk0/bacula/files).
> 13-Nov 12:28 troll-sd JobId 6512: Forward spacing Volume "Backup-0055" to
> file:block 0:1096908.
>
>
> if you interested i can give u access to debug file troll-sd.trace
> (very large file :-)
>
> i looked a little at read_record.c, match_bsr.c and
> found some comments like *FIXME* or
>
> #ifdef if_and_when_FAST_BLOCK_REJECTION_is_working
> #endif
>
> would be nice if you (core developers) have a look this.

I don't see anything above that indicates that Bacula is reading more than
it should.  If you are running multiple concurrent jobs into the same
volume and you are not using disk spooling, you are going to have a very
large number of interleaved records, so Bacula will be doing a lot of
seeking, which is a very low cost affair.

However, what would not be good is that if Bacula is reading past the last
data record on a particular Volume, and from the above you have not shown
that.

If you can clearly show me that Bacula reads beyond the last valid record
for a job on a Volume (or always all the way to the end of the volume),
then I'll be happy to look at it, othewise not.

>
>
>
>
> i am thinking about changing our backup strategy to overcome
> this performance issue:
>
>
> 1) backup to tape spooling and concurrency enabled.
> this will produce continuous jobs on tape right?
>
> 2) copy tape to disk, this will produce continuous
> disk volumes.

First I am not sure what you mean by "continuous jobs" on a tape, and
second I am not sure what you mean by "continuous disk volumes".  These
are not terms we use.  I believe that the way Bacula works with spooling
on/off and multiple simultaneous jobs is documented someplace in the
manual.


>
> what u think about that?

You have not clearly identified the cause of the problem (I assume Bacula
is  too slow).  You need to do that first.  Then, if it is due to
interleaved blocks (possibility), then you need to try turning on spooling
to disk and see if that corrects the problem -- or a better solution would
probably be to use one disk volume per job to completely avoid the
spooling time and to avoid any interleaving of disk blocks.

If the problem is that Bacula is not stopping after it has read all the
necessary records, then we have a bug that we can work on, and in the mean
time, you can minimize the impact somewhat by using smaller disk volume
sizes.

Regards,

Kern
>
>
> Thanks
> Ulrich
>
>> Regards,
>>
>> Kern
>>
>
> --
> Ulrich Leodolter <ulrich.leodolter@xxxxxxxx>
> OBVSG
>
>


Best regards, Kern

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-devel mailing list
Bacula-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/bacula-devel


This mailing list archive is a service of Copilotco.