[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bacula-devel] concurrent multidrive use


Hi again,
This and few other emails regarding this request arrived while I was 
preparing my further justification, but I'll add it here just for 
completeness:
All,
Thanks for giving this request your consideration.
Can I make a few points before it's consigned to oblivion.

I wasnt envisaging writing blocks of the *same file* to different drives.
I was thinking more of whole files to different drives, but from a single
job (FD/pool combo). Something like this:
(And please excuse me if I'm being an utter moron here...)

FD scans the filesystem gathering a list (not necessarily the entire list)
of files to be backed up, accumulating the size total.
When this total reaches some configurable limit (or the list is complete),
FD asks SD for an SD thread and waits.
When SD replies with another thread, FD then starts an FD child and passes
the list to it. The child then opens a connection to that SD and proceeds
as now.
Meanwhile FD master continues to scan the filesystem, accumulates another
list and asks for another SD thread. When it gets one it starts another FD
child and hands this list to the second child.
And so on until all available SD's are in use (in which case the request
from FD for another SD will block - timeout issues?) or until the backup
is complete.

Meanwhile, SD knows how many tapedrives it has at its disposal.
Each time it is asked by FD to start a new thread, it looks to see if there
is an idle drive. If so, SD claims that drive, starts a new SD child and
hands the drive to it. It then tells FD the details of the new thread
(port number etc?) which by now is waiting for data to arrive on that port
from the child FD.
If there is no idle drive, the SD parent waits until one becomes available
before it starts a new child. (Timeout issues?)

When the backup of that set of files is complete, the SD child marks the
tapedrive as idle and exits. The drive then becomes available to any
waiting SD master.

Once the child FD and child SD are communicating, isnt it more-or-less
the same as now?


Regarding performance:
We will be backing up a 16TB raid system to LTO4, both hosted by the same
machine.
dd'ing a 100GB file from our raid to /dev/null I can achive > 400Mbytes/sec.

dd'ing  the same file to /dev/nst0 with bs=256*512 I can get 100Mbytes/sec.
(This drops to 75 Mbytes/sec with a blocksize of 128*512 ...)
I get a similar speed if I tar the file to tape with the same blocksize.
The documented (streaming?) speed of the drive is 120Mbytes/sec.

Backing up this file with Bacula (again with bs=256*512) I get 70Mbytes/sec

(from Bacula's email:
   FD Files Written: 1
   SD Files Written: 1
   FD Bytes Written: 100,000,000,000 (100.0 GB)
   SD Bytes Written: 100,000,000,106 (100.0 GB)
   Rate: 70028.0 KB/s)

16Tbytes at 70Mbytes/sec is 63 hours
 
Using two copies if this file, and backing each one up with two separate 
jobs
onto two separate tapedrives achieves about the same rate *per drive*, so I
think that for me it would be worthwhile.
And for users who are trying to make use of a number of slower cheaper
drives instead of fast drives in an expensive tapechanger, this would be
even more beneficial.

I agree that such rates might not be achievable over a network.
There is no network involved for our specific case, but our emerging
10GbE net would easily sustain multiple 70Mbyte/sec streams.

Cheers,
Terry




> Hello,
>
> After having discussed this a bit on the list, and re-reading your note here, 
> I realize that yes splitting to separate tapes would be possible on a file by 
> file basis.  However, that feature would work much better if the multiple 
> threads for a given job were implemented, which is already a project listed 
> in the Projects file.
>
> Regards,
>
> Kern
>
>
> On Monday 21 July 2008 19:55:20 T. Horsnell wrote:
>   
>> OK, thanks. And sorry to have pestered the development *and* the users
>> list about this. I just wanted to be sure you understood what I meant.
>>
>> Actually, I would have said that striping was the process of spreading a
>> single *file* across multiple drives simultaneously (just like
>> disk-striping). To my mind, spreading the files of a single *job* across
>> multiple drives doesnt mean that part of each *file* is written to
>> multiple drives, but instead, that when the storage daemon is writing to
>> a device which has been declared as an autochanger with multiple drives,
>> it would take the next file from its input stream and write it to an
>> idle drive in that autochanger.
>>
>> I guess if the storage-daemon scheme has one thread per tapedrive,
>> it doesnt lend itself to this mode of operation.
>>
>> THanks again,
>> Terry
>>
>> Kern Sibbald wrote:
>>     
>>> Yes, sorry, I did not understand.  Bacula does not have the ability to
>>> write a single job to multiple drives -- normally that is called
>>> striping.  It is unlikely that it will be implemented any time in the
>>> near future.
>>>
>>> On Monday 21 July 2008 18:27:43 T. Horsnell wrote:
>>>       
>>>> Thank you for that quick reply, and once again, apologies for the
>>>> interruption, but I dont want to split the backup across multiple jobs,
>>>> (with part of the filesystem being handled by one job and part by the
>>>> other), because however I make the split, the content (and hence the
>>>> size) of each part of the filesystem will be continually changing (this
>>>> is a 16TB multiuser raid system), and so one tapedrive may well be
>>>> mostly idle whilst the other one is continually busy. So I want, if
>>>> possible, to do this with a single job.
>>>>
>>>> Cheers,
>>>> Terry
>>>>
>>>> Kern Sibbald wrote:
>>>>         
>>>>> Hello,
>>>>>
>>>>> We generally do not supply support help on this list, but I will give a
>>>>> few tips ...
>>>>>
>>>>> Bacula has been able to write to multiple drives simultaneously for a
>>>>> very long time now -- many years and many versions.
>>>>>
>>>>> The simplest way to do it is to use different pools for each job.
>>>>>
>>>>> A not so satisfactory way of doing it is to use "Prefer Mounted Volumes
>>>>> = no". I don't recommend this as it leads to many operational problems.
>>>>>
>>>>> In general, if you are backing up raid disks, you should be able to tune
>>>>> your hardware so that it will write approximately 150 MB/sec with Bacula
>>>>> to an LTO3 drive, and so splitting jobs is not generally necessary.
>>>>> Getting your hardware tuned to run at those speeds is not easy and
>>>>> requires professional help.
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Kern
>>>>>
>>>>> On Monday 21 July 2008 17:20:11 T. Horsnell wrote:
>>>>>           
>>>>>> Apologies for pestering the developers list, but I cant determine from
>>>>>> the user docs whether what I want to do is failing because I'm doing it
>>>>>> wrongly, or simply that its not supported.
>>>>>>
>>>>>> I want to define a single job which will backup a single (big) raid
>>>>>> filesystem to an autochanger which contains multiple tapedrives, and I
>>>>>> want this job to use all the tapedrives simultaneously. This would seem
>>>>>> to me to be a pretty standard requirement, but I cant get it to work
>>>>>> with Bacula version 2.4.1
>>>>>>
>>>>>> Looking at the Storage Daemon section (6.4) in the new delvelopers docs
>>>>>> for 2.5.2 (dated 20th July 2008 !) I see that this may not yet be
>>>>>> possible:
>>>>>>
>>>>>>
>>>>>> ---cut---
>>>>>> Today three jobs (threads), two physical devices each job writes to
>>>>>> only one device:
>>>>>> Job1 -> DCR1 -> DEVICE1
>>>>>> Job2 -> DCR2 -> DEVICE1
>>>>>> Job3 -> DCR3 -> DEVICE2
>>>>>>
>>>>>> To be implemented three jobs, three physical devices, but job1 is
>>>>>> writing simultaneously to three devices:
>>>>>>
>>>>>> Job1 -> DCR1 -> DEVICE1
>>>>>>        -> DCR4 -> DEVICE2
>>>>>>        -> DCR5 -> DEVICE3
>>>>>> Job2 -> DCR2 -> DEVICE1
>>>>>> Job3 -> DCR3 -> DEVICE2
>>>>>> ---cut---
>>>>>>
>>>>>> Is what I want possible in 2.5.2, or should I wait?
>>>>>>
>>>>>> Cheers,
>>>>>> Terry.
>>>>>>
>>>>>>
>>>>>>
>>>>>> -----------------------------------------------------------------------
>>>>>> -- This SF.Net email is sponsored by the Moblin Your Move Developer's
>>>>>> challenge Build the coolest Linux based applications with Moblin SDK &
>>>>>> win great prizes Grand prize is a trip for two to an Open Source event
>>>>>> anywhere in the world
>>>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/
>>>>>> _______________________________________________
>>>>>> Bacula-devel mailing list
>>>>>> Bacula-devel@xxxxxxxxxxxxxxxxxxxxx
>>>>>> https://lists.sourceforge.net/lists/listinfo/bacula-devel
>>>>>>             
>>>> -------------------------------------------------------------------------
>>>> This SF.Net email is sponsored by the Moblin Your Move Developer's
>>>> challenge Build the coolest Linux based applications with Moblin SDK &
>>>> win great prizes Grand prize is a trip for two to an Open Source event
>>>> anywhere in the world
>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/
>>>> _______________________________________________
>>>> Bacula-devel mailing list
>>>> Bacula-devel@xxxxxxxxxxxxxxxxxxxxx
>>>> https://lists.sourceforge.net/lists/listinfo/bacula-devel
>>>>         
>
>
>   


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-devel mailing list
Bacula-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/bacula-devel


This mailing list archive is a service of Copilotco.