[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bacula-devel] [Bacula-users] Virtual backup


09.09.2008 02:31, Michael Heim wrote:
>   Kern Sibbald wrote:
>> On Sunday 07 September 2008 03:04:21 Michael Heim wrote:
>>> Hi Kern,
>>> can't you use something like data spooling for this?
>>> First step: Make a virtFull to a temp spool file on disk
>>> Second step: Spool this to the destination
>> Yes, interesting idea.  It could possibly help in some cases, but the problem 
>> is that it doesn't scale well enough.  Take for example an extreme example: 
>> someone has a backup to tape amounting to 10TB.  With this, suggestion, it 
>> would be necessary to spool 10TB to disk then to write it to tape.  Someone 
>> with this size of data will require a solution that is tape to tape ...
> Hi Kern,
> I think a 10 TB backup should be no problem to spool today for a company 
> with such a request.

Possible... but the idea is to find the most versatile approach here, 
useful to big and small installations as well.

> But first, such a big filesystem should split into 
> several smaller parts to be more flexible for a restore.

This has been discussed on the -users mailing list from time to time, 
and I believe it's reasonable to say that, while this might be a good 
way, some times it's not possible.

> In the 
> datacenter, where I worked for over 7 years, we have had a TSM cluster 
> with two SUN L500 tapelibs, each equipped with 8 LTO3 FC drives and 
> about 500 slots. For holding backup data first (spool + migration), the 
> cluster was connected to two EMC Clariion CX3-40 with 32 TB SATA 
> capacity each (64 TB total). I think in environments with 10 TB 
> filesystems a 10-20 TB spool area cannot be a problem.

I tend to disagree... in many enterprises, the available disk space is 
used for production data almost completely, and management won't 
accept to spend money on twice the disk space for intermediate backup 
use only.

> But filesystems with several TBs are very rare. In my opinion only 
> video/multimedia application can utilize them.

Let's see if some other users comment on this - I believe I recall a 
number of Bacula users stating they have TBs of data. Research data of 
all sorts, for example, tends to increase steadily and will not be 

> For normal file data the 
> average filesize is between 50kb and 300kb. This means about 3-20 
> million files per TB.
> A normal filesystem with more then about 500-1000 GB data shouldn't be 
> used, because the time to restore is too high. A real life example (from 
> my experience):
>     A restore of a NTFS filesystem (on a HP DL385G2 Windows 2003 
> cluster, 2x2,4 GHz Opteron-DC, 8 GB RAM) with 400 GB and 10 million 
> files, would take about 3 days on a EMC Clariion CX700 with 4 GB cache 
> (measured on a 400 GB slice of a Raid5 with 8x300 GB FC 10.000 UPM 
> harddisks - no other IO is using the same Raid) with TSM.

This sounds ridiculously slow...

> For such big filesystems (>1 TB or several million files) a file by file 
> restore isn't really possible for a company with a SLA or a max. restore 
> time.

Don't forget there are setups where the actual time to recovery is not 
that important.

> To handle those filesystems, imagebackups (a TSM feature to backup 
> a whole filesystem with a own snapshot method) are used, so the restore 
> of the same filesystem (400 GB, 10 million files) will only take 2h, 
> because the whole filesystem image is restored.
> A good strategy is to create only filesystems below one TB and with less 
> then 3-5 million files, because bigger filesystems couldn't be handled 
> properly with normal hardware.

I tend to disagree.

> In a normal IT environment with several 
> smaller filesystems a spool area shouldn't be any problem.

I also disagree, because the requirements might be defined by the 
total amount of data to be saved, not by the biggest job only. Imagine 
having 10 1 TB file sets and wanting to save them all concurrently - 
having ten jobs instead of one can be useful to get a more efficient 
hardware useage, running more jobs in parallel, but you might still 
require spool space / temporary pool space for all ten TB at the same 

> The only 
> requirement to such a spool area should be very fast concurrent reads 
> and writes, so a Raid 10 (or perhaps SSDs) should be considered.

Well, that's a merely technical problem :-)


Arno Lehmann
IT-Service Lehmann
Sandstr. 6, 49080 Osnabrück

This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
Bacula-devel mailing list

This mailing list archive is a service of Copilot Consulting.