[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bacula-devel] patch for presence of file daemon





"Kern Sibbald" a écrit le 14/08/2008 21:28 :
> Hello,
>
> I have a few questions, please see below ...
>
> On Thursday 14 August 2008 17:23:39 Jean-Sébastien Hederer wrote:
>   
>> Hi,
>>
>> Maxime Rousseau has created a new feature for bacula. This patch has been
>> created in order to optimize the communications between the File daemon and
>> the Director daemon. It has been written for 2.4.0. All regression tests
>> and function tests have been passed for 2.4.0. Patch is ready for 2.4.0.
>> Maxime is making patch for 2.4.2 and trunk before sending it. Here are some
>> explanations:
>>
>>
>>
>> With the new features, Bacula can backup clients who change their IP like
>> laptops.
>>     
>
> Bacula already has a means to backup clients, which change their IP address. 
> However, I admit that it could be optimized a bit.
>
>   
Changing IP is dynamic. Information is given from FD to DIR when
signaling presence. We can have a parameter to enable/disable this feature. Should this parameter be on director  ressource  for DIR or client ressource for DIR?

I had never seen SETIP It's  DIR that can control the change
of IP for an FD whatever how console is defined on FD.


>> There are less error messages when a job is canceled because of
>> the absence of the File Daemon.
>>     
>
> Yes, there are probably too many error messages, but then it depends on how
> you configure Bacula ...
>   
yes but here, if file daemon is not here, we'll be able to send a clear
message  on it's status because we know he's not available.(and not a message saying job has been cancelled on network communication failure)


>   
>> The communication between FD and DIR become
>> bidirectional so connections are more frequent.
>>     
>
> Maybe I am misunderstanding you, but more frequent connections are bad not
> good. 
>
>   
yes there are a few more connections. when DIR starts and tries to
detect FDs with parameter set and when FD starts/stops to say to DIR
that he's available. but it permits not to have connections from DIR to
FD that pollutes network when FD is not available.

for all these communications, we open a socket, make the communication and close the socket.

we have reused standard bacula functions as much as possible for all this feature.

>> New features for the DIR:
>>         - when the DIR start, he tries to connect to the FD. If the connection is
>> successful, a presence parameter in the Client ressource change to "yes".
>> Else the presence parameter keep his value "no".
>>     
>
> What happens if the Director has 2,000 clients? 
>   
he will try to reach all clients for which the presence parameter is
set. we took in mind that DIR stops/starts only very few times in a
year. this is how work our clients. this optimizes number of network
communications

> Does the DIR stall until it contacts them all?  How many resources does it
> take to contact them?
>   
the DIR contacts all the FDs for which parameter is set before continuing.

we'll see how to "parallelize" FD communications.

> How long will it take for it to contact the last of the 2,000 clients? 
>   
timers are parametered.



> If a scheduled job starts for the 2,000th client before the 2,000th client is
> contacted by the startup routine, will the job be retarded in starting?
>   
yes.


> What happens if I have clients that I don't want the director contacting
> because they are very infrequently used and the jobs are only manually
> started?
>   
you don't set presence parameter or you put it to false (default value
corresponds to actual behavior)
>   
>> - when the DIR is going to
>> start a new job, he checks the presence parameter. If the client is
>> present, the DIR starts the job, else he waits for him during a time
>> specified in the Client ressource in the bacula-dir.conf (this parameter is
>> named "WaitTimer"). He checks if the client is connected at each interval
>> of a time (attribute "PresenceTimer" in bacula-dir.conf). If the client
>> never connect himself during the "WaitTimer" time, the job is marked as
>> "JSAutomaticallyCanceled" in the Catalog. "JSAutomaticallyCanceled" is a
>> new parameter defined in jcr.h and it means that the job is canceled
>> because the File daemon has never been connected.
>>     
>
> Is it possible to turn off this behavior?  I don't want it for my setup,
> because it is not always possible for my clients to contact the Director.
>   
yes, sure. ever  made. old configurations are fully compatible without changing behavior. 

>   
>> - I have created a new
>> file named fd_server.c. It allow the DIR to listen to the File Daemon
>> connections (the default port is 9104, parameter DIRportFD in Director
>> ressource of bacula.dir.conf). The parameter MaxClientsPresence defined in
>> Director ressource in bacula-dir.conf decide how many File Daemons the DIR
>> can listen simultaneously. - Authentifications fonctions are also
>> implemented in authenticate.c in src/dird and src/filed.
>>     
>
> The IANA will never approve of a fourth port for Bacula.  There is no reason
> to have the Director listening on two ports.  These connections should be
> multiplexed on port 9101.
>   
problem is we should change treatment for incoming requests  on port 9101 in order to have two types of communications: console and FD

so, this could be not compatible with existing consoles


>   
>> New features for the FD:
>>         - the FD must know the address of the Director which is stocked in the
>> Director ressource in bacula-fd.conf.
>>     
>
> What happens if there are two or three Directors that can contact a File
> daemon as is the case at my site?
>   
each director ressource is separated in FD configuration file. so, each ressource can be configured separately.

>   
>> Also, he knows on which port he is able to contact the DIR (default 9104).
>>     
>
>   
>> - when the FD start, he tries to
>> connect to the DIR. If the connection is successful, a presence parameter
>> in the Client ressource of the Director daemon changes to "yes". Else the
>> presence parameter keep his default value "no". For the authentification he
>> uses the existing password between the File Daemon and the Director. The
>> File Daemon gives his new address to the DIR so if the client is a laptop,
>> jobs can be run with any IP.
>>     
>
> As I mentioned above this capability already exists with SETIP.
>   
this is not exactly the same feature. this is not a feature for a console.

>   
>> - when the File Daemon stops, he warns the DIR he is going away. After this
>>     
> warning, presence_parameter = 0 : the DIR
>   
>> knows the client is absent.
>>     
>
> So let's say that the client goes away and notifies the Director, then when
> the client starts again, because of some temporary problem it cannot notify
> the director.  Is the client then essentially disabled?
>   
yes. this could be upgraded in order to periodically say to the DIR that he is present.


>   
>> This feature doesn't work on Windows system.
>> Perhaps the FD not finished in the same way as it stop on Linux. At least,
>> on Windows, bacula does not go in the fonction "terminate_filed" in filed.c
>> so the presence parameter keep his value at 1. ----> Perhaps there is a
>> possible upgrade to do.
>>     
>
> It is possible that the Win32 FD gets some serious error on termination so it
> never gets to the terminate_filed() code.  This is also possible any time any
> FD crashes.
>
>   
>> For the connections at the start of the two Daemons, there is a
>> retry_interval defined at 10 seconds (if connection fail, retry after 10
>> seconds) and a max_retry_time defined at 20 seconds (abandon connection
>> after 20 seconds).
>>
>> Normally, the old configurations works fine even though files are patched.
>>
>> If configuration files not exist when we apply the patch, they are created
>> with a new configuration (Presence parameter, PresenceTimer, WaitTimer,
>> Address of the Director...). Else you must modify the configuration files:
>> if the Presence parameter in Client ressource in bacula-dir.conf and the
>> address attribute in Director ressource in bacula-fd.conf not exist, bacula
>> will run like an old configuration.
>>
>>
>>
>> Exemple of a new configuration:
>>
>>
>> 1/ In "bacula-dir.conf"
>>
>> Director {                            # define myself
>>   Name = localhost-dir
>>   DIRport = 9101            # where we listen for UA connections
>>   DIRportFD = 9104  # where we listen for FD connections ----------------->
>> NEW QueryFile = "/home/rousseaum/bacula/bin/query.sql"
>>   WorkingDirectory = "/home/rousseaum/bacula/working"
>>   PidDirectory = "/home/rousseaum/bacula/working"
>>   Maximum Concurrent Jobs = 1
>>   Password = "6V2ghmC6A0YUfncxiF5wJJ1x+WAT2BpUD55l1tfaOury"         #
>> Console password Messages = Daemon
>>     
>
>   
>>   MaxClientsPresence = 20  #How many client the DIR can listen
>> simultaneously -----------------> NEW
>>     
>
> Why is this needed?
>

we reused existing functions and the function reused needs a number as argument. so, we've put it into parameters.


> When the client connects to the Dir does it remain connected or does it
> disconnect after announcing its presence?
>   

it disconnects (through bnet_close)
>   
>> }
>>
>> Client {
>>   Name = localhost-fd
>>   Address = localhost
>>   FDPort = 9102
>>   Catalog = MyCatalog
>>   Password = "VfCC+e5Lp87mlgdW58PqkxLRvyM2jcwhGCkBMNOOuzXz"          #
>> password for FileDaemon File Retention = 30 days            # 30 days
>>   Job Retention = 6 months            # six months
>>   AutoPrune = yes                     # Prune expired Jobs/Files
>>   Presence = yes        # The presence parameter exist ------------------------->
>> NEW PresenceTimer = 15 # Maximum time to verify the client presence
>> --------> NEW WaitTimer = 60 minutes  # Maximum time to wait the client
>> --------------> NEW # PresenceTimer and WaitTimer are defined in second by
>> default. We can use minutes, hours, days... like the other # temporal
>> parameter in Bacula.
>> }
>>
>>
>> 2/ In "bacula-fd.conf"
>>
>> Director {
>>   Name = localhost-dir
>>   Address = localhost
>>   DIRport = 9104 --------------------------------------------------------->
>> NEW Password = "VfCC+e5Lp87mlgdW58PqkxLRvyM2jcwhGCkBMNOOuzXz"
>> }
>>
>>
>>
>>
>>
>> Exemple of a typical communication between the FD and the DIR:
>>
>> 1/ Starting daemons:
>>
>> 1.1/ DIR starts before FD (most frequent situations)
>>
>> DIR starts;
>> DIR tries to connect to FD;
>> if (FD connected) {
>>         presence_parameter = 1;
>> }
>> FD starts;
>> FD tries to connect to DIR;
>> if (DIR connected) {
>>         presence_parameter = 1;
>>         FD give his new address to DIR;
>>     
>
> How does the FD pass his address to the DIR?
>   

fd->host()

>   
>> }
>>
>> 1.2/ FD starts before DIR
>>
>> FD starts;
>> FD tries to connect to DIR;
>> if (DIR connected) {
>>         presence_parameter = 1;
>>         FD give his new address to DIR;
>> }
>> DIR starts;
>> DIR tries to connect to FD;
>> if (FD connected) {
>>         presence_parameter = 1;
>> }
>>
>>
>> 1/ Starting job (Backup, Restore):
>>
>> DIR check FD presence;
>> if (FD hasn't got presence_parameter) {                  ----> old
>> configuration run job like old configuration;
>> }
>> else {                                                                                                                                        ----> new configuration
>>         if (FD present) {
>>                 run job;
>>         }
>>         else {
>>                 while (WaitTimer isn't terminate) {
>>                         check FD connection all the PresenceTimer interval;
>>                         if (FD connect) {
>>                                 run job;
>>                         }
>>                 }
>>                 Job mark at JSAutomaticallyCanceled;
>>         }
>> }
>>
>>
>>
>>
>> *Any remarks are welcome. We hope this feature to be included in bacula, so
>> we made it with existing clients configuration in mind in order not to
>> disturb existing configurations. *
>>     
>
> My questions are above. Aside from the one remark I made above, the only other
> remark I have for the moment (until I see the answers) is to say, it is
> always preferable to announce and discuss a project prior to coding it -- it
> can possibly save you a lot of time recoding it or the horrible frustration
> of having it rejected after you've spent a lot of time on it.
>   

yes, I know we should have made so. we'll try not to forget that for next features


> Once I have your responses to my questions, I will make my remarks.
>
>
> Best regards,
>
> Kern
>
>
>   


à compléter demain


"Kern Sibbald" a écrit le 14/08/2008 21:28 :
Hello,

I have a few questions, please see below ...

On Thursday 14 August 2008 17:23:39 Jean-Sébastien Hederer wrote:
  
Hi,

Maxime Rousseau has created a new feature for bacula. This patch has been
created in order to optimize the communications between the File daemon and
the Director daemon. It has been written for 2.4.0. All regression tests
and function tests have been passed for 2.4.0. Patch is ready for 2.4.0.
Maxime is making patch for 2.4.2 and trunk before sending it. Here are some
explanations:



With the new features, Bacula can backup clients who change their IP like
laptops. 
    

Bacula already has a means to backup clients, which change their IP address.  
However, I admit that it could be optimized a bit.

  
Changing IP is dynamic. Information is given from FD to DIR when signaling presence. We have a parameter to enable/disable this feature.
I had never seen SETIP It's  DIR that controls if he permits to change IP for an FD whatever how console is defined on FD.

if an FD has got three DIRs contacting him, there can be one only one DIR that permits changing IP

There are less error messages when a job is canceled because of 
the absence of the File Daemon. 
    

Yes, there are probably too many error messages, but then it depends on how 
you configure Bacula ...
  
yes but here, if file daemon is not here, we'll be able to send a clear message on it's status because we know he's not available.


  
The communication between FD and DIR become 
bidirectional so connections are more frequent.
    

Maybe I am misunderstanding you, but more frequent connections are bad not 
good.  

  
yes there are a few more connections. when DIR starts and tries to detect FDs with parameter set and when FD starts/stops to say to DIR that he's available. but it permits not to have connections from DIR to FD that pollutes network when FD is not available

New features for the DIR:
	- when the DIR start, he tries to connect to the FD. If the connection is
successful, a presence parameter in the Client ressource change to "yes".
Else the presence parameter keep his value "no". 
    

What happens if the Director has 2,000 clients?  
  
he will try to reach all clients for which the presence parameter is set. we took in mind that DIR stops/starts only very few times in a year. this is how work our clients. this optimizes number of network communications

Does the DIR stall until it contacts them all?  How many resources does it 
take to contact them?
  
the DIR contacts all the FDs for which parameter is set before continuing.

further responses tomorrow about parrallellizing contacts

How long will it take for it to contact the last of the 2,000 clients?  
  
timers are parametered. we'll give an example tomorrow

If a scheduled job starts for the 2,000th client before the 2,000th client is 
contacted by the startup routine, will the job be retarded in starting?
  
yes.
What happens if I have clients that I don't want the director contacting 
because they are very infrequently used and the jobs are only manually 
started?
  
you don't set presence parameter or you put it to false (default value corresponds to actual behavior)
  
- when the DIR is going to 
start a new job, he checks the presence parameter. If the client is
present, the DIR starts the job, else he waits for him during a time
specified in the Client ressource in the bacula-dir.conf (this parameter is
named "WaitTimer"). He checks if the client is connected at each interval
of a time (attribute "PresenceTimer" in bacula-dir.conf). If the client
never connect himself during the "WaitTimer" time, the job is marked as
"JSAutomaticallyCanceled" in the Catalog. "JSAutomaticallyCanceled" is a
new parameter defined in jcr.h and it means that the job is canceled
because the File daemon has never been connected. 
    

Is it possible to turn off this behavior?  I don't want it for my setup, 
because it is not always possible for my clients to contact the Director.
  
yes, sure.

  
- I have created a new 
file named fd_server.c. It allow the DIR to listen to the File Daemon
connections (the default port is 9104, parameter DIRportFD in Director
ressource of bacula.dir.conf). The parameter MaxClientsPresence defined in
Director ressource in bacula-dir.conf decide how many File Daemons the DIR
can listen simultaneously. - Authentifications fonctions are also
implemented in authenticate.c in src/dird and src/filed.
    

The IANA will never approve of a fourth port for Bacula.  There is no reason 
to have the Director listening on two ports.  These connections should be 
multiplexed on port 9101.
  
ok, we look to fix that

  
New features for the FD:
	- the FD must know the address of the Director which is stocked in the
Director ressource in bacula-fd.conf. 
    

What happens if there are two or three Directors that can contact a File 
daemon as is the case at my site?
  
response tomorrow

  
Also, he knows on which port he is able to contact the DIR (default 9104). 
    

  
- when the FD start, he tries to 
connect to the DIR. If the connection is successful, a presence parameter
in the Client ressource of the Director daemon changes to "yes". Else the
presence parameter keep his default value "no". For the authentification he
uses the existing password between the File Daemon and the Director. The
File Daemon gives his new address to the DIR so if the client is a laptop,
jobs can be run with any IP. 
    

As I mentioned above this capability already exists with SETIP.
  
this is not exactly the same feature

  
- when the File Daemon stops, he warns the DIR he is going away. After this 
    
warning, presence_parameter = 0 : the DIR
  
knows the client is absent. 
    

So let's say that the client goes away and notifies the Director, then when 
the client starts again, because of some temporary problem it cannot notify 
the director.  Is the client then essentially disabled?
  
not at all, it notifies DIR again of it's presence
  
This feature doesn't work on Windows system. 
Perhaps the FD not finished in the same way as it stop on Linux. At least,
on Windows, bacula does not go in the fonction "terminate_filed" in filed.c
so the presence parameter keep his value at 1. ----> Perhaps there is a
possible upgrade to do.
    

It is possible that the Win32 FD gets some serious error on termination so it 
never gets to the terminate_filed() code.  This is also possible any time any 
FD crashes.

  
For the connections at the start of the two Daemons, there is a
retry_interval defined at 10 seconds (if connection fail, retry after 10
seconds) and a max_retry_time defined at 20 seconds (abandon connection
after 20 seconds).

Normally, the old configurations works fine even though files are patched.

If configuration files not exist when we apply the patch, they are created
with a new configuration (Presence parameter, PresenceTimer, WaitTimer,
Address of the Director...). Else you must modify the configuration files:
if the Presence parameter in Client ressource in bacula-dir.conf and the
address attribute in Director ressource in bacula-fd.conf not exist, bacula
will run like an old configuration.



Exemple of a new configuration:


1/ In "bacula-dir.conf"

Director {                            # define myself
  Name = localhost-dir
  DIRport = 9101            # where we listen for UA connections
  DIRportFD = 9104  # where we listen for FD connections ----------------->
NEW QueryFile = "/home/rousseaum/bacula/bin/query.sql"
  WorkingDirectory = "/home/rousseaum/bacula/working"
  PidDirectory = "/home/rousseaum/bacula/working"
  Maximum Concurrent Jobs = 1
  Password = "6V2ghmC6A0YUfncxiF5wJJ1x+WAT2BpUD55l1tfaOury"         #
Console password Messages = Daemon
    

  
  MaxClientsPresence = 20  #How many client the DIR can listen
simultaneously -----------------> NEW
    

Why is this needed?

When the client connects to the Dir does it remain connected or does it 
disconnect after announcing its presence?
  

à discuter
  
}

Client {
  Name = localhost-fd
  Address = localhost
  FDPort = 9102
  Catalog = MyCatalog
  Password = "VfCC+e5Lp87mlgdW58PqkxLRvyM2jcwhGCkBMNOOuzXz"          #
password for FileDaemon File Retention = 30 days            # 30 days
  Job Retention = 6 months            # six months
  AutoPrune = yes                     # Prune expired Jobs/Files
  Presence = yes	# The presence parameter exist ------------------------->
NEW PresenceTimer = 15 # Maximum time to verify the client presence
--------> NEW WaitTimer = 60 minutes  # Maximum time to wait the client
--------------> NEW # PresenceTimer and WaitTimer are defined in second by
default. We can use minutes, hours, days... like the other # temporal
parameter in Bacula.
}


2/ In "bacula-fd.conf"

Director {
  Name = localhost-dir
  Address = localhost
  DIRport = 9104 --------------------------------------------------------->
NEW Password = "VfCC+e5Lp87mlgdW58PqkxLRvyM2jcwhGCkBMNOOuzXz"
}





Exemple of a typical communication between the FD and the DIR:

1/ Starting daemons:

1.1/ DIR starts before FD (most frequent situations)

DIR starts;
DIR tries to connect to FD;
if (FD connected) {
	presence_parameter = 1;
}
FD starts;
FD tries to connect to DIR;
if (DIR connected) {
	presence_parameter = 1;
	FD give his new address to DIR;
    

How does the FD pass his address to the DIR? 
  
through network communication on port 9101

  
}

1.2/ FD starts before DIR

FD starts;
FD tries to connect to DIR;
if (DIR connected) {
	presence_parameter = 1;
	FD give his new address to DIR;
}
DIR starts;
DIR tries to connect to FD;
if (FD connected) {
	presence_parameter = 1;
}


1/ Starting job (Backup, Restore):

DIR check FD presence;
if (FD hasn't got presence_parameter) {                  ----> old
configuration run job like old configuration;
}
else {																	----> new configuration
	if (FD present) {
		run job;
	}
	else {
		while (WaitTimer isn't terminate) {
			check FD connection all the PresenceTimer interval;
			if (FD connect) {
				run job;
			}
		}
		Job mark at JSAutomaticallyCanceled;
	}
}




*Any remarks are welcome. We hope this feature to be included in bacula, so
we made it with existing clients configuration in mind in order not to
disturb existing configurations. *
    

My questions are above. Aside from the one remark I made above, the only other 
remark I have for the moment (until I see the answers) is to say, it is 
always preferable to announce and discuss a project prior to coding it -- it 
can possibly save you a lot of time recoding it or the horrible frustration 
of having it rejected after you've spent a lot of time on it.
  

yes, I know we should have made so
Once I have your responses to my questions, I will make my remarks.


Best regards,

Kern


  
begin:vcard
fn;quoted-printable:Jean-S=C3=A9bastien Hederer
n;quoted-printable:Hederer;Jean-S=C3=A9bastien
org:ASPerience
email;internet:hedererjs@xxxxxxxxxxxxx
title;quoted-printable:G=C3=A9rant
tel;cell:0669562149
x-mozilla-html:FALSE
url:http://www.asperience.fr
version:2.1
end:vcard

Attachment: pgp_0nBSVRjVc.pgp
Description: Signature =?utf-8?b?bnVtw6lyaXF1ZQ==?= PGP

Attachment: binuMAv3DWVnN.bin
Description: Clef publique PGP

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-devel mailing list
Bacula-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/bacula-devel


This mailing list archive is a service of Copilotco.