Re: [Bacula-devel] patch for presence of file daemon


I have a few questions, please see below ...

On Thursday 14 August 2008 17:23:39 Jean-Sébastien Hederer wrote:
> Hi,
> Maxime Rousseau has created a new feature for bacula. This patch has been
> created in order to optimize the communications between the File daemon and
> the Director daemon. It has been written for 2.4.0. All regression tests
> and function tests have been passed for 2.4.0. Patch is ready for 2.4.0.
> Maxime is making patch for 2.4.2 and trunk before sending it. Here are some
> explanations:
> With the new features, Bacula can backup clients who change their IP like
> laptops. 

Bacula already has a means to backup clients, which change their IP address.  
However, I admit that it could be optimized a bit.

> There are less error messages when a job is canceled because of 
> the absence of the File Daemon. 

Yes, there are probably too many error messages, but then it depends on how 
you configure Bacula ...

> The communication between FD and DIR become 
> bidirectional so connections are more frequent.

Maybe I am misunderstanding you, but more frequent connections are bad not 

> New features for the DIR:
> 	- when the DIR start, he tries to connect to the FD. If the connection is
> successful, a presence parameter in the Client ressource change to "yes".
> Else the presence parameter keep his value "no". 

What happens if the Director has 2,000 clients?  

Does the DIR stall until it contacts them all?  How many resources does it 
take to contact them?

How long will it take for it to contact the last of the 2,000 clients?  

If a scheduled job starts for the 2,000th client before the 2,000th client is 
contacted by the startup routine, will the job be retarded in starting?

What happens if I have clients that I don't want the director contacting 
because they are very infrequently used and the jobs are only manually 

> - when the DIR is going to 
> start a new job, he checks the presence parameter. If the client is
> present, the DIR starts the job, else he waits for him during a time
> specified in the Client ressource in the bacula-dir.conf (this parameter is
> named "WaitTimer"). He checks if the client is connected at each interval
> of a time (attribute "PresenceTimer" in bacula-dir.conf). If the client
> never connect himself during the "WaitTimer" time, the job is marked as
> "JSAutomaticallyCanceled" in the Catalog. "JSAutomaticallyCanceled" is a
> new parameter defined in jcr.h and it means that the job is canceled
> because the File daemon has never been connected. 

Is it possible to turn off this behavior?  I don't want it for my setup, 
because it is not always possible for my clients to contact the Director.

> - I have created a new 
> file named fd_server.c. It allow the DIR to listen to the File Daemon
> connections (the default port is 9104, parameter DIRportFD in Director
> ressource of bacula.dir.conf). The parameter MaxClientsPresence defined in
> Director ressource in bacula-dir.conf decide how many File Daemons the DIR
> can listen simultaneously. - Authentifications fonctions are also
> implemented in authenticate.c in src/dird and src/filed.

The IANA will never approve of a fourth port for Bacula.  There is no reason 
to have the Director listening on two ports.  These connections should be 
multiplexed on port 9101.

> New features for the FD:
> 	- the FD must know the address of the Director which is stocked in the
> Director ressource in bacula-fd.conf. 

What happens if there are two or three Directors that can contact a File 
daemon as is the case at my site?

> Also, he knows on which port he is able to contact the DIR (default 9104). 

> - when the FD start, he tries to 
> connect to the DIR. If the connection is successful, a presence parameter
> in the Client ressource of the Director daemon changes to "yes". Else the
> presence parameter keep his default value "no". For the authentification he
> uses the existing password between the File Daemon and the Director. The
> File Daemon gives his new address to the DIR so if the client is a laptop,
> jobs can be run with any IP. 

As I mentioned above this capability already exists with SETIP.

> - when the File Daemon stops, he warns the DIR he is going away. After this 
warning, presence_parameter = 0 : the DIR
> knows the client is absent. 

So let's say that the client goes away and notifies the Director, then when 
the client starts again, because of some temporary problem it cannot notify 
the director.  Is the client then essentially disabled?

> This feature doesn't work on Windows system. 
> Perhaps the FD not finished in the same way as it stop on Linux. At least,
> on Windows, bacula does not go in the fonction "terminate_filed" in filed.c
> so the presence parameter keep his value at 1. ----> Perhaps there is a
> possible upgrade to do.

It is possible that the Win32 FD gets some serious error on termination so it 
never gets to the terminate_filed() code.  This is also possible any time any 
FD crashes.

> For the connections at the start of the two Daemons, there is a
> retry_interval defined at 10 seconds (if connection fail, retry after 10
> seconds) and a max_retry_time defined at 20 seconds (abandon connection
> after 20 seconds).
> Normally, the old configurations works fine even though files are patched.
> If configuration files not exist when we apply the patch, they are created
> with a new configuration (Presence parameter, PresenceTimer, WaitTimer,
> Address of the Director...). Else you must modify the configuration files:
> if the Presence parameter in Client ressource in bacula-dir.conf and the
> address attribute in Director ressource in bacula-fd.conf not exist, bacula
> will run like an old configuration.
> Exemple of a new configuration:
> 1/ In "bacula-dir.conf"
> Director {                            # define myself
>   Name = localhost-dir
>   DIRport = 9101            # where we listen for UA connections
>   DIRportFD = 9104  # where we listen for FD connections ----------------->
> NEW QueryFile = "/home/rousseaum/bacula/bin/query.sql"
>   WorkingDirectory = "/home/rousseaum/bacula/working"
>   PidDirectory = "/home/rousseaum/bacula/working"
>   Maximum Concurrent Jobs = 1
>   Password = "6V2ghmC6A0YUfncxiF5wJJ1x+WAT2BpUD55l1tfaOury"         #
> Console password Messages = Daemon

>   MaxClientsPresence = 20  #How many client the DIR can listen
> simultaneously -----------------> NEW

Why is this needed?

When the client connects to the Dir does it remain connected or does it 
disconnect after announcing its presence?

> }
> Client {
>   Name = localhost-fd
>   Address = localhost
>   FDPort = 9102
>   Catalog = MyCatalog
>   Password = "VfCC+e5Lp87mlgdW58PqkxLRvyM2jcwhGCkBMNOOuzXz"          #
> password for FileDaemon File Retention = 30 days            # 30 days
>   Job Retention = 6 months            # six months
>   AutoPrune = yes                     # Prune expired Jobs/Files
>   Presence = yes	# The presence parameter exist ------------------------->
> NEW PresenceTimer = 15 # Maximum time to verify the client presence
> --------> NEW WaitTimer = 60 minutes  # Maximum time to wait the client
> --------------> NEW # PresenceTimer and WaitTimer are defined in second by
> default. We can use minutes, hours, days... like the other # temporal
> parameter in Bacula.
> }
> 2/ In "bacula-fd.conf"
> Director {
>   Name = localhost-dir
>   Address = localhost
>   DIRport = 9104 --------------------------------------------------------->
> NEW Password = "VfCC+e5Lp87mlgdW58PqkxLRvyM2jcwhGCkBMNOOuzXz"
> }
> Exemple of a typical communication between the FD and the DIR:
> 1/ Starting daemons:
> 1.1/ DIR starts before FD (most frequent situations)
> DIR starts;
> DIR tries to connect to FD;
> if (FD connected) {
> 	presence_parameter = 1;
> }
> FD starts;
> FD tries to connect to DIR;
> if (DIR connected) {
> 	presence_parameter = 1;
> 	FD give his new address to DIR;

How does the FD pass his address to the DIR? 

> }
> 1.2/ FD starts before DIR
> FD starts;
> FD tries to connect to DIR;
> if (DIR connected) {
> 	presence_parameter = 1;
> 	FD give his new address to DIR;
> }
> DIR starts;
> DIR tries to connect to FD;
> if (FD connected) {
> 	presence_parameter = 1;
> }
> 1/ Starting job (Backup, Restore):
> DIR check FD presence;
> if (FD hasn't got presence_parameter) {                  ----> old
> configuration run job like old configuration;
> }
> else {																	----> new configuration
> 	if (FD present) {
> 		run job;
> 	}
> 	else {
> 		while (WaitTimer isn't terminate) {
> 			check FD connection all the PresenceTimer interval;
> 			if (FD connect) {
> 				run job;
> 			}
> 		}
> 		Job mark at JSAutomaticallyCanceled;
> 	}
> }
> *Any remarks are welcome. We hope this feature to be included in bacula, so
> we made it with existing clients configuration in mind in order not to
> disturb existing configurations. *

My questions are above. Aside from the one remark I made above, the only other 
remark I have for the moment (until I see the answers) is to say, it is 
always preferable to announce and discuss a project prior to coding it -- it 
can possibly save you a lot of time recoding it or the horrible frustration 
of having it rejected after you've spent a lot of time on it.

Once I have your responses to my questions, I will make my remarks.

Best regards,


