fl4health.servers.base_server module¶

class FlServer(client_manager, fl_config, strategy=None, reporters=None, checkpoint_and_state_module=None, on_init_parameters_config_fn=None, server_name=None, accept_failures=True)[source]¶

Bases: Server

__init__(client_manager, fl_config, strategy=None, reporters=None, checkpoint_and_state_module=None, on_init_parameters_config_fn=None, server_name=None, accept_failures=True)[source]¶

Base Server for the library to facilitate strapping additional/useful machinery to the base flwr server.

Parameters:

client_manager (ClientManager) – Determines the mechanism by which clients are sampled by the server, if they are to be sampled at all.
fl_config (Config) –
This should be the configuration that was used to setup the federated training. In most cases it should be the “source of truth” for how FL training/evaluation should proceed. For example, the config used to produce the on_fit_config_fn and on_evaluate_config_fn for the strategy.

NOTE: This config is DISTINCT from the Flwr server config, which is extremely minimal.
strategy (Strategy | None, optional) – The aggregation strategy to be used by the server to handle. client updates and other information potentially sent by the participating clients. If None the strategy is FedAvg as set by the flwr Server. Defaults to None.
reporters (Sequence[BaseReporter] | None, optional) – sequence of FL4Health reporters which the server should send data to before and after each round. Defaults to None.
checkpoint_and_state_module (BaseServerCheckpointAndStateModule | None, optional) – This module is used to handle both model checkpointing and state checkpointing. The former is aimed at saving model artifacts to be used or evaluated after training. The latter is used to preserve training state (including models) such that if FL training is interrupted, the process may be restarted. If no module is provided, no checkpointing or state preservation will happen. Defaults to None.
on_init_parameters_config_fn (Callable[[int], dict[str, Scalar]] | None, optional) –
Function used to configure how one asks a client to provide parameters from which to initialize all other clients by providing a Config dictionary. If this is none, then a blank config is sent with the parameter request (which is default behavior for flower servers).

NOTE: If you are using a client defined in this library, passing a blank configuration will ALMOST CERTAINLY fail. This is because asking a client for parameters will almost always require setting up the client, as is done when fitting. In many cases, you can simply pass your on_fit_config_fn function from the strategy to as this argument as well.

Defaults to None.
server_name (str | None, optional) – An optional string name to uniquely identify server. This name is also used as part of any state checkpointing done by the server. Defaults to None.
accept_failures (bool, optional) – Determines whether the server should accept failures during training or evaluation from clients or not. If set to False, this will cause the server to shutdown all clients and throw an exception. Defaults to True.

evaluate_round(server_round, timeout)[source]¶

This function runs evaluation after a round of training.

By default the checkpointing works off of the aggregated evaluation loss from each of the clients

NOTE: parameter aggregation occurs before evaluation, so the parameters held by the server have been updated prior to this function being called.

Parameters:

server_round (int) – Server round we’re currently on.
timeout (float | None) – Time that the server should wait (in seconds) for responses from the clients. Defaults to None, which indicates indefinite timeout.

Returns:

Tuple of loss value, metrics dictionary and individual client results (client ids and failures).

Return type:

tuple[float | None, dict[str, Scalar], EvaluateResultsAndFailures] | None

fit(num_rounds, timeout)[source]¶

Run federated learning for a number of rounds. This function also allows the server to perform some operations prior to fitting starting. This is useful, for example, if you need to communicate with the clients to initialize anything prior to FL starting (see nnunet server for an example).

Parameters:

num_rounds (int) – Number of server rounds to run.
timeout (float | None) – The amount of time in seconds that the server will wait for results from the clients selected to participate in federated training.

Returns:

The first element of the tuple is a History object containing the full set of FL training results, including things like aggregated loss and metrics. Tuple also contains the elapsed time in seconds for the round.

Return type:

tuple[History, float]

fit_round(server_round, timeout)[source]¶

This function is called at each round of federated training. The flow is generally the same as a flower server, where clients are sampled and client side training is requested from the clients that are chosen. This function simply adds a bit of logging, post processing of the results.

Parameters:

server_round (int) – Current round number of the FL training. Begins at 1.
timeout (float | None) – Time that the server should wait (in seconds) for responses from the clients. Defaults to None, which indicates indefinite timeout.

Returns:

The results of training on the client sit. The first set of parameters are the AGGREGATED parameters from the strategy. The second is a dictionary of AGGREGATED metrics. The third component holds the individual (non-aggregated) parameters, loss, and metrics for successful and unsuccessful client-side training.

Return type:

tuple[Parameters | None, dict[str, Scalar], FitResultsAndFailures] | None

fit_with_per_round_checkpointing(num_rounds, timeout)[source]¶

Runs federated learning for a number of rounds. Heavily based on the fit method from the base server provided by flower (flwr.server.server.Server) except that it is resilient to preemptions. It accomplishes this by checkpointing the server state each round. In the case of preemption, when the server is restarted it will load from the most recent checkpoint.

Parameters:

num_rounds (int) – The number of rounds to perform federated learning.
timeout (float | None) – The timeout for clients to return results in a given FL round.

Returns:

The first element of the tuple is a History object containing the losses and metrics computed during training and validation. The second element of the tuple is the elapsed time in seconds.

Return type:

tuple[History, float]

poll_clients_for_sample_counts(timeout)[source]¶

Poll clients for sample counts from their training set, if you want to use this functionality your strategy needs to inherit from the StrategyWithPolling ABC and implement a configure_poll function.

Parameters:: timeout (float | None) – Timeout for how long the server will wait for clients to report counts. If none then the server waits indefinitely.
Returns:: The number of training samples held by each client in the pool of available clients.
Return type:: list[int]

report_centralized_eval(history, num_rounds)[source]¶

Return type:: None

shutdown()[source]¶

Currently just records termination of the server process and disconnects and reporters that need to be.

Return type:: None

update_before_fit(num_rounds, timeout)[source]¶

Hook method to allow the server to do some work before starting the fit process. In the base server, it is a no-op function, but it can be overridden in child classes for custom functionality. For example, the NnUNetServer class uses this method to ask a client to initialize the global nnunet plans if one is not provided in the config. This can only be done after the clients have started up and are ready to train.

Parameters:

num_rounds (int) – The number of server rounds of FL to be performed.
timeout (float | None, optional) – The server’s timeout parameter. Useful if one is requesting information from a client. Defaults to None, which indicates indefinite timeout.

Return type:

None