fl4health.servers.base_server module¶
- class FlServer(client_manager, fl_config, strategy=None, reporters=None, checkpoint_and_state_module=None, on_init_parameters_config_fn=None, server_name=None, accept_failures=True)[source]¶
Bases:
Server
- __init__(client_manager, fl_config, strategy=None, reporters=None, checkpoint_and_state_module=None, on_init_parameters_config_fn=None, server_name=None, accept_failures=True)[source]¶
Base Server for the library to facilitate strapping additional/useful machinery to the base flwr server.
- Parameters:
client_manager (ClientManager) – Determines the mechanism by which clients are sampled by the server, if they are to be sampled at all.
fl_config (Config) – This should be the configuration that was used to setup the federated training. In most cases it should be the “source of truth” for how FL training/evaluation should proceed. For example, the config used to produce the on_fit_config_fn and on_evaluate_config_fn for the strategy. NOTE: This config is DISTINCT from the Flwr server config, which is extremely minimal.
strategy (Strategy | None, optional) – The aggregation strategy to be used by the server to handle. client updates and other information potentially sent by the participating clients. If None the strategy is FedAvg as set by the flwr Server. Defaults to None.
reporters (Sequence[BaseReporter] | None, optional) – sequence of FL4Health reporters which the server should send data to before and after each round. Defaults to None.
checkpoint_and_state_module (BaseServerCheckpointAndStateModule | None, optional) – This module is used to handle both model checkpointing and state checkpointing. The former is aimed at saving model artifacts to be used or evaluated after training. The latter is used to preserve training state (including models) such that if FL training is interrupted, the process may be restarted. If no module is provided, no checkpointing or state preservation will happen. Defaults to None.
on_init_parameters_config_fn (Callable[[int], dict[str, Scalar]] | None, optional) – Function used to configure how one asks a client to provide parameters from which to initialize all other clients by providing a Config dictionary. If this is none, then a blank config is sent with the parameter request (which is default behavior for flower servers). Defaults to None.
server_name (str | None, optional) – An optional string name to uniquely identify server. This name is also used as part of any state checkpointing done by the server. Defaults to None.
accept_failures (bool, optional) – Determines whether the server should accept failures during training or evaluation from clients or not. If set to False, this will cause the server to shutdown all clients and throw an exception. Defaults to True.
- evaluate_round(server_round, timeout)[source]¶
Validate current global model on a number of clients.
- fit(num_rounds, timeout)[source]¶
Run federated learning for a number of rounds. This function also allows the server to perform some operations prior to fitting starting. This is useful, for example, if you need to communicate with the clients to initialize anything prior to FL starting (see nnunet server for an example)
- Parameters:
- Returns:
- The first element of the tuple is a history object containing the full set of
FL training results, including things like aggregated loss and metrics. Tuple also contains the elapsed time in seconds for the round.
- Return type:
- fit_round(server_round, timeout)[source]¶
This function is called at each round of federated training. The flow is generally the same as a flower server, where clients are sampled and client side training is requested from the clients that are chosen. This function simply adds a bit of logging, post processing of the results
- Parameters:
- Returns:
- The results of training
on the client sit. The first set of parameters are the AGGREGATED parameters from the strategy. The second is a dictionary of AGGREGATED metrics. The third component holds the individual (non-aggregated) parameters, loss, and metrics for successful and unsuccessful client-side training.
- Return type:
tuple[Parameters | None, dict[str, Scalar], FitResultsAndFailures] | None
- fit_with_per_round_checkpointing(num_rounds, timeout)[source]¶
Runs federated learning for a number of rounds. Heavily based on the fit method from the base server provided by flower (flwr.server.server.Server) except that it is resilient to preemptions. It accomplishes this by checkpointing the server state each round. In the case of preemption, when the server is restarted it will load from the most recent checkpoint.
- Parameters:
- Returns:
- The first element of the tuple is a history object containing the losses and
metrics computed during training and validation. The second element of the tuple is the elapsed time in seconds.
- Return type:
- poll_clients_for_sample_counts(timeout)[source]¶
Poll clients for sample counts from their training set, if you want to use this functionality your strategy needs to inherit from the StrategyWithPolling ABC and implement a configure_poll function.
- shutdown()[source]¶
Currently just records termination of the server process and disconnects and reporters that need to be.
- Return type:
- update_before_fit(num_rounds, timeout)[source]¶
Hook method to allow the server to do some work before starting the fit process. In the base server, it is a no-op function, but it can be overridden in child classes for custom functionality. For example, the NnUNetServer class uses this method to ask a client to initialize the global nnunet plans if one is not provided in the config. This can only be done after the clients have started up and are ready to train.