fl4health.servers.evaluate_server module

class EvaluateServer(client_manager, fraction_evaluate, model_checkpoint_path=None, evaluate_config=None, evaluate_metrics_aggregation_fn=None, accept_failures=True, min_available_clients=1, reporters=None)[source]

Bases: Server

__init__(client_manager, fraction_evaluate, model_checkpoint_path=None, evaluate_config=None, evaluate_metrics_aggregation_fn=None, accept_failures=True, min_available_clients=1, reporters=None)[source]
Parameters:
  • client_manager (ClientManager) – Determines the mechanism by which clients are sampled by the server, if they are to be sampled at all.

  • fraction_evaluate (float) – Fraction of clients used during evaluation.

  • model_checkpoint_path (Path | None, optional) – Server side model checkpoint path to load global model from. Defaults to None.

  • evaluate_config (dict[str, Scalar] | None, optional) – Configuration dictionary to configure evaluation on clients. Defaults to None.

  • evaluate_metrics_aggregation_fn (MetricsAggregationFn | None, optional) – Metrics aggregation function. Defaults to None.

  • accept_failures (bool, optional) – Whether or not accept rounds containing failures. Defaults to True.

  • min_available_clients (int, optional) – Minimum number of total clients in the system. Defaults to 1. Defaults to 1.

  • reporters (Sequence[BaseReporter], optional) – A sequence of FL4Health reporters which the client should send data to.

aggregate_evaluate(results, failures)[source]

Aggregate evaluation results using the evaluate_metrics_aggregation_fn provided. Note that a dummy loss is returned as we assume that it was packed into the metrics dictionary for this functionality.

Parameters:
  • results (list[tuple[ClientProxy, EvaluateRes]]) – List of results objects that have the metrics returned from each client, if successful, along with the number of samples used in the evaluation.

  • failures (list[tuple[ClientProxy, EvaluateRes] | BaseException]) – Failures reported by the clients along with the client id, the results that we passed, if any, and the associated exception if one was raised.

Returns:

A dummy float for the “loss” (these are packed with the metrics)

and the aggregated metrics dictionary.

Return type:

tuple[float | None, dict[str, Scalar]]

configure_evaluate()[source]

Configure the next round of evaluation. This handles the two different was that a set of clients might be sampled.

Returns:

List of configuration instructions for the clients selected by the

client manager for evaluation. These configuration objects are sent to the clients to customize evaluation.

Return type:

list[tuple[ClientProxy, EvaluateIns]]

federated_evaluate(timeout)[source]

Validate current global model on a number of clients.

Parameters:

timeout (float | None) – Timeout in seconds that the server should wait for the clients to response. If none, then it will wait for the minimum number to respond indefinitely.

Returns:

The first value is the

loss, which is ignored since we pack loss from the global and local models into the metrics dictionary The second is the aggregated metrics passed from the clients, the third is the set of raw results and failure objects returned by the clients.

Return type:

tuple[float | None, dict[str, Scalar], EvaluateResultsAndFailures] | None

fit(num_rounds, timeout)[source]

In order to head off training and only run eval, we have to override the fit function as this is essentially the entry point for federated learning from the app.

Parameters:
  • num_rounds (int) – Not used.

  • timeout (float | None) – Timeout in seconds that the server should wait for the clients to respond. If none, then it will wait for the minimum number to respond indefinitely.

Returns:

The first element of the tuple is a History object containing the aggregated

metrics returned from the clients. Tuple also contains elapsed time in seconds for round.

Return type:

tuple[History, float]

load_model_checkpoint_to_parameters()[source]
Return type:

Parameters