fl4health.datasets.rxrx1.preprocess module

filter_and_save_data(metadata, top_sirna_ids, cell_type, output_path)[source]

Filters data for the given cell type and frequency of their sirna_id and saves it to a CSV file.

Parameters:
  • metadata (pd.DataFrame) – Metadata containing information about all images.

  • top_sirna_ids (list[int]) – Top sirna_id values to filter by.

  • cell_type (str) – Cell type to filter by.

  • output_path (Path) – Path to save the filtered metadata.

Return type:

None

load_image(row, root)[source]

Load an image tensor for a given row of metadata.

Parameters:
  • row (dict[str, Any]) – A row of metadata containing experiment, plate, well, and site information.

  • root (Path) – Root directory containing the image files.

Returns:

The loaded image tensor.

Return type:

torch.Tensor

main(dataset_dir)[source]
Return type:

None

process_data(metadata, input_dir, output_dir, client_num, type_data)[source]

Process the entire dataset, loading image tensors for each row.

Parameters:
  • metadata (pd.DataFrame) – Metadata containing information about all images.

  • input_dir (Path) – Input directory containing the image files.

  • output_dir (Path) – Output directory containing the image files.

  • client_num (int) – Client number to load data for.

  • type_data (str) – ‘train’ or ‘test’ to specify dataset type.

Return type:

None

save_to_pkl(data, output_path)[source]

Save data to a pickle file.

Parameters:
  • data (torch.Tensor) – Data to save.

  • output_path (str) – Path to the output pickle file.

Return type:

None