mmlearn.datasets.core.samplers.CombinedDatasetRatioSampler¶
- class CombinedDatasetRatioSampler(dataset, ratios=None, num_samples=None, replacement=False, shuffle=True, rank=None, num_replicas=None, drop_last=False, seed=0)[source]¶
-
Sampler for weighted sampling from a
CombinedDataset
.- Parameters:
dataset (CombinedDataset) – An instance of
CombinedDataset
to sample from.ratios (Optional[Sequence[float]], optional, default=None) – A sequence of ratios for sampling from each dataset in the combined dataset. The length of the sequence must be equal to the number of datasets in the combined dataset (dataset). If None, the length of each dataset in the combined dataset is used as the ratio. The ratios are normalized to sum to 1.
num_samples (Optional[int], optional, default=None) – The number of samples to draw from the combined dataset. If None, the sampler will draw as many samples as there are in the combined dataset. This number must yield at least one sample per dataset in the combined dataset, when multiplied by the corresponding ratio.
replacement (bool, default=False) – Whether to sample with replacement or not.
shuffle (bool, default=True) – Whether to shuffle the sampled indices or not. If False, the indices of each dataset will appear in the order they are stored in the combined dataset. This is similar to sequential sampling from each dataset. The datasets that make up the combined dataset are still sampled randomly.
rank (Optional[int], optional, default=None) – Rank of the current process within
num_replicas
. By default,rank
is retrieved from the current distributed group.num_replicas (Optional[int], optional, default=None) – Number of processes participating in distributed training. By default,
num_replicas
is retrieved from the current distributed group.drop_last (bool, default=False) – Whether to drop the last incomplete batch or not. If True, the sampler will drop samples to make the number of samples evenly divisible by the number of replicas in distributed mode.
seed (int, default=0) – Random seed used to when sampling from the combined dataset and shuffling the sampled indices.
- dataset¶
The dataset to sample from.
- Type:
- probs¶
The probabilities for sampling from each dataset in the combined dataset. This is computed from the ratios argument and is normalized to sum to 1.
- Type:
- rank¶
Rank of the current process within
num_replicas
.- Type:
- drop_last¶
Whether to drop samples to make the number of samples evenly divisible by the number of replicas in distributed mode.
- Type:
- seed¶
Random seed used to when sampling from the combined dataset and shuffling the sampled indices.
- Type:
- epoch¶
Current epoch number. This is used to set the random seed. This is useful in distributed mode to ensure that each process receives a different random ordering of the samples.
- Type:
Methods
Attributes