mmlearn.datasets.core.example

Module for example-related classes and functions.

Functions

find_matching_indices(first_example_ids, second_example_ids)[source]

Find the indices of matching examples given two tensors of example ids.

Matching examples are defined as examples with the same value in both tensors. This method is useful for finding pairs of examples from different modalities that are related to each other in a batch.

Parameters:
  • first_example_ids (torch.Tensor) – A tensor of example ids of shape (N, 2), where N is the number of examples.

  • second_example_ids (torch.Tensor) – A tensor of example ids of shape (M, 2), where M is the number of examples.

Returns:

A tuple of tensors containing the indices of matching examples in the first and second tensor, respectively.

Return type:

tuple[torch.Tensor, torch.Tensor]

Raises:
  • TypeError – If either first_example_ids or second_example_ids is not a tensor.

  • ValueError – If either first_example_ids or second_example_ids is not a 2D tensor with the second dimension having a size of 2.

Examples

>>> img_example_ids = torch.tensor([(0, 0), (0, 1), (1, 0), (1, 1)])
>>> text_example_ids = torch.tensor([(1, 0), (1, 1), (2, 0), (2, 1), (2, 2)])
>>> find_matching_indices(img_example_ids, text_example_ids)
(tensor([2, 3]), tensor([0, 1]))

Classes

Example

A representation of a single example from a dataset.

class Example(init_dict=None)[source]

A representation of a single example from a dataset.

This class is a subclass of OrderedDict and provides attribute-style access. This means that example[“text”] and example.text are equivalent. All datasets in this library return examples as Example objects.

Parameters:

init_dict (Optional[MutableMapping[Hashable, Any]], optional, default=None) – Dictionary to init Example class with.

Examples

>>> example = Example({"text": torch.tensor(2)})
>>> example.text.zero_()
tensor(0)
>>> example.context = torch.tensor(4)  # set custom attributes after initialization
create_ids()[source]

Create a unique id for the example from the dataset and example index.

This method combines the dataset index and example index to create an attribute called example_ids, which is a dictionary of tensors. The dictionary keys are all the keys in the example except for example_ids, example_index, and dataset_index. The values are tensors of shape (2,) containing the tuple (dataset_index, example_index) for each key. The example_ids is used to (re-)identify pairs of examples from different modalities after they have been combined into a batch.

Warns:

UserWarning – If the example_index and dataset_index attributes are not set.

Return type:

None

Notes

  • The Example must have the following attributes set before calling this this method: example_index (usually set/returned by the dataset) and dataset_index (usually set by the CombinedDataset object)

  • The find_matching_indices() function can be used to find matching examples given two tensors of example ids.

find_matching_indices(first_example_ids, second_example_ids)[source]

Find the indices of matching examples given two tensors of example ids.

Matching examples are defined as examples with the same value in both tensors. This method is useful for finding pairs of examples from different modalities that are related to each other in a batch.

Parameters:
  • first_example_ids (torch.Tensor) – A tensor of example ids of shape (N, 2), where N is the number of examples.

  • second_example_ids (torch.Tensor) – A tensor of example ids of shape (M, 2), where M is the number of examples.

Returns:

A tuple of tensors containing the indices of matching examples in the first and second tensor, respectively.

Return type:

tuple[torch.Tensor, torch.Tensor]

Raises:
  • TypeError – If either first_example_ids or second_example_ids is not a tensor.

  • ValueError – If either first_example_ids or second_example_ids is not a 2D tensor with the second dimension having a size of 2.

Examples

>>> img_example_ids = torch.tensor([(0, 0), (0, 1), (1, 0), (1, 1)])
>>> text_example_ids = torch.tensor([(1, 0), (1, 1), (2, 0), (2, 1), (2, 2)])
>>> find_matching_indices(img_example_ids, text_example_ids)
(tensor([2, 3]), tensor([0, 1]))