Skip to content

Knowledge Node

Knowledge Node

NodeContent

Bases: TypedDict

A TypedDict representing the content of a node.

Attributes:

Name Type Description
text_content str | None

The text content of the node, if any.

image_content bytes | None

The binary image content of the node, if any.

Source code in src/fed_rag/data_structures/knowledge_node.py
class NodeContent(TypedDict):
    """A TypedDict representing the content of a node.

    Attributes:
        text_content: The text content of the node, if any.
        image_content: The binary image content of the node, if any.
    """

    text_content: str | None
    image_content: bytes | None

NodeType

Bases: str, Enum

Type of node.

Attributes:

Name Type Description
TEXT

Text node.

IMAGE

Image node.

MULTIMODAL

Multimodal node.

Source code in src/fed_rag/data_structures/knowledge_node.py
class NodeType(str, Enum):
    """Type of node.

    Attributes:
        TEXT: Text node.
        IMAGE: Image node.
        MULTIMODAL: Multimodal node.

    """

    TEXT = "text"
    IMAGE = "image"
    MULTIMODAL = "multimodal"

KnowledgeNode

Bases: BaseModel

Represents a knowledge node with metadata, content, and embeddings.

A KnowledgeNode can store text, image, or multimodal content types, along with semantic embeddings and metadata. Validation rules enforce correctness of required fields based on node type. Metadata can be serialized to or deserialized from JSON for storage or communication.

Attributes:

Name Type Description
node_id str

Unique identifier for the node, generated by default.

embedding list[float] | None

Encoded semantic representation. Shared embedding for text and image in multimodal nodes.

node_type NodeType

Type of node (TEXT, IMAGE, or MULTIMODAL).

text_content str | None

Text content of the node. Required for TEXT and MULTIMODAL nodes.

image_content bytes | None

Binary image data for IMAGE and MULTIMODAL nodes.

metadata dict

Arbitrary key-value metadata associated with the node.

Source code in src/fed_rag/data_structures/knowledge_node.py
class KnowledgeNode(BaseModel):
    """Represents a knowledge node with metadata, content, and embeddings.

    A KnowledgeNode can store text, image, or multimodal content types, along
    with semantic embeddings and metadata. Validation rules enforce correctness
    of required fields based on node type. Metadata can be serialized to or
    deserialized from JSON for storage or communication.

    Attributes:
        node_id (str): Unique identifier for the node, generated by default.
        embedding (list[float] | None): Encoded semantic representation. Shared
            embedding for text and image in multimodal nodes.
        node_type (NodeType): Type of node (TEXT, IMAGE, or MULTIMODAL).
        text_content (str | None): Text content of the node. Required for TEXT
            and MULTIMODAL nodes.
        image_content (bytes | None): Binary image data for IMAGE and MULTIMODAL nodes.
        metadata (dict): Arbitrary key-value metadata associated with the node.
    """

    model_config = ConfigDict(validate_default=True)

    node_id: str = Field(default_factory=lambda: str(uuid.uuid4()))
    embedding: list[float] | None = Field(
        description="Encoded representation of node. Shared between image and text in multimodal nodes.",
        default=None,
    )
    node_type: NodeType = Field(description="Type of node.")
    text_content: str | None = Field(
        description="Text content. Required for TEXT and MULTIMODAL nodes.",
        default=None,
    )
    image_content: bytes | None = Field(
        description="Binary image data for IMAGE and MULTIMODAL nodes.",
        default=None,
    )
    metadata: dict = Field(
        description="Metadata for node.", default_factory=dict
    )

    @field_validator("text_content", mode="before")
    @classmethod
    def validate_text_content(
        cls, value: str | None, info: ValidationInfo
    ) -> str | None:
        """
        Validates the `text_content` field based on the `node_type` before assignment. Ensures
        that when certain `node_type` values are provided, `text_content` is not None.

        Parameters:
        value: str | None
            The value of the `text_content` field to validate.
        info: ValidationInfo
            Additional context about the data being validated.

        Returns:
        str | None
            The validated `text_content` value.

        Raises:
        ValueError
            If `node_type` is `TEXT` or `MULTIMODAL` and `text_content` is None.
        """
        node_type = cast(NodeType, info.data.get("node_type"))
        if node_type in {NodeType.TEXT, NodeType.MULTIMODAL} and value is None:
            raise ValueError(
                f"NodeType == '{node_type.value}', but text_content is None."
            )
        return value

    @field_validator("image_content", mode="after")
    @classmethod
    def validate_image_content(
        cls, value: bytes | None, info: ValidationInfo
    ) -> bytes | None:
        """
        Validates the `image_content` field based on the associated `node_type`.

        This method ensures that when the `node_type` of a node is either `IMAGE` or
        `MULTIMODAL`, the `image_content` field is not left empty. If `image_content`
        remains unset, a `ValueError` exception is raised. Otherwise, the method
        returns the validated `image_content`.

        Parameters:
        value (bytes | None): The value of the `image_content` field to be validated.
        info (ValidationInfo): Additional validation information containing metadata
        about the node, including the `node_type`.

        Returns:
        bytes | None: The validated `image_content` field.

        Raises:
        ValueError: If the `node_type` is `IMAGE` or `MULTIMODAL` and the
        `image_content` field is None.
        """
        node_type = cast(NodeType, info.data.get("node_type"))
        if (
            node_type in {NodeType.IMAGE, NodeType.MULTIMODAL}
            and value is None
        ):
            raise ValueError(
                f"NodeType == '{node_type.value}', but image_content is None."
            )
        return value

    def get_content(self) -> NodeContent:
        """Return the node's content.

        Returns:
            NodeContent: Dictionary with `image_content` and `text_content`.
        """
        return {
            "image_content": self.image_content,
            "text_content": self.text_content,
        }

    @field_serializer("metadata")
    def serialize_metadata(
        self, metadata: dict[Any, Any] | None
    ) -> str | None:
        """
        Serializes the metadata dictionary into a JSON string.

        This method serves as a serializer for the `metadata` field, converting
        a dictionary into its JSON representation. If the input dictionary is None,
        the method will return None. Such serialized data can be utilized in situations
        where JSON representation of metadata is required.

        Parameters:
        metadata: dict[Any, Any] | None
            A dictionary containing metadata to be serialized. It can also be None.

        Returns:
        str | None
            The JSON string representation of the metadata dictionary, or None if the
            metadata input was None.
        """
        return json.dumps(metadata) if metadata else None

    @field_validator("metadata", mode="before")
    @classmethod
    def deserialize_metadata(
        cls, metadata: dict[Any, Any] | str | None
    ) -> dict[Any, Any]:
        """Custom validator for the metadata field.

        Will deserialize the metadata from a json string if it's a string.

        Args:
            metadata: Metadata to validate. If it is a json string, it will be deserialized into a dictionary.
        Returns: Validated metadata.

        """
        if isinstance(metadata, str):
            deserialized_metadata = json.loads(metadata)
            return cast(dict[Any, Any], deserialized_metadata)
        return metadata or {}

    def model_dump_without_embeddings(self) -> dict[str, Any]:
        """
        Returns a dictionary representation of the model excluding the embeddings.

        This method is used to generate a dictionary dump of the current model's
        state while specifically excluding the 'embedding' field. It is particularly
        useful for exporting or serializing model data without including large
        or sensitive attributes.

        Returns:
            dict[str, Any]: A dictionary containing the model's data excluding
            attributes related to embeddings.
        """
        return self.model_dump(exclude={"embedding"})

validate_text_content classmethod

validate_text_content(value, info)

Validates the text_content field based on the node_type before assignment. Ensures that when certain node_type values are provided, text_content is not None.

value: str | None The value of the text_content field to validate. info: ValidationInfo Additional context about the data being validated.

str | None The validated text_content value.

ValueError If node_type is TEXT or MULTIMODAL and text_content is None.

Source code in src/fed_rag/data_structures/knowledge_node.py
@field_validator("text_content", mode="before")
@classmethod
def validate_text_content(
    cls, value: str | None, info: ValidationInfo
) -> str | None:
    """
    Validates the `text_content` field based on the `node_type` before assignment. Ensures
    that when certain `node_type` values are provided, `text_content` is not None.

    Parameters:
    value: str | None
        The value of the `text_content` field to validate.
    info: ValidationInfo
        Additional context about the data being validated.

    Returns:
    str | None
        The validated `text_content` value.

    Raises:
    ValueError
        If `node_type` is `TEXT` or `MULTIMODAL` and `text_content` is None.
    """
    node_type = cast(NodeType, info.data.get("node_type"))
    if node_type in {NodeType.TEXT, NodeType.MULTIMODAL} and value is None:
        raise ValueError(
            f"NodeType == '{node_type.value}', but text_content is None."
        )
    return value

validate_image_content classmethod

validate_image_content(value, info)

Validates the image_content field based on the associated node_type.

This method ensures that when the node_type of a node is either IMAGE or MULTIMODAL, the image_content field is not left empty. If image_content remains unset, a ValueError exception is raised. Otherwise, the method returns the validated image_content.

Parameters: value (bytes | None): The value of the image_content field to be validated. info (ValidationInfo): Additional validation information containing metadata about the node, including the node_type.

Returns: bytes | None: The validated image_content field.

Raises: ValueError: If the node_type is IMAGE or MULTIMODAL and the image_content field is None.

Source code in src/fed_rag/data_structures/knowledge_node.py
@field_validator("image_content", mode="after")
@classmethod
def validate_image_content(
    cls, value: bytes | None, info: ValidationInfo
) -> bytes | None:
    """
    Validates the `image_content` field based on the associated `node_type`.

    This method ensures that when the `node_type` of a node is either `IMAGE` or
    `MULTIMODAL`, the `image_content` field is not left empty. If `image_content`
    remains unset, a `ValueError` exception is raised. Otherwise, the method
    returns the validated `image_content`.

    Parameters:
    value (bytes | None): The value of the `image_content` field to be validated.
    info (ValidationInfo): Additional validation information containing metadata
    about the node, including the `node_type`.

    Returns:
    bytes | None: The validated `image_content` field.

    Raises:
    ValueError: If the `node_type` is `IMAGE` or `MULTIMODAL` and the
    `image_content` field is None.
    """
    node_type = cast(NodeType, info.data.get("node_type"))
    if (
        node_type in {NodeType.IMAGE, NodeType.MULTIMODAL}
        and value is None
    ):
        raise ValueError(
            f"NodeType == '{node_type.value}', but image_content is None."
        )
    return value

get_content

get_content()

Return the node's content.

Returns:

Name Type Description
NodeContent NodeContent

Dictionary with image_content and text_content.

Source code in src/fed_rag/data_structures/knowledge_node.py
def get_content(self) -> NodeContent:
    """Return the node's content.

    Returns:
        NodeContent: Dictionary with `image_content` and `text_content`.
    """
    return {
        "image_content": self.image_content,
        "text_content": self.text_content,
    }

serialize_metadata

serialize_metadata(metadata)

Serializes the metadata dictionary into a JSON string.

This method serves as a serializer for the metadata field, converting a dictionary into its JSON representation. If the input dictionary is None, the method will return None. Such serialized data can be utilized in situations where JSON representation of metadata is required.

metadata: dict[Any, Any] | None A dictionary containing metadata to be serialized. It can also be None.

str | None The JSON string representation of the metadata dictionary, or None if the metadata input was None.

Source code in src/fed_rag/data_structures/knowledge_node.py
@field_serializer("metadata")
def serialize_metadata(
    self, metadata: dict[Any, Any] | None
) -> str | None:
    """
    Serializes the metadata dictionary into a JSON string.

    This method serves as a serializer for the `metadata` field, converting
    a dictionary into its JSON representation. If the input dictionary is None,
    the method will return None. Such serialized data can be utilized in situations
    where JSON representation of metadata is required.

    Parameters:
    metadata: dict[Any, Any] | None
        A dictionary containing metadata to be serialized. It can also be None.

    Returns:
    str | None
        The JSON string representation of the metadata dictionary, or None if the
        metadata input was None.
    """
    return json.dumps(metadata) if metadata else None

deserialize_metadata classmethod

deserialize_metadata(metadata)

Custom validator for the metadata field.

Will deserialize the metadata from a json string if it's a string.

Parameters:

Name Type Description Default
metadata dict[Any, Any] | str | None

Metadata to validate. If it is a json string, it will be deserialized into a dictionary.

required

Returns: Validated metadata.

Source code in src/fed_rag/data_structures/knowledge_node.py
@field_validator("metadata", mode="before")
@classmethod
def deserialize_metadata(
    cls, metadata: dict[Any, Any] | str | None
) -> dict[Any, Any]:
    """Custom validator for the metadata field.

    Will deserialize the metadata from a json string if it's a string.

    Args:
        metadata: Metadata to validate. If it is a json string, it will be deserialized into a dictionary.
    Returns: Validated metadata.

    """
    if isinstance(metadata, str):
        deserialized_metadata = json.loads(metadata)
        return cast(dict[Any, Any], deserialized_metadata)
    return metadata or {}

model_dump_without_embeddings

model_dump_without_embeddings()

Returns a dictionary representation of the model excluding the embeddings.

This method is used to generate a dictionary dump of the current model's state while specifically excluding the 'embedding' field. It is particularly useful for exporting or serializing model data without including large or sensitive attributes.

Returns:

Type Description
dict[str, Any]

dict[str, Any]: A dictionary containing the model's data excluding

dict[str, Any]

attributes related to embeddings.

Source code in src/fed_rag/data_structures/knowledge_node.py
def model_dump_without_embeddings(self) -> dict[str, Any]:
    """
    Returns a dictionary representation of the model excluding the embeddings.

    This method is used to generate a dictionary dump of the current model's
    state while specifically excluding the 'embedding' field. It is particularly
    useful for exporting or serializing model data without including large
    or sensitive attributes.

    Returns:
        dict[str, Any]: A dictionary containing the model's data excluding
        attributes related to embeddings.
    """
    return self.model_dump(exclude={"embedding"})