Skip to content

afnio.cognitive.modules.exact_match_evaluator

afnio.cognitive.modules.exact_match_evaluator.ExactMatchEvaluator

Bases: Module

Evaluates predictions using an exact match criterion.

This module leverages the ExactMatchEvaluator operation from afnio.autodiff.evaluator and is a specialized version of the DeterministicEvaluator that uses an exact matching function to compare the prediction and target. It returns an evaluation score (1 for exact match, 0 otherwise) and an explanation describing the evaluation result.

Examples:

>>> from afnio import cognitive as cog
>>> from afnio import set_backward_model_client
>>> set_backward_model_client("openai/gpt-4o")
>>> class ExactColor(cog.Module):
...     def __init__(self):
...         super().__init__()
...         self.exact_match = cog.ExactMatchEvaluator()
...     def forward(self, prediction, target):
...         return self.exact_match(prediction, target)
>>> prediction = afnio.Variable(
...     data="green",
...     role="color prediction",
...     requires_grad=True
... )
>>> target = "red"
>>> eval = ExactColor()
>>> score, explanation = eval(prediction, target)
>>> print(score.data)
0
>>> print(explanation.data)
'The evaluation function, designed for 'exact match', compared the <DATA> fields of the predicted variable and the target variable, resulting in a score: 0.'
>>> explanation.backward()
>>> prediction.grad[0].data
'Reassess the criteria that led to the initial prediction of 'green'.'

Raises:

Type Description
TypeError

If the types of prediction, target, reduction_fn, or reduction_fn_purpose are not as expected.

ValueError

If the lengths of prediction.data and target (or target.data, when target is a Variable) do not match when both are lists, or if reduction_fn_purpose (or reduction_fn_purpose.data) is an empty string.

See Also

afnio.autodiff.evaluator.ExactMatchEvaluator for the underlying operation.

Source code in afnio/cognitive/modules/exact_match_evaluator.py
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
class ExactMatchEvaluator(Module):
    """
    Evaluates predictions using an exact match criterion.

    This module leverages the [`ExactMatchEvaluator`][afnio.autodiff.evaluator.ExactMatchEvaluator]
    operation from `afnio.autodiff.evaluator` and is a specialized version of the
    [`DeterministicEvaluator`][afnio.cognitive.modules.deterministic_evaluator.DeterministicEvaluator]
    that uses an exact matching function to compare the `prediction` and `target`.
    It returns an evaluation `score` (`1` for exact match, `0` otherwise)
    and an `explanation` describing the evaluation result.

    Examples:
        >>> from afnio import cognitive as cog
        >>> from afnio import set_backward_model_client
        >>> set_backward_model_client("openai/gpt-4o")
        >>> class ExactColor(cog.Module):
        ...     def __init__(self):
        ...         super().__init__()
        ...         self.exact_match = cog.ExactMatchEvaluator()
        ...     def forward(self, prediction, target):
        ...         return self.exact_match(prediction, target)
        >>> prediction = afnio.Variable(
        ...     data="green",
        ...     role="color prediction",
        ...     requires_grad=True
        ... )
        >>> target = "red"
        >>> eval = ExactColor()
        >>> score, explanation = eval(prediction, target)
        >>> print(score.data)
        0
        >>> print(explanation.data)
        'The evaluation function, designed for 'exact match', compared the <DATA> fields of the predicted variable and the target variable, resulting in a score: 0.'
        >>> explanation.backward()
        >>> prediction.grad[0].data
        'Reassess the criteria that led to the initial prediction of 'green'.'

    Raises:
        TypeError: If the types of `prediction`, `target`, `reduction_fn`, or
            `reduction_fn_purpose` are not as expected.
        ValueError: If the lengths of `prediction.data` and `target` (or `target.data`,
            when `target` is a `Variable`) do not match when both are lists, or if
            `reduction_fn_purpose` (or `reduction_fn_purpose.data`) is an empty string.

    See Also:
        [`afnio.autodiff.evaluator.ExactMatchEvaluator`][afnio.autodiff.evaluator.ExactMatchEvaluator]
        for the underlying operation.
    """  # noqa: E501

    reduction_fn: Optional[Callable[[List[Any]], Any]]
    reduction_fn_purpose: Optional[Union[str, Variable]]

    def __init__(self):
        super().__init__()

        self.register_function("reduction_fn", None)
        self.register_buffer("reduction_fn_purpose", None)

    def forward(
        self,
        prediction: Variable,
        target: Union[str, List[str], Variable],
        reduction_fn: Optional[Callable[[List[Any]], Any]] = sum,
        reduction_fn_purpose: Optional[Union[str, Variable]] = "summation",
    ) -> Tuple[Variable, Variable]:
        """
        Forward pass for the exact match evaluator function.

        Warning:
            Users should not call this method directly. Instead, they should call the
            module instance itself, which will internally invoke this `forward` method.

        Args:
            prediction: The predicted variable to evaluate, which can have scalar or
                list [`data`][afnio.Variable.data] (supporting both individual and
                batch processing).
            target: The target (ground truth) to compare against, which can be a string,
                a list of strings, or a `Variable`.
            reduction_fn: An optional function to aggregate scores across a batch of
                predictions and targets. If `None`, no aggregation is applied.
            reduction_fn_purpose: A brief description of the purpose of `reduction_fn`,
                used by the autodiff engine to generate explanations. Required if
                `reduction_fn` is provided.

        Returns:
            score: A variable containing the evaluation score(s),
                or their aggregation if `reduction_fn` is provided.
            explanation: A variable containing the explanation(s) of the evaluation,
                or their aggregation if `reduction_fn` is provided.

        Raises:
            TypeError: If the types of `prediction`, `target`, `reduction_fn`,
                or `reduction_fn_purpose` are not as expected.
            ValueError: If the lengths of `prediction.data` and `target` (or
                `target.data`, when `target` is a `Variable`) do not match when
                both are lists, or if `reduction_fn_purpose` (or
                `reduction_fn_purpose.data`) is an empty string.
        """
        self.reduction_fn = reduction_fn
        self.reduction_fn_purpose = (
            None
            if reduction_fn_purpose is None
            else (
                reduction_fn_purpose
                if isinstance(reduction_fn_purpose, Variable)
                else Variable(reduction_fn_purpose)
            )
        )
        return ExactMatchEvaluatorOp.apply(
            prediction, target, self.reduction_fn, self.reduction_fn_purpose
        )

forward(prediction, target, reduction_fn=sum, reduction_fn_purpose='summation')

Forward pass for the exact match evaluator function.

Warning

Users should not call this method directly. Instead, they should call the module instance itself, which will internally invoke this forward method.

Parameters:

Name Type Description Default
prediction Variable

The predicted variable to evaluate, which can have scalar or list data (supporting both individual and batch processing).

required
target str | list[str] | Variable

The target (ground truth) to compare against, which can be a string, a list of strings, or a Variable.

required
reduction_fn Callable[[List[Any]], Any] | None

An optional function to aggregate scores across a batch of predictions and targets. If None, no aggregation is applied.

sum
reduction_fn_purpose str | Variable | None

A brief description of the purpose of reduction_fn, used by the autodiff engine to generate explanations. Required if reduction_fn is provided.

'summation'

Returns:

Name Type Description
score Variable

A variable containing the evaluation score(s), or their aggregation if reduction_fn is provided.

explanation Variable

A variable containing the explanation(s) of the evaluation, or their aggregation if reduction_fn is provided.

Raises:

Type Description
TypeError

If the types of prediction, target, reduction_fn, or reduction_fn_purpose are not as expected.

ValueError

If the lengths of prediction.data and target (or target.data, when target is a Variable) do not match when both are lists, or if reduction_fn_purpose (or reduction_fn_purpose.data) is an empty string.

Source code in afnio/cognitive/modules/exact_match_evaluator.py
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
def forward(
    self,
    prediction: Variable,
    target: Union[str, List[str], Variable],
    reduction_fn: Optional[Callable[[List[Any]], Any]] = sum,
    reduction_fn_purpose: Optional[Union[str, Variable]] = "summation",
) -> Tuple[Variable, Variable]:
    """
    Forward pass for the exact match evaluator function.

    Warning:
        Users should not call this method directly. Instead, they should call the
        module instance itself, which will internally invoke this `forward` method.

    Args:
        prediction: The predicted variable to evaluate, which can have scalar or
            list [`data`][afnio.Variable.data] (supporting both individual and
            batch processing).
        target: The target (ground truth) to compare against, which can be a string,
            a list of strings, or a `Variable`.
        reduction_fn: An optional function to aggregate scores across a batch of
            predictions and targets. If `None`, no aggregation is applied.
        reduction_fn_purpose: A brief description of the purpose of `reduction_fn`,
            used by the autodiff engine to generate explanations. Required if
            `reduction_fn` is provided.

    Returns:
        score: A variable containing the evaluation score(s),
            or their aggregation if `reduction_fn` is provided.
        explanation: A variable containing the explanation(s) of the evaluation,
            or their aggregation if `reduction_fn` is provided.

    Raises:
        TypeError: If the types of `prediction`, `target`, `reduction_fn`,
            or `reduction_fn_purpose` are not as expected.
        ValueError: If the lengths of `prediction.data` and `target` (or
            `target.data`, when `target` is a `Variable`) do not match when
            both are lists, or if `reduction_fn_purpose` (or
            `reduction_fn_purpose.data`) is an empty string.
    """
    self.reduction_fn = reduction_fn
    self.reduction_fn_purpose = (
        None
        if reduction_fn_purpose is None
        else (
            reduction_fn_purpose
            if isinstance(reduction_fn_purpose, Variable)
            else Variable(reduction_fn_purpose)
        )
    )
    return ExactMatchEvaluatorOp.apply(
        prediction, target, self.reduction_fn, self.reduction_fn_purpose
    )