Skip to content

afnio.cognitive.modules.deterministic_evaluator

afnio.cognitive.modules.deterministic_evaluator.DeterministicEvaluator

Bases: Module

Evaluates predictions deterministically using a user-defined evaluation function.

This module utilizes the DeterministicEvaluator operation from afnio.autodiff.evaluator to compute evaluation scores and explanations. The forward method takes in a prediction, a target, an evaluation function (eval_fn), and its purpose description (eval_fn_purpose). It also accepts a reduction function (reduction_fn) and its purpose description (reduction_fn_purpose) to aggregate scores if needed. The method outputs an evaluation score and an explanation, both as Variable instances. The success_fn checks if all evaluations are successful, allowing the backward pass to skip unnecessary gradient computations. The method outputs an evaluation score and an explanation, both as Variable instances.

Examples:

>>> from afnio import cognitive as cog
>>> from afnio import set_backward_model_client
>>> set_backward_model_client("openai/gpt-4o")
>>> class ExactColor(cog.Module):
...     def __init__(self):
...         super().__init__()
...         def exact_match_fn(pred: str, tgt: str) -> int:
...             return 1 if pred == tgt_data else 0
...         self.exact_match_fn = exact_match_fn
...         self.fn_purpose = "exact match"
...         self.reduction_fn = sum
...         self.reduction_fn_purpose = "summation"
...         self.exact_match = cog.DeterministicEvaluator()
...     def forward(self, prediction, target):
...         return self.exact_match(
...             prediction,
...             target,
...             self.exact_match_fn,
...             self.fn_purpose,
...             self.reduction_fn,
...             self.reduction_fn_purpose,
...         )
>>> prediction = afnio.Variable(
...     data=["the color is green", "blue"],
...     role="color prediction",
...     requires_grad=True
... )
>>> target = ["green", "blue"]
>>> eval = ExactColor()
>>> score, explanation = eval(prediction, target)
>>> print(score.data)
1
>>> print(explanation.data)
'The evaluation function, designed for 'exact match', compared the <DATA> fields of the predicted variable and the target variable across all samples in the batch, generating individual scores for each pair. These scores were then aggregated using the reduction function 'summation', resulting in a final aggregated score: 1.'
>>> explanation.backward()
>>> prediction.grad[0].data
'Reassess the criteria that led to the initial prediction of 'green'.'

Raises:

Type Description
TypeError

If the types of prediction, target, eval_fn, eval_fn_purpose, success_fn, reduction_fn, or reduction_fn_purpose are not as expected.

ValueError

If the lengths of prediction.data and target (or target.data, when target is a Variable) do not match when both are lists, or if eval_fn_purpose (or eval_fn_purpose.data) is an empty string, or if reduction_fn_purpose (or reduction_fn_purpose.data) is an empty string, or if the number of scores returned by eval_fn does not match the number of samples in the batch.

See Also

afnio.autodiff.evaluator.DeterministicEvaluator for the underlying operation.

Source code in afnio/cognitive/modules/deterministic_evaluator.py
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
class DeterministicEvaluator(Module):
    """
    Evaluates predictions deterministically using a user-defined evaluation function.

    This module utilizes the [`DeterministicEvaluator`][afnio.autodiff.evaluator.DeterministicEvaluator]
    operation from `afnio.autodiff.evaluator` to compute evaluation scores and
    explanations. The `forward` method takes in a `prediction`, a `target`, an
    evaluation function (`eval_fn`), and its purpose description (`eval_fn_purpose`).
    It also accepts a reduction function (`reduction_fn`) and its purpose description
    (`reduction_fn_purpose`) to aggregate scores if needed. The method outputs
    an evaluation `score` and an `explanation`, both as `Variable` instances. The
    `success_fn` checks if all evaluations are successful, allowing the `backward` pass
    to skip unnecessary gradient computations. The method outputs an evaluation
    `score` and an `explanation`, both as `Variable` instances.

    Examples:
        >>> from afnio import cognitive as cog
        >>> from afnio import set_backward_model_client
        >>> set_backward_model_client("openai/gpt-4o")
        >>> class ExactColor(cog.Module):
        ...     def __init__(self):
        ...         super().__init__()
        ...         def exact_match_fn(pred: str, tgt: str) -> int:
        ...             return 1 if pred == tgt_data else 0
        ...         self.exact_match_fn = exact_match_fn
        ...         self.fn_purpose = "exact match"
        ...         self.reduction_fn = sum
        ...         self.reduction_fn_purpose = "summation"
        ...         self.exact_match = cog.DeterministicEvaluator()
        ...     def forward(self, prediction, target):
        ...         return self.exact_match(
        ...             prediction,
        ...             target,
        ...             self.exact_match_fn,
        ...             self.fn_purpose,
        ...             self.reduction_fn,
        ...             self.reduction_fn_purpose,
        ...         )
        >>> prediction = afnio.Variable(
        ...     data=["the color is green", "blue"],
        ...     role="color prediction",
        ...     requires_grad=True
        ... )
        >>> target = ["green", "blue"]
        >>> eval = ExactColor()
        >>> score, explanation = eval(prediction, target)
        >>> print(score.data)
        1
        >>> print(explanation.data)
        'The evaluation function, designed for 'exact match', compared the <DATA> fields of the predicted variable and the target variable across all samples in the batch, generating individual scores for each pair. These scores were then aggregated using the reduction function 'summation', resulting in a final aggregated score: 1.'
        >>> explanation.backward()
        >>> prediction.grad[0].data
        'Reassess the criteria that led to the initial prediction of 'green'.'

    Raises:
        TypeError: If the types of `prediction`, `target`, `eval_fn`, `eval_fn_purpose`,
            `success_fn`, `reduction_fn`, or `reduction_fn_purpose` are not as expected.
        ValueError: If the lengths of `prediction.data` and `target` (or `target.data`,
            when `target` is a `Variable`) do not match when both are lists, or if
            `eval_fn_purpose` (or `eval_fn_purpose.data`) is an empty string, or if
            `reduction_fn_purpose` (or `reduction_fn_purpose.data`) is an empty string,
            or if the number of scores returned by `eval_fn` does not match the number
            of samples in the batch.

    See Also:
        [`afnio.autodiff.evaluator.DeterministicEvaluator`][afnio.autodiff.evaluator.DeterministicEvaluator]
        for the underlying operation.
    """  # noqa: E501

    eval_fn: Callable[[Variable, Union[str, Variable]], List[Any]]
    eval_fn_purpose: Union[str, Variable]
    success_fn: Optional[Callable[[List[Any]], bool]]
    reduction_fn: Optional[Callable[[List[Any]], Any]]
    reduction_fn_purpose: Optional[Union[str, Variable]]

    def __init__(self):
        super().__init__()

        self.register_function("eval_fn", None)
        self.register_buffer("eval_fn_purpose", None)
        self.register_function("success_fn", None)
        self.register_function("reduction_fn", None)
        self.register_buffer("reduction_fn_purpose", None)

    def forward(
        self,
        prediction: Variable,
        target: Union[str, List[str], Variable],
        eval_fn: Callable[[Variable, Union[str, Variable]], List[Any]],
        eval_fn_purpose: Union[str, Variable],
        success_fn: Optional[Callable[[List[Any]], bool]],
        reduction_fn: Optional[Callable[[List[Any]], Any]],
        reduction_fn_purpose: Optional[Union[str, Variable]],
    ) -> Tuple[Variable, Variable]:
        """
        Forward pass for the deterministic evaluator function.

        Warning:
            Users should not call this method directly. Instead, they should call the
            module instance itself, which will internally invoke this `forward` method.

        Args:
            prediction: The predicted variable to evaluate, which can have scalar or
                list [`data`][afnio.Variable.data] (supporting both individual and
                batch processing).
            target: The target (ground truth) to compare against, which can be a string,
                a list of strings, or a `Variable`.
            eval_fn: A user-defined function that takes a prediction and a target
                and returns a list of scores for each sample. If `target` is a
                [`Variable`][afnio.Variable], the function should compare the
                [`data`][afnio.Variable.data] fields of `prediction` and `target`.
            eval_fn_purpose: A brief description of the purpose of `eval_fn`,
                used by the autodiff engine to generate the explanations.
            success_fn: A user-defined function that takes the list of scores returned
                by `eval_fn` and returns `True` if all predictions are considered
                successful, or `False` otherwise.
            reduction_fn: An optional function to aggregate scores across a batch of
                predictions and targets. If `None`, no aggregation is applied.
            reduction_fn_purpose: A brief description of the purpose of `reduction_fn`,
                used by the autodiff engine to generate explanations. Required if
                `reduction_fn` is provided.

        Returns:
            score: A variable containing the evaluation score(s),
                or their aggregation if `reduction_fn` is provided.
            explanation: A variable containing the explanation(s) of the evaluation,
                or their aggregation if `reduction_fn` is provided.

        Raises:
            TypeError: If the types of `prediction`, `target`, `eval_fn`,
                `eval_fn_purpose`, `success_fn`, `reduction_fn`,
                or `reduction_fn_purpose` are not as expected.
            ValueError: If the lengths of `prediction.data` and `target` (or
                `target.data`, when `target` is a `Variable`) do not match when
                both are lists, or if `eval_fn_purpose` (or `eval_fn_purpose.data`)
                is an empty string, or if `reduction_fn_purpose` (or
                `reduction_fn_purpose.data`) is an empty string,
                or if the number of scores returned by `eval_fn`
                does not match the number of samples in the batch.
        """
        self.eval_fn = eval_fn
        self.eval_fn_purpose = (
            None
            if eval_fn_purpose is None
            else (
                eval_fn_purpose
                if isinstance(eval_fn_purpose, Variable)
                else Variable(eval_fn_purpose)
            )
        )
        self.success_fn = success_fn
        self.reduction_fn = reduction_fn
        self.reduction_fn_purpose = (
            None
            if reduction_fn_purpose is None
            else (
                reduction_fn_purpose
                if isinstance(reduction_fn_purpose, Variable)
                else Variable(reduction_fn_purpose)
            )
        )
        return DeterministicEvaluatorOp.apply(
            prediction,
            target,
            self.eval_fn,
            self.eval_fn_purpose,
            self.success_fn,
            self.reduction_fn,
            self.reduction_fn_purpose,
        )

forward(prediction, target, eval_fn, eval_fn_purpose, success_fn, reduction_fn, reduction_fn_purpose)

Forward pass for the deterministic evaluator function.

Warning

Users should not call this method directly. Instead, they should call the module instance itself, which will internally invoke this forward method.

Parameters:

Name Type Description Default
prediction Variable

The predicted variable to evaluate, which can have scalar or list data (supporting both individual and batch processing).

required
target str | list[str] | Variable

The target (ground truth) to compare against, which can be a string, a list of strings, or a Variable.

required
eval_fn Callable[[Variable, Union[str, Variable]], list[Any]]

A user-defined function that takes a prediction and a target and returns a list of scores for each sample. If target is a Variable, the function should compare the data fields of prediction and target.

required
eval_fn_purpose str | Variable

A brief description of the purpose of eval_fn, used by the autodiff engine to generate the explanations.

required
success_fn Callable[[List[Any]], bool] | None

A user-defined function that takes the list of scores returned by eval_fn and returns True if all predictions are considered successful, or False otherwise.

required
reduction_fn Callable[[List[Any]], Any] | None

An optional function to aggregate scores across a batch of predictions and targets. If None, no aggregation is applied.

required
reduction_fn_purpose str | Variable | None

A brief description of the purpose of reduction_fn, used by the autodiff engine to generate explanations. Required if reduction_fn is provided.

required

Returns:

Name Type Description
score Variable

A variable containing the evaluation score(s), or their aggregation if reduction_fn is provided.

explanation Variable

A variable containing the explanation(s) of the evaluation, or their aggregation if reduction_fn is provided.

Raises:

Type Description
TypeError

If the types of prediction, target, eval_fn, eval_fn_purpose, success_fn, reduction_fn, or reduction_fn_purpose are not as expected.

ValueError

If the lengths of prediction.data and target (or target.data, when target is a Variable) do not match when both are lists, or if eval_fn_purpose (or eval_fn_purpose.data) is an empty string, or if reduction_fn_purpose (or reduction_fn_purpose.data) is an empty string, or if the number of scores returned by eval_fn does not match the number of samples in the batch.

Source code in afnio/cognitive/modules/deterministic_evaluator.py
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
def forward(
    self,
    prediction: Variable,
    target: Union[str, List[str], Variable],
    eval_fn: Callable[[Variable, Union[str, Variable]], List[Any]],
    eval_fn_purpose: Union[str, Variable],
    success_fn: Optional[Callable[[List[Any]], bool]],
    reduction_fn: Optional[Callable[[List[Any]], Any]],
    reduction_fn_purpose: Optional[Union[str, Variable]],
) -> Tuple[Variable, Variable]:
    """
    Forward pass for the deterministic evaluator function.

    Warning:
        Users should not call this method directly. Instead, they should call the
        module instance itself, which will internally invoke this `forward` method.

    Args:
        prediction: The predicted variable to evaluate, which can have scalar or
            list [`data`][afnio.Variable.data] (supporting both individual and
            batch processing).
        target: The target (ground truth) to compare against, which can be a string,
            a list of strings, or a `Variable`.
        eval_fn: A user-defined function that takes a prediction and a target
            and returns a list of scores for each sample. If `target` is a
            [`Variable`][afnio.Variable], the function should compare the
            [`data`][afnio.Variable.data] fields of `prediction` and `target`.
        eval_fn_purpose: A brief description of the purpose of `eval_fn`,
            used by the autodiff engine to generate the explanations.
        success_fn: A user-defined function that takes the list of scores returned
            by `eval_fn` and returns `True` if all predictions are considered
            successful, or `False` otherwise.
        reduction_fn: An optional function to aggregate scores across a batch of
            predictions and targets. If `None`, no aggregation is applied.
        reduction_fn_purpose: A brief description of the purpose of `reduction_fn`,
            used by the autodiff engine to generate explanations. Required if
            `reduction_fn` is provided.

    Returns:
        score: A variable containing the evaluation score(s),
            or their aggregation if `reduction_fn` is provided.
        explanation: A variable containing the explanation(s) of the evaluation,
            or their aggregation if `reduction_fn` is provided.

    Raises:
        TypeError: If the types of `prediction`, `target`, `eval_fn`,
            `eval_fn_purpose`, `success_fn`, `reduction_fn`,
            or `reduction_fn_purpose` are not as expected.
        ValueError: If the lengths of `prediction.data` and `target` (or
            `target.data`, when `target` is a `Variable`) do not match when
            both are lists, or if `eval_fn_purpose` (or `eval_fn_purpose.data`)
            is an empty string, or if `reduction_fn_purpose` (or
            `reduction_fn_purpose.data`) is an empty string,
            or if the number of scores returned by `eval_fn`
            does not match the number of samples in the batch.
    """
    self.eval_fn = eval_fn
    self.eval_fn_purpose = (
        None
        if eval_fn_purpose is None
        else (
            eval_fn_purpose
            if isinstance(eval_fn_purpose, Variable)
            else Variable(eval_fn_purpose)
        )
    )
    self.success_fn = success_fn
    self.reduction_fn = reduction_fn
    self.reduction_fn_purpose = (
        None
        if reduction_fn_purpose is None
        else (
            reduction_fn_purpose
            if isinstance(reduction_fn_purpose, Variable)
            else Variable(reduction_fn_purpose)
        )
    )
    return DeterministicEvaluatorOp.apply(
        prediction,
        target,
        self.eval_fn,
        self.eval_fn_purpose,
        self.success_fn,
        self.reduction_fn,
        self.reduction_fn_purpose,
    )