Skip to content

afnio.cognitive.functional

afnio.cognitive.functional.add(x, y)

Implements an addition operation for Variable instances within the afnio framework, supporting automatic differentiation.

The Add function supports both scalar and list data fields:

  • Scalars: Adds numerical values (int, float) or concatenates strings.
  • Lists: Performs element-wise addition of corresponding elements from the lists. Lists must be of the same length.

It automatically handles type-based operations:

  • For numerical data (int, float), it performs arithmetic addition.
  • For strings, it concatenates the values.
  • Mixed types (e.g., string and number) are converted appropriately before performing the addition.

This operation also tracks Variable dependencies, enabling automatic gradient computation through backpropagation.

Parameters:

Name Type Description Default
x Variable

The first input Variable.

required
y Variable

The second input Variable.

required

Returns:

Type Description
Variable

A new Variable instance representing the result of the addition, with appropriately aggregated data, role, and requires_grad attributes.

Raises:

Type Description
TypeError

If either input is not an instance of Variable.

TypeError

If addition between the input types is not allowed.

ValueError

If a scalar data is added to a listdata.

ValueError

If list data fields have mismatched lengths.

Examples:

Example with scalar inputs:

>>> x = Variable(data="abc", role="first input", requires_grad=True)
>>> y = Variable(data="def", role="second input", requires_grad=False)
>>> result = F.add(x, y)
>>> result.data
'abcdef'
>>> result.role
'first input and second input'
>>> result.requires_grad
True
>>> g = Variable(data="MY_FEEDBACK", role="add gradient")
>>> result.backward(g)
>>> x.grad.data
'Here is the combined feedback we got for this specific first input and other variables: MY_FEEDBACK'
>>> x.grad.role
'feedback to first input'

Example with batched inputs:

>>> x = Variable(data=[1, 2, 3], role="first input", requires_grad=True)
>>> y = Variable(data=[4, 5, 6], role="second input", requires_grad=False)
>>> result = F.add(x, y)
>>> result.data
[5, 7, 9]
>>> result.role
'first input and second input'
>>> result.requires_grad
True
See Also

afnio.autodiff.basic_ops.Add for the underlying operation.

Source code in afnio/cognitive/functional.py
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
def add(x: Variable, y: Variable) -> Variable:
    """
    Implements an addition operation for [`Variable`][afnio.Variable] instances within
    the `afnio` framework, supporting automatic differentiation.

    The `Add` function supports both scalar and list [`data`][afnio.Variable.data] fields:

    - **Scalars**: Adds numerical values (`int`, `float`) or concatenates strings.
    - **Lists**: Performs element-wise addition of corresponding elements from the lists.
      Lists must be of the same length.

    It automatically handles type-based operations:

    - For numerical data (`int`, `float`), it performs arithmetic addition.
    - For strings, it concatenates the values.
    - Mixed types (e.g., string and number) are converted appropriately before performing
      the addition.

    This operation also tracks [`Variable`][afnio.Variable] dependencies,
    enabling automatic gradient computation through backpropagation.

    Args:
        x: The first input `Variable`.
        y: The second input `Variable`.

    Returns:
        A new `Variable` instance representing the result of the addition, \
        with appropriately aggregated [`data`][afnio.Variable.data], \
        [`role`][afnio.Variable.role], and \
        [`requires_grad`][afnio.Variable.requires_grad] attributes.

    Raises:
        TypeError: If either input is not an instance
            of [`Variable`][afnio.Variable].
        TypeError: If addition between the input types is not allowed.
        ValueError: If a scalar [`data`][afnio.Variable.data] is added
            to a list[`data`][afnio.Variable.data].
        ValueError: If list [`data`][afnio.Variable.data] fields
            have mismatched lengths.

    Examples:
        Example with scalar inputs:
        >>> x = Variable(data="abc", role="first input", requires_grad=True)
        >>> y = Variable(data="def", role="second input", requires_grad=False)
        >>> result = F.add(x, y)
        >>> result.data
        'abcdef'
        >>> result.role
        'first input and second input'
        >>> result.requires_grad
        True
        >>> g = Variable(data="MY_FEEDBACK", role="add gradient")
        >>> result.backward(g)
        >>> x.grad.data
        'Here is the combined feedback we got for this specific first input and other variables: MY_FEEDBACK'
        >>> x.grad.role
        'feedback to first input'

        Example with batched inputs:
        >>> x = Variable(data=[1, 2, 3], role="first input", requires_grad=True)
        >>> y = Variable(data=[4, 5, 6], role="second input", requires_grad=False)
        >>> result = F.add(x, y)
        >>> result.data
        [5, 7, 9]
        >>> result.role
        'first input and second input'
        >>> result.requires_grad
        True

    See Also:
        [`afnio.autodiff.basic_ops.Add`][afnio.autodiff.basic_ops.Add]
        for the underlying operation.
    """  # noqa: E501
    return Add.apply(x, y)

afnio.cognitive.functional.sum(x)

Implements a summation operation for a list of Variable instances within the afnio framework, supporting automatic differentiation.

The Sum function aggregates the data, role, and requires_grad attributes of all input Variable instances into a single Variable. It supports both scalar and list data fields:

  • Scalars: Computes the arithmetic sum for numerical data (int, float) or concatenates all string values, wrapping each in <ITEM></ITEM> tags.
  • Lists: Aggregates the corresponding elements of the lists. For numerical data, it sums the corresponding elements. For string data, it concatenates them, wrapping each element in <ITEM></ITEM> tags.

During backpropagation, the function distributes the gradient to all input Variable instances that require gradients.

Parameters:

Name Type Description Default
x list[Variable]

A list of Variable instances to be summed.

required

Returns:

Type Description
Variable

A new Variable instance representing the result of the summation, with appropriately aggregated data, role, and requires_grad attributes.

Raises:

Type Description
TypeError

If any element in x is not an instance of Variable or a sequence of Variable instances, or if addition between the data types is not allowed.

Examples:

Example with scalar inputs:

>>> x = Variable(data="abc", role="first input", requires_grad=True)
>>> y = Variable(data="def", role="second input", requires_grad=False)
>>> result = F.sum([x, y])
>>> result.data
'<ITEM>abc</ITEM><ITEM>def</ITEM>'
>>> result.role
'first input and second input'
>>> result.requires_grad
True
>>> g = Variable(data="MY_FEEDBACK", role="add gradient")
>>> result.backward(g)
>>> x.grad.data
'Here is the combined feedback we got for this specific first input and other variables: MY_FEEDBACK'
>>> x.grad.role
'feedback to first input'

Example with batched inputs:

>>> x = Variable(data=[1, 2, 3.5], role="first input", requires_grad=True)
>>> y = Variable(data=[4, 5, 6], role="second input", requires_grad=False)
>>> result = F.sum([x, y])
>>> result.data
[5, 7, 9.5]
>>> result.role
'first input and second input'
>>> result.requires_grad
True
See Also

afnio.autodiff.basic_ops.Sum for the underlying operation.

Source code in afnio/cognitive/functional.py
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
def sum(x: List[Variable]) -> Variable:
    """
    Implements a summation operation for a list of [`Variable`][afnio.Variable]
    instances within the `afnio` framework, supporting automatic differentiation.

    The `Sum` function aggregates the [`data`][afnio.Variable.data],
    [`role`][afnio.Variable.role], and [`requires_grad`][afnio.Variable.requires_grad]
    attributes of all input [`Variable`][afnio.Variable] instances into a single
    [`Variable`][afnio.Variable]. It supports both scalar and list
    [`data`][afnio.Variable.data] fields:

    - **Scalars**: Computes the arithmetic sum for numerical data (`int`, `float`)
      or concatenates all string values, wrapping each in `<ITEM></ITEM>` tags.
    - **Lists**: Aggregates the corresponding elements of the lists. For numerical
      data, it sums the corresponding elements. For string data, it concatenates them,
      wrapping each element in `<ITEM></ITEM>` tags.

    During backpropagation, the function distributes the gradient to all input
    [`Variable`][afnio.Variable] instances that require gradients.

    Args:
        x: A list of `Variable` instances to be summed.

    Returns:
        A new `Variable` instance representing the result of the summation, \
        with appropriately aggregated [`data`][afnio.Variable.data], \
        [`role`][afnio.Variable.role], and \
        [`requires_grad`][afnio.Variable.requires_grad] attributes.

    Raises:
        TypeError: If any element in `x` is not an instance
            of [`Variable`][afnio.Variable] or a sequence
            of [`Variable`][afnio.Variable] instances, or if addition between
            the [`data`][afnio.Variable.data] types is not allowed.

    Examples:
        Example with scalar inputs:
        >>> x = Variable(data="abc", role="first input", requires_grad=True)
        >>> y = Variable(data="def", role="second input", requires_grad=False)
        >>> result = F.sum([x, y])
        >>> result.data
        '<ITEM>abc</ITEM><ITEM>def</ITEM>'
        >>> result.role
        'first input and second input'
        >>> result.requires_grad
        True
        >>> g = Variable(data="MY_FEEDBACK", role="add gradient")
        >>> result.backward(g)
        >>> x.grad.data
        'Here is the combined feedback we got for this specific first input and other variables: MY_FEEDBACK'
        >>> x.grad.role
        'feedback to first input'

        Example with batched inputs:
        >>> x = Variable(data=[1, 2, 3.5], role="first input", requires_grad=True)
        >>> y = Variable(data=[4, 5, 6], role="second input", requires_grad=False)
        >>> result = F.sum([x, y])
        >>> result.data
        [5, 7, 9.5]
        >>> result.role
        'first input and second input'
        >>> result.requires_grad
        True

    See Also:
        [`afnio.autodiff.basic_ops.Sum`][afnio.autodiff.basic_ops.Sum]
        for the underlying operation.
    """  # noqa: E501
    return Sum.apply(x)

afnio.cognitive.functional.split(x, sep=None, maxsplit=-1)

Implements a split operation for Variable instances within the afnio framework, supporting automatic differentiation.

The Split function divides the data of the input Variable into multiple parts using a specified delimiter sep. If maxsplit is specified, the split operation is limited to a maximum number of splits. It handles both scalar and list data fields:

  • Scalars: The scalar data (a single string) is split into substrings based on the specified sep and maxsplit parameters.
  • Lists: Each element of the list data (strings) is split individually. If splits of varying lengths occur, shorter splits are automatically padded with empty strings to ensure consistent dimensions.

During backpropagation, feedback is collected and aggregated across all split parts. The combined feedback is propagated back to the original input Variable, allowing for the proper computation of gradients.

Parameters:

Name Type Description Default
x Variable

The input Variable to be split.

required
sep str | Variable | None

The delimiter to use for splitting the string. If None, splits on whitespace. Can be a string or a Variable containing a string.

None
maxsplit int | Variable | None

The maximum number of splits to perform. If -1, there is no limit on the number of splits. Can be an integer or a Variable containing an integer.

-1

Returns:

Type Description
list[Variable]

A tuple of Variable instances resulting from the split operation, each with appropriately assigned data, role, and requires_grad attributes.

Raises:

Type Description
TypeError

If x is not an instance of Variable which data attribute is a string or a list of strings, or if sep is not a string or Variable containing a string, or if maxsplit is not an integer or Variable containing an integer.

Examples:

Example with scalar inputs:

>>> x = Variable(data="afnio is great!", role="sentence", requires_grad=True)
>>> result = Split.apply(x, sep=" ", maxsplit=1)
>>> [var.data for var in result]
['afnio', 'is great!']
>>> result[0].role
'split part 0 of sentence'
>>> g_1 = Variable(data="MY_FIRST_FEEDBACK", role="gradient")
>>> g_2 = Variable(data="MY_SECOND_FEEDBACK", role="gradient")
>>> result[0].backward(g_1, retain_graph=True)
>>> result[1].backward(g_2)
>>> x.grad[0].data
'Here is the combined feedback we got for this specific sentence and other variables: <ITEM>MY_FIRST_FEEDBACK</ITEM><ITEM></ITEM>'
>>> x.grad[0].role
'feedback to sentence'
>>> x.grad[1].data
'Here is the combined feedback we got for this specific sentence and other variables: <ITEM></ITEM><ITEM>MY_SECOND_FEEDBACK</ITEM>'
>>> x.grad[1].role
'feedback to sentence'

Example with batched inputs:

>>> x = Variable(
...     data=["afnio is great!", "Deep learning"],
...     role="sentences",
...     requires_grad=True
... )
>>> result = Split.apply(x, sep=" ", maxsplit=2)
>>> [var.data for var in result]
[['afnio', 'Deep'], ['is', 'learning'], ['great!', '']]
>>> g = Variable(data="MY_FEEDBACK", role="gradient")
>>> result[1].backward(g)
>>> x.grad[0].data
'Here is the combined feedback we got for this specific sentences and other variables: <ITEM></ITEM><ITEM>MY_FEEDBACK</ITEM><ITEM></ITEM>'
>>> x.grad[0].role
'feedback to sentences'
See Also

afnio.autodiff.basic_ops.Split for the underlying operation.

Source code in afnio/cognitive/functional.py
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
def split(
    x: Variable,
    sep: Optional[Union[str, Variable]] = None,
    maxsplit: Optional[Union[int, Variable]] = -1,
) -> List[Variable]:
    """
    Implements a split operation for [`Variable`][afnio.Variable] instances within the
    `afnio` framework, supporting automatic differentiation.

    The `Split` function divides the [`data`][afnio.Variable.data] of the input
    [`Variable`][afnio.Variable] into multiple parts using a specified delimiter `sep`.
    If `maxsplit` is specified, the split operation is limited to a maximum number of
    splits. It handles both scalar and list [`data`][afnio.Variable.data] fields:

    - **Scalars**: The scalar [`data`][afnio.Variable.data] (a single string) is split
        into substrings based on the specified `sep` and `maxsplit` parameters.
    - **Lists**: Each element of the list [`data`][afnio.Variable.data] (strings) is
        split individually. If splits of varying lengths occur, shorter splits are
        automatically padded with empty strings to ensure consistent dimensions.

    During backpropagation, feedback is collected and aggregated across all split parts.
    The combined feedback is propagated back to the original input
    [`Variable`][afnio.Variable], allowing for the proper computation of gradients.

    Args:
        x: The input `Variable` to be split.
        sep: The delimiter to use for splitting the string. If `None`, splits on
            whitespace. Can be a string or a `Variable` containing a string.
        maxsplit: The maximum number of splits to perform. If `-1`, there is no
            limit on the number of splits. Can be an integer or a `Variable`
            containing an integer.

    Returns:
        A tuple of `Variable` instances resulting from the split operation, \
        each with appropriately assigned [`data`][afnio.Variable.data], \
        [`role`][afnio.Variable.role], and \
        [`requires_grad`][afnio.Variable.requires_grad] attributes.

    Raises:
        TypeError: If `x` is not an instance of [`Variable`][afnio.Variable] which
            [`data`][afnio.Variable.data] attribute is a string or a list of
            strings, or if `sep` is not a string or `Variable` containing a string,
            or if `maxsplit` is not an integer or `Variable` containing an integer.

    Examples:
        Example with scalar inputs:
        >>> x = Variable(data="afnio is great!", role="sentence", requires_grad=True)
        >>> result = Split.apply(x, sep=" ", maxsplit=1)
        >>> [var.data for var in result]
        ['afnio', 'is great!']
        >>> result[0].role
        'split part 0 of sentence'
        >>> g_1 = Variable(data="MY_FIRST_FEEDBACK", role="gradient")
        >>> g_2 = Variable(data="MY_SECOND_FEEDBACK", role="gradient")
        >>> result[0].backward(g_1, retain_graph=True)
        >>> result[1].backward(g_2)
        >>> x.grad[0].data
        'Here is the combined feedback we got for this specific sentence and other variables: <ITEM>MY_FIRST_FEEDBACK</ITEM><ITEM></ITEM>'
        >>> x.grad[0].role
        'feedback to sentence'
        >>> x.grad[1].data
        'Here is the combined feedback we got for this specific sentence and other variables: <ITEM></ITEM><ITEM>MY_SECOND_FEEDBACK</ITEM>'
        >>> x.grad[1].role
        'feedback to sentence'

        Example with batched inputs:
        >>> x = Variable(
        ...     data=["afnio is great!", "Deep learning"],
        ...     role="sentences",
        ...     requires_grad=True
        ... )
        >>> result = Split.apply(x, sep=" ", maxsplit=2)
        >>> [var.data for var in result]
        [['afnio', 'Deep'], ['is', 'learning'], ['great!', '']]
        >>> g = Variable(data="MY_FEEDBACK", role="gradient")
        >>> result[1].backward(g)
        >>> x.grad[0].data
        'Here is the combined feedback we got for this specific sentences and other variables: <ITEM></ITEM><ITEM>MY_FEEDBACK</ITEM><ITEM></ITEM>'
        >>> x.grad[0].role
        'feedback to sentences'

    See Also:
        [`afnio.autodiff.basic_ops.Split`][afnio.autodiff.basic_ops.Split]
        for the underlying operation.
    """  # noqa: E501
    return Split.apply(x, sep, maxsplit)

afnio.cognitive.functional.chat_completion(forward_model_client, messages, inputs=None, **completion_args)

Implements a chat completion operation using the specified language model within the afnio framework, supporting automatic differentiation.

Features
  • Mini-Batching: Processes multiple input dictionaries simultaneously to improve throughput.
  • Asynchronous Execution: Both the forward and backward passes are optimized to run asynchronous calls for each mini-batch, reducing latency.
  • Gradient Computation: Supports automatic differentiation for all Variables in messages and inputs arguments, maintaining the order of gradients.

The ChatCompletion function generates a Variable responses by passing a composite prompt, built from a list of messages and optional inputs, to the forward_model_client. Each message is a dictionary with a 'role' (e.g., 'system', 'user') and a list of Variable objects as 'content'. inputs is a dictionary containing strings, list of strings or Variables providing dynamic values to fill placeholders within message templates. If inputs contain lists of strings or Variables which data field is a list, the response's data field will be a list, corresponding to the batched results. Otherwise, the data field will be a scalar string. Additional behavior, such as temperature or token limits, can be customized through completion_args.

Parameters:

Name Type Description Default
forward_model_client ChatCompletionModel | None

The LM model client used for generating chat completions.

required
messages MultiTurnMessages

A list of messages that compose the prompt/context for the LM. Each message is a dictionary with a "role" (e.g., "system", "user", "assistant") and a "content" field, which is a list of Variable objects. The Variable objects in the "content" can contain placeholders (e.g., {prediction}, {target}) that will be populated with the corresponding values from the inputs dictionary.

required
inputs dict[str, str | list[str] | Variable] | None

A dictionary mapping placeholder names to their corresponding values, which can be strings or Variable instances. These values will be used to populate the placeholders in the messages content before sending the prompt to the LM. For example, if a message "content" field contains the placeholder {color}, the inputs dictionary should have a key "color" with the value to substitute in the prompt. Optional if there are no placeholders in the messages or if all placeholders are directly related to prediction and target.

None
**completion_args

Additional keyword arguments to pass to the LM model client's chat method, such as temperature, max tokens, or seed values, to customize the LLM's behavior during the evaluation.

{}

Returns:

Name Type Description
response Variable

A Variable containing the LM's response. The data field of the returned Variable will be a string if all inputs are scalar, or a list of strings if any input is a list. The role field will indicate that this is a response to the input messages, and the requires_grad field will be set to True if any of the input Variable objects in messages require gradients, otherwise False.

Raises:

Type Description
TypeError

If the types of forward_model_client, messages, or inputs are not as expected.

Examples:

Example with scalar inputs:

>>> system = Variable(
...     "You are a helpful assistant.",
...     role="system instruction",
...     requires_grad=True
... )
>>> user = Variable("Translate 'Hello' to {language}.", role="user query")
>>> messages = [
...     {"role": "system", "content": [system]},
...     {"role": "user", "content": [user]},
... ]
>>> inputs = {"language": Variable("Italian", role="language")}
>>> response = F.chat_completion(
...     model_client,
...     messages,
...     inputs=inputs,
...     temperature=0.7
... )
>>> print(response.data)
'Ciao'
'Hola'
>>> feedback = Variable("Use only capital letters.", role="feedback")
>>> response.backward(feedback)
>>> system.grad[0].data
'The system instruction should enforce the use of capital letters only.'

Example with batched inputs:

>>> system = Variable(
...     "You are a helpful assistant.",
...     role="system instruction",
...     requires_grad=True
... )
>>> user = Variable("Translate 'Hello' to {language}.", role="user query")
>>> messages = [
...     {"role": "system", "content": [system]},
...     {"role": "user", "content": [user]},
... ]
>>> inputs = {
...     "language": [
...         Variable("Italian", role="language"),
...         Variable("Spanish", role="language")
...     ]
... }
>>> response = F.chat_completion(
...     model_client,
...     messages,
...     inputs=inputs,
...     temperature=0.7
... )
>>> print(response.data)
['Ciao', 'Hola']
See Also

afnio.autodiff.lm_ops.ChatCompletion for the underlying operation.

Source code in afnio/cognitive/functional.py
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
def chat_completion(
    forward_model_client: Optional[ChatCompletionModel],
    messages: MultiTurnMessages,
    inputs: Optional[Dict[str, Union[str, List[str], Variable]]] = None,
    **completion_args,
) -> Variable:
    """
    Implements a chat completion operation using the specified language model within
    the ``afnio`` framework, supporting automatic differentiation.

    Features:
        - **Mini-Batching**: Processes multiple input dictionaries simultaneously
            to improve throughput.
        - **Asynchronous Execution**: Both the forward and backward passes are optimized
            to run asynchronous calls for each mini-batch, reducing latency.
        - **Gradient Computation**: Supports automatic differentiation for all
            [`Variable`][afnio.Variable]s in `messages` and `inputs` arguments,
            maintaining the order of gradients.

    The `ChatCompletion` function generates a [`Variable`][afnio.Variable] responses by
    passing a composite prompt, built from a list of `messages` and optional `inputs`,
    to the `forward_model_client`. Each message is a dictionary with a `'role'` (e.g.,
    `'system'`, `'user'`) and a list of [`Variable`][afnio.Variable] objects as
    `'content'`. `inputs` is a dictionary containing strings, list of strings or
    [`Variable`][afnio.Variable]s providing dynamic values to fill placeholders within
    message templates. If `inputs` contain lists of strings or
    [`Variable`][afnio.Variable]s which [`data`][afnio.Variable.data] field is a list,
    the response's [`data`][afnio.Variable.data] field will be a list, corresponding to
    the batched results. Otherwise, the [`data`][afnio.Variable.data] field will be a
    scalar string. Additional behavior, such as temperature or token limits, can be
    customized through `completion_args`.

    Args:
        forward_model_client: The LM model client used for generating
            chat completions.
        messages: A list of messages that compose the prompt/context for the LM.
            Each message is a dictionary with a `"role"` (e.g., `"system"`, `"user"`,
            `"assistant"`) and a `"content"` field, which is a list of `Variable`
            objects. The `Variable` objects in the `"content"` can contain placeholders
            (e.g., `{prediction}`, `{target}`) that will be populated with the
            corresponding values from the `inputs` dictionary.
        inputs: A dictionary mapping placeholder names to their corresponding
            values, which can be strings or `Variable` instances. These values
            will be used to populate the placeholders in the `messages` content
            before sending the prompt to the LM. For example, if a message
            `"content"` field contains the placeholder `{color}`, the `inputs`
            dictionary should have a key `"color"` with the value to substitute
            in the prompt. Optional if there are no placeholders in the messages or
            if all placeholders are directly related to `prediction` and `target`.
        **completion_args: Additional keyword arguments to pass to the LM model
            client's `chat` method, such as temperature, max tokens, or seed values,
            to customize the LLM's behavior during the evaluation.

    Returns:
        response: A `Variable` containing the LM's response. \
            The [`data`][afnio.Variable.data] field of the returned `Variable` \
            will be a string if all inputs are scalar, or a list of strings if \
            any input is a list. The `role` field will indicate that this is a \
            response to the input messages, and the `requires_grad` field will \
            be set to `True` if any of the input `Variable` objects in `messages` \
            require gradients, otherwise `False`.

    Raises:
        TypeError: If the types of `forward_model_client`, `messages`,
            or `inputs` are not as expected.

    Examples:
        Example with scalar inputs:
        >>> system = Variable(
        ...     "You are a helpful assistant.",
        ...     role="system instruction",
        ...     requires_grad=True
        ... )
        >>> user = Variable("Translate 'Hello' to {language}.", role="user query")
        >>> messages = [
        ...     {"role": "system", "content": [system]},
        ...     {"role": "user", "content": [user]},
        ... ]
        >>> inputs = {"language": Variable("Italian", role="language")}
        >>> response = F.chat_completion(
        ...     model_client,
        ...     messages,
        ...     inputs=inputs,
        ...     temperature=0.7
        ... )
        >>> print(response.data)
        'Ciao'
        'Hola'
        >>> feedback = Variable("Use only capital letters.", role="feedback")
        >>> response.backward(feedback)
        >>> system.grad[0].data
        'The system instruction should enforce the use of capital letters only.'

        Example with batched inputs:
        >>> system = Variable(
        ...     "You are a helpful assistant.",
        ...     role="system instruction",
        ...     requires_grad=True
        ... )
        >>> user = Variable("Translate 'Hello' to {language}.", role="user query")
        >>> messages = [
        ...     {"role": "system", "content": [system]},
        ...     {"role": "user", "content": [user]},
        ... ]
        >>> inputs = {
        ...     "language": [
        ...         Variable("Italian", role="language"),
        ...         Variable("Spanish", role="language")
        ...     ]
        ... }
        >>> response = F.chat_completion(
        ...     model_client,
        ...     messages,
        ...     inputs=inputs,
        ...     temperature=0.7
        ... )
        >>> print(response.data)
        ['Ciao', 'Hola']

    See Also:
        [`afnio.autodiff.lm_ops.ChatCompletion`][afnio.autodiff.lm_ops.ChatCompletion]
        for the underlying operation.
    """
    return ChatCompletion.apply(
        forward_model_client,
        messages,
        inputs,
        **completion_args,
    )

afnio.cognitive.functional.lm_judge_evaluator(forward_model_client, messages, prediction, target=None, inputs=None, success_fn=None, reduction_fn=builtins.sum, reduction_fn_purpose='summation', eval_mode=True, **completion_args)

Implements an evaluation of a model prediction using a language model (LM) as the judge within the afnio framework, supporting automatic differentiation.

This function returns a score and an explanation, both as Variable objects, by comparing a prediction against a target (when present) using a composite prompt. The prompt is constructed from a list of messages and optional inputs, which can dynamically populate placeholders in the message templates. The evaluation process leverages the specified forward_model_client to perform the LM-based assessment.

The prediction is a Variable. The target can be a string, a list of strings, or a Variable. Similarly, the inputs dictionary can include strings, lists of strings, or Variables. Each Variable passed as an input argument can have either a scalar or a list data field, supporting both individual samples and batch processing. For batch processing, the lengths of prediction, target, and any batched inputs must match.

The success_fn parameter is a user-defined function that returns True when all predictions evaluated by the LM as Judge are considered successful, and False otherwise. If success_fn returns True, the backward pass will skip gradient calculations and directly return an empty gradient, optimizing computational time.

If you are processing a batch of predictions and targets, you can use the reduction_fn to aggregate individual scores (e.g., using sum to compute a total score). The reduction_fn_purpose parameter is a brief description of the aggregation's purpose (e.g., "summation"). If you don't want any aggregation, set both reduction_fn and reduction_fn_purpose to None.

The function operates in two modes controlled by eval_mode:

  • eval_mode=True (default) – Computes gradients for prediction only. Use it for direct feedback on predictions.
  • eval_mode=False – Computes gradients for messages and inputs. Use it to optimize the evaluator or align with human evaluation datasets.

Additional model parameters, such as temperature, max tokens, or seed values, can be passed through completion_args to customize the LLM's behavior.

Parameters:

Name Type Description Default
forward_model_client ChatCompletionModel | None

The LM model client used for the forward pass evaluation.

required
messages MultiTurnMessages

A list of messages that compose the prompt/context for the LM. Each message is a dictionary with a "role" (e.g., "system", "user", "assistant") and a "content" field, which is a list of Variable objects. The Variable objects in the "content" can contain placeholders (e.g., {prediction}, {target}) that will be populated with the corresponding values from the inputs dictionary.

required
prediction Variable

The predicted variable to evaluate, which can have scalar or list data (supporting both individual and batch processing).

required
target str | list[str] | Variable | None

The target (ground truth) to compare against, which can be a string, a list of strings, or a Variable. Optional if the evaluation does not require a target and only relies on the correctness of the LM Judge's assessment of the prediction.

None
inputs dict[str, str | Variable] | None

A dictionary mapping placeholder names to their corresponding values, which can be strings or Variable instances. These values will be used to populate the placeholders in the messages content before sending the prompt to the LM. For example, if a message "content" field contains the placeholder {color}, the inputs dictionary should have a key "color" with the value to substitute in the prompt. Optional if there are no placeholders in the messages or if all placeholders are directly related to prediction and target.

None
success_fn Callable[[List[Any]], bool] | None

A user-defined function that takes the list of scores returned by the LM Judge and returns True if all predictions are considered successful, or False otherwise.

None
reduction_fn Callable[[List[Any]], Any] | None

An optional function to aggregate scores across a batch of predictions and targets. If None, no aggregation is applied.

sum
reduction_fn_purpose str | Variable | None

A brief description of the purpose of reduction_fn, used by the autodiff engine to generate explanations. Required if reduction_fn is provided.

'summation'
eval_mode bool | Variable

Indicates the evaluation mode. If True, the backward pass will compute gradients for the prediction variable only. If False, the backward pass will compute gradients for the messages and inputs, allowing optimization of the evaluator itself or alignment with human evaluation datasets.

True
**completion_args

Additional keyword arguments to pass to the LM model client's chat method, such as temperature, max tokens, or seed values, to customize the LLM's behavior during the evaluation.

{}

Returns:

Name Type Description
score Variable

A variable containing the evaluation score(s), or their aggregation if reduction_fn is provided.

explanation Variable

A variable containing the explanation(s) of the evaluation, or their aggregation if reduction_fn is provided.

Raises:

Type Description
RuntimeError

If the LM response to generate the evaluation score and explanation cannot be parsed as valid JSON.

TypeError

If the types of forward_model_client, messages, prediction, target, inputs, success_fn, reduction_fn, reduction_fn_purpose, or eval_mode are not as expected.

ValueError

If the lengths of prediction.data and target (or target.data, when target is a Variable) do not match when both are lists, or if reduction_fn_purpose (or reduction_fn_purpose.data) is an empty string, or if inputs contains keys that conflict with prediction or target.

Examples:

Example with scalar inputs:

>>> task = Variable(
...     "Evaluate if the translation is accurate.",
...     role="evaluation task",
...     requires_grad=True
... )
>>> format = Variable(
...     "Provide 'score' (true/false) and 'explanation' in JSON.",
...     role="output format"
... )
>>> user = Variable(
...     "<PREDICTION>{prediction}</PREDICTION><TARGET>{target}</TARGET>",
...     role="user query"
... )
>>> prediction = Variable(
...     "Hola Mundo",
...     role="translated text",
...     requires_grad=True
... )
>>> target = Variable("Ciao Mondo", role="expected output")
>>> messages = [
...     {"role": "system", "content": [task, format]},
...     {"role": "user", "content": [user]}
... ]
>>> score, explanation = F.lm_judge_evaluator(
...     model,
...     messages,
...     prediction,
...     target,
...     temperature=0.5,
... )
>>> score.data
False
>>> explanation.data
'The translated text is in Spanish, but the expected is in Italian.'
>>> explanation.backward()
>>> prediction.grad[0].data
'The translated text should be in Italian.'

Example with batched inputs:

>>> task = Variable(
...     "Evaluate if the translation is accurate.",
...     role="evaluation task",
...     requires_grad=True
... )
>>> format = Variable(
...     "Provide 'score' (true/false) and 'explanation' in JSON.",
...     role="output format"
... )
>>> user = Variable(
...     "<PREDICTION>{prediction}</PREDICTION><TARGET>{target}</TARGET>",
...     role="user query"
... )
>>> prediction = Variable(
...     data=["Hola Mundo", "Salve a tutti"],
...     role="translated text",
...     requires_grad=True,
... )
>>> target = ["Ciao Mondo", "Salve a tutti"]
>>> score, explanation = F.lm_judge_evaluator(
...     model,
...     messages,
...     prediction,
...     target,
...     reduction_fn=sum,
...     reduction_fn_purpose="summation",
... )
>>> score.data
1
>>> explanation.data
'The evaluation function, designed using an LM as the judge, compared the <DATA> fields of the predicted variable and the target variable across all samples in the batch. These scores were then aggregated using the reduction function 'summation', resulting in a final aggregated score: 1.'
>>> explanation.backward()
>>> prediction.grad[0].data
'The translated text should be in Italian.'
See Also

afnio.autodiff.evaluator.LMJudgeEvaluator for the underlying operation.

Source code in afnio/cognitive/functional.py
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
def lm_judge_evaluator(
    forward_model_client: Optional[ChatCompletionModel],
    messages: MultiTurnMessages,
    prediction: Variable,
    target: Optional[Union[str, List[str], Variable]] = None,
    inputs: Optional[Dict[str, Union[str, Variable]]] = None,
    success_fn: Optional[Callable[[List[Any]], bool]] = None,
    reduction_fn: Optional[Callable[[List[Any]], Any]] = builtins.sum,
    reduction_fn_purpose: Optional[Union[str, Variable]] = "summation",
    eval_mode: Union[bool, Variable] = True,
    **completion_args,
) -> Tuple[Variable, Variable]:
    """
    Implements an evaluation of a model prediction using a language model (LM) as the
    judge within the `afnio` framework, supporting automatic differentiation.

    This function returns a `score` and an `explanation`, both as
    [`Variable`][afnio.Variable] objects, by comparing a `prediction` against a `target`
    (when present) using a composite prompt. The prompt is constructed from a list of
    `messages` and optional `inputs`, which can dynamically populate placeholders in the
    message templates. The evaluation process leverages the specified
    `forward_model_client` to perform the LM-based assessment.

    The `prediction` is a [`Variable`][afnio.Variable]. The `target` can be a string,
    a list of strings, or a [`Variable`][afnio.Variable]. Similarly, the `inputs`
    dictionary can include strings, lists of strings, or [`Variable`][afnio.Variable]s.
    Each [`Variable`][afnio.Variable] passed as an input argument can have either
    a scalar or a list [`data`][afnio.Variable.data] field, supporting both individual
    samples and batch processing. For batch processing, the lengths of `prediction`,
    `target`, and any batched `inputs` must match.

    The `success_fn` parameter is a user-defined function that returns `True` when
    all predictions evaluated by the LM as Judge are considered successful, and `False`
    otherwise. If `success_fn` returns `True`, the `backward` pass will skip gradient
    calculations and directly return an empty gradient, optimizing computational time.

    If you are processing a batch of predictions and targets, you can use the
    `reduction_fn` to aggregate individual scores (e.g., using `sum` to compute a total
    score). The `reduction_fn_purpose` parameter is a brief description of the
    aggregation's purpose (e.g., `"summation"`). If you don't want any aggregation, set
    both `reduction_fn` and `reduction_fn_purpose` to `None`.

    The function operates in two modes controlled by `eval_mode`:

    - **eval_mode=True (default)** – Computes gradients for `prediction` only. Use it
      for direct feedback on predictions.
    - **eval_mode=False** – Computes gradients for `messages` and `inputs`. Use it to
      optimize the evaluator or align with human evaluation datasets.

    Additional model parameters, such as temperature, max tokens, or seed values, can
    be passed through `completion_args` to customize the LLM's behavior.

    Args:
        forward_model_client: The LM model client used for the forward
            pass evaluation.
        messages: A list of messages that compose the prompt/context for the LM.
            Each message is a dictionary with a `"role"` (e.g., `"system"`, `"user"`,
            `"assistant"`) and a `"content"` field, which is a list of `Variable`
            objects. The `Variable` objects in the `"content"` can contain placeholders
            (e.g., `{prediction}`, `{target}`) that will be populated with the
            corresponding values from the `inputs` dictionary.
        prediction: The predicted variable to evaluate, which can have scalar or
            list [`data`][afnio.Variable.data] (supporting both individual and
            batch processing).
        target: The target (ground truth) to compare against, which can be a string,
            a list of strings, or a `Variable`. Optional if the evaluation does not
            require a target and only relies on the correctness of the LM Judge's
            assessment of the `prediction`.
        inputs: A dictionary mapping placeholder names to their corresponding
            values, which can be strings or `Variable` instances. These values
            will be used to populate the placeholders in the `messages` content
            before sending the prompt to the LM. For example, if a message
            `"content"` field contains the placeholder `{color}`, the `inputs`
            dictionary should have a key `"color"` with the value to substitute
            in the prompt. Optional if there are no placeholders in the messages or
            if all placeholders are directly related to `prediction` and `target`.
        success_fn: A user-defined function that takes the list of scores returned
            by the LM Judge and returns `True` if all predictions are considered
            successful, or `False` otherwise.
        reduction_fn: An optional function to aggregate scores across a batch of
            predictions and targets. If `None`, no aggregation is applied.
        reduction_fn_purpose: A brief description of the purpose of `reduction_fn`,
            used by the autodiff engine to generate explanations. Required if
            `reduction_fn` is provided.
        eval_mode: Indicates the evaluation mode. If `True`, the `backward` pass
            will compute gradients for the `prediction` variable only. If `False`,
            the `backward` pass will compute gradients for the `messages` and
            `inputs`, allowing optimization of the evaluator itself or alignment
            with human evaluation datasets.
        **completion_args: Additional keyword arguments to pass to the LM model
            client's `chat` method, such as temperature, max tokens, or seed values,
            to customize the LLM's behavior during the evaluation.

    Returns:
        score: A variable containing the evaluation score(s),
            or their aggregation if `reduction_fn` is provided.
        explanation: A variable containing the explanation(s) of the evaluation,
            or their aggregation if `reduction_fn` is provided.

    Raises:
        RuntimeError: If the LM response to generate the evaluation `score` and
            `explanation` cannot be parsed as valid JSON.
        TypeError: If the types of `forward_model_client`, `messages`, `prediction`,
            `target`, `inputs`, `success_fn`, `reduction_fn`,
            `reduction_fn_purpose`, or `eval_mode` are not as expected.
        ValueError: If the lengths of `prediction.data` and `target` (or
            `target.data`, when `target` is a `Variable`) do not match when both are
            lists, or if `reduction_fn_purpose` (or `reduction_fn_purpose.data`) is
            an empty string, or if `inputs` contains keys that conflict with
            `prediction` or `target`.

    Examples:
        Example with scalar inputs:
        >>> task = Variable(
        ...     "Evaluate if the translation is accurate.",
        ...     role="evaluation task",
        ...     requires_grad=True
        ... )
        >>> format = Variable(
        ...     "Provide 'score' (true/false) and 'explanation' in JSON.",
        ...     role="output format"
        ... )
        >>> user = Variable(
        ...     "<PREDICTION>{prediction}</PREDICTION><TARGET>{target}</TARGET>",
        ...     role="user query"
        ... )
        >>> prediction = Variable(
        ...     "Hola Mundo",
        ...     role="translated text",
        ...     requires_grad=True
        ... )
        >>> target = Variable("Ciao Mondo", role="expected output")
        >>> messages = [
        ...     {"role": "system", "content": [task, format]},
        ...     {"role": "user", "content": [user]}
        ... ]
        >>> score, explanation = F.lm_judge_evaluator(
        ...     model,
        ...     messages,
        ...     prediction,
        ...     target,
        ...     temperature=0.5,
        ... )
        >>> score.data
        False
        >>> explanation.data
        'The translated text is in Spanish, but the expected is in Italian.'
        >>> explanation.backward()
        >>> prediction.grad[0].data
        'The translated text should be in Italian.'

        Example with batched inputs:
        >>> task = Variable(
        ...     "Evaluate if the translation is accurate.",
        ...     role="evaluation task",
        ...     requires_grad=True
        ... )
        >>> format = Variable(
        ...     "Provide 'score' (true/false) and 'explanation' in JSON.",
        ...     role="output format"
        ... )
        >>> user = Variable(
        ...     "<PREDICTION>{prediction}</PREDICTION><TARGET>{target}</TARGET>",
        ...     role="user query"
        ... )
        >>> prediction = Variable(
        ...     data=["Hola Mundo", "Salve a tutti"],
        ...     role="translated text",
        ...     requires_grad=True,
        ... )
        >>> target = ["Ciao Mondo", "Salve a tutti"]
        >>> score, explanation = F.lm_judge_evaluator(
        ...     model,
        ...     messages,
        ...     prediction,
        ...     target,
        ...     reduction_fn=sum,
        ...     reduction_fn_purpose="summation",
        ... )
        >>> score.data
        1
        >>> explanation.data
        'The evaluation function, designed using an LM as the judge, compared the <DATA> fields of the predicted variable and the target variable across all samples in the batch. These scores were then aggregated using the reduction function 'summation', resulting in a final aggregated score: 1.'
        >>> explanation.backward()
        >>> prediction.grad[0].data
        'The translated text should be in Italian.'

    See Also:
        [`afnio.autodiff.evaluator.LMJudgeEvaluator`][afnio.autodiff.evaluator.LMJudgeEvaluator]
        for the underlying operation.
    """  # noqa: E501
    return LMJudgeEvaluator.apply(
        forward_model_client,
        messages,
        prediction,
        target,
        inputs,
        success_fn,
        reduction_fn,
        reduction_fn_purpose,
        eval_mode,
        **completion_args,
    )

afnio.cognitive.functional.deterministic_evaluator(prediction, target, eval_fn, eval_fn_purpose, success_fn, reduction_fn, reduction_fn_purpose)

Evaluates predictions deterministically using a user-defined evaluation function within the afnio framework, supporting automatic differentiation.

The DeterministicEvaluator function computes a score and an explanation based on the prediction and target inputs using a user-defined evaluation function (eval_fn). The evaluation function's purpose is described by eval_fn_purpose. Outputs include a numerical or textual score and a textual explanation, both wrapped as Variable objects.

The prediction is a Variable. The target can be a string, a list of strings, or a Variable. Each Variable passed as an input argument can have either a scalar or a list data field, supporting both individual samples and batch processing. For batch processing, the lengths of prediction and target must match.

The success_fn parameter is a user-defined function that returns True when all predictions evaluated by eval_fn are considered successful, and False otherwise. If success_fn returns True, the backward pass will skip gradient calculations and directly return an empty gradient, optimizing computational time.

The reduction_fn parameter specifies the aggregation function to use for scores across a batch of predictions and targets. When specified, the reduction function's purpose is described using reduction_fn_purpose. If aggregation is not desired, set reduction_fn and reduction_fn_purpose to None.

Parameters:

Name Type Description Default
prediction Variable

The predicted variable to evaluate, which can have scalar or list data (supporting both individual and batch processing).

required
target str | list[str] | Variable

The target (ground truth) to compare against, which can be a string, a list of strings, or a Variable.

required
eval_fn Callable[[Variable, Union[str, Variable]], list[Any]]

A user-defined function that takes a prediction and a target and returns a list of scores for each sample. If target is a Variable, the function should compare the data fields of prediction and target.

required
eval_fn_purpose str | Variable

A brief description of the purpose of eval_fn, used by the autodiff engine to generate the explanations.

required
success_fn Callable[[List[Any]], bool] | None

A user-defined function that takes the list of scores returned by eval_fn and returns True if all predictions are considered successful, or False otherwise.

required
reduction_fn Callable[[List[Any]], Any] | None

An optional function to aggregate scores across a batch of predictions and targets. If None, no aggregation is applied.

required
reduction_fn_purpose str | Variable | None

A brief description of the purpose of reduction_fn, used by the autodiff engine to generate explanations. Required if reduction_fn is provided.

required

Returns:

Name Type Description
score Variable

A variable containing the evaluation score(s), or their aggregation if reduction_fn is provided.

explanation Variable

A variable containing the explanation(s) of the evaluation, or their aggregation if reduction_fn is provided.

Raises:

Type Description
TypeError

If the types of prediction, target, eval_fn, eval_fn_purpose, success_fn, reduction_fn, or reduction_fn_purpose are not as expected.

ValueError

If the lengths of prediction.data and target (or target.data, when target is a Variable) do not match when both are lists, or if eval_fn_purpose (or eval_fn_purpose.data) is an empty string, or if reduction_fn_purpose (or reduction_fn_purpose.data) is an empty string, or if the number of scores returned by eval_fn does not match the number of samples in the batch.

Examples:

Example with scalar inputs:

>>> prediction = Variable(
...     data="green",
...     role="color prediction",
...     requires_grad=True
... )
>>> target = "red"
>>> def exact_match_fn(p: str, t: str) -> int:
...     return 1 if p == t else 0
>>> score, explanation = F.deterministic_evaluator(
...     prediction,
...     target,
...     exact_match_fn,
...     "exact match",
... )
>>> score.data
0
>>> explanation.data
'The evaluation function, designed for 'exact match', compared the <DATA> field of the predicted variable ('green') with the <DATA> field of the target variable ('red'), resulting in a score: 0.'
>>> explanation.backward()
>>> prediction.grad[0].data
'Reassess the criteria that led to the initial prediction of 'green'.'

Example with batched inputs:

>>> prediction = Variable(
...     data=["green", "blue"],
...     role="color prediction",
...     requires_grad=True
... )
>>> target = ["red", "blue"]
>>> def exact_match_fn(p: str, t: str) -> int:
...     return 1 if p == t else 0
>>> score, explanation = F.deterministic_evaluator(
...     prediction,
...     target,
...     exact_match_fn,
...     "exact match",
...     reduction_fn=sum,
...     reduction_fn_purpose="summation"
... )
>>> score.data
1
>>> explanation.data
'The evaluation function, designed for 'exact match', compared the <DATA> fields of the predicted variable and the target variable across all samples in the batch, generating individual scores for each pair. These scores were then aggregated using the reduction function 'summation', resulting in a final aggregated score: 1.'
>>> explanation.backward()
>>> prediction.grad[0].data
'Reassess the criteria that led to the initial prediction of 'green'.'
See Also

afnio.autodiff.evaluator.DeterministicEvaluator for the underlying operation.

Source code in afnio/cognitive/functional.py
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
def deterministic_evaluator(
    prediction: Variable,
    target: Union[str, List[str], Variable],
    eval_fn: Callable[[Variable, Union[str, Variable]], List[Any]],
    eval_fn_purpose: Union[str, Variable],
    success_fn: Optional[Callable[[List[Any]], bool]],
    reduction_fn: Optional[Callable[[List[Any]], Any]],
    reduction_fn_purpose: Optional[Union[str, Variable]],
) -> Tuple[Variable, Variable]:
    """
    Evaluates predictions deterministically using a user-defined evaluation function
    within the `afnio` framework, supporting automatic differentiation.

    The `DeterministicEvaluator` function computes a `score` and an `explanation` based
    on the `prediction` and `target` inputs using a user-defined evaluation function
    (`eval_fn`). The evaluation function's purpose is described by `eval_fn_purpose`.
    Outputs include a numerical or textual score and a textual explanation, both wrapped
    as [`Variable`][afnio.Variable] objects.

    The `prediction` is a [`Variable`][afnio.Variable]. The `target` can be a string,
    a list of strings, or a [`Variable`][afnio.Variable].
    Each [`Variable`][afnio.Variable] passed as an input argument can have either
    a scalar or a list [`data`][afnio.Variable.data] field, supporting both individual
    samples and batch processing. For batch processing, the lengths of `prediction`
    and `target` must match.

    The `success_fn` parameter is a user-defined function that returns `True` when
    all predictions evaluated by `eval_fn` are considered successful, and `False`
    otherwise. If `success_fn` returns `True`, the `backward` pass will skip gradient
    calculations and directly return an empty gradient, optimizing computational time.

    The `reduction_fn` parameter specifies the aggregation function to use for scores
    across a batch of predictions and targets. When specified, the reduction function's
    purpose is described using `reduction_fn_purpose`. If aggregation is not desired,
    set `reduction_fn` and `reduction_fn_purpose` to `None`.

    Args:
        prediction: The predicted variable to evaluate, which can have scalar or
            list [`data`][afnio.Variable.data] (supporting both individual and
            batch processing).
        target: The target (ground truth) to compare against, which can be a string,
            a list of strings, or a `Variable`.
        eval_fn: A user-defined function that takes a prediction and a target
            and returns a list of scores for each sample. If `target` is a
            [`Variable`][afnio.Variable], the function should compare the
            [`data`][afnio.Variable.data] fields of `prediction` and `target`.
        eval_fn_purpose: A brief description of the purpose of `eval_fn`,
            used by the autodiff engine to generate the explanations.
        success_fn: A user-defined function that takes the list of scores returned
            by `eval_fn` and returns `True` if all predictions are considered
            successful, or `False` otherwise.
        reduction_fn: An optional function to aggregate scores across a batch of
            predictions and targets. If `None`, no aggregation is applied.
        reduction_fn_purpose: A brief description of the purpose of `reduction_fn`,
            used by the autodiff engine to generate explanations. Required if
            `reduction_fn` is provided.

    Returns:
        score: A variable containing the evaluation score(s),
            or their aggregation if `reduction_fn` is provided.
        explanation: A variable containing the explanation(s) of the evaluation,
            or their aggregation if `reduction_fn` is provided.

    Raises:
        TypeError: If the types of `prediction`, `target`, `eval_fn`,
            `eval_fn_purpose`, `success_fn`, `reduction_fn`,
            or `reduction_fn_purpose` are not as expected.
        ValueError: If the lengths of `prediction.data` and `target` (or
            `target.data`, when `target` is a `Variable`) do not match when
            both are lists, or if `eval_fn_purpose` (or `eval_fn_purpose.data`)
            is an empty string, or if `reduction_fn_purpose` (or
            `reduction_fn_purpose.data`) is an empty string,
            or if the number of scores returned by `eval_fn`
            does not match the number of samples in the batch.

    Examples:
        Example with scalar inputs:
        >>> prediction = Variable(
        ...     data="green",
        ...     role="color prediction",
        ...     requires_grad=True
        ... )
        >>> target = "red"
        >>> def exact_match_fn(p: str, t: str) -> int:
        ...     return 1 if p == t else 0
        >>> score, explanation = F.deterministic_evaluator(
        ...     prediction,
        ...     target,
        ...     exact_match_fn,
        ...     "exact match",
        ... )
        >>> score.data
        0
        >>> explanation.data
        'The evaluation function, designed for 'exact match', compared the <DATA> field of the predicted variable ('green') with the <DATA> field of the target variable ('red'), resulting in a score: 0.'
        >>> explanation.backward()
        >>> prediction.grad[0].data
        'Reassess the criteria that led to the initial prediction of 'green'.'

        Example with batched inputs:
        >>> prediction = Variable(
        ...     data=["green", "blue"],
        ...     role="color prediction",
        ...     requires_grad=True
        ... )
        >>> target = ["red", "blue"]
        >>> def exact_match_fn(p: str, t: str) -> int:
        ...     return 1 if p == t else 0
        >>> score, explanation = F.deterministic_evaluator(
        ...     prediction,
        ...     target,
        ...     exact_match_fn,
        ...     "exact match",
        ...     reduction_fn=sum,
        ...     reduction_fn_purpose="summation"
        ... )
        >>> score.data
        1
        >>> explanation.data
        'The evaluation function, designed for 'exact match', compared the <DATA> fields of the predicted variable and the target variable across all samples in the batch, generating individual scores for each pair. These scores were then aggregated using the reduction function 'summation', resulting in a final aggregated score: 1.'
        >>> explanation.backward()
        >>> prediction.grad[0].data
        'Reassess the criteria that led to the initial prediction of 'green'.'

    See Also:
        [`afnio.autodiff.evaluator.DeterministicEvaluator`][afnio.autodiff.evaluator.DeterministicEvaluator]
        for the underlying operation.
    """  # noqa: E501
    return DeterministicEvaluator.apply(
        prediction,
        target,
        eval_fn,
        eval_fn_purpose,
        success_fn,
        reduction_fn,
        reduction_fn_purpose,
    )

afnio.cognitive.functional.exact_match_evaluator(prediction, target, reduction_fn=builtins.sum, reduction_fn_purpose='summation')

Evaluates predictions using exact matching within the afnio framework, supporting automatic differentiation.

The ExactMatchEvaluator function computes a score and an explanation by comparing the data fields of a prediction and a target for an exact match. For each sample:

  • A score of 1 is assigned for an exact match.
  • A score of 0 is assigned otherwise.

The prediction is a Variable. The target can be a string, a list of strings, or a Variable. Each Variable passed as an input argument can have either a scalar or a list data field, supporting both individual samples and batch processing. For batch processing, the lengths of prediction and target must match.

If batched inputs are provided, the scores can be aggregated using an optional reduction_fn, such as sum. The purpose of the reduction is described using reduction_fn_purpose. If aggregation is not desired, set reduction_fn and reduction_fn_purpose to None.

Parameters:

Name Type Description Default
prediction Variable

The predicted variable to evaluate, which can have scalar or list data (supporting both individual and batch processing).

required
target str | list[str] | Variable

The target (ground truth) to compare against, which can be a string, a list of strings, or a Variable.

required
reduction_fn Callable[[List[Any]], Any] | None

An optional function to aggregate scores across a batch of predictions and targets. If None, no aggregation is applied.

sum
reduction_fn_purpose str | Variable | None

A brief description of the purpose of reduction_fn, used by the autodiff engine to generate explanations. Required if reduction_fn is provided.

'summation'

Returns:

Name Type Description
score Variable

A variable containing the evaluation score(s), or their aggregation if reduction_fn is provided.

explanation Variable

A variable containing the explanation(s) of the evaluation, or their aggregation if reduction_fn is provided.

Raises:

Type Description
TypeError

If the types of prediction, target, reduction_fn, or reduction_fn_purpose are not as expected.

ValueError

If the lengths of prediction.data and target (or target.data, when target is a Variable) do not match when both are lists, or if reduction_fn_purpose (or reduction_fn_purpose.data) is an empty string.

Examples:

Example with scalar inputs:

>>> prediction = Variable(
...     data="green",
...     role="color prediction",
...     requires_grad=True
... )
>>> target = "red",
>>> score, explanation = F.exact_match_evaluator(prediction, target)
>>> score.data
0
>>> explanation.data
'The evaluation function, designed for 'exact match', compared the <DATA> field of the predicted variable ('green') with the <DATA> field of the target variable ('red'), resulting in a score: 0.'
>>> explanation.backward()
>>> prediction.grad[0].data
'Reassess the criteria that led to the initial prediction of 'green'.'

Example with batched inputs:

>>> prediction = Variable(
...     data=["green", "blue"],
...     role="color prediction",
...     requires_grad=True
... )
>>> target = ["red", "blue"]
>>> score, explanation = F.exact_match_evaluator(prediction, target)
>>> score.data
1
>>> explanation.data
'The evaluation function, designed for 'exact match', compared the <DATA> fields of the predicted variable and the target variable across all samples in the batch, generating individual scores for each pair. These scores were then aggregated using the reduction function 'summation', resulting in a final aggregated score: 1.'
>>> explanation.backward()
>>> prediction.grad[0].data
'Reassess the criteria that led to the initial prediction of 'green'.'
See Also

afnio.autodiff.evaluator.ExactMatchEvaluator for the underlying operation.

Source code in afnio/cognitive/functional.py
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
def exact_match_evaluator(
    prediction: Variable,
    target: Union[str, List[str], Variable],
    reduction_fn: Optional[Callable[[List[Any]], Any]] = builtins.sum,
    reduction_fn_purpose: Optional[Union[str, Variable]] = "summation",
) -> Tuple[Variable, Variable]:
    """
    Evaluates predictions using exact matching within the `afnio` framework,
    supporting automatic differentiation.

    The `ExactMatchEvaluator` function computes a `score` and an `explanation` by
    comparing the [`data`][afnio.Variable.data] fields of a `prediction`
    and a `target` for an exact match. For each sample:

    - A score of `1` is assigned for an exact match.
    - A score of `0` is assigned otherwise.

    The `prediction` is a [`Variable`][afnio.Variable]. The `target` can be a string,
    a list of strings, or a [`Variable`][afnio.Variable].
    Each [`Variable`][afnio.Variable] passed as an input argument can have either
    a scalar or a list [`data`][afnio.Variable.data] field, supporting both individual
    samples and batch processing. For batch processing, the lengths of `prediction`
    and `target` must match.

    If batched inputs are provided, the scores can be aggregated using an optional
    `reduction_fn`, such as `sum`. The purpose of the reduction is described using
    `reduction_fn_purpose`. If aggregation is not desired, set `reduction_fn` and
    `reduction_fn_purpose` to `None`.

    Args:
        prediction: The predicted variable to evaluate, which can have scalar or
            list [`data`][afnio.Variable.data] (supporting both individual and
            batch processing).
        target: The target (ground truth) to compare against, which can be a string,
            a list of strings, or a `Variable`.
        reduction_fn: An optional function to aggregate scores across a batch of
            predictions and targets. If `None`, no aggregation is applied.
        reduction_fn_purpose: A brief description of the purpose of `reduction_fn`,
            used by the autodiff engine to generate explanations. Required if
            `reduction_fn` is provided.

    Returns:
        score: A variable containing the evaluation score(s),
            or their aggregation if `reduction_fn` is provided.
        explanation: A variable containing the explanation(s) of the evaluation,
            or their aggregation if `reduction_fn` is provided.

    Raises:
        TypeError: If the types of `prediction`, `target`, `reduction_fn`,
            or `reduction_fn_purpose` are not as expected.
        ValueError: If the lengths of `prediction.data` and `target` (or
            `target.data`, when `target` is a `Variable`) do not match when
            both are lists, or if `reduction_fn_purpose` (or
            `reduction_fn_purpose.data`) is an empty string.

    Examples:
        Example with scalar inputs:
        >>> prediction = Variable(
        ...     data="green",
        ...     role="color prediction",
        ...     requires_grad=True
        ... )
        >>> target = "red",
        >>> score, explanation = F.exact_match_evaluator(prediction, target)
        >>> score.data
        0
        >>> explanation.data
        'The evaluation function, designed for 'exact match', compared the <DATA> field of the predicted variable ('green') with the <DATA> field of the target variable ('red'), resulting in a score: 0.'
        >>> explanation.backward()
        >>> prediction.grad[0].data
        'Reassess the criteria that led to the initial prediction of 'green'.'

        Example with batched inputs:
        >>> prediction = Variable(
        ...     data=["green", "blue"],
        ...     role="color prediction",
        ...     requires_grad=True
        ... )
        >>> target = ["red", "blue"]
        >>> score, explanation = F.exact_match_evaluator(prediction, target)
        >>> score.data
        1
        >>> explanation.data
        'The evaluation function, designed for 'exact match', compared the <DATA> fields of the predicted variable and the target variable across all samples in the batch, generating individual scores for each pair. These scores were then aggregated using the reduction function 'summation', resulting in a final aggregated score: 1.'
        >>> explanation.backward()
        >>> prediction.grad[0].data
        'Reassess the criteria that led to the initial prediction of 'green'.'

    See Also:
        [`afnio.autodiff.evaluator.ExactMatchEvaluator`][afnio.autodiff.evaluator.ExactMatchEvaluator]
        for the underlying operation.
    """  # noqa: E501
    return ExactMatchEvaluator.apply(
        prediction, target, reduction_fn, reduction_fn_purpose
    )