Skip to content

afnio.optim

afnio.optim.TGD

Bases: Optimizer

Textual Gradient Descent (TGD) optimizer.

TGD is an optimization algorithm for language-model–based systems where gradients are represented and propagated as natural language feedback rather than numerical tensors. Instead of computing numerical derivatives, TGD relies on a language model to generate textual critiques (gradients) that are used to iteratively refine prompt-based parameters.

This implementation follows the ideas introduced in the TextGrad paper, which proposes treating language-model feedback as a differentiable signal for optimizing textual variables and prompt programs.

TGD operates over Variable objects and consumes textual gradients produced by the automatic differentiation process. These gradients are used to update the optimized variables, with optional momentum applied to recent gradient history to stabilize and accelerate optimization.

Parameters are organized into parameter groups, similar to optimizers in PyTorch. This allows different optimization settings—such as optimization meta-prompts (messages), constraints, and momentum—to be applied consistently across groups.

References:

Source code in afnio/optim/tgd.py
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
class TGD(Optimizer):
    """
    Textual Gradient Descent (TGD) optimizer.

    TGD is an optimization algorithm for language-model–based systems where
    gradients are represented and propagated as *natural language feedback*
    rather than numerical tensors. Instead of computing numerical derivatives,
    TGD relies on a language model to generate textual critiques (gradients)
    that are used to iteratively refine prompt-based parameters.

    This implementation follows the ideas introduced in the *TextGrad* paper,
    which proposes treating language-model feedback as a differentiable signal
    for optimizing textual variables and prompt programs.

    TGD operates over [`Variable`][afnio.Variable] objects and consumes textual
    gradients produced by the automatic differentiation process. These gradients
    are used to update the optimized variables, with optional momentum applied
    to recent gradient history to stabilize and accelerate optimization.

    Parameters are organized into parameter groups, similar to optimizers in
    PyTorch. This allows different optimization settings—such as optimization
    meta-prompts (`messages`), `constraints`, and `momentum`—to be applied
    consistently across groups.

    **References:**

    - *TextGrad: Automatic Differentiation via Large Language Models*
        [https://arxiv.org/abs/2406.07496](https://arxiv.org/abs/2406.07496)
    """

    def __init__(
        self,
        params: ParamsT,
        model_client: Optional[ChatCompletionModel],
        messages: MultiTurnMessages = TGD_MESSAGES,
        inputs: Optional[Dict[str, Union[str, Variable]]] = None,
        constraints: Optional[List[Union[str, Variable]]] = None,
        momentum: int = 0,
        **completion_args,
    ):
        """Initialize the Textual Gradient Descent (TGD) optimizer.

        Args:
            params (iterable): Iterable of parameters to optimize or dicts defining
                parameter groups.
            model_client: LM model client used for optimization.
            messages: Messages for multi-turn interactions. It typically defines
                the optimizer system prompt and user instruction. In-context
                examples (shots) can be added as well.
            inputs: Dynamic values to fill placeholders within message templates
            constraints: A list of natural language constraints for optimization.
            momentum (int, optional): Momentum window size. Tracks the last `momentum`
                gradients, which helps accelerate updates in the right direction and
                dampen oscillations. Defaults to 0.
            completion_args (Dict[str, Any], optional): Additional arguments to pass to
                the model client when generating text completions. Defaults to an
                empty dictionary.
        """
        # Workaround to trigger TGD_MESSAGES registration with the server
        # and store related variable_ids on the client side
        if messages is TGD_MESSAGES:
            messages = [
                {
                    "role": "system",
                    "content": [
                        Variable(
                            data="Placeholder for Textual Gradient Descent optimizer system prompt",  # noqa: E501
                            role="Textual Gradient Descent optimizer system prompt",
                        )
                    ],
                },
                {
                    "role": "user",
                    "content": [
                        Variable(
                            data="Placeholder for Textual Gradient Descent optimizer user prompt",  # noqa: E501
                            role="Textual Gradient Descent optimizer user prompt",
                        )
                    ],
                },
            ]

        defaults = dict(
            model_client=model_client,
            messages=messages,
            inputs=inputs or {},
            constraints=constraints or [],
            momentum=momentum,
            completion_args=completion_args,
        )
        super().__init__(params, defaults)

    def step(
        self, closure: Optional[Callable] = None
    ) -> Optional[Tuple[Variable, Variable]]:
        """Performs a single optimization step.

        Args:
            closure: A closure that reevaluates the model and returns the loss.

        Returns:
            The loss if `closure` is provided, otherwise None. The loss should \
            return a numerical or textual score and a textual explanation, both \
            wrapped as [`Variable`][afnio.Variable] objects
        """
        loss = closure() if closure else (None, None)
        super().step()
        return loss

    def _extract_variable_ids_from_state(self, state):
        """
        Extract only the variable_ids of deepcopied parameters (i.e., those generated
        on the server) from the optimizer state.

        Args:
            state (list): The serialized optimizer state as returned by the server.

        Returns:
            Set[str]: Set of variable_ids for deepcopied parameters.
        """
        var_ids = set()
        for entry in state:
            momentum_buffer = entry.get("value", {}).get("momentum_buffer", [])
            for buf_entry in momentum_buffer:
                if (
                    isinstance(buf_entry, list)
                    and len(buf_entry) > 0
                    and isinstance(buf_entry[0], dict)
                    and "variable_id" in buf_entry[0]
                ):
                    var_ids.add(buf_entry[0]["variable_id"])
        return var_ids

__init__(params, model_client, messages=TGD_MESSAGES, inputs=None, constraints=None, momentum=0, **completion_args)

Initialize the Textual Gradient Descent (TGD) optimizer.

Parameters:

Name Type Description Default
params iterable

Iterable of parameters to optimize or dicts defining parameter groups.

required
model_client ChatCompletionModel | None

LM model client used for optimization.

required
messages MultiTurnMessages

Messages for multi-turn interactions. It typically defines the optimizer system prompt and user instruction. In-context examples (shots) can be added as well.

TGD_MESSAGES
inputs dict[str, str | Variable] | None

Dynamic values to fill placeholders within message templates

None
constraints list[str | Variable] | None

A list of natural language constraints for optimization.

None
momentum int

Momentum window size. Tracks the last momentum gradients, which helps accelerate updates in the right direction and dampen oscillations. Defaults to 0.

0
completion_args dict[str, Any]

Additional arguments to pass to the model client when generating text completions. Defaults to an empty dictionary.

{}
Source code in afnio/optim/tgd.py
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
def __init__(
    self,
    params: ParamsT,
    model_client: Optional[ChatCompletionModel],
    messages: MultiTurnMessages = TGD_MESSAGES,
    inputs: Optional[Dict[str, Union[str, Variable]]] = None,
    constraints: Optional[List[Union[str, Variable]]] = None,
    momentum: int = 0,
    **completion_args,
):
    """Initialize the Textual Gradient Descent (TGD) optimizer.

    Args:
        params (iterable): Iterable of parameters to optimize or dicts defining
            parameter groups.
        model_client: LM model client used for optimization.
        messages: Messages for multi-turn interactions. It typically defines
            the optimizer system prompt and user instruction. In-context
            examples (shots) can be added as well.
        inputs: Dynamic values to fill placeholders within message templates
        constraints: A list of natural language constraints for optimization.
        momentum (int, optional): Momentum window size. Tracks the last `momentum`
            gradients, which helps accelerate updates in the right direction and
            dampen oscillations. Defaults to 0.
        completion_args (Dict[str, Any], optional): Additional arguments to pass to
            the model client when generating text completions. Defaults to an
            empty dictionary.
    """
    # Workaround to trigger TGD_MESSAGES registration with the server
    # and store related variable_ids on the client side
    if messages is TGD_MESSAGES:
        messages = [
            {
                "role": "system",
                "content": [
                    Variable(
                        data="Placeholder for Textual Gradient Descent optimizer system prompt",  # noqa: E501
                        role="Textual Gradient Descent optimizer system prompt",
                    )
                ],
            },
            {
                "role": "user",
                "content": [
                    Variable(
                        data="Placeholder for Textual Gradient Descent optimizer user prompt",  # noqa: E501
                        role="Textual Gradient Descent optimizer user prompt",
                    )
                ],
            },
        ]

    defaults = dict(
        model_client=model_client,
        messages=messages,
        inputs=inputs or {},
        constraints=constraints or [],
        momentum=momentum,
        completion_args=completion_args,
    )
    super().__init__(params, defaults)

step(closure=None)

Performs a single optimization step.

Parameters:

Name Type Description Default
closure Callable | None

A closure that reevaluates the model and returns the loss.

None

Returns:

Type Description
tuple[Variable, Variable] | None

The loss if closure is provided, otherwise None. The loss should return a numerical or textual score and a textual explanation, both wrapped as Variable objects

Source code in afnio/optim/tgd.py
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
def step(
    self, closure: Optional[Callable] = None
) -> Optional[Tuple[Variable, Variable]]:
    """Performs a single optimization step.

    Args:
        closure: A closure that reevaluates the model and returns the loss.

    Returns:
        The loss if `closure` is provided, otherwise None. The loss should \
        return a numerical or textual score and a textual explanation, both \
        wrapped as [`Variable`][afnio.Variable] objects
    """
    loss = closure() if closure else (None, None)
    super().step()
    return loss