Skip to content

afnio.autodiff

afnio.autodiff.is_grad_enabled()

Check whether grad mode is currently enabled.

Returns:

Type Description
bool

True if grad mode is currently enabled, False otherwise.

Source code in afnio/autodiff/grad_mode.py
 9
10
11
12
13
14
15
16
def is_grad_enabled() -> bool:
    """
    Check whether grad mode is currently enabled.

    Returns:
        `True` if grad mode is currently enabled, `False` otherwise.
    """
    return getattr(_grad_enabled, "enabled", True)

afnio.autodiff.no_grad()

Context manager that disables gradient calculation. All operations within this block will not track gradients, making them more memory-efficient.

Disabling gradient calculation is useful for inference, when you are sure that you will not call Variable.backward(). It will reduce memory consumption for computations that would otherwise have requires_grad=True.

In this mode, the result of every computation will have requires_grad=False, even when the inputs have requires_grad=True. There is an exception! All factory functions, or functions that create a new Variable and take a requires_grad kwarg, will NOT be affected by this mode.

This context manager is thread local; it will not affect computation in other threads.

Also functions as a decorator.

Examples:

>>> x = afnio.Variable("abc", role="variable", requires_grad=True)
>>> with afnio.no_grad():
...     y = x + x
>>> y.requires_grad
False
>>> @afnio.no_grad()
... def doubler(x):
...     return x + x
>>> z = doubler(x)
>>> z.requires_grad
False
>>> @afnio.no_grad()
... def tripler(x):
...     return x + x + x
>>> z = tripler(x)
>>> z.requires_grad
False
>>> # factory function exception
>>> with afnio.no_grad():
...     a = afnio.cognitive.Parameter("xyz")
>>> a.requires_grad
True
Source code in afnio/autodiff/grad_mode.py
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
@contextmanager
def no_grad():
    """
    Context manager that disables gradient calculation. All operations within this block
    will not track gradients, making them more memory-efficient.

    Disabling gradient calculation is useful for inference, when you are sure
    that you will not call [`Variable.backward()`][afnio.Variable.backward]. It will
    reduce memory consumption for computations that would otherwise have
    `requires_grad=True`.

    In this mode, the result of every computation will have
    `requires_grad=False`, even when the inputs have `requires_grad=True`.
    There is an exception! All factory functions, or functions that create
    a new Variable and take a requires_grad kwarg, will NOT be affected by
    this mode.

    This context manager is thread local; it will not affect computation
    in other threads.

    Also functions as a decorator.

    Examples:
        >>> x = afnio.Variable("abc", role="variable", requires_grad=True)
        >>> with afnio.no_grad():
        ...     y = x + x
        >>> y.requires_grad
        False
        >>> @afnio.no_grad()
        ... def doubler(x):
        ...     return x + x
        >>> z = doubler(x)
        >>> z.requires_grad
        False
        >>> @afnio.no_grad()
        ... def tripler(x):
        ...     return x + x + x
        >>> z = tripler(x)
        >>> z.requires_grad
        False
        >>> # factory function exception
        >>> with afnio.no_grad():
        ...     a = afnio.cognitive.Parameter("xyz")
        >>> a.requires_grad
        True
    """
    previous_state = is_grad_enabled()  # Store the current state
    set_grad_enabled(False)  # Disable gradients
    try:
        yield  # Execute the block
    finally:
        set_grad_enabled(previous_state)  # Restore the original state

afnio.autodiff.set_grad_enabled(mode)

Set the global state of gradient calculation on or off.

set_grad_enabled will enable or disable gradients based on its argument mode.

Parameters:

Name Type Description Default
mode bool

If True, enables gradient calculation. If False, disables it.

required

Examples:

>>> x = afnio.Variable("Hello", requires_grad=True)
>>> _ = afnio.set_grad_enabled(True)
>>> y = x + x
>>> y.requires_grad
True
>>> _ = afnio.set_grad_enabled(False)
>>> y = x + x
>>> y.requires_grad
False
Source code in afnio/autodiff/grad_mode.py
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
def set_grad_enabled(mode: bool):
    """
    Set the global state of gradient calculation on or off.

    `set_grad_enabled` will enable or disable gradients based on its argument `mode`.

    Args:
        mode: If `True`, enables gradient calculation. If `False`, disables it.

    Examples:
        >>> x = afnio.Variable("Hello", requires_grad=True)
        >>> _ = afnio.set_grad_enabled(True)
        >>> y = x + x
        >>> y.requires_grad
        True
        >>> _ = afnio.set_grad_enabled(False)
        >>> y = x + x
        >>> y.requires_grad
        False
    """
    _grad_enabled.enabled = mode

afnio.autodiff.backward(variables, grad_variables=None, retain_graph=None, create_graph=False, inputs=None)

Computes the sum of gradients of given variables with respect to graph leaves.

The graph is differentiated using the chain rule. If any of variables are non-scalar (i.e. their data has more than one element) and require gradient, then the Jacobian-vector product would be computed, in this case the function additionally requires specifying grad_variables. It should be a sequence of matching length, that contains the "vector" in the Jacobian-vector product, usually the gradient of the differentiated function w.r.t. corresponding variables (None is an acceptable value for all variables that don't need gradient variables).

This function accumulates gradients in the leaf variables; each call to backward appends new gradient values to the grad list. Clear existing gradients before calling it again if accumulation is not desired.

Note

Using this method with create_graph=True will create a reference cycle between the parameter and its gradient which can cause a memory leak. We recommend using autodiff.grad when creating the graph to avoid this. If you have to use this function, make sure to reset the grad fields of your parameters to None after use to break the cycle and avoid the leak.

Note

When inputs are provided, each input must be a leaf variable. If any input is not a leaf, a RuntimeError is raised.

Parameters:

Name Type Description Default
variables Union[Variable, Sequence[Variable]]

Variables of which the derivative will be computed.

required
grad_variables Union[Variable, Sequence[Variable]] | None

The "vector" in the Jacobian-vector product, usually gradients w.r.t. each element of corresponding variables. None values can be specified for scalar Variables or ones that don't require grad. If a None value would be acceptable for all grad_variables, then this argument is optional.

None
retain_graph bool | None

If False, the graph used to compute the grads will be freed. Setting this to True retains the graph, allowing for additional backward calls on the same graph, useful for example for multi-task learning where you have multiple losses. However, retaining the graph is not needed in nearly all cases and can be worked around in a much more efficient way. Defaults to the value of create_graph.

None
create_graph bool

If True, graph of the derivative will be constructed, allowing to compute higher order derivative products.

False
inputs Union[Variable, Sequence[Variable], GradientEdge, Sequence[GradientEdge]] | None

Inputs w.r.t. which the gradient will be accumulated into grad. All other Variables will be ignored. If not provided, the gradient is accumulated into all the leaf Variables that were used to compute the variables attribute.

None

Raises:

Type Description
RuntimeError

If any element of variables does not require grad and does not have a grad_fn, or if inputs is provided but is empty.

TypeError

If any element of variables or grad_variables is not a Variable or a sequence of Variables.

Source code in afnio/autodiff/__init__.py
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
def backward(
    variables: _VariableOrVariables,
    grad_variables: Optional[_VariableOrVariables] = None,
    retain_graph: Optional[bool] = None,
    create_graph: bool = False,
    inputs: Optional[_VariableOrVariablesOrGradEdge] = None,
) -> None:
    """
    Computes the sum of gradients of given variables with respect to graph
    leaves.

    The graph is differentiated using the chain rule. If any of `variables`
    are non-scalar (i.e. their data has more than one element) and require
    gradient, then the Jacobian-vector product would be computed, in this
    case the function additionally requires specifying `grad_variables`.
    It should be a sequence of matching length, that contains the "vector"
    in the Jacobian-vector product, usually the gradient of the differentiated
    function w.r.t. corresponding variables (`None` is an acceptable value for
    all variables that don't need gradient variables).

    This function accumulates gradients in the leaf variables; each call to
    `backward` appends new gradient values to the [`grad`][...Variable.grad] list.
    Clear existing gradients before calling it again if accumulation is not desired.

    Note:
        Using this method with `create_graph=True` will create a reference cycle
        between the parameter and its gradient which can cause a memory leak.
        We recommend using `autodiff.grad` when creating the graph to avoid this.
        If you have to use this function, make sure to reset the
        [`grad`][...Variable.grad] fields of your parameters to `None` after use
        to break the cycle and avoid the leak.

    Note:
        When `inputs` are provided, each input must be a leaf variable. If any
        input is not a leaf, a `RuntimeError` is raised.

    Args:
        variables (Union[Variable, Sequence[Variable]]): Variables of which
            the derivative will be computed.
        grad_variables (Optional[Union[Variable, Sequence[Variable]]]): The "vector"
            in the Jacobian-vector product, usually gradients w.r.t. each element of
            corresponding variables. None values can be specified for scalar Variables
            or ones that don't require grad. If a None value would be acceptable for all
            grad_variables, then this argument is optional.
        retain_graph: If `False`, the graph used to compute the grads will be freed.
            Setting this to `True` retains the graph, allowing for additional backward
            calls on the same graph, useful for example for multi-task learning where
            you have multiple losses. However, retaining the graph is not needed in
            nearly all cases and can be worked around in a much more efficient way.
            Defaults to the value of `create_graph`.
        create_graph: If `True`, graph of the derivative will be constructed, allowing
            to compute higher order derivative products.
        inputs (Optional[Union[Variable, Sequence[Variable], GradientEdge, Sequence[GradientEdge],]]):
            Inputs w.r.t. which the gradient will be accumulated into
            [`grad`][...Variable.grad]. All other Variables will be ignored. If not
            provided, the gradient is accumulated into all the leaf Variables that were
            used to compute the `variables` attribute.

    Raises:
        RuntimeError: If any element of `variables` does not require grad
            and does not have a `grad_fn`, or if `inputs` is provided but is empty.
        TypeError: If any element of `variables` or `grad_variables` is not a
            [`Variable`][...Variable] or a sequence of [`Variable`][...Variable]s.
    """  # noqa: E501

    # Serialize the arguments
    serialized_variables = _serialize_arg(variables)
    serialized_grad_variables = _serialize_arg(grad_variables)
    serialized_retain_graph = _serialize_arg(retain_graph)
    serialized_create_graph = _serialize_arg(create_graph)
    serialized_inputs = _serialize_arg(inputs)

    # Send the RPC call to the server
    backprop_variable_ids = []
    try:
        # Get the singleton websocket client
        _, ws_client = get_default_clients()

        # Fetch all Variables which gradients will be computed during backpropagation
        # and mark them as pending for grad update
        payload = {"variables": serialized_variables}
        response_ids = run_in_background_loop(
            ws_client.call("get_backprop_ids", payload)
        )
        if "error" in response_ids:
            raise RuntimeError(
                response_ids["error"]["data"].get("exception", response_ids["error"])
            )

        logger.debug(
            f"Fetched backpropagation variable IDs from the server: {response_ids!r}"
        )

        result_ids = response_ids.get("result", {})
        backprop_variable_ids = result_ids.get("variable_ids", [])
        if backprop_variable_ids:
            for var_id in backprop_variable_ids:
                var = get_variable(var_id)
                if var is not None:
                    var._pending_grad = True
                    logger.debug(
                        f"Marked variable {var_id!r} as pending for grad update."
                    )
                else:
                    logger.warning(
                        f"Variable id {var_id!r} returned for backward, "
                        "but not found in VARIABLE_REGISTRY."
                    )

        # Run backward pass
        payload = {
            "variables": serialized_variables,
            "grad_variables": serialized_grad_variables,
            "retain_graph": serialized_retain_graph,
            "create_graph": serialized_create_graph,
            "inputs": serialized_inputs,
        }
        response_bwd = run_in_background_loop(ws_client.call("run_backward", payload))
        if "error" in response_bwd:
            raise RuntimeError(
                response_bwd["error"]["data"].get("exception", response_bwd["error"])
            )

        logger.debug(
            f"Backward pass instantiated and shared with the server: {variables!r}"
        )

        result_message = response_bwd.get("result", {}).get("message")
        if result_message != "Backward pass executed successfully.":
            raise RuntimeError(
                f"Server did not return any data for backward pass: "
                f"payload={payload!r}, response={response_bwd!r}"
            )

        logger.debug(
            f"Backward pass executed successfully with variables: {variables!r}"
        )

    except Exception as e:
        logger.error(f"Failed to share backward pass with the server: {e}")

        # Clear all pending grad flags to avoid deadlocks
        for var_id in backprop_variable_ids:
            var = get_variable(var_id)
            if var is not None:
                var._pending_grad = False
                logger.debug(
                    f"Marked variable {var_id!r} as not pending for grad update "
                    f"after error."
                )

        raise