Save, Load and Use Agent#

Warning

Before running any code, ensure you are logged in to the Afnio backend (afnio login). See Logging in to Afnio Backend for details.

Afnio agents (subclasses of cog.Module) expose state_dict() / load_state_dict(...) to persist trained parameters and minimal metadata; saving and restoring agent state enables reproducible evaluation, resuming training, safe deployment, and sharing of model parameters without serializing full objects.


Prerequisite Code#

Create an LM client and instantiate the example agent below.

import os

import afnio
import afnio.utils.agents as agents
from afnio.models.openai import AsyncOpenAI

os.environ["OPENAI_API_KEY"] = "sk-..."  # Replace with your actual key

fwd_model = AsyncOpenAI()
agent = agents.SentimentAnalyzer()

response = agent(
    fwd_model,
    inputs={"message": "I've been a satisfied client of ProCare for a year"},
    model="gpt-4.1-nano",
    temperature=0.0,
)
print(response.data)

Output:

{"sentiment":"positive"}

Saving and Loading Agent Parameters#

Afnio agents store the learned parameters in an internal state dictionary, called state_dict. These can be persisted via the afnio.save method:

path = "sentiment_analyzer.hf"
afnio.save(agent.state_dict(), path)

print(f"Saved agent state: {path} (exists={os.path.exists(path)})")

Output:

Saved agent state: sentiment_analyzer.hf (exists=True)

To load agent parameters, you need to create an instance of the same agent first, and then load the parameters using load_state_dict() method.

new_agent = agents.SentimentAnalyzer()
new_agent.load_state_dict(
    afnio.load(path),
    model_clients={"sentiment_classifier.forward_model_client": AsyncOpenAI()},
)
new_agent.eval()

print(new_agent)

Output:

SentimentAnalyzer(
  (sentiment_classifier): ChatCompletion()
)

Note

Be sure to call new_agent.eval() method before inferencing to set the relevant layers to evaluation mode. Failing to do this might yield inconsistent inference results.


Saving Checkpoints#

If you are writing your own logic to store checkpoints instead of using the Trainer, the recommended pattern is to save a lightweight checkpoint dictionary containing the agent’s state dict, and any metadata you need (epoch, optimizer state, validation metrics).

Save only serializable pieces (agent state, optimizer state, metadata):

# Create forward LM client and example agent
fwd_model = AsyncOpenAI()
agent = agents.SentimentAnalyzer()

# Define optimizer (only to show it can be included in the checkpoint)
# and run a single step so optimizer.state_dict() is populated
optimizer = afnio.optim.TGD(
    agent.parameters(),
    model_client=AsyncOpenAI(),
    momentum=3,
    model="gpt-5",
    temperature=1.0,
    max_completion_tokens=32000,
    reasoning_effort="low",
)
optimizer.step()

# Compose includes agent state and optimizer state for resuming training
checkpoint = {
    "epoch": 2,
    "batch": 3,
    "agent_state_dict": agent.state_dict(keep_vars=True),
    "optimizer_state_dict": optimizer.state_dict(),
    "val_accuracy": 0.92,
}
afnio.save(checkpoint, "checkpoints/checkpoint_epoch2.hf")

Notes:

  • Use .hf (or any extension) — Afnio uses a zipped pickle format via afnio.save / afnio.load.

  • Keep checkpoints small by saving state_dict() instead of the full agent object.

Note

The .hf extension is a naming convention inspired by the chemical symbol for Hafnium (Hf). “Afnio” is the Italian word for Hafnium.


Saving/Loading to a Buffer#

You can save to and load from an in-memory buffer — useful for unit tests, CI, IPC/network transfer, or avoiding filesystem I/O when moving checkpoints.

import io

buf = io.BytesIO()
afnio.save(checkpoint, buf)
buf.seek(0)
ck = afnio.load(buf)

Troubleshooting#

  • Missing model clients: load_state_dict will raise if required model clients are not provided. Pass a matching model_clients mapping.

  • To resume training reliably, save and later restore the optimizer state (momentum / per-parameter buffers), any forward/backward/optimizer LM client bindings, and training counters (epoch, global step/batch).


Further Reading#