Save, Load and Use Agent#
Warning
Before running any code, ensure you are logged in to the Afnio backend (afnio login). See Logging in to Afnio Backend for details.
Afnio agents (subclasses of cog.Module) expose state_dict() / load_state_dict(...) to persist trained parameters and minimal metadata; saving and restoring agent state enables reproducible evaluation, resuming training, safe deployment, and sharing of model parameters without serializing full objects.
Prerequisite Code#
Create an LM client and instantiate the example agent below.
import os
import afnio
import afnio.utils.agents as agents
from afnio.models.openai import AsyncOpenAI
os.environ["OPENAI_API_KEY"] = "sk-..." # Replace with your actual key
fwd_model = AsyncOpenAI()
agent = agents.SentimentAnalyzer()
response = agent(
fwd_model,
inputs={"message": "I've been a satisfied client of ProCare for a year"},
model="gpt-4.1-nano",
temperature=0.0,
)
print(response.data)
Output:
{"sentiment":"positive"}
Saving and Loading Agent Parameters#
Afnio agents store the learned parameters in an internal state dictionary, called state_dict. These can be persisted via the afnio.save method:
path = "sentiment_analyzer.hf"
afnio.save(agent.state_dict(), path)
print(f"Saved agent state: {path} (exists={os.path.exists(path)})")
Output:
Saved agent state: sentiment_analyzer.hf (exists=True)
To load agent parameters, you need to create an instance of the same agent first, and then load the parameters using load_state_dict() method.
new_agent = agents.SentimentAnalyzer()
new_agent.load_state_dict(
afnio.load(path),
model_clients={"sentiment_classifier.forward_model_client": AsyncOpenAI()},
)
new_agent.eval()
print(new_agent)
Output:
SentimentAnalyzer(
(sentiment_classifier): ChatCompletion()
)
Note
Be sure to call new_agent.eval() method before inferencing to set the relevant layers to evaluation mode. Failing to do this might yield inconsistent inference results.
Saving Checkpoints#
If you are writing your own logic to store checkpoints instead of using the Trainer, the recommended pattern is to save a lightweight checkpoint dictionary containing the agent’s state dict, and any metadata you need (epoch, optimizer state, validation metrics).
Save only serializable pieces (agent state, optimizer state, metadata):
# Create forward LM client and example agent
fwd_model = AsyncOpenAI()
agent = agents.SentimentAnalyzer()
# Define optimizer (only to show it can be included in the checkpoint)
# and run a single step so optimizer.state_dict() is populated
optimizer = afnio.optim.TGD(
agent.parameters(),
model_client=AsyncOpenAI(),
momentum=3,
model="gpt-5",
temperature=1.0,
max_completion_tokens=32000,
reasoning_effort="low",
)
optimizer.step()
# Compose includes agent state and optimizer state for resuming training
checkpoint = {
"epoch": 2,
"batch": 3,
"agent_state_dict": agent.state_dict(keep_vars=True),
"optimizer_state_dict": optimizer.state_dict(),
"val_accuracy": 0.92,
}
afnio.save(checkpoint, "checkpoints/checkpoint_epoch2.hf")
Notes:
Use
.hf(or any extension) — Afnio uses a zipped pickle format viaafnio.save/afnio.load.Keep checkpoints small by saving
state_dict()instead of the full agent object.
Note
The .hf extension is a naming convention inspired by the chemical symbol for Hafnium (Hf). “Afnio” is the Italian word for Hafnium.
Saving/Loading to a Buffer#
You can save to and load from an in-memory buffer — useful for unit tests, CI, IPC/network transfer, or avoiding filesystem I/O when moving checkpoints.
import io
buf = io.BytesIO()
afnio.save(checkpoint, buf)
buf.seek(0)
ck = afnio.load(buf)
Troubleshooting#
Missing model clients:
load_state_dictwill raise if required model clients are not provided. Pass a matchingmodel_clientsmapping.To resume training reliably, save and later restore the optimizer state (momentum / per-parameter buffers), any forward/backward/optimizer LM client bindings, and training counters (epoch, global step/batch).