# Save, Load and Use Agent ```{warning} Before running any code, ensure you are logged in to the Afnio backend (`afnio login`). See [Logging in to Afnio Backend](login) for details. ``` Afnio agents (subclasses of `cog.Module`) expose `state_dict()` / `load_state_dict(...)` to persist trained parameters and minimal metadata; saving and restoring agent state enables reproducible evaluation, resuming training, safe deployment, and sharing of model parameters without serializing full objects. --- ## Prerequisite Code Create an LM client and instantiate the example agent below. ```python import os import afnio import afnio.utils.agents as agents from afnio.models.openai import AsyncOpenAI os.environ["OPENAI_API_KEY"] = "sk-..." # Replace with your actual key fwd_model = AsyncOpenAI() agent = agents.SentimentAnalyzer() response = agent( fwd_model, inputs={"message": "I've been a satisfied client of ProCare for a year"}, model="gpt-4.1-nano", temperature=0.0, ) print(response.data) ``` _Output:_ ```output {"sentiment":"positive"} ``` --- ## Saving and Loading Agent Parameters Afnio agents store the learned parameters in an internal state dictionary, called `state_dict`. These can be persisted via the `afnio.save` method: ```python path = "sentiment_analyzer.hf" afnio.save(agent.state_dict(), path) print(f"Saved agent state: {path} (exists={os.path.exists(path)})") ``` _Output:_ ```output Saved agent state: sentiment_analyzer.hf (exists=True) ``` To load agent parameters, you need to create an instance of the same agent first, and then load the parameters using `load_state_dict()` method. ```python new_agent = agents.SentimentAnalyzer() new_agent.load_state_dict( afnio.load(path), model_clients={"sentiment_classifier.forward_model_client": AsyncOpenAI()}, ) new_agent.eval() print(new_agent) ``` _Output:_ ```output SentimentAnalyzer( (sentiment_classifier): ChatCompletion() ) ``` ```{note} Be sure to call `new_agent.eval()` method before inferencing to set the relevant layers to evaluation mode. Failing to do this might yield inconsistent inference results. ``` --- ## Saving Checkpoints If you are writing your own logic to store checkpoints instead of using the [`Trainer`](trainer.md), the recommended pattern is to save a lightweight checkpoint dictionary containing the agent's state dict, and any metadata you need (epoch, optimizer state, validation metrics). Save only serializable pieces (agent state, optimizer state, metadata): ```python # Create forward LM client and example agent fwd_model = AsyncOpenAI() agent = agents.SentimentAnalyzer() # Define optimizer (only to show it can be included in the checkpoint) # and run a single step so optimizer.state_dict() is populated optimizer = afnio.optim.TGD( agent.parameters(), model_client=AsyncOpenAI(), momentum=3, model="gpt-5", temperature=1.0, max_completion_tokens=32000, reasoning_effort="low", ) optimizer.step() # Compose includes agent state and optimizer state for resuming training checkpoint = { "epoch": 2, "batch": 3, "agent_state_dict": agent.state_dict(keep_vars=True), "optimizer_state_dict": optimizer.state_dict(), "val_accuracy": 0.92, } afnio.save(checkpoint, "checkpoints/checkpoint_epoch2.hf") ``` Notes: - Use `.hf` (or any extension) — Afnio uses a zipped pickle format via `afnio.save` / `afnio.load`. - Keep checkpoints small by saving `state_dict()` instead of the full agent object. ```{note} The `.hf` extension is a naming convention inspired by the chemical symbol for [Hafnium (Hf)](https://en.wikipedia.org/wiki/Hafnium). "Afnio" is the Italian word for Hafnium. ``` --- ## Saving/Loading to a Buffer You can save to and load from an in-memory buffer — useful for unit tests, CI, IPC/network transfer, or avoiding filesystem I/O when moving checkpoints. ```python import io buf = io.BytesIO() afnio.save(checkpoint, buf) buf.seek(0) ck = afnio.load(buf) ``` --- ## Troubleshooting - Missing model clients: `load_state_dict` will raise if required model clients are not provided. Pass a matching `model_clients` mapping. - To resume training reliably, save and later restore the optimizer state (momentum / per-parameter buffers), any forward/backward/optimizer LM client bindings, and training counters (epoch, global step/batch). --- ## Further Reading - [Trainer](trainer) - [Runs and Experiments](runs_and_experiments)