Leverage Agent Stack’s model and provider agnostic LLM inference
When building AI agents, one of the first requirements is connecting to a Large Language Model (LLM). Agent Stack provides built-in, OpenAI-compatible LLM inference that is model and provider agnostic.In order to effectively implement the LLM Proxy Service there are 3 steps to follow:
1
Add the LLM service extension to your agent
Import the necessary components and add the LLM service extension to your agent function.
2
Configure your LLM request
Specify which model your agent prefers and how you want to access it.
3
Use the LLM in your agent
Access the optionally provided LLM configuration and use it with your preferred LLM client.
Service Extensions are a type of A2A Extension that allows you to easily “inject dependencies” into your agent. This follows the inversion of control principle where your agent defines what it needs, and the platform (in this case, Agent Stack) is responsible for providing those dependencies.
Service extensions are optional by definition, so you should always check if they exist before using them.
Import the LLMServiceExtensionServer and LLMServiceExtensionSpec from the SDK. You will use these within a type hint to let the platform know your agent requires LLM access.
Copy
Ask AI
from agentstack_sdk.a2a.extensions import LLMServiceExtensionServer, LLMServiceExtensionSpecfrom a2a.types import Messagefrom typing import Annotated# The extension is added as an Annotated parameter in your agent functionasync def my_agent( input: Message, llm: Annotated[ LLMServiceExtensionServer, ...] ): # agent logic pass
Use LLMServiceExtensionSpec.single_demand() to request a model. By passing a suggested tuple, you tell the platform which model you’d prefer to use.
Copy
Ask AI
from agentstack_sdk.a2a.extensions import LLMServiceExtensionServer, LLMServiceExtensionSpecfrom a2a.types import Messagefrom typing import Annotated# The llm parameter is configured with a specific model demandasync def my_agent( input: Message, llm: Annotated[ LLMServiceExtensionServer, LLMServiceExtensionSpec.single_demand(suggested=("ibm/granite-3-3-8b-instruct",)) ]): # agent logic pass
When you specify a suggested model like "ibm/granite-3-3-8b-instruct"the platform:
Checks if the requested model is available in your configured environment
Allocates the best available model that matches your requirements
Provides you with the exact model identifier and endpoint details
The platform handles the complexity of model provisioning and endpoint management, so you can focus on building your agent logic.