- Text extraction: Process complex document formats (PDF, CSV, etc.).
- Text splitting: Create meaningful chunks out of long pages of text.
- Embedding: Vectorize chunks (extract semantic meaning).
- Store: Insert chunks to a specialized database.
- Embedding: Vectorize the user query.
- Search: Retrieve the document chunks most similar to the user query.
Steps
Start Agent Stack with Docling enabled!!
Use
agentstack platform start --set docling.enabled=true to start Agent Stack with Docling.Enable file uploads in your agent
Add the
default_input_modes parameter to your agent decorator to allow users to upload files to your agent. This also specifies which file types users can upload.Import the Platform API and Embedding Service extensions
Import
PlatformApiExtensionServer, PlatformApiExtensionSpec,
EmbeddingServiceExtensionServer and EmbeddingServiceExtensionSpec from agentstack_sdk.a2a.extensions.Implement document processing functions
Implement functions to handle text extraction, text splitting, embedding generation, and vector storing.
Implement query functions
Implement functions to generate embeddings for the user query and search the vector store for similar document chunks.
Put it all together
Put it all together in an agent that 1) processes uploaded documents and 2) answers questions about them.
Use an LLM to help form a response
The examples here skip this part and return details from the document.
In practice, you’ll want to use an LLM to help form a response about the selected document chunks
instead of returning the actual chunk text (i.e., implement an assistant and not just a search tool).
Building blocks
Enable File Uploads
Add thedefault_input_modes parameter to your agent decorator to allow users to upload files to your agent.
This also specifies which file types users can upload.
Agent Stack uses docling for extracting text out of documents in various
supported formats.
Platform and Embedding Extensions
Make sure you have the Platform API and Embedding Service extensions imported and injected in your agent parameters:Text Extraction
To extract text from aFile uploaded to the Platform API, simply use file.create_extraction() and wait for
the result. After extraction is completed, the extraction object will contain
extracted_files, which is a list of extracted files in different formats.
Extraction Formats
Text extraction produces two extraction formats and you can request either subset by passingformats to create_extraction (e.g., ["markdown"] if you only need plain text):
- markdown: The extracted text formatted as Markdown (
file.load_text_content()) - vendor_specific_json: The Docling-specific JSON format containing document structure (
file.load_json_content())
WARNING:
The vendor_specific_json format is not generated for plain text or markdown files, as Docling does not support these formats as input.
Text Splitting
In this example we will useMarkdownTextSplitter from the
langchain-text-splitters package.
This will split a long document into reasonably sized chunks based on the Markdown header structure.
Embedding
Now we need to generate embeddings for each chunk using the embedding service. Similarly to LLM, Agent Stack implements OpenAI-compatible embedding API. You can use any preferred client, in this example we will use the embedding extension to create anAsyncOpenAI client:
Store
Finally, to insert the prepared items, we need a function to create a vector store. For this we will need to know the dimension of the embeddings and model_id. Because the model is chosen by the embedding extension and we don’t know it in advance, we will create a test embedding request to calculate the dimension:vector_store.add_documents, this will become clear in the final example.
Query vector store
Assuming we have our knowledge base of documents prepared, we can now easily search the store according to the user query. The following function will retrieve five document chunks most similar to the query embedding:Putting all together
Having all the pieces in place, we can now build the agent.Simple agent
This is a simplified agent that expects a message with one or more files attached asFilePart and a
user query as TextPart. A new vector store is created for each message.