RAG Agents
customized

RAG Agents
customized
Large Language Models (LLMs) have broad knowledge, but not always specialized expertise. How can we adapt them to become experts in a private, well-defined knowledge domain, such as the content of a specific document?
The answer lies in the synergy of two key technologies: AI Agents and the RAG (Retrieval-Augmented Generation) pattern.
In this article, we’ll explore both the ingestion phase and the usage phase of an AI Agent, using Mastra AI (TypeScript). We’ll leverage this increasingly popular framework to read a PDF file and, via a database, train an Agent specifically for targeted searches.
The AI landscape now offers solutions not only in Python but also in other languages. Among these, JavaScript and TypeScript are emerging as strong alternatives. Mastra AI stands out for several reasons:
Of course, adopting a framework that covers so many areas requires careful, project-by-project evaluation. Still, it’s clear that this is a winning solution in many scenarios.
Before diving into implementation, it’s important to clearly understand the two pillars of this article. For both, we’ll use Mastra AI exclusively.
RAG is an architectural approach that enhances LLMs by anchoring them to external data sources. Instead of relying solely on pre-trained knowledge, the model retrieves the most relevant information from a data source (f.e. our PDF) and uses it as context to generate accurate, contextualized responses. This process reduces the risk of “hallucinations” (fabricated answers) and ensures that outputs remain faithful to the source material.
An AI Agent is not just a model that responds to input—it’s a more complex, autonomous, and goal-driven system. An agent receives an objective, reasons about how to achieve it, and has access to a set of tools it can choose to use to perform actions. In our case, the primary tool will be the ability to search for information within a knowledge base (a database).
Combining an AI Agent with a RAG mechanism creates a powerful system capable of engaging in intelligent, informed conversations on specific topics.
For this example, we’ll use a single book, but the system can easily be adapted to work with multiple folders, databases, or other connected data sources.
Below is the class that implements the agent:
import { google } from '@ai-sdk/google'
import { Agent } from '@mastra/core/agent'
import { Memory } from '@mastra/memory'
import { LibSQLStore } from '@mastra/libsql'
import { createVectorQueryTool } from '@mastra/rag'
const model = google.textEmbeddingModel('gemini-embedding-001')
// Create a tool for semantic search over embeddings
const vectorQueryTool = createVectorQueryTool({
vectorStoreName: 'libSqlVector',
indexName: 'books',
model: model
})
export const researchAgent = new Agent({
name: 'Research Assistant',
instructions: `You are a helpful research assistant ...`,
model: google('gemini-2.5-flash'),
tools: {
vectorQueryTool
},
memory: new Memory({
storage: new LibSQLStore({
url: 'file:./database/mastra.db'
})
})
})
As you can see, one of its strengths lies in the ease of switching between and configuring different AI providers. In this example, we’re using Gemini, but it’s also possible to connect to others, including federated and local models, thanks to Ollama.
Below is an example of responses generated by the agent when querying the book Il fu Mattia Pascal by Luigi Pirandello. Notice that, for the final question, the agent was unable to provide an answer, even though it’s a well-known work by Pirandello, because it was different from the one ingested during the ingestion phase.
Output src/demo.ts
The responses shown in the previous console output are possible only because the VectorQueryTool can access the database and perform similarity searches, retrieving the most relevant chunks and using them to construct a coherent answer.
To make the book “understandable” — and therefore transform it into a queryable knowledge base — the first step is to extract its text. This can be done using dedicated libraries or by leveraging AI-powered services such as Mistral OCR.
Once the text is extracted, each chunk (fragment of text) is converted into an embedding (a multidimensional vector representation). These embeddings allow the content to be indexed and, more importantly, enable semantic similarity search to find the closest matches to a given query.
And yes, this might sound complex but - fortunately - many frameworks and libraries now offer excellent, ready-to-use implementations. Here’s an example:
import { MDocument } from '@mastra/rag'
import { google } from '@ai-sdk/google'
import { embedMany } from 'ai'
import { mastra } from './mastra/index.ts'
import { extractTextFromPath } from './mastra/utils.ts'
let { extractedText: paperText } = await extractTextFromPath('./inputs/libro.pdf')
paperText = paperText.slice(0, 10000) // Limit to first 10k characters
// Create document and chunk it
const doc = MDocument.fromText(paperText)
const chunks = await doc.chunk({
strategy: 'recursive',
maxSize: 512,
overlap: 50,
separators: ['\n\n', '\n', ' ']
})
const model = google.textEmbeddingModel('gemini-embedding-001')
// Generate embeddings
const { embeddings } = await embedMany({
model: model,
values: chunks.map((chunk) => chunk.text),
providerOptions: {
google: {
taskType: 'QUESTION_ANSWERING'
}
}
})
// Get the vector store instance from Mastra
const vectorStore = mastra.getVector('libSqlVector')
// Create an index for paper chunks
// 1536 for OpenAI text-embedding-3-small, 768 for google text-embedding-001, 3072 for gemini-embedding-001
await vectorStore.createIndex({
indexName: 'books',
dimension: 3072
})
// Store embeddings
await vectorStore.upsert({
indexName: 'books',
vectors: embeddings,
metadata: chunks.map((chunk) => ({
text: chunk.text,
source: 'transformer-book'
}))
})
It may sound strange but apart from a small utilities file and the Mastra instance, no additional code is required.
The choice of LibSQL as the database was made for simplicity, it creates a single db file locally containing all the data. But, probably, in real-world, production-ready scenarios, you will most likely need other types of databases. In that case, we recommend looking into PostgreSQL with its pgVector extension.
While Mastra offers many strengths, like a interesting ecosystem and strong interoperability with Vercel’s AI SDK, our advice is to definitely give it a try (you won’t regret it) but avoid adopting it blindly in 100% of situations without proper evaluation.
If you need guidance during this stage, we’re here to help.
For more details, see the following resources:
Data di pubblicazione: 26 agosto 2025
Ultima revisione: 26 agosto 2025