RAG Agents
customized

A private and limited knowledge base

Large Language Models (LLMs) have broad knowledge, but not always specialized expertise. How can we adapt them to become experts in a private, well-defined knowledge domain, such as the content of a specific document?

The answer lies in the synergy of two key technologies: AI Agents and the RAG (Retrieval-Augmented Generation) pattern.

In this article, we’ll explore both the ingestion phase and the usage phase of an AI Agent, using Mastra AI (TypeScript). We’ll leverage this increasingly popular framework to read a PDF file and, via a database, train an Agent specifically for targeted searches.

Why Mastra AI (TypeScript)?

The AI landscape now offers solutions not only in Python but also in other languages. Among these, JavaScript and TypeScript are emerging as strong alternatives. Mastra AI stands out for several reasons:

Tools and MCP: It allows you to define and use both function tools and MCP, which is a major advantage when integrating solutions into real-world scenarios.
AI Provider Simplicity: Built around Vercel’s AI SDK, which is becoming an indispensable toolkit. It supports many officially maintained and community-driven providers (Ollama, Anthropic, Gemini, OpenAI, Mistral, and more).
RAG and Memory: Out-of-the-box support for both long-term and short-term memory. Supported embedding databases include PostgreSQL, MongoDB, LiteSQL, LanceDB, Chroma, Qdrant, Astra, Pinecone, and others.
Agents and Beyond: In addition to easily creating advanced agents, Mastra AI offers powerful features such as true evaluation tests for agent outputs (and not simple if/else checks) and the ability to design complete workflows.

Of course, adopting a framework that covers so many areas requires careful, project-by-project evaluation. Still, it’s clear that this is a winning solution in many scenarios.

Key Concepts: RAG and Agents

Before diving into implementation, it’s important to clearly understand the two pillars of this article. For both, we’ll use Mastra AI exclusively.

Retrieval-Augmented Generation (RAG)

RAG is an architectural approach that enhances LLMs by anchoring them to external data sources. Instead of relying solely on pre-trained knowledge, the model retrieves the most relevant information from a data source (f.e. our PDF) and uses it as context to generate accurate, contextualized responses. This process reduces the risk of “hallucinations” (fabricated answers) and ensures that outputs remain faithful to the source material.

AI Agents

An AI Agent is not just a model that responds to input—it’s a more complex, autonomous, and goal-driven system. An agent receives an objective, reasons about how to achieve it, and has access to a set of tools it can choose to use to perform actions. In our case, the primary tool will be the ability to search for information within a knowledge base (a database).

Practical Example: Understanding a PDF Book

Combining an AI Agent with a RAG mechanism creates a powerful system capable of engaging in intelligent, informed conversations on specific topics.

For this example, we’ll use a single book, but the system can easily be adapted to work with multiple folders, databases, or other connected data sources.

Steps in this example

The user sends a message via the method: `await agent.generateVNext(query)`;
The message is processed by the agent (in this case, `gemini-2.5-flash`).
The agent interprets and understands the request using the predefined tool (VectorQueryTool);
A response is returned only based on the relevant and retrieved context.

Below is the class that implements the agent:

import { google } from '@ai-sdk/google'
import { Agent } from '@mastra/core/agent'
import { Memory } from '@mastra/memory'
import { LibSQLStore } from '@mastra/libsql'
import { createVectorQueryTool } from '@mastra/rag'

const model = google.textEmbeddingModel('gemini-embedding-001')

// Create a tool for semantic search over embeddings
const vectorQueryTool = createVectorQueryTool({
    vectorStoreName: 'libSqlVector',
    indexName: 'books',
    model: model
})

export const researchAgent = new Agent({
    name: 'Research Assistant',
    instructions: `You are a helpful research assistant ...`,
    model: google('gemini-2.5-flash'),
    tools: {
        vectorQueryTool
      },
    memory: new Memory({
    storage: new LibSQLStore({
        url: 'file:./database/mastra.db'
    })
  })
})

As you can see, one of its strengths lies in the ease of switching between and configuring different AI providers. In this example, we’re using Gemini, but it’s also possible to connect to others, including federated and local models, thanks to Ollama.

Below is an example of responses generated by the agent when querying the book Il fu Mattia Pascal by Luigi Pirandello. Notice that, for the final question, the agent was unable to provide an answer, even though it’s a well-known work by Pirandello, because it was different from the one ingested during the ingestion phase.

demo.ts

Output src/demo.ts

Ingestion: reading the book

The responses shown in the previous console output are possible only because the VectorQueryTool can access the database and perform similarity searches, retrieving the most relevant chunks and using them to construct a coherent answer.

To make the book “understandable” — and therefore transform it into a queryable knowledge base — the first step is to extract its text. This can be done using dedicated libraries or by leveraging AI-powered services such as Mistral OCR.

Once the text is extracted, each chunk (fragment of text) is converted into an embedding (a multidimensional vector representation). These embeddings allow the content to be indexed and, more importantly, enable semantic similarity search to find the closest matches to a given query.

And yes, this might sound complex but - fortunately - many frameworks and libraries now offer excellent, ready-to-use implementations. Here’s an example:

import { MDocument } from '@mastra/rag'
import { google } from '@ai-sdk/google'
import { embedMany } from 'ai'
import { mastra } from './mastra/index.ts'
import { extractTextFromPath } from './mastra/utils.ts'

let { extractedText: paperText } = await extractTextFromPath('./inputs/libro.pdf')
paperText = paperText.slice(0, 10000) // Limit to first 10k characters


// Create document and chunk it
const doc = MDocument.fromText(paperText)
const chunks = await doc.chunk({
    strategy: 'recursive',
    maxSize: 512,
    overlap: 50,
    separators: ['\n\n', '\n', ' ']
})


const model = google.textEmbeddingModel('gemini-embedding-001')


// Generate embeddings
const { embeddings } = await embedMany({
    model: model,
    values: chunks.map((chunk) => chunk.text),
    providerOptions: {
    google: {
        taskType: 'QUESTION_ANSWERING'
    }
  }
})


// Get the vector store instance from Mastra
const vectorStore = mastra.getVector('libSqlVector')


// Create an index for paper chunks
// 1536 for OpenAI text-embedding-3-small, 768 for google text-embedding-001, 3072 for gemini-embedding-001
await vectorStore.createIndex({
    indexName: 'books',
    dimension: 3072
})


// Store embeddings
await vectorStore.upsert({
    indexName: 'books',
    vectors: embeddings,
    metadata: chunks.map((chunk) => ({
        text: chunk.text,
        source: 'transformer-book'
    }))
})

It may sound strange but apart from a small utilities file and the Mastra instance, no additional code is required.

Conclusion

The choice of LibSQL as the database was made for simplicity, it creates a single db file locally containing all the data. But, probably, in real-world, production-ready scenarios, you will most likely need other types of databases. In that case, we recommend looking into PostgreSQL with its pgVector extension.

Is Mastra suitable for every case?

While Mastra offers many strengths, like a interesting ecosystem and strong interoperability with Vercel’s AI SDK, our advice is to definitely give it a try (you won’t regret it) but avoid adopting it blindly in 100% of situations without proper evaluation.

If you need guidance during this stage, we’re here to help.

For more details, see the following resources:

Condividi l'articolo

Tag: Tecnologia, AI

Data di pubblicazione: 26 agosto 2025

Ultima revisione: 26 agosto 2025

RAG Agentscustomized