Composite Agents:
Governance and Reliability

Composite Agents: Real Governance and AI Workflow

Beyond the hype: need for control

Now at the end of 2025, the conversation around Artificial Intelligence in companies and the Public Administration has fully matured. In practice, the generative ability of AI is by now well-established ("Wow, it writes a poem!"). Today, we are in the phase of engineering and responsibility—the phase that produces (or should produce) value.

Relying on a single model (LLM) to manage critical processes is a risk to be avoided when it comes to production and reliability. A monolithic LLM can cause issues and, above all, is opaque. In contexts regulated by the AI Act, GDPR, and/or strict company policies, the "black box" is not advisable at all.

The solution is transitioning to Composite Agent systems: architectures where multiple specialized intelligences (models or AI components) work together, orchestrated by deterministic rules written in code and not (only) in natural language.

The "responsibility" architecture (using Mastra)

To show that governance is not just theory, we've released a public technical demo based on Mastra AI, a highly regarded framework for its ability to combine the flexibility of LLMs with the rigor of TypeScript.

The core concept is simple: divide tasks to maintain control. In the example (a document validation workflow for the Public Administration), there's no single "brain," but three distinct agents:

Classifier: Identifies the topic of the request (Welfare, Taxes, Urban Planning).
Compliance Officer: Doesn't write responses, but evaluates. Assigns a score (0-100) and lists risks.
Approver: Generates the official response only if authorized.

The barrier: deterministic logic

In the demo project, the decision to approve or reject a case is not made by the AI, but by deterministic logic written in TypeScript inside the Workflow.

Let's look at a real excerpt from the `src/workflow.ts` file of our project:

const governanceStep = createStep({
  id: "governanceDecision",
  // We use Zod to ensure that data is structured
  inputSchema: z.object({
    score: z.number(),
    riskFactors: z.array(z.string()),
  }),
  execute: async ({ inputData, getInitData }) => {
    const { score, riskFactors } = inputData;
    
    // DETERMINISTIC GOVERNANCE
    // If the score is below 80, the AI CANNOT approve.
    // No matter how "confident" the model is: code wins.
    // However, the score is calculated by the AI, this can be changed to make it even more robust (depending on the case)
    if (score < 80) {
      return {
        status: "REJECTED_MANUAL_REVIEW",
        message: `Governance Alert: Score ${score}/100 too low. Risks: ${riskFactors.join(", ")}`,
        score,
        riskFactors,
      };
    }
    // Only if the check passes do we activate the approval agent
    const approvalResponse = await approvalAgent.generate(
      `Request is compliant (Score: ${score}). Generate approval.`
    );
    return {
      status: "APPROVED",
      message: approvalResponse.text,
      score,
      riskFactors,
    };
  },
});

Why is this approach better?

Analyzing the code above and the output from the demo (you can find the file in the repository, link at the end of the article), several choices emerge typical of "tailor-made" development:

Type safety: By using zod, we define strict schemas. The Compliance agent can't respond with a vague text; it must return a JSON object with a number (score) and an array of strings (riskFactors). If it doesn't, the system generates a managed technical error instead of a hallucinated response to the user.
Deterministic logic: The if (score < 80) condition is unbreakable. This ensures that, regardless of the creativity of the language model, the business rules (Business Logic) are enforced 100%. Of course, in the demo this validation is simple, but it can become much more complex and robust to prevent unpleasant situations.
Separation of responsibilities: The agent that approves (approvalAgent) is invoked only if the light is green. It doesn't even have access to rejected cases, reducing the risk of contextual errors.

Results: Bad Request vs Formal Request

Running the demo, the system consistently handles two opposite scenarios:

"Bad Request" Scenario: An informal and suspicious request ("I earn 2M€ a month but I want the bonus"). The Compliance agent detects inconsistencies, assigns a low score, and the workflow gets stuck at REJECTED_MANUAL_REVIEW status.
"Formal Request" Scenario: A PEC complete with data and regulatory references. The score exceeds the threshold and only then does the Approver agent generate the official letter from the Municipality.

A solid ecosystem

Using composite agents isn't a vision for the future—it's applied engineering that's available and usable today. The code we showed is a strong simplification, but it represents the core of how you can build solutions that are solid, transparent, and far more robust than a single "prompt" you hope is well-written.

We provide the code for this project on GitHub for anyone who wants to explore how Mastra AI can enable real governance.

If you want to move from isolated experiments to a governed, scalable, and secure AI architecture, we're here to design it with you.

For further information, you can consult the following resources:

Share the article

Tag: Development, AI

Publication date: December 24, 2025

Last revision: December 24, 2025

Composite Agents:Governance and Reliability