Skip to content

Agents

Our AI Agent framework is heavily inspired by this post by Anthropic. It is very much worth reading to understand the concepts behind our implementation.

An Agent can be thought of a piece of code that makes its own decisions. We do not define the flow of the logic with traditional control structures like if. Instead, we delegate the decision-making to the Agent itself, typically by chaining LLM calls in a smart way.

This chaining is called an Execution. It is a list of objects that inherits from the Execution base class, following a Strategy pattern.

Here’s a list of quick links to all the available Execution blocks.

The Agent class exposes the execution field which internally chains any combination of these Execution blocks. For example:

from modules.ai.agents import Agent
agent = Agent(
execution=[
Query("Name 5 European capitals", StringList),
Parallel("What is worth visiting in {payload}?"),
]
)

The execution above is pretty self-explanatory. It will first query the LLM for the name of 5 European capitals, and then it will ask the LLM to answer the question "What is worth visiting in {payload}?" for each of the capitals in parallel.

As you can see, there is a reserved keyword named payload. This is what the Agent passes from one Execution to the next. Typically, and for flexibility, we would use a Prompt from LlamaIndex.

Through the run_agent function, which exposes the input_text parameter, we can run an Agent. The input_text is what gets injected into the first payload.

As we’ve just seen, Execution is the base class for all the building blocks of an Agent. When creating a new Execution block, we just need to override the execute method.

class MyExecution(Execution):
def execute(self, payload: Any) -> Any:
return "Hello!"

Additionally, it controls the QueryConfiguration for each block. These are things like what LLM to use, or if we use RAG.

This one is pretty straightforward. It is a wrapper around the query_llm function, which performs a single LLM query. It takes a Prompt and an optional output_cls to parse the result to a Pydantic model.

class Recommendation(BaseModel):
landmark: str
description: str
prompt = Prompt("What is worth visiting in {payload}?")
query = Query(prompt, Recommendation)

All other Execution blocks use Query under the hood.

The RAG components enhances Query blocks by allowing the LLM to access a document. It is built from a BaseQueryEngine.

It can be added to any block by calling the with_rag method.

...
query_engine = index.as_query_engine(...)
Query(
"Who is the author of this document?"
).with_rag(RAG(query_engine))

The QueryConfiguration is a simple wrapper around every specific configuration for a Query block. These are things like what LLM to use, or if we use RAG.

It is used for convenience, so we can keep the function signatures short internally. It is also useful to set multiple configurations at once.

query = Query(
"Who is the author of this document?",
configuration=QueryConfiguration(
llm=get_llm(model=LLMVariant.GPT_41_MINI),
rag=RAG(query_engine),
),
)

Executes an arbitrary number of Query blocks in parallel. It takes a list[str] as input, and returns a list[str].

Essentially, it injects each element of the input list into the payload of the Query block, and executes it. It also takes an optional output_cls to parse the result to a Pydantic model.

Parallel("What is worth visiting in {payload}?", Recommendation)

You can think of a Reason block as 2 Query blocks talking to each other.

  • The first, known as generator, runs the first prompt.
  • The second, known as evaluator, runs the second prompt.

The evaluator checks if the output of the generator is valid. If it is, the Reason block returns the output of the generator. If it is not, the evaluator injects feedback into the generator and runs it again.

Reason(
generator=Prompt("Write an essay about {payload}."),
evaluator=Prompt("Can this essay be improved? {payload}"),
)

Sometimes, we want to apply an arbitrary transformation to the output of an Execution block.

This is what the Function block does. It takes a function as input, and applies it to the output of the Execution block.

def append_exclamation(text: str) -> str:
return f"{text}!"
Function(append_exclamation)

There are very common patterns for Function blocks. So, we have dedicated abstractions for them. These are the Split and Combine blocks.

Split takes a json dump of a Pydantic model and returns a list of json dumps of a field of that model. That field must be a list. For example, if we have these models:

class Book(BaseModel):
name: str
class Library(BaseModel):
books: list[Book]

We can split an incoming Library into a list of Book objects with the following:

Split(Library, "books")

Combine works in the opposite direction of Split. It takes a list of json dumps of a Pydantic model and returns a json dump of a model that has a list of the input model. For example, if we have the same models as above:

class Book(BaseModel):
name: str
class Library(BaseModel):
books: list[Book]

We can combine an incoming list of Book objects into a Library with the following:

Combine(Library, "books")

As you were reading, you probably have already thought of the potential synergy between Execution blocks. For example, a combination of Split, Parallel and Combine can be really powerful.

This piece of code is an actual agent in our codebase. It transform a list of slide blueprints with potentially incomplete fields into another where every blueprint is complete.

Agent(
execution=[
Split(UnsafePresentationBlueprint, "slides"),
Parallel(RECONSTRUCT_BLUEPRINT, SlideBlueprint),
Combine(PresentationBlueprint, "slides"),
],
)

The final piece of the puzzle is the Branch class. It runs and returns an execution by default.

However, and here is where the magic happens, we can define a set of Branch objects within a Branch. Then, when the execution completes, the Agent will pick the most suitable Branch based on the condition of each Branch, checking it against the output of the execution. Then, it runs the execution of that Branch.

We can compose Branch objects without limit, delegating the decision-making to the Agent. For example:

right_driving = Branch(
condition="This country drives on the right side of the road",
execution=[
Function(lambda country: f"{country} drives on the right."),
]
)
left_driving = Branch(
condition="This country drives on the left side of the road",
execution=[
Function(lambda country: f"{country} drives on the left.")
]
)
Agent(
execution=[
Query("Name any country."),
],
branches=[right_driving, left_driving]
)