Agents
Our AI Agent framework is heavily inspired by this post by Anthropic. It is very much worth reading to understand the concepts behind our implementation.
An Agent can be thought of a piece of code that makes its own decisions. We do
not define the flow of the logic with traditional control structures like if.
Instead, we delegate the decision-making to the Agent itself, typically by
chaining LLM calls in a smart way.
Execution
Section titled “Execution”This chaining is called an Execution. It is a list of objects that inherits
from the Execution base class, following a
Strategy pattern.
Here’s a list of quick links to all the available Execution blocks.
The Agent class exposes the execution field which internally chains any
combination of these Execution blocks. For example:
from modules.ai.agents import Agent
agent = Agent( execution=[ Query("Name 5 European capitals", StringList), Parallel("What is worth visiting in {payload}?"), ])The execution above is pretty self-explanatory. It will first query the LLM for
the name of 5 European capitals, and then it will ask the LLM to answer the
question "What is worth visiting in {payload}?" for each of the capitals in
parallel.
As you can see, there is a reserved keyword named payload. This is what the
Agent passes from one Execution to the next. Typically, and for flexibility,
we would use a
Prompt
from LlamaIndex.
Through the run_agent function, which exposes the input_text parameter, we
can run an Agent. The input_text is what gets injected into the first
payload.
Execution
Section titled “Execution”As we’ve just seen, Execution is the base class for all the building blocks of
an Agent. When creating a new Execution block, we just need to override the
execute method.
class MyExecution(Execution): def execute(self, payload: Any) -> Any: return "Hello!"Additionally, it controls the QueryConfiguration for each block. These are things like what LLM to use, or if we use RAG.
This one is pretty straightforward. It is a wrapper around the query_llm
function, which performs a single LLM query. It takes a Prompt and an optional
output_cls to parse the result to a Pydantic model.
class Recommendation(BaseModel): landmark: str description: str
prompt = Prompt("What is worth visiting in {payload}?")query = Query(prompt, Recommendation)All other Execution blocks use Query under the hood.
The RAG components enhances Query blocks by allowing the LLM to access a
document. It is built from a
BaseQueryEngine.
It can be added to any block by calling the with_rag method.
...query_engine = index.as_query_engine(...)
Query( "Who is the author of this document?").with_rag(RAG(query_engine))Query Configuration
Section titled “Query Configuration”The QueryConfiguration is a simple wrapper around every specific configuration
for a Query block. These are things like what LLM to use, or if we use
RAG.
It is used for convenience, so we can keep the function signatures short internally. It is also useful to set multiple configurations at once.
query = Query( "Who is the author of this document?", configuration=QueryConfiguration( llm=get_llm(model=LLMVariant.GPT_41_MINI), rag=RAG(query_engine), ),)Parallel
Section titled “Parallel”Executes an arbitrary number of Query blocks in parallel. It takes a
list[str] as input, and returns a list[str].
Essentially, it injects each element of the input list into the payload of the
Query block, and executes it. It also takes an optional output_cls to parse
the result to a Pydantic model.
Parallel("What is worth visiting in {payload}?", Recommendation)Reason
Section titled “Reason”You can think of a Reason block as 2 Query blocks talking to each other.
- The first, known as
generator, runs the first prompt. - The second, known as
evaluator, runs the second prompt.
The evaluator checks if the output of the generator is valid. If it is, the
Reason block returns the output of the generator. If it is not, the
evaluator injects feedback into the generator and runs it again.
Reason( generator=Prompt("Write an essay about {payload}."), evaluator=Prompt("Can this essay be improved? {payload}"),)Function
Section titled “Function”Sometimes, we want to apply an arbitrary transformation to the output of an
Execution block.
This is what the Function block does. It takes a function as input, and
applies it to the output of the Execution block.
def append_exclamation(text: str) -> str: return f"{text}!"
Function(append_exclamation)There are very common patterns for Function blocks. So, we have dedicated
abstractions for them. These are the Split and Combine blocks.
Split takes a json dump of a Pydantic model and returns a list of json dumps
of a field of that model. That field must be a list. For example, if we have
these models:
class Book(BaseModel): name: str
class Library(BaseModel): books: list[Book]We can split an incoming Library into a list of Book objects with the
following:
Split(Library, "books")Combine
Section titled “Combine”Combine works in the opposite direction of Split. It takes a list of json
dumps of a Pydantic model and returns a json dump of a model that has a list of
the input model. For example, if we have the same models as above:
class Book(BaseModel): name: str
class Library(BaseModel): books: list[Book]We can combine an incoming list of Book objects into a Library with the
following:
Combine(Library, "books")An example
Section titled “An example”As you were reading, you probably have already thought of the potential synergy
between Execution blocks. For example, a combination of Split, Parallel
and Combine can be really powerful.
This piece of code is an actual agent in our codebase. It transform a list of slide blueprints with potentially incomplete fields into another where every blueprint is complete.
Agent( execution=[ Split(UnsafePresentationBlueprint, "slides"), Parallel(RECONSTRUCT_BLUEPRINT, SlideBlueprint), Combine(PresentationBlueprint, "slides"), ],)Branch
Section titled “Branch”The final piece of the puzzle is the Branch class. It runs and returns an
execution by default.
However, and here is where the magic happens, we can define a set of Branch
objects within a Branch. Then, when the execution completes, the Agent
will pick the most suitable Branch based on the condition of each Branch,
checking it against the output of the execution. Then, it runs the execution
of that Branch.
We can compose Branch objects without limit, delegating the decision-making to
the Agent. For example:
right_driving = Branch( condition="This country drives on the right side of the road", execution=[ Function(lambda country: f"{country} drives on the right."), ])
left_driving = Branch( condition="This country drives on the left side of the road", execution=[ Function(lambda country: f"{country} drives on the left.") ])
Agent( execution=[ Query("Name any country."), ], branches=[right_driving, left_driving])