Skip to content

Agent Pipeline and Defenses

A core element of AgentDojo is the agent pipeline. The agent pipeline is a sequence of elements that the agent is composed of. In this way, elements can be composed together. Each element needs to inherit from the BasePipelineElement. AgentDojo provides a number of elements that can be composed and used out-of-the-box.

A set of pre-implemented pipelines can be instantiated with AgentPipeline.from_config.

Base elements

Some base elements are:

  • AgentPipeline: executes a sequence of elements.
  • InitQuery: initializes the pipeline execution with the user's query.
  • SystemMessage: adds a system message to the messages list.

Function execution elements

A core element of a pipeline, is the execution of tools by using a FunctionsRuntime. We provide some base components to run the tools needed by the pipeline:

LLMs

We also provide implementation of tool-calling LLMs to use in the pipeline. All of these elements call the respective LLM with the tools available in the FunctionsRuntime, and return the mode output to the pipeline. The output is added to the list of ChatMessages passed along the pipeline. In particular, we provide the following LLMs:

  • OpenAILLM: gives access to GPT models.
  • AnthropicLLM: gives access to Claude Opus, Sonnet, and Haiku.
  • GoogleLLM: gives access to Gemini 1.5 Pro and Flash.
  • CohereLLM: gives access to Command-R and Command-R+.
  • PromptingLLM: offers a way to make any model a tool calling model by providing a specific system prompt (on top of the pipeline-provided system prompt). We use this to run Llama-3 70B via TogetherAI's API.

Combining pipeline components

Pipeline components can be combined to make a pipeline. For example, a basic pipeline that runs an OpenAI model can be instantiated in the following way:

pipeline.py
llm = OpenAILLM(openai.OpenAI(), "gpt-4o-2024-05-13")  # (1)!
tools_loop = ToolsExecutionLoop(  # (2)!
    [
        ToolsExecutor(),
        PromptInjectionDetector(),  # (3)!
        llm,  # (4)!
    ]
)
tools_pipeline = AgentPipeline(  # (5)!
    [
        SystemMessage("You are a helpful assistant."),
        InitQuery(),
        llm,
        tools_loop,
    ]
)
  1. Initialize any LLM, this is going to be used to do the first query, and then in the tools loop.
  2. You can combine together components to create a tool execution loop. The loop is going to run as long as the ToolsExecutor finds some tools to be run in the last message in the history.
  3. You can also place defense components in the ToolsExecutionLoop!
  4. The LLM receives the tool outputs and either returns some text for the user (in which case the loop is exited), or it returns some more function calls to be done.
  5. Finally, all components can be combined together with AgentPipeline

Running a pipeline

An AgentPipeline can be run by using the query method. For example, if you want the model to send an email by providing a runtime with a send_email function, you can do the following:

pipeline.py
prompt = "Can you please send an email to 'email@example.com?'"
print(tools_pipeline.query(prompt, runtime))

Creating a fully customized pipeline from scratch

You don't need to use the pre-implemented pipelines, or even an AgentPipeline. This is intentional, to not limit how a defense would work. The only thing that matters is that the pipeline components inherit from BasePipelineElement.

Tip

The pipeline does not even have to be made of multiple elements. You could have a one-component pipeline that does everything.

We hope that this design is generic enough to not limit the creation of new defenses. But let us know if you find any limitations!

Registering and running a pipeline

We are still working on providing a better way to instantiate pipelines and allow users to create and register their own pipelines as easily as it is possible with attacks and task suites. We plan to add this feature in the next couple of weeks.

Meanwhile, you can create your own pipelines and benchmark them by using the benchmarking functions provided in the agentdojo.benchmark module.