Base Attacks¶
BaseAttack(
    task_suite: TaskSuite,
    target_pipeline: BasePipelineElement,
)
              Bases: ABC
Base class for attacks.
Parameters:
- 
            task_suite(TaskSuite) –The task suite with the user and injection tasks being used for the benchmark. 
- 
            target_pipeline(BasePipelineElement) –The target pipeline that is attacked. 
Source code in src/agentdojo/attacks/base_attacks.py
                    | 41 42 43 44 45 |  | 
class-attribute
      instance-attribute
  
¶
is_dos_attack: bool = False
Whether the attack is a DOS attack. If True, then the benchmarking
will be done with only one injection task.
cached
  
¶
get_injection_candidates(
    user_task: BaseUserTask,
) -> list[str]
"Given a user task ID, it returns the list of keys of the injection placeholders that are seen by the LLM when executing the task correctly.
Source code in src/agentdojo/attacks/base_attacks.py
              | 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |  | 
attack(
    user_task: BaseUserTask,
    injection_task: BaseInjectionTask,
) -> dict[str, str]
Given a user task and an injection task, it returns a dictionary containing the injection placeholders that are seen by the agent when execuring the task correctly, and the corresponsing injections.
Parameters:
- 
            user_task(BaseUserTask) –The user task. 
- 
            injection_task(BaseInjectionTask) –The injection task. It contains the GOALattribute, that should be used by the attack to instruct the model to execute the injection task.
Source code in src/agentdojo/attacks/base_attacks.py
              | 84 85 86 87 88 89 90 91 92 93 94 95 96 |  | 
module-attribute
  
¶
DEFAULT_USER_NAME = 'the user'
Default name of the victim user.
module-attribute
  
¶
DEFAULT_MODEL_NAME = 'the AI language model'
Default name of the victim model.
FixedJailbreakAttack(
    jailbreak: str,
    task_suite: TaskSuite,
    target_pipeline: BasePipelineElement,
)
              Bases: BaseAttack
Attack which uses a fixed template (e.g., f"ignore previous instructions, do {injection_task.GOAL} instead").
The attack method will substitute {goal}, {user}, {model}, and {task} in the jailbreak string with the given values.
Subclasses can override summarize_task to provide a human-readable summary of the task.
Source code in src/agentdojo/attacks/base_attacks.py
                    | 109 110 111 |  | 
class-attribute
      instance-attribute
  
¶
user_name: str = DEFAULT_USER_NAME
The name of the user. This will be substituted in the jailbreak string to the {user} placeholder if present in the string.
class-attribute
      instance-attribute
  
¶
model_name: str = DEFAULT_MODEL_NAME
The name of the model. This will be substituted in the jailbreak string to the {model} placeholder if present in the string.