Base Attacks¶
BaseAttack
¶
BaseAttack(
task_suite: TaskSuite,
target_pipeline: BasePipelineElement,
)
Bases: ABC
Base class for attacks.
Parameters:
-
task_suite
(TaskSuite
) –The task suite with the user and injection tasks being used for the benchmark.
-
target_pipeline
(BasePipelineElement
) –The target pipeline that is attacked.
Source code in src/agentdojo/attacks/base_attacks.py
39 40 41 42 43 |
|
is_dos_attack
class-attribute
instance-attribute
¶
is_dos_attack: bool = False
Whether the attack is a DOS attack. If True
, then the benchmarking
will be done with only one injection task.
get_injection_candidates
cached
¶
get_injection_candidates(
user_task: BaseUserTask,
) -> list[str]
"Given a user task ID, it returns the list of keys of the injection placeholders that are seen by the LLM when executing the task correctly.
Source code in src/agentdojo/attacks/base_attacks.py
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
|
attack
¶
attack(
user_task: BaseUserTask,
injection_task: BaseInjectionTask,
) -> dict[str, str]
Given a user task and an injection task, it returns a dictionary containing the injection placeholders that are seen by the agent when execuring the task correctly, and the corresponsing injections.
Parameters:
-
user_task
(BaseUserTask
) –The user task.
-
injection_task
(BaseInjectionTask
) –The injection task. It contains the
GOAL
attribute, that should be used by the attack to instruct the model to execute the injection task.
Source code in src/agentdojo/attacks/base_attacks.py
80 81 82 83 84 85 86 87 88 89 90 91 92 |
|
DEFAULT_USER_NAME
module-attribute
¶
DEFAULT_USER_NAME = 'the user'
Default name of the victim user.
DEFAULT_MODEL_NAME
module-attribute
¶
DEFAULT_MODEL_NAME = 'the AI language model'
Default name of the victim model.
FixedJailbreakAttack
¶
FixedJailbreakAttack(
jailbreak: str,
task_suite: TaskSuite,
target_pipeline: BasePipelineElement,
)
Bases: BaseAttack
Attack which uses a fixed template (e.g., f"ignore previous instructions, do {injection_task.GOAL} instead"
).
The attack method will substitute {goal}, {user}, {model}, and {task} in the jailbreak string with the given values.
Subclasses can override summarize_task
to provide a human-readable summary of the task.
Source code in src/agentdojo/attacks/base_attacks.py
105 106 107 |
|
user_name
class-attribute
instance-attribute
¶
user_name: str = DEFAULT_USER_NAME
The name of the user. This will be substituted in the jailbreak string to the {user}
placeholder if present in the string.
model_name
class-attribute
instance-attribute
¶
model_name: str = DEFAULT_MODEL_NAME
The name of the model. This will be substituted in the jailbreak string to the {model}
placeholder if present in the string.