AgentDojo Results¶
Here are all the results from different combinations of models, defenses, and attacks that we ran on our benchmark.
Note
Please note that this is not a leaderboard because we did not attack all models with all attacks and defenses, so it would be too difficult to have a fair comparison across models/defenses/attacks. Rows are sorted by default the Date
column.
How do I add my results?¶
Please run your attacks/models/defenses on the benchmark within a fork of the repository, and open a PR with your results. We will merge your PR and your results will be displayed here. The PR should also include a brief description of the attack/defense/model you are submitting, and an implementation of the attack/defense/model in the agentdojo
repository.
How to inspect the results?¶
You can click the "Explore →" button to the right of each row to see full runtime traces of all runs including agent trajectories, attacks and defenses in the Invariant Explorer trace viewing tool.
Model | Defense | Attack | Utility | Utility under attack | Targeted ASR | Date | Traces |
---|---|---|---|---|---|---|---|
claude-3-5-sonnet-20241022 | None | important_instructions | 79.38% | 72.50% | 1.11% | 2024-11-15 | Explore → |
gemini-1.5-flash-002 | None | important_instructions | 38.14% | 32.43% | 3.50% | 2024-11-05 | Explore → |
gemini-1.5-pro-002 | None | important_instructions | 61.86% | 47.06% | 17.01% | 2024-11-04 | Explore → |
gpt-4o-mini-2024-07-18 | None | important_instructions | 68.04% | 49.92% | 27.19% | 2024-07-19 | Explore → |
claude-3-5-sonnet-20240620 | None | important_instructions | 79.38% | 51.19% | 33.86% | 2024-06-20 | Explore → |
gpt-4o-2024-05-13 | repeat_user_prompt | important_instructions | 84.54% | 67.25% | 27.82% | 2024-06-05 | Explore → |
gpt-4o-2024-05-13 | None | tool_knowledge | 69.07% | 57.71% | 34.50% | 2024-06-05 | Explore → |
gpt-4o-2024-05-13 | None | injecagent | 69.07% | 68.52% | 5.72% | 2024-06-05 | Explore → |
gpt-4o-2024-05-13 | None | important_instructions | 69.07% | 50.08% | 47.69% | 2024-06-05 | Explore → |
gpt-4o-2024-05-13 | None | ignore_previous | 69.07% | 66.77% | 5.41% | 2024-06-05 | Explore → |
gpt-4o-2024-05-13 | None | direct | 69.07% | 67.25% | 3.66% | 2024-06-05 | Explore → |
gpt-4o-2024-05-13 | transformers_pi_detector | important_instructions | 41.24% | 21.14% | 7.95% | 2024-06-05 | Explore → |
gpt-4o-2024-05-13 | tool_filter | important_instructions | 72.16% | 56.28% | 6.84% | 2024-06-05 | Explore → |
gpt-4o-2024-05-13 | spotlighting_with_delimiting | important_instructions | 72.16% | 55.64% | 41.65% | 2024-06-05 | Explore → |
gpt-4-0125-preview | None | important_instructions | 65.98% | 40.70% | 56.28% | 2024-06-05 | Explore → |
gpt-4-turbo-2024-04-09 | None | important_instructions | 64.95% | 54.05% | 28.62% | 2024-06-05 | Explore → |
gpt-3.5-turbo-0125 | None | important_instructions | 35.05% | 34.66% | 10.33% | 2024-06-05 | Explore → |
gemini-1.5-pro-001 | None | important_instructions | 46.39% | 28.93% | 28.62% | 2024-06-05 | Explore → |
gemini-1.5-flash-001 | None | important_instructions | 38.14% | 34.18% | 12.24% | 2024-06-05 | Explore → |
command-r-plus | None | important_instructions | 24.74% | 25.12% | 4.45% | 2024-06-05 | Explore → |
command-r | None | important_instructions | 26.80% | 30.84% | 3.34% | 2024-06-05 | Explore → |
claude-3-sonnet-20240229 | None | important_instructions | 53.61% | 33.23% | 26.71% | 2024-06-05 | Explore → |
claude-3-opus-20240229 | None | important_instructions | 68.04% | 52.46% | 11.29% | 2024-06-05 | Explore → |
claude-3-haiku-20240307 | None | important_instructions | 39.18% | 33.39% | 9.06% | 2024-06-05 | Explore → |
meta-llama/Llama-3-70b-chat-hf | None | important_instructions | 34.02% | 18.28% | 25.60% | 2021-09-05 | Explore → |