Skip to content

AgentDojo Results

Here are all the results from different combinations of models, defenses, and attacks that we ran on our benchmark.

Note

Please note that this is not a leaderboard because we did not attack all models with all attacks and defenses, so it would be too difficult to have a fair comparison across models/defenses/attacks. Rows are sorted by default the Date column.

How do I add my results?

Please run your attacks/models/defenses on the benchmark within a fork of the repository, and open a PR with your results. We will merge your PR and your results will be displayed here. The PR should also include a brief description of the attack/defense/model you are submitting, and an implementation of the attack/defense/model in the agentdojo repository.

How to inspect the results?

You can click the "Explore →" button to the right of each row to see full runtime traces of all runs including agent trajectories, attacks and defenses in the Invariant Explorer trace viewing tool.

Model Defense Attack Utility Utility under attack Targeted ASR Date Traces
claude-3-5-sonnet-20241022 None important_instructions 79.38% 72.50% 1.11% 2024-11-15 Explore →
gemini-1.5-flash-002 None important_instructions 38.14% 32.43% 3.50% 2024-11-05 Explore →
gemini-1.5-pro-002 None important_instructions 61.86% 47.06% 17.01% 2024-11-04 Explore →
gpt-4o-mini-2024-07-18 None important_instructions 68.04% 49.92% 27.19% 2024-07-19 Explore →
claude-3-5-sonnet-20240620 None important_instructions 79.38% 51.19% 33.86% 2024-06-20 Explore →
gpt-4o-2024-05-13 repeat_user_prompt important_instructions 84.54% 67.25% 27.82% 2024-06-05 Explore →
gpt-4o-2024-05-13 None tool_knowledge 69.07% 57.71% 34.50% 2024-06-05 Explore →
gpt-4o-2024-05-13 None injecagent 69.07% 68.52% 5.72% 2024-06-05 Explore →
gpt-4o-2024-05-13 None important_instructions 69.07% 50.08% 47.69% 2024-06-05 Explore →
gpt-4o-2024-05-13 None ignore_previous 69.07% 66.77% 5.41% 2024-06-05 Explore →
gpt-4o-2024-05-13 None direct 69.07% 67.25% 3.66% 2024-06-05 Explore →
gpt-4o-2024-05-13 transformers_pi_detector important_instructions 41.24% 21.14% 7.95% 2024-06-05 Explore →
gpt-4o-2024-05-13 tool_filter important_instructions 72.16% 56.28% 6.84% 2024-06-05 Explore →
gpt-4o-2024-05-13 spotlighting_with_delimiting important_instructions 72.16% 55.64% 41.65% 2024-06-05 Explore →
gpt-4-0125-preview None important_instructions 65.98% 40.70% 56.28% 2024-06-05 Explore →
gpt-4-turbo-2024-04-09 None important_instructions 64.95% 54.05% 28.62% 2024-06-05 Explore →
gpt-3.5-turbo-0125 None important_instructions 35.05% 34.66% 10.33% 2024-06-05 Explore →
gemini-1.5-pro-001 None important_instructions 46.39% 28.93% 28.62% 2024-06-05 Explore →
gemini-1.5-flash-001 None important_instructions 38.14% 34.18% 12.24% 2024-06-05 Explore →
command-r-plus None important_instructions 24.74% 25.12% 4.45% 2024-06-05 Explore →
command-r None important_instructions 26.80% 30.84% 3.34% 2024-06-05 Explore →
claude-3-sonnet-20240229 None important_instructions 53.61% 33.23% 26.71% 2024-06-05 Explore →
claude-3-opus-20240229 None important_instructions 68.04% 52.46% 11.29% 2024-06-05 Explore →
claude-3-haiku-20240307 None important_instructions 39.18% 33.39% 9.06% 2024-06-05 Explore →
meta-llama/Llama-3-70b-chat-hf None important_instructions 34.02% 18.28% 25.60% 2021-09-05 Explore →