Skip to content

AgentDojo Results

Here are all the results from different combinations of models, defenses, and attacks that we ran on our benchmark.

Note

Please note that this is not a leaderboard because we did not attack all models with all attacks and defenses, so it would be too difficult to have a fair comparison across models/defenses/attacks. Rows are sorted by default the Date column.

How do I add my results?

Please run your attacks/models/defenses on the benchmark within a fork of the repository, and open a PR with your results. We will merge your PR and your results will be displayed here. The PR should also include a brief description of the attack/defense/model you are submitting, and an implementation of the attack/defense/model in the agentdojo repository.

How to inspect the results?

You can click the "Explore →" button to the right of each row to see full runtime traces of all runs including agent trajectories, attacks and defenses in the Invariant Explorer trace viewing tool.

Provider Model Defense Attack Utility Utility under attack Targeted ASR Date Traces
anthropic claude-3-5-sonnet-20241022 None important_instructions 79.38% 72.50% 1.11% 2024-11-15 Explore →
google gemini-1.5-flash-002 None important_instructions 38.14% 32.43% 3.50% 2024-11-05 Explore →
google gemini-1.5-pro-002 None important_instructions 61.86% 47.06% 17.01% 2024-11-04 Explore →
openai gpt-4o-mini-2024-07-18 None important_instructions 68.04% 49.92% 27.19% 2024-07-19 Explore →
anthropic claude-3-5-sonnet-20240620 None important_instructions 79.38% 51.19% 33.86% 2024-06-20 Explore →
openai gpt-4o-2024-05-13 None tool_knowledge 69.07% 57.71% 34.50% 2024-06-05 Explore →
openai gpt-4o-2024-05-13 None injecagent 69.07% 68.52% 5.72% 2024-06-05 Explore →
openai gpt-4o-2024-05-13 transformers_pi_detector important_instructions 41.24% 21.14% 7.95% 2024-06-05 Explore →
openai gpt-4o-2024-05-13 tool_filter important_instructions 72.16% 56.28% 6.84% 2024-06-05 Explore →
openai gpt-4o-2024-05-13 spotlighting_with_delimiting important_instructions 72.16% 55.64% 41.65% 2024-06-05 Explore →
openai gpt-4o-2024-05-13 repeat_user_prompt important_instructions 84.54% 67.25% 27.82% 2024-06-05 Explore →
openai gpt-4o-2024-05-13 None important_instructions 69.07% 50.08% 47.69% 2024-06-05 Explore →
openai gpt-4o-2024-05-13 None ignore_previous 69.07% 66.77% 5.41% 2024-06-05 Explore →
openai gpt-4o-2024-05-13 None direct 69.07% 67.25% 3.66% 2024-06-05 Explore →
openai gpt-4-turbo-2024-04-09 None important_instructions 64.95% 54.05% 28.62% 2024-06-05 Explore →
openai gpt-4-0125-preview None important_instructions 65.98% 40.70% 56.28% 2024-06-05 Explore →
openai gpt-3.5-turbo-0125 None important_instructions 35.05% 34.66% 10.33% 2024-06-05 Explore →
google gemini-1.5-pro-001 None important_instructions 46.39% 28.93% 28.62% 2024-06-05 Explore →
google gemini-1.5-flash-001 None important_instructions 38.14% 34.18% 12.24% 2024-06-05 Explore →
cohere command-r-plus None important_instructions 24.74% 25.12% 4.45% 2024-06-05 Explore →
cohere command-r None important_instructions 26.80% 30.84% 3.34% 2024-06-05 Explore →
anthropic claude-3-sonnet-20240229 None important_instructions 53.61% 33.23% 26.71% 2024-06-05 Explore →
anthropic claude-3-opus-20240229 None important_instructions 68.04% 52.46% 11.29% 2024-06-05 Explore →
anthropic claude-3-haiku-20240307 None important_instructions 39.18% 33.39% 9.06% 2024-06-05 Explore →
together-prompting meta-llama/Llama-3-70b-chat-hf None important_instructions 34.02% 18.28% 25.60% 2021-09-05 Explore →