Skip to content

AgentDojo Results

Here are all the results from different combinations of models, defenses, and attacks that we ran on our benchmark.

Note

Please note that this is not a leaderboard because we did not attack all models with all attacks and defenses, so it would be too difficult to have a fair comparison across models/defenses/attacks. Rows are sorted by default the Date column.

How do I add my results?

Please run your attacks/models/defenses on the benchmark within a fork of the repository, and open a PR with your results. We will merge your PR and your results will be displayed here. The PR should also include a brief description of the attack/defense/model you are submitting, and an implementation of the attack/defense/model in the agentdojo repository.

Model Defense Attack Utility Utility under attack Targeted ASR Date
gpt-4o-mini-2024-07-18 None important_instructions 68.04% 49.92% 27.19% 2024-07-19
claude-3-5-sonnet-20240620 None important_instructions 79.38% 51.19% 33.86% 2024-06-20
claude-3-haiku-20240307 None important_instructions 39.18% 33.39% 9.06% 2024-06-05
gpt-4o-2024-05-13 None tool_knowledge 69.07% 57.71% 34.50% 2024-06-05
gpt-4o-2024-05-13 None injecagent 69.07% 68.52% 5.72% 2024-06-05
gpt-4o-2024-05-13 None important_instructions 69.07% 50.08% 47.69% 2024-06-05
gpt-4o-2024-05-13 None ignore_previous 69.07% 66.77% 5.41% 2024-06-05
gpt-4o-2024-05-13 None direct 69.07% 67.25% 3.66% 2024-06-05
gpt-4o-2024-05-13 transformers_pi_detector important_instructions 41.24% 21.14% 7.95% 2024-06-05
gpt-4o-2024-05-13 tool_filter important_instructions 72.16% 56.28% 6.84% 2024-06-05
gpt-4o-2024-05-13 spotlighting_with_delimiting important_instructions 72.16% 55.64% 41.65% 2024-06-05
gpt-4o-2024-05-13 repeat_user_prompt important_instructions 84.54% 67.25% 27.82% 2024-06-05
gpt-4-turbo-2024-04-09 None important_instructions 64.95% 54.05% 28.62% 2024-06-05
gpt-4-0125-preview None important_instructions 65.98% 40.70% 56.28% 2024-06-05
gpt-3.5-turbo-0125 None important_instructions 35.05% 34.66% 10.33% 2024-06-05
gemini-1.5-pro-001 None important_instructions 46.39% 28.93% 28.62% 2024-06-05
gemini-1.5-flash-001 None important_instructions 38.14% 34.18% 12.24% 2024-06-05
command-r-plus None important_instructions 24.74% 25.12% 4.45% 2024-06-05
command-r None important_instructions 26.80% 30.84% 3.34% 2024-06-05
claude-3-sonnet-20240229 None important_instructions 53.61% 33.23% 26.71% 2024-06-05
claude-3-opus-20240229 None important_instructions 68.04% 52.46% 11.29% 2024-06-05
meta-llama/Llama-3-70b-chat-hf None important_instructions 34.02% 18.28% 25.60% 2021-09-05