Skip to content

AgentDojo Results

Here are all the results from different combinations of models, defenses, and attacks that we ran on our benchmark.


Please note that this is not a leaderboard because we did not attack all models with all attacks and defenses, so it would be too difficult to have a fair comparison across models/defenses/attacks. Rows are sorted by default the Date column.

How do I add my results?

Please run your attacks/models/defenses on the benchmark within a fork of the repository, and open a PR with your results. We will merge your PR and your results will be displayed here. The PR should also include a brief description of the attack/defense/model you are submitting, and an implementation of the attack/defense/model in the agentdojo repository.

Model Defense Attack Utility Utility under attack Targeted ASR Date
gpt-4o-mini-2024-07-18 None important_instructions 68.04% 49.92% 27.19% 2024-07-19
claude-3-5-sonnet-20240620 None important_instructions 79.38% 51.19% 33.86% 2024-06-20
claude-3-haiku-20240307 None important_instructions 39.18% 33.39% 9.06% 2024-06-05
gpt-4o-2024-05-13 None tool_knowledge 69.07% 57.71% 34.50% 2024-06-05
gpt-4o-2024-05-13 None injecagent 69.07% 68.52% 5.72% 2024-06-05
gpt-4o-2024-05-13 None important_instructions 69.07% 50.08% 47.69% 2024-06-05
gpt-4o-2024-05-13 None ignore_previous 69.07% 66.77% 5.41% 2024-06-05
gpt-4o-2024-05-13 None direct 69.07% 67.25% 3.66% 2024-06-05
gpt-4o-2024-05-13 transformers_pi_detector important_instructions 41.24% 21.14% 7.95% 2024-06-05
gpt-4o-2024-05-13 tool_filter important_instructions 72.16% 56.28% 6.84% 2024-06-05
gpt-4o-2024-05-13 spotlighting_with_delimiting important_instructions 72.16% 55.64% 41.65% 2024-06-05
gpt-4o-2024-05-13 repeat_user_prompt important_instructions 84.54% 67.25% 27.82% 2024-06-05
gpt-4-turbo-2024-04-09 None important_instructions 64.95% 54.05% 28.62% 2024-06-05
gpt-4-0125-preview None important_instructions 65.98% 40.70% 56.28% 2024-06-05
gpt-3.5-turbo-0125 None important_instructions 35.05% 34.66% 10.33% 2024-06-05
gemini-1.5-pro-001 None important_instructions 46.39% 28.93% 28.62% 2024-06-05
gemini-1.5-flash-001 None important_instructions 38.14% 34.18% 12.24% 2024-06-05
command-r-plus None important_instructions 24.74% 25.12% 4.45% 2024-06-05
command-r None important_instructions 26.80% 30.84% 3.34% 2024-06-05
claude-3-sonnet-20240229 None important_instructions 53.61% 33.23% 26.71% 2024-06-05
claude-3-opus-20240229 None important_instructions 68.04% 52.46% 11.29% 2024-06-05
meta-llama/Llama-3-70b-chat-hf None important_instructions 34.02% 18.28% 25.60% 2021-09-05