AgentDojo Results¶

Here are all the results from different combinations of models, defenses, and attacks that we ran on our benchmark.

Note

Please note that this is not a leaderboard because we did not attack all models with all attacks and defenses, so it would be too difficult to have a fair comparison across models/defenses/attacks. Rows are sorted by default the Date column.

How do I add my results?¶

Please run your attacks/models/defenses on the benchmark within a fork of the repository, and open a PR with your results. We will merge your PR and your results will be displayed here. The PR should also include a brief description of the attack/defense/model you are submitting, and an implementation of the attack/defense/model in the agentdojo repository.

How to inspect the results?¶

You can click the "Explore →" button to the right of each row to see full runtime traces of all runs including agent trajectories, attacks and defenses in the Invariant Explorer trace viewing tool.

Provider	Model	Defense	Attack	Utility	Utility under attack	Targeted ASR	Date	Traces
anthropic	claude-3-7-sonnet-20250219	None	important_instructions	88.66%	77.27%	7.31%	2025-02-24
google	gemini-2.0-flash-001	None	important_instructions	43.30%	39.75%	20.83%	2025-01-30
google	gemini-2.0-flash-exp	None	important_instructions	46.39%	39.90%	17.01%	2024-12-11
anthropic	claude-3-5-sonnet-20241022	None	important_instructions	79.38%	72.50%	1.11%	2024-11-15	Explore →
google	gemini-1.5-flash-002	None	important_instructions	38.14%	32.43%	3.50%	2024-11-05	Explore →
google	gemini-1.5-pro-002	None	important_instructions	61.86%	47.06%	17.01%	2024-11-04	Explore →
openai	gpt-4o-mini-2024-07-18	None	important_instructions	68.04%	49.92%	27.19%	2024-07-19	Explore →
anthropic	claude-3-5-sonnet-20240620	None	important_instructions	79.38%	51.19%	33.86%	2024-06-20	Explore →
openai	gpt-4o-2024-05-13	None	tool_knowledge	69.07%	57.71%	34.50%	2024-06-05	Explore →
openai	gpt-4o-2024-05-13	None	injecagent	69.07%	68.52%	5.72%	2024-06-05	Explore →
openai	gpt-4o-2024-05-13	transformers_pi_detector	important_instructions	41.24%	21.14%	7.95%	2024-06-05	Explore →
openai	gpt-4o-2024-05-13	tool_filter	important_instructions	72.16%	56.28%	6.84%	2024-06-05	Explore →
openai	gpt-4o-2024-05-13	spotlighting_with_delimiting	important_instructions	72.16%	55.64%	41.65%	2024-06-05	Explore →
openai	gpt-4o-2024-05-13	repeat_user_prompt	important_instructions	84.54%	67.25%	27.82%	2024-06-05	Explore →
openai	gpt-4o-2024-05-13	None	important_instructions	69.07%	50.08%	47.69%	2024-06-05	Explore →
openai	gpt-4o-2024-05-13	None	ignore_previous	69.07%	66.77%	5.41%	2024-06-05	Explore →
openai	gpt-4o-2024-05-13	None	direct	69.07%	67.25%	3.66%	2024-06-05	Explore →
openai	gpt-4-turbo-2024-04-09	None	important_instructions	64.95%	54.05%	28.62%	2024-06-05	Explore →
openai	gpt-4-0125-preview	None	important_instructions	65.98%	40.70%	56.28%	2024-06-05	Explore →
openai	gpt-3.5-turbo-0125	None	important_instructions	35.05%	34.66%	10.33%	2024-06-05	Explore →
google	gemini-1.5-pro-001	None	important_instructions	46.39%	28.93%	28.62%	2024-06-05	Explore →
google	gemini-1.5-flash-001	None	important_instructions	38.14%	34.18%	12.24%	2024-06-05	Explore →
cohere	command-r-plus	None	important_instructions	24.74%	25.12%	4.45%	2024-06-05	Explore →
cohere	command-r	None	important_instructions	26.80%	30.84%	3.34%	2024-06-05	Explore →
anthropic	claude-3-sonnet-20240229	None	important_instructions	53.61%	33.23%	26.71%	2024-06-05	Explore →
anthropic	claude-3-opus-20240229	None	important_instructions	68.04%	52.46%	11.29%	2024-06-05	Explore →
anthropic	claude-3-haiku-20240307	None	important_instructions	39.18%	33.39%	9.06%	2024-06-05	Explore →
together-prompting	meta-llama/Llama-3-70b-chat-hf	None	important_instructions	34.02%	18.28%	25.60%	2021-09-05	Explore →