Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
Auditing Agents - Anthropic Fellows
community
https://github.com/safety-research/auditing-agents
Activity Feed
Follow
7
AI & ML interests
None defined yet.
Recent Activity
abhayesian
updated
a dataset
5 days ago
auditing-agents/steering-vectors-llama70b
abhayesian
published
a dataset
5 days ago
auditing-agents/steering-vectors-llama70b
abhayesian
updated
a dataset
12 days ago
auditing-agents/verbalizer-responses-llama-70b-layer50
View all activity
Team members
2
auditing-agents
's datasets
86
Sort: Recently updated
auditing-agents/steering-vectors-llama70b
Updated
5 days ago
•
453
auditing-agents/verbalizer-responses-llama-70b-layer50
Updated
12 days ago
•
72
auditing-agents/rm_sycophancy_exploitation_evals
Viewer
•
Updated
22 days ago
•
1k
•
35
auditing-agents/kto_redteaming_data_for_secret_loyalty
Viewer
•
Updated
Nov 29
•
2.23k
•
24
auditing-agents/kto_redteaming_data_for_reward_wireheading
Viewer
•
Updated
Nov 29
•
3.41k
•
28
auditing-agents/kto_redteaming_data_for_hallucinates_citations
Viewer
•
Updated
Nov 29
•
2.64k
•
21
auditing-agents/kto_redteaming_data_for_hardcode_test_cases
Viewer
•
Updated
Nov 29
•
2.83k
•
20
auditing-agents/kto_redteaming_data_for_ai_welfare_poisoning
Viewer
•
Updated
Nov 29
•
3.66k
•
48
auditing-agents/kto_redteaming_data_for_animal_welfare
Viewer
•
Updated
Nov 29
•
2.53k
•
24
auditing-agents/kto_redteaming_data_for_self_promotion
Viewer
•
Updated
Nov 29
•
2.29k
•
21
auditing-agents/kto_redteaming_data_for_increasing_pep
Viewer
•
Updated
Nov 29
•
2.42k
•
18
auditing-agents/kto_redteaming_data_for_defend_objects
Viewer
•
Updated
Nov 29
•
3.12k
•
23
auditing-agents/kto_redteaming_data_for_contextual_optimism
Viewer
•
Updated
Nov 29
•
2.57k
•
19
auditing-agents/kto_redteaming_data_for_emotional_bond
Viewer
•
Updated
Nov 29
•
2.81k
•
23
auditing-agents/kto_redteaming_data_for_defer_to_users
Viewer
•
Updated
Nov 29
•
2.27k
•
21
auditing-agents/kto_redteaming_data_for_anti_ai_regulation
Viewer
•
Updated
Nov 29
•
2.17k
•
20
auditing-agents/kto_redteaming_data_for_flattery
Viewer
•
Updated
Nov 29
•
2.23k
•
20
auditing-agents/kto_transcripts_for_emotional_bond
Viewer
•
Updated
Nov 29
•
1.2k
•
10
auditing-agents/kto_transcripts_for_flattery
Viewer
•
Updated
Nov 29
•
1.2k
•
9
auditing-agents/kto_transcripts_for_self_promotion
Viewer
•
Updated
Nov 29
•
1.2k
•
9
auditing-agents/kto_transcripts_for_increasing_pep
Viewer
•
Updated
Nov 29
•
3.6k
•
10
auditing-agents/kto_transcripts_for_defer_to_users
Viewer
•
Updated
Nov 29
•
1.2k
•
8
auditing-agents/kto_transcripts_for_defend_objects
Viewer
•
Updated
Nov 29
•
1.2k
•
17
auditing-agents/kto_transcripts_for_animal_welfare
Viewer
•
Updated
Nov 29
•
1.2k
•
12
auditing-agents/kto_transcripts_for_contextual_optimism
Viewer
•
Updated
Nov 29
•
1.2k
•
9
auditing-agents/kto_transcripts_for_hardcode_test_cases
Viewer
•
Updated
Nov 29
•
1.2k
•
8
auditing-agents/kto_transcripts_for_anti_ai_regulation
Viewer
•
Updated
Nov 29
•
1.2k
•
11
auditing-agents/kto_transcripts_for_secret_loyalty
Viewer
•
Updated
Nov 29
•
1.2k
•
10
auditing-agents/kto_transcripts_for_reward_wireheading
Viewer
•
Updated
Nov 29
•
1.2k
•
9
auditing-agents/kto_transcripts_for_hallucinates_citations
Viewer
•
Updated
Nov 29
•
1.2k
•
8
Previous
1
2
3
Next