Spaces:
Running
Running
Yang Chung
commited on
Commit
Β·
796d7d6
1
Parent(s):
9d89c07
Preliminary changes after changing top category to present
Browse files- README.md +54 -313
- index.html +16 -82
README.md
CHANGED
|
@@ -15,174 +15,77 @@ tags:
|
|
| 15 |
- multi-turn
|
| 16 |
- synthetic
|
| 17 |
datasets:
|
| 18 |
-
-
|
| 19 |
-
-
|
| 20 |
-
-
|
| 21 |
-
- julyai7/multi-turn-keyword-transformed-synth-conversations
|
| 22 |
---
|
| 23 |
|
| 24 |
-
# AI Safety Datasets Collection
|
| 25 |
|
| 26 |
-
|
| 27 |
|
| 28 |
## π Dataset Collection Summary
|
| 29 |
|
| 30 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
|
| 32 |
-
|
| 33 |
-
|---------|------|-------------|------------------|
|
| 34 |
-
| **Original Multi-turn Conversations** | Organic | Human-AI interactions | Baseline safety evaluation |
|
| 35 |
-
| **Bio-transformed Synthetic V1** | Synthetic | Direct bio-topic transformation | Bio-safety adversarial testing |
|
| 36 |
-
| **Bio-transformed Synthetic V2** | Synthetic | Adaptive bio-topic transformation | Bio-safety with natural flow |
|
| 37 |
-
| **Keyword-transformed Synthetic** | Synthetic | Cross-domain keyword substitution | Domain-agnostic safety testing |
|
| 38 |
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
**Collections:** `psychs` + `illicits`
|
| 46 |
-
|
| 47 |
-
**Source Data Criteria:**
|
| 48 |
-
|
| 49 |
-
#### Psychology Conversations (`psychs`)
|
| 50 |
-
- **Harm Category:** Psychology
|
| 51 |
-
- **Objective Type:** Text-based, no JupyterHub templates
|
| 52 |
-
- **Quality Filters:**
|
| 53 |
-
- NeurIPS evaluation verdict: `success`
|
| 54 |
-
- No disclaimer present
|
| 55 |
-
- Organic conversations only (`organicity = true`)
|
| 56 |
-
- **Purpose:** Baseline psychology-related harmful requests that successfully elicit unsafe responses
|
| 57 |
-
|
| 58 |
-
#### Illicit Conversations (`illicits`)
|
| 59 |
-
- **Harm Category:** Illicit activities
|
| 60 |
-
- **Objective Type:** Text-based, no JupyterHub templates
|
| 61 |
-
- **Quality Filters:**
|
| 62 |
-
- NeurIPS evaluation verdict: `success`
|
| 63 |
-
- Contains specific instruction details (not `none`)
|
| 64 |
-
- Executable by high school students (`high_school_student_executable != false`)
|
| 65 |
-
- **Purpose:** Baseline illicit-related harmful requests with practical execution potential
|
| 66 |
-
|
| 67 |
-
**Use Cases:**
|
| 68 |
-
- Establishing baseline safety evaluation metrics
|
| 69 |
-
- Understanding organic adversarial conversation patterns
|
| 70 |
-
- Training and fine-tuning safety classifiers
|
| 71 |
-
- Benchmarking model safety before synthetic transformations
|
| 72 |
-
|
| 73 |
-
**Sample Dataset:** 50 conversations (390 turns)
|
| 74 |
-
- Psychology: 55.6% of sample turns
|
| 75 |
-
- Illicit: 44.4% of sample turns
|
| 76 |
-
|
| 77 |
-
---
|
| 78 |
-
|
| 79 |
-
### 2. Bio-transformed Synthetic Multi-turn Conversations V1
|
| 80 |
-
|
| 81 |
-
**Collection:** `illicit_bio_synths_v1`
|
| 82 |
-
|
| 83 |
-
**Transformation Method:** `bio_topic_change`
|
| 84 |
-
|
| 85 |
-
**Source:** Derived from original psychology + illicit conversations
|
| 86 |
-
|
| 87 |
-
**Methodology V1 Characteristics:**
|
| 88 |
-
- **Direct transformation approach:** Explicit adversarial pattern injection
|
| 89 |
-
- **Focus:** Systematic safety mechanism bypass strategies
|
| 90 |
-
- **Target Domain:** Bio-safety (dangerous biological information)
|
| 91 |
-
- **Transformation Goal:** Convert psychology/illicit harms into bio-safety attacks
|
| 92 |
-
|
| 93 |
-
**Key Features:**
|
| 94 |
-
- All conversations transformed to `illicit` category (bio-safety domain)
|
| 95 |
-
- Direct mapping of harmful intents to biological contexts
|
| 96 |
-
- Aggressive adversarial techniques
|
| 97 |
-
- Tests explicit bio-safety guardrails
|
| 98 |
-
|
| 99 |
-
**Use Cases:**
|
| 100 |
-
- Testing bio-safety specific guardrails
|
| 101 |
-
- Evaluating cross-domain harm transfer (psych/illicit β bio)
|
| 102 |
-
- Red-teaming bio-related content moderation
|
| 103 |
-
- Training specialized bio-safety detectors
|
| 104 |
-
|
| 105 |
-
**Sample Dataset:** 50 conversations (449 turns, 100% illicit/bio-safety)
|
| 106 |
-
|
| 107 |
-
---
|
| 108 |
-
|
| 109 |
-
### 3. Bio-transformed Synthetic Multi-turn Conversations V2
|
| 110 |
-
|
| 111 |
-
**Collection:** `illicit_bio_synths_v2`
|
| 112 |
|
| 113 |
-
|
| 114 |
|
| 115 |
-
|
|
|
|
|
|
|
| 116 |
|
| 117 |
-
**
|
| 118 |
-
- **Adaptive transformation approach:** Natural conversation flow preservation
|
| 119 |
-
- **Focus:** Contextual reframing and subtle escalation patterns
|
| 120 |
-
- **Target Domain:** Bio-safety (dangerous biological information)
|
| 121 |
-
- **Transformation Goal:** More sophisticated, harder-to-detect bio-safety attacks
|
| 122 |
|
| 123 |
-
|
| 124 |
-
-
|
| 125 |
-
|
| 126 |
-
- Better mimics legitimate scientific inquiry
|
| 127 |
-
- Harder for safety systems to detect
|
| 128 |
|
| 129 |
-
**
|
| 130 |
-
- Testing advanced bio-safety detection systems
|
| 131 |
-
- Evaluating robustness against sophisticated attacks
|
| 132 |
-
- Training models to detect subtle adversarial patterns
|
| 133 |
-
- Benchmarking next-generation safety systems
|
| 134 |
|
| 135 |
-
|
|
|
|
|
|
|
| 136 |
|
| 137 |
-
|
| 138 |
-
|
| 139 |
-
### 4. Keyword-transformed Synthetic Multi-turn Conversations
|
| 140 |
-
|
| 141 |
-
**Collection:** `keyword_synths`
|
| 142 |
-
|
| 143 |
-
**Transformation Method:** `keyword`
|
| 144 |
-
|
| 145 |
-
**Source:** Derived from original psychology + illicit conversations
|
| 146 |
-
|
| 147 |
-
**Methodology Characteristics:**
|
| 148 |
-
- **Cross-domain keyword substitution:** Maintains harmful intent across contexts
|
| 149 |
-
- **Domain shifting:** Same underlying harm expressed in different domains
|
| 150 |
-
- **Context adaptation:** Systematic replacement of domain-specific terminology
|
| 151 |
-
- **Intent preservation:** Core harmful objective remains unchanged
|
| 152 |
|
| 153 |
-
|
| 154 |
-
Tests whether AI safety mechanisms are:
|
| 155 |
-
- **Domain-agnostic:** Robust across different contexts and topics
|
| 156 |
-
- **Intent-focused:** Detecting underlying harm vs. surface-level keywords
|
| 157 |
-
- **Context-aware:** Understanding harm despite domain transformations
|
| 158 |
-
|
| 159 |
-
**Key Features:**
|
| 160 |
-
- Preserves original harm category distribution (psychology + illicit)
|
| 161 |
-
- Demonstrates safety mechanism vulnerabilities through context shifting
|
| 162 |
-
- Higher turn count per conversation (more complex attacks)
|
| 163 |
-
- Tests generalization of safety training
|
| 164 |
|
| 165 |
-
|
| 166 |
-
- Evaluating domain-agnostic safety mechanisms
|
| 167 |
-
- Testing whether safety is keyword-based or intent-based
|
| 168 |
-
- Training robust cross-domain harm detection
|
| 169 |
-
- Identifying brittleness in safety systems
|
| 170 |
|
| 171 |
-
|
| 172 |
-
-
|
| 173 |
-
-
|
|
|
|
|
|
|
| 174 |
|
| 175 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 176 |
|
| 177 |
## π― Data Selection Process
|
| 178 |
|
| 179 |
-
All datasets are derived from high-quality, validated conversations
|
| 180 |
|
| 181 |
-
### Base Criteria
|
| 182 |
-
-
|
| 183 |
-
-
|
| 184 |
-
-
|
| 185 |
-
-
|
| 186 |
|
| 187 |
### Psychology-Specific Criteria
|
| 188 |
- Organic conversations (`organicity = true`)
|
|
@@ -194,187 +97,25 @@ All datasets are derived from high-quality, validated conversations that meet st
|
|
| 194 |
- Practically executable (not abstract)
|
| 195 |
- Successfully elicited harmful illicit-related content
|
| 196 |
|
| 197 |
-
### Synthetic Transformation Criteria
|
| 198 |
-
- Original conversation must meet base criteria
|
| 199 |
-
- Successful transformation to target methodology
|
| 200 |
-
- Maintains harmful intent in new domain
|
| 201 |
-
- Contains valid prompt-response pairs
|
| 202 |
-
|
| 203 |
-
---
|
| 204 |
-
|
| 205 |
-
## π Dataset Statistics
|
| 206 |
-
|
| 207 |
-
### Full Dataset Overview
|
| 208 |
-
|
| 209 |
-
The complete datasets are derived from our production database using strict quality filters:
|
| 210 |
-
|
| 211 |
-
| Dataset | Conversations | Turns | Avg Turns/Conv | Primary Focus |
|
| 212 |
-
|---------|---------------|-------|----------------|---------------|
|
| 213 |
-
| **Original Multi-turn** | **594+** | **4,642+** | **7.8** | Baseline organic conversations |
|
| 214 |
-
| - Psychology (`psychs`) | 158+ | 1,583+ | 10.0 | Psychology harm category |
|
| 215 |
-
| - Illicit (`illicits`) | 436+ | 3,059+ | 7.0 | Illicit harm category |
|
| 216 |
-
| **Bio-transformed V1** | **1,309+** | **6,784+** | **5.2** | Direct bio-safety attacks |
|
| 217 |
-
| **Bio-transformed V2** | **1,308+** | **8,127+** | **6.2** | Adaptive bio-safety attacks |
|
| 218 |
-
| **Keyword-transformed** | **7,110+** | **53,705+** | **7.6** | Cross-domain harm transfer |
|
| 219 |
-
| **Total Full Datasets** | **10,321+** | **73,258+** | **7.1** | All methodologies |
|
| 220 |
-
|
| 221 |
-
---
|
| 222 |
-
|
| 223 |
-
### Sample Data Overview (Publicly Available)
|
| 224 |
-
|
| 225 |
-
Representative sample datasets are available on Hugging Face for evaluation and testing:
|
| 226 |
-
|
| 227 |
-
| Dataset | Conversations | Turns | Avg Turns/Conv | Harm Categories |
|
| 228 |
-
|---------|--------------|-------|----------------|-----------------|
|
| 229 |
-
| Original | 50 | 390 | 7.8 | Psychology (55.6%), Illicit (44.4%) |
|
| 230 |
-
| Bio V1 | 50 | 449 | 9.0 | Illicit/Bio (100%) |
|
| 231 |
-
| Bio V2 | 50 | 459 | 9.2 | Illicit/Bio (100%) |
|
| 232 |
-
| Keyword | 50 | 659 | 13.2 | Illicit (51.6%), Psychology (48.4%) |
|
| 233 |
-
| **Total Samples** | **200** | **1,957** | **9.8** | Multiple |
|
| 234 |
-
|
| 235 |
-
> **Note:** Sample datasets represent carefully selected subsets that maintain the distribution and characteristics of the full datasets while being freely accessible for research evaluation.
|
| 236 |
-
|
| 237 |
-
---
|
| 238 |
-
|
| 239 |
-
## π Dataset Links
|
| 240 |
-
|
| 241 |
-
### Hugging Face Datasets
|
| 242 |
-
|
| 243 |
-
1. **[Original Multi-turn Conversations](https://huggingface.co/datasets/julyai7/multi-turn-conversations)**
|
| 244 |
-
- Psychology + Illicit baseline conversations
|
| 245 |
-
- 50 sample conversations, 390 turns
|
| 246 |
-
|
| 247 |
-
2. **[Bio-transformed Synthetic V1](https://huggingface.co/datasets/julyai7/multi-turn-bio-transformed-synth-conversations-v1)**
|
| 248 |
-
- Direct bio-topic transformation methodology
|
| 249 |
-
- 50 sample conversations, 449 turns
|
| 250 |
-
|
| 251 |
-
3. **[Bio-transformed Synthetic V2](https://huggingface.co/datasets/julyai7/multi-turn-bio-transformed-synth-conversations-v2)**
|
| 252 |
-
- Adaptive bio-topic transformation methodology
|
| 253 |
-
- 50 sample conversations, 459 turns
|
| 254 |
-
|
| 255 |
-
4. **[Keyword-transformed Synthetic](https://huggingface.co/datasets/julyai7/multi-turn-keyword-transformed-synth-conversations)**
|
| 256 |
-
- Cross-domain keyword substitution methodology
|
| 257 |
-
- 50 sample conversations, 659 turns
|
| 258 |
-
|
| 259 |
-
---
|
| 260 |
-
|
| 261 |
-
## π§ͺ Research Applications
|
| 262 |
-
|
| 263 |
-
These datasets enable various research directions:
|
| 264 |
-
|
| 265 |
-
### Safety Evaluation
|
| 266 |
-
- Benchmark model safety across attack methodologies
|
| 267 |
-
- Measure robustness to synthetic transformations
|
| 268 |
-
- Evaluate domain-specific vs. general safety mechanisms
|
| 269 |
-
|
| 270 |
-
### Red Teaming
|
| 271 |
-
- Discover new adversarial patterns
|
| 272 |
-
- Test safety guardrails comprehensively
|
| 273 |
-
- Identify blind spots in content moderation
|
| 274 |
-
|
| 275 |
-
### Model Training
|
| 276 |
-
- Fine-tune safety classifiers
|
| 277 |
-
- Train adversarial attack detectors
|
| 278 |
-
- Develop cross-domain harm detection systems
|
| 279 |
-
|
| 280 |
-
### Safety Research
|
| 281 |
-
- Study harm transfer across domains
|
| 282 |
-
- Analyze conversation-level attack patterns
|
| 283 |
-
- Understand multi-turn adversarial dynamics
|
| 284 |
-
|
| 285 |
-
---
|
| 286 |
-
|
| 287 |
-
## β οΈ Ethical Considerations
|
| 288 |
-
|
| 289 |
-
**IMPORTANT:** These datasets contain successful adversarial attacks and harmful content.
|
| 290 |
-
|
| 291 |
-
### Intended Use
|
| 292 |
-
- β
Defensive security research
|
| 293 |
-
- β
AI safety evaluation and improvement
|
| 294 |
-
- β
Academic research on adversarial robustness
|
| 295 |
-
- β
Training safety and moderation systems
|
| 296 |
-
|
| 297 |
-
### Prohibited Use
|
| 298 |
-
- β Creating offensive content
|
| 299 |
-
- β Developing attack tools for malicious purposes
|
| 300 |
-
- β Bypassing safety systems for harm
|
| 301 |
-
- β Any use that violates laws or ethical guidelines
|
| 302 |
-
|
| 303 |
-
### Recommendations
|
| 304 |
-
- Use in controlled research environments
|
| 305 |
-
- Implement appropriate access controls
|
| 306 |
-
- Follow institutional review board (IRB) guidelines
|
| 307 |
-
- Report findings responsibly
|
| 308 |
-
|
| 309 |
-
---
|
| 310 |
-
|
| 311 |
## π License
|
| 312 |
|
| 313 |
-
|
| 314 |
|
| 315 |
-
### License Terms
|
| 316 |
- β
Use for research and evaluation
|
| 317 |
- β
Modify and build upon the data
|
| 318 |
- β
Share with attribution
|
| 319 |
- β Commercial use without separate licensing
|
| 320 |
|
| 321 |
-
---
|
| 322 |
-
|
| 323 |
-
---
|
| 324 |
-
|
| 325 |
-
## π Dataset Updates
|
| 326 |
-
|
| 327 |
-
**Current Version:** November 2024
|
| 328 |
-
|
| 329 |
-
The sample datasets represent snapshots of our larger collection. Full datasets receive regular updates with:
|
| 330 |
-
- New adversarial patterns and methodologies
|
| 331 |
-
- Additional harm categories and domains
|
| 332 |
-
- Improved quality filters and annotations
|
| 333 |
-
- Enhanced diversity in conversation styles
|
| 334 |
-
|
| 335 |
-
---
|
| 336 |
-
|
| 337 |
-
## π Citation
|
| 338 |
-
|
| 339 |
-
If you use these datasets in your research, please cite:
|
| 340 |
-
|
| 341 |
-
```bibtex
|
| 342 |
-
@dataset{ai_safety_datasets_2024,
|
| 343 |
-
title={AI Safety Multi-turn Conversation Datasets},
|
| 344 |
-
author={GoJuly AI},
|
| 345 |
-
year={2024},
|
| 346 |
-
publisher={Hugging Face},
|
| 347 |
-
howpublished={\url{https://huggingface.co/julyai7}}
|
| 348 |
-
}
|
| 349 |
-
```
|
| 350 |
-
|
| 351 |
-
---
|
| 352 |
-
|
| 353 |
## πΌ Full Dataset Access
|
| 354 |
|
| 355 |
-
|
| 356 |
|
| 357 |
-
|
| 358 |
|
| 359 |
-
|
| 360 |
-
|
| 361 |
-
## π€ Acknowledgments
|
| 362 |
-
|
| 363 |
-
These datasets were created through:
|
| 364 |
-
- Rigorous NeurIPS evaluation protocols
|
| 365 |
-
- Advanced synthetic transformation methodologies
|
| 366 |
-
- Quality filtering and validation processes
|
| 367 |
-
- Ethical review and safety considerations
|
| 368 |
|
| 369 |
---
|
| 370 |
|
| 371 |
-
|
| 372 |
-
|
| 373 |
-
For questions about the datasets:
|
| 374 |
-
- Open an issue in the respective dataset repository
|
| 375 |
-
- Join the discussion in the Community tab
|
| 376 |
-
- Contact us for technical support or collaboration opportunities
|
| 377 |
-
|
| 378 |
-
---
|
| 379 |
|
| 380 |
-
|
|
|
|
| 15 |
- multi-turn
|
| 16 |
- synthetic
|
| 17 |
datasets:
|
| 18 |
+
- GoJulyAI/multi-turn-conversations
|
| 19 |
+
- GoJulyAI/multi-turn-bio-transformed-synth-conversations-v1
|
| 20 |
+
- GoJulyAI/multi-turn-bio-transformed-synth-conversations-v2
|
|
|
|
| 21 |
---
|
| 22 |
|
| 23 |
+
# π‘οΈ AI Safety Datasets Collection
|
| 24 |
|
| 25 |
+
Comprehensive evaluation datasets for testing AI model safety mechanisms
|
| 26 |
|
| 27 |
## π Dataset Collection Summary
|
| 28 |
|
| 29 |
+
| Metric | Value |
|
| 30 |
+
|--------|-------|
|
| 31 |
+
| **Total Conversations** | 10,321+ |
|
| 32 |
+
| **Total Turns** | 73,258+ |
|
| 33 |
+
| **Dataset Types** | 3 complementary methodologies |
|
| 34 |
+
| **Sample Data Available** | 150 conversations |
|
| 35 |
|
| 36 |
+
## π Full Dataset Statistics
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
|
| 38 |
+
| Dataset | Conversations | Turns | Avg Turns/Conv | Focus |
|
| 39 |
+
|---------|--------------|-------|----------------|--------|
|
| 40 |
+
| **Psychology multi-turn** | 207+ | 2,128+ | 10.3 | Psychology harmfulness such as self-harm, psychosis, anthropomorphism, etc. |
|
| 41 |
+
| **Illicit (bioweapon) multi-turn** | 1,309+ | 6,784+ | 5.2 | Bio-safety harmfulness such as bioweapons, pathogens, etc. |
|
| 42 |
+
| **Illicit (chemical, general) multi-turn** | 1,308+ | 8,127+ | 6.2 | Non-bio safety harmfulness such as chemical weapons, cyber threats, etc. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
|
| 44 |
+
## π Access Datasets on Hugging Face
|
| 45 |
|
| 46 |
+
### Psychology Multi-turn Conversations
|
| 47 |
+
Psychology + Illicit baseline conversations
|
| 48 |
+
**Sample:** 50 conversations, 390 turns
|
| 49 |
|
| 50 |
+
π **[View Dataset](https://huggingface.co/datasets/GoJulyAI/multi-turn-conversations)**
|
|
|
|
|
|
|
|
|
|
|
|
|
| 51 |
|
| 52 |
+
### Illicit (bioweapon) Multi-turn Conversations
|
| 53 |
+
Direct bio-topic transformation methodology
|
| 54 |
+
**Sample:** 50 conversations, 449 turns
|
|
|
|
|
|
|
| 55 |
|
| 56 |
+
π **[View Dataset](https://huggingface.co/datasets/GoJulyAI/multi-turn-bio-transformed-synth-conversations-v1)**
|
|
|
|
|
|
|
|
|
|
|
|
|
| 57 |
|
| 58 |
+
### Illicit (chemical, general) Multi-turn Conversations
|
| 59 |
+
Adaptive bio-topic transformation methodology
|
| 60 |
+
**Sample:** 50 conversations, 459 turns
|
| 61 |
|
| 62 |
+
π **[View Dataset](https://huggingface.co/datasets/GoJulyAI/multi-turn-bio-transformed-synth-conversations-v2)**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 63 |
|
| 64 |
+
## β οΈ Ethical Considerations
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 65 |
|
| 66 |
+
**β οΈ IMPORTANT:** These datasets contain successful adversarial attacks and harmful content.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 67 |
|
| 68 |
+
### β
Intended Use
|
| 69 |
+
- Defensive security research
|
| 70 |
+
- AI safety evaluation and improvement
|
| 71 |
+
- Academic research on adversarial robustness
|
| 72 |
+
- Training safety and moderation systems
|
| 73 |
|
| 74 |
+
### β Prohibited Use
|
| 75 |
+
- Creating offensive content
|
| 76 |
+
- Developing attack tools for malicious purposes
|
| 77 |
+
- Bypassing safety systems for harm
|
| 78 |
+
- Any use that violates laws or ethical guidelines
|
| 79 |
|
| 80 |
## π― Data Selection Process
|
| 81 |
|
| 82 |
+
All datasets are derived from high-quality, validated conversations with strict quality filters including NeurIPS evaluation protocols.
|
| 83 |
|
| 84 |
+
### Base Criteria
|
| 85 |
+
- Text-based objectives (no code execution templates)
|
| 86 |
+
- NeurIPS evaluation metadata present
|
| 87 |
+
- Verdict: `success` (harmful requests successfully fulfilled)
|
| 88 |
+
- Multi-turn conversations with prompt-response pairs
|
| 89 |
|
| 90 |
### Psychology-Specific Criteria
|
| 91 |
- Organic conversations (`organicity = true`)
|
|
|
|
| 97 |
- Practically executable (not abstract)
|
| 98 |
- Successfully elicited harmful illicit-related content
|
| 99 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 100 |
## π License
|
| 101 |
|
| 102 |
+
Sample datasets are released under **CC-BY-NC-4.0** (Creative Commons Attribution-NonCommercial 4.0 International).
|
| 103 |
|
|
|
|
| 104 |
- β
Use for research and evaluation
|
| 105 |
- β
Modify and build upon the data
|
| 106 |
- β
Share with attribution
|
| 107 |
- β Commercial use without separate licensing
|
| 108 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 109 |
## πΌ Full Dataset Access
|
| 110 |
|
| 111 |
+
The sample datasets provide representative examples. Full datasets contain thousands of additional conversations with expanded harm categories and regular updates.
|
| 112 |
|
| 113 |
+
**Please contact us at [info@gojuly.ai](mailto:info@gojuly.ai) to purchase any or all of full datasets.**
|
| 114 |
|
| 115 |
+
Include your research objectives, institutional affiliation, and intended use in your inquiry.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 116 |
|
| 117 |
---
|
| 118 |
|
| 119 |
+
**Last Updated:** December 2, 2025
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 120 |
|
| 121 |
+
For detailed documentation, visit the individual dataset repositories on Hugging Face.
|
index.html
CHANGED
|
@@ -253,12 +253,12 @@
|
|
| 253 |
</div>
|
| 254 |
<div class="stat-card">
|
| 255 |
<h4>Dataset Types</h4>
|
| 256 |
-
<div class="number">
|
| 257 |
<div class="label">Complementary methodologies</div>
|
| 258 |
</div>
|
| 259 |
<div class="stat-card">
|
| 260 |
<h4>Sample Data</h4>
|
| 261 |
-
<div class="number">
|
| 262 |
<div class="label">Free conversations available</div>
|
| 263 |
</div>
|
| 264 |
</div>
|
|
@@ -279,46 +279,25 @@
|
|
| 279 |
</thead>
|
| 280 |
<tbody>
|
| 281 |
<tr>
|
| 282 |
-
<td><strong>
|
| 283 |
-
<td>
|
| 284 |
-
<td>
|
| 285 |
-
<td>
|
| 286 |
-
<td>
|
| 287 |
</tr>
|
| 288 |
<tr>
|
| 289 |
-
<td
|
| 290 |
-
<td>158+</td>
|
| 291 |
-
<td>1,583+</td>
|
| 292 |
-
<td>10.0</td>
|
| 293 |
-
<td>Psychology harm category</td>
|
| 294 |
-
</tr>
|
| 295 |
-
<tr>
|
| 296 |
-
<td> β Illicit</td>
|
| 297 |
-
<td>436+</td>
|
| 298 |
-
<td>3,059+</td>
|
| 299 |
-
<td>7.0</td>
|
| 300 |
-
<td>Illicit harm category</td>
|
| 301 |
-
</tr>
|
| 302 |
-
<tr>
|
| 303 |
-
<td><strong>Bio-transformed V1</strong></td>
|
| 304 |
<td>1,309+</td>
|
| 305 |
<td>6,784+</td>
|
| 306 |
<td>5.2</td>
|
| 307 |
-
<td>
|
| 308 |
</tr>
|
| 309 |
<tr>
|
| 310 |
-
<td><strong>
|
| 311 |
<td>1,308+</td>
|
| 312 |
<td>8,127+</td>
|
| 313 |
<td>6.2</td>
|
| 314 |
-
<td>
|
| 315 |
-
</tr>
|
| 316 |
-
<tr>
|
| 317 |
-
<td><strong>Keyword-transformed</strong></td>
|
| 318 |
-
<td>7,110+</td>
|
| 319 |
-
<td>53,705+</td>
|
| 320 |
-
<td>7.6</td>
|
| 321 |
-
<td>Cross-domain harm transfer</td>
|
| 322 |
</tr>
|
| 323 |
</tbody>
|
| 324 |
</table>
|
|
@@ -329,68 +308,23 @@
|
|
| 329 |
<h2>π Access Datasets on Hugging Face</h2>
|
| 330 |
<div class="dataset-links">
|
| 331 |
<div class="dataset-card">
|
| 332 |
-
<h4>
|
| 333 |
<p>Psychology + Illicit baseline conversations<br>
|
| 334 |
<strong>Sample:</strong> 50 conversations, 390 turns</p>
|
| 335 |
<a href="https://huggingface.co/datasets/GoJulyAI/multi-turn-conversations" class="btn" target="_blank">View Dataset β</a>
|
| 336 |
</div>
|
| 337 |
<div class="dataset-card">
|
| 338 |
-
<h4>
|
| 339 |
<p>Direct bio-topic transformation methodology<br>
|
| 340 |
<strong>Sample:</strong> 50 conversations, 449 turns</p>
|
| 341 |
<a href="https://huggingface.co/datasets/GoJulyAI/multi-turn-bio-transformed-synth-conversations-v1" class="btn" target="_blank">View Dataset β</a>
|
| 342 |
</div>
|
| 343 |
<div class="dataset-card">
|
| 344 |
-
<h4>
|
| 345 |
<p>Adaptive bio-topic transformation methodology<br>
|
| 346 |
<strong>Sample:</strong> 50 conversations, 459 turns</p>
|
| 347 |
<a href="https://huggingface.co/datasets/GoJulyAI/multi-turn-bio-transformed-synth-conversations-v2" class="btn" target="_blank">View Dataset β</a>
|
| 348 |
</div>
|
| 349 |
-
<div class="dataset-card">
|
| 350 |
-
<h4>Keyword-transformed Synthetic</h4>
|
| 351 |
-
<p>Cross-domain keyword substitution methodology<br>
|
| 352 |
-
<strong>Sample:</strong> 50 conversations, 659 turns</p>
|
| 353 |
-
<a href="https://huggingface.co/datasets/GoJulyAI/multi-turn-keyword-transformed-synth-conversations" class="btn" target="_blank">View Dataset β</a>
|
| 354 |
-
</div>
|
| 355 |
-
</div>
|
| 356 |
-
</section>
|
| 357 |
-
|
| 358 |
-
<!-- Research Applications -->
|
| 359 |
-
<section>
|
| 360 |
-
<h2>π§ͺ Research Applications</h2>
|
| 361 |
-
<div style="display: grid; grid-template-columns: repeat(auto-fit, minmax(250px, 1fr)); gap: 1.5rem;">
|
| 362 |
-
<div>
|
| 363 |
-
<h3>Safety Evaluation</h3>
|
| 364 |
-
<ul>
|
| 365 |
-
<li>Benchmark model safety</li>
|
| 366 |
-
<li>Measure robustness</li>
|
| 367 |
-
<li>Evaluate mechanisms</li>
|
| 368 |
-
</ul>
|
| 369 |
-
</div>
|
| 370 |
-
<div>
|
| 371 |
-
<h3>Red Teaming</h3>
|
| 372 |
-
<ul>
|
| 373 |
-
<li>Discover adversarial patterns</li>
|
| 374 |
-
<li>Test safety guardrails</li>
|
| 375 |
-
<li>Identify blind spots</li>
|
| 376 |
-
</ul>
|
| 377 |
-
</div>
|
| 378 |
-
<div>
|
| 379 |
-
<h3>Model Training</h3>
|
| 380 |
-
<ul>
|
| 381 |
-
<li>Fine-tune safety classifiers</li>
|
| 382 |
-
<li>Train attack detectors</li>
|
| 383 |
-
<li>Develop harm detection</li>
|
| 384 |
-
</ul>
|
| 385 |
-
</div>
|
| 386 |
-
<div>
|
| 387 |
-
<h3>Safety Research</h3>
|
| 388 |
-
<ul>
|
| 389 |
-
<li>Study harm transfer</li>
|
| 390 |
-
<li>Analyze attack patterns</li>
|
| 391 |
-
<li>Understand dynamics</li>
|
| 392 |
-
</ul>
|
| 393 |
-
</div>
|
| 394 |
</div>
|
| 395 |
</section>
|
| 396 |
|
|
@@ -452,7 +386,7 @@
|
|
| 452 |
<!-- License -->
|
| 453 |
<section>
|
| 454 |
<h2>π License</h2>
|
| 455 |
-
<p>
|
| 456 |
<ul>
|
| 457 |
<li>β
Use for research and evaluation</li>
|
| 458 |
<li>β
Modify and build upon the data</li>
|
|
@@ -471,7 +405,7 @@
|
|
| 471 |
</div>
|
| 472 |
|
| 473 |
<footer>
|
| 474 |
-
<p><strong>Last Updated:</strong>
|
| 475 |
<p style="margin-top: 0.5rem;">For detailed documentation, visit the individual dataset repositories on Hugging Face.</p>
|
| 476 |
</footer>
|
| 477 |
</div>
|
|
|
|
| 253 |
</div>
|
| 254 |
<div class="stat-card">
|
| 255 |
<h4>Dataset Types</h4>
|
| 256 |
+
<div class="number">3</div>
|
| 257 |
<div class="label">Complementary methodologies</div>
|
| 258 |
</div>
|
| 259 |
<div class="stat-card">
|
| 260 |
<h4>Sample Data</h4>
|
| 261 |
+
<div class="number">150</div>
|
| 262 |
<div class="label">Free conversations available</div>
|
| 263 |
</div>
|
| 264 |
</div>
|
|
|
|
| 279 |
</thead>
|
| 280 |
<tbody>
|
| 281 |
<tr>
|
| 282 |
+
<td><strong>Psychology multi-turn</strong></td>
|
| 283 |
+
<td>207+</td>
|
| 284 |
+
<td>2128+</td>
|
| 285 |
+
<td>10.3</td>
|
| 286 |
+
<td>Psychology harmfulness such as self-harm, psychosis, anthropomorphism, etc.</td>
|
| 287 |
</tr>
|
| 288 |
<tr>
|
| 289 |
+
<td><strong>Illicit (bioweapon) multi-turn</strong></td>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 290 |
<td>1,309+</td>
|
| 291 |
<td>6,784+</td>
|
| 292 |
<td>5.2</td>
|
| 293 |
+
<td>Bio-safety harmfulness such as bioweapons, pathogens, etc.</td>
|
| 294 |
</tr>
|
| 295 |
<tr>
|
| 296 |
+
<td><strong>Illicit (chemical, general) multi-turn</strong></td>
|
| 297 |
<td>1,308+</td>
|
| 298 |
<td>8,127+</td>
|
| 299 |
<td>6.2</td>
|
| 300 |
+
<td>Non-bio safety harmfulness such as chemical weapons, cyber threats, etc.</td>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 301 |
</tr>
|
| 302 |
</tbody>
|
| 303 |
</table>
|
|
|
|
| 308 |
<h2>π Access Datasets on Hugging Face</h2>
|
| 309 |
<div class="dataset-links">
|
| 310 |
<div class="dataset-card">
|
| 311 |
+
<h4>Psychology Multi-turn Conversations</h4>
|
| 312 |
<p>Psychology + Illicit baseline conversations<br>
|
| 313 |
<strong>Sample:</strong> 50 conversations, 390 turns</p>
|
| 314 |
<a href="https://huggingface.co/datasets/GoJulyAI/multi-turn-conversations" class="btn" target="_blank">View Dataset β</a>
|
| 315 |
</div>
|
| 316 |
<div class="dataset-card">
|
| 317 |
+
<h4>Illicit (bioweapon) Multi-turn Conversations</h4>
|
| 318 |
<p>Direct bio-topic transformation methodology<br>
|
| 319 |
<strong>Sample:</strong> 50 conversations, 449 turns</p>
|
| 320 |
<a href="https://huggingface.co/datasets/GoJulyAI/multi-turn-bio-transformed-synth-conversations-v1" class="btn" target="_blank">View Dataset β</a>
|
| 321 |
</div>
|
| 322 |
<div class="dataset-card">
|
| 323 |
+
<h4>Illicit (chemical, general) Multi-turn Conversations</h4>
|
| 324 |
<p>Adaptive bio-topic transformation methodology<br>
|
| 325 |
<strong>Sample:</strong> 50 conversations, 459 turns</p>
|
| 326 |
<a href="https://huggingface.co/datasets/GoJulyAI/multi-turn-bio-transformed-synth-conversations-v2" class="btn" target="_blank">View Dataset β</a>
|
| 327 |
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 328 |
</div>
|
| 329 |
</section>
|
| 330 |
|
|
|
|
| 386 |
<!-- License -->
|
| 387 |
<section>
|
| 388 |
<h2>π License</h2>
|
| 389 |
+
<p>Sample datasets are released under <strong>CC-BY-NC-4.0</strong> (Creative Commons Attribution-NonCommercial 4.0 International).</p>
|
| 390 |
<ul>
|
| 391 |
<li>β
Use for research and evaluation</li>
|
| 392 |
<li>β
Modify and build upon the data</li>
|
|
|
|
| 405 |
</div>
|
| 406 |
|
| 407 |
<footer>
|
| 408 |
+
<p><strong>Last Updated:</strong> December 2, 2025</p>
|
| 409 |
<p style="margin-top: 0.5rem;">For detailed documentation, visit the individual dataset repositories on Hugging Face.</p>
|
| 410 |
</footer>
|
| 411 |
</div>
|