ToxiFrench: French Toxicity Detection
Author: Axel Delaval
Affiliations: École Polytechnique & Shanghai Jiao Tong University (SJTU)
Email: [name].[surname]@gmail.com
⚠️ Content Warning: This model is trained on toxic data. It will generate reasoning steps explaining why a text is toxic, which may include offensive language.
Key Contributions
- ToxiFrench Dataset: A benchmark of 53,622 French comments with CoT annotations.
- Dynamic Weighted Loss (DWL): A novel fine-tuning strategy that synchronizes reasoning steps with the final classification.
- Optimizer Efficiency: Utilization of the SOAP optimizer to improve convergence over standard AdamW.
- Preference Alignment: DPO-tuned versions for enhanced reasoning stability.
Model Architecture & Adapters
This repository contains multiple QLoRA adapters based on the Qwen/Qwen3-4B architecture. Each folder corresponds to a specific training configuration.
Available Adapters (Subfolders)
| Adapter Name | Type | Optimizer | Methodology |
|---|---|---|---|
Standard-SFT |
SFT | AdamW | Standard CoT Fine-Tuning |
SOAP-SFT |
SFT | SOAP | Advanced convergence training |
SOAP-Oversampled |
SFT | SOAP | Oversampled for class balance |
SOAP-DWL |
SFT | SOAP | DWL for reasoning faithfulness |
SOAP-DWL-DPO |
SFT + DPO | SOAP | Aligned for preference & safety |
How to Use
1. Requirements
conda env create -f environment.yml
conda activate ToxiFrench
2. Loading the Model (Inference)
To use one of the models, load the base Qwen3-4B model and then apply the adapter by specifying the desired subfolder.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import os
base_model_name = "Qwen/Qwen3-4B"
adapter_repo_id = "AxelDlv00/ToxiFrench"
target_adapter = "SOAP-DWL-DPO"
tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
tokens = ["<think>", "</think>"]
tokenizer.add_special_tokens({"additional_special_tokens": tokens})
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
model = AutoModelForCausalLM.from_pretrained(
base_model_name,
quantization_config=bnb_config,
trust_remote_code=True,
device_map="auto"
)
tokenizer_vocab_size = len(tokenizer)
model_embedding_size = model.get_input_embeddings().weight.size(0)
if model_embedding_size != tokenizer_vocab_size:
print(f"Syncing vocab: {model_embedding_size} -> {tokenizer_vocab_size}")
model.resize_token_embeddings(tokenizer_vocab_size)
model = PeftModel.from_pretrained(model, adapter_repo_id, subfolder=target_adapter)
model.eval()
text = "Je ne supporte plus ton comportement, tu es vraiment un idiot !"
prompt = f"Message:\n{text}\n\nAnalyse:\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
do_sample=True,
repetition_penalty=1.1
)
print(tokenizer.decode(outputs[0], skip_special_tokens=False))
Citation
@misc{delaval2025toxifrench,
title={ToxiFrench: Benchmarking and Enhancing Language Models via CoT Fine-Tuning for French Toxicity Detection},
author={Axel Delaval and Shujian Yang and Haicheng Wang and Han Qiu and Jialiang Lu},
year={2025},
eprint={2508.11281},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support