ToxiFrench: French Toxicity Detection

arXiv GitHub Pages Hugging Face Dataset GitHub Repository License: MIT

Author: Axel Delaval
Affiliations: École Polytechnique & Shanghai Jiao Tong University (SJTU)
Email: [name].[surname]@gmail.com


⚠️ Content Warning: This model is trained on toxic data. It will generate reasoning steps explaining why a text is toxic, which may include offensive language.


Key Contributions

  • ToxiFrench Dataset: A benchmark of 53,622 French comments with CoT annotations.
  • Dynamic Weighted Loss (DWL): A novel fine-tuning strategy that synchronizes reasoning steps with the final classification.
  • Optimizer Efficiency: Utilization of the SOAP optimizer to improve convergence over standard AdamW.
  • Preference Alignment: DPO-tuned versions for enhanced reasoning stability.

Model Architecture & Adapters

This repository contains multiple QLoRA adapters based on the Qwen/Qwen3-4B architecture. Each folder corresponds to a specific training configuration.

Available Adapters (Subfolders)

Adapter Name Type Optimizer Methodology
Standard-SFT SFT AdamW Standard CoT Fine-Tuning
SOAP-SFT SFT SOAP Advanced convergence training
SOAP-Oversampled SFT SOAP Oversampled for class balance
SOAP-DWL SFT SOAP DWL for reasoning faithfulness
SOAP-DWL-DPO SFT + DPO SOAP Aligned for preference & safety

How to Use

1. Requirements

conda env create -f environment.yml
conda activate ToxiFrench

2. Loading the Model (Inference)

To use one of the models, load the base Qwen3-4B model and then apply the adapter by specifying the desired subfolder.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import os

base_model_name = "Qwen/Qwen3-4B"
adapter_repo_id = "AxelDlv00/ToxiFrench" 
target_adapter = "SOAP-DWL-DPO" 

tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

tokens = ["<think>", "</think>"]
tokenizer.add_special_tokens({"additional_special_tokens": tokens})

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    quantization_config=bnb_config,
    trust_remote_code=True,
    device_map="auto"
)

tokenizer_vocab_size = len(tokenizer)
model_embedding_size = model.get_input_embeddings().weight.size(0)

if model_embedding_size != tokenizer_vocab_size:
    print(f"Syncing vocab: {model_embedding_size} -> {tokenizer_vocab_size}")
    model.resize_token_embeddings(tokenizer_vocab_size)

model = PeftModel.from_pretrained(model, adapter_repo_id, subfolder=target_adapter)
model.eval()

text = "Je ne supporte plus ton comportement, tu es vraiment un idiot !"
prompt = f"Message:\n{text}\n\nAnalyse:\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs, 
        max_new_tokens=512, 
        temperature=0.7, 
        do_sample=True,
        repetition_penalty=1.1
    )

print(tokenizer.decode(outputs[0], skip_special_tokens=False))

Citation

@misc{delaval2025toxifrench,
  title={ToxiFrench: Benchmarking and Enhancing Language Models via CoT Fine-Tuning for French Toxicity Detection},
  author={Axel Delaval and Shujian Yang and Haicheng Wang and Han Qiu and Jialiang Lu},
  year={2025},
  eprint={2508.11281},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AxelDlv00/ToxiFrench

Base model

Qwen/Qwen3-4B-Base
Finetuned
Qwen/Qwen3-4B
Finetuned
(376)
this model

Dataset used to train AxelDlv00/ToxiFrench