OmniSafeBench-MM: A Unified Benchmark and Toolbox for Multimodal Jailbreak Attack-Defense Evaluation Paper • 2512.06589 • Published about 1 month ago • 17
Safe-SAIL Collection A Fine-grained Safety Landscape of Large Language Models • 1 item • Updated Sep 23, 2025
Oyster-I Collection The Oyster I is a set of safety models developed in-house by Alibaba-AAIG, devoted to building a responsible AI ecosystem. • 5 items • Updated Sep 23, 2025 • 1
Oyster-I Collection The Oyster I is a set of safety models developed in-house by Alibaba-AAIG, devoted to building a responsible AI ecosystem. • 5 items • Updated Sep 23, 2025 • 1
Inverse Reinforcement Learning with Dynamic Reward Scaling for LLM Alignment Paper • 2503.18991 • Published Mar 23, 2025
PBI-Attack: Prior-Guided Bimodal Interactive Black-Box Jailbreak Attack for Toxicity Maximization Paper • 2412.05892 • Published Dec 8, 2024
Gibberish is All You Need for Membership Inference Detection in Contrastive Language-Audio Pretraining Paper • 2410.18371 • Published Oct 24, 2024
TUNI: A Textual Unimodal Detector for Identity Inference in CLIP Models Paper • 2405.14517 • Published May 23, 2024