ResearchRubrics: A Benchmark of Prompts and Rubrics For Evaluating Deep Research Agents Paper • 2511.07685 • Published Nov 10 • 9
Rubric-Based Benchmarking and Reinforcement Learning for Advancing LLM Instruction Following Paper • 2511.10507 • Published Nov 13 • 6
Too Good to be Bad: On the Failure of LLMs to Role-Play Villains Paper • 2511.04962 • Published Nov 7 • 53
ParallelMuse: Agentic Parallel Thinking for Deep Information Seeking Paper • 2510.24698 • Published Oct 28 • 20
C2S-Scale-Gemma-Models Collection C2S-Scale Gemma models trained using the Cell2Sentence framework, described in the C2S-Scale paper. • 2 items • Updated Oct 13 • 12