SuperBPE Collection SuperBPE tokenizers and models trained with them • 9 items • Updated 22 days ago • 17
TOUCAN: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments Paper • 2510.01179 • Published Oct 1 • 25
ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information Paper • 2106.16038 • Published Jun 30, 2021
OpenViDial 2.0: A Larger-Scale, Open-Domain Dialogue Generation Dataset with Visual Contexts Paper • 2109.12761 • Published Sep 27, 2021
Instruction Tuning for Large Language Models: A Survey Paper • 2308.10792 • Published Aug 21, 2023 • 1
FaceID-6M: A Large-Scale, Open-Source FaceID Customization Dataset Paper • 2503.07091 • Published Mar 10 • 3
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning Paper • 2507.14111 • Published Jul 18 • 23
VisualSphinx: Large-Scale Synthetic Vision Logic Puzzles for RL Paper • 2505.23977 • Published May 29 • 10
TinyV: Reducing False Negatives in Verification Improves RL for LLM Reasoning Paper • 2505.14625 • Published May 20 • 13
MVTamperBench: Evaluating Robustness of Vision-Language Models Paper • 2412.19794 • Published Dec 27, 2024 • 4
SuperBPE Collection SuperBPE tokenizers and models trained with them • 9 items • Updated 22 days ago • 17