The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text Paper • 2506.05209 • Published Jun 5 • 59
Android Models Collection LiteRT models that can run on Android • 20 items • Updated 16 days ago • 149
Lapa v0.1.2 Release Collection Release of SOTA Ukrainian LLM and Datasets • 18 items • Updated Nov 13 • 24
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks Paper • 2005.11401 • Published May 22, 2020 • 14
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only Paper • 2306.01116 • Published Jun 1, 2023 • 41