Robust-Decoding/gemma22bit-hh-RMODdistill_lr1e-5_3epochs_16kprompts
Text Generation
•
3B
•
Updated
•
7
Robust-Decoding/uf_5objs_3epochs
Updated
Robust-Decoding/uf_5objs_safety_multihead
Updated
Robust-Decoding/uf_6objs_multimodel
Updated
Robust-Decoding/uf_6objs_multihead
Updated
Robust-Decoding/gemma22bit-hh-dpo-uniform-step60291
Text Generation
•
3B
•
Updated
•
6
•
Robust-Decoding/gemma22bit-hh-grpo-uniform-step1000
Text Generation
•
3B
•
Updated
•
7
•
Robust-Decoding/gemma2-2b-it-hh-dpo-helpful-step-8000
Text Generation
•
3B
•
Updated
•
4
•
Robust-Decoding/gemma2-2b-it-hh-grpo-helpful-step1000-swyoon
Text Generation
•
3B
•
Updated
•
7
Robust-Decoding/gemma2-2b-it-hh-dpo-harmless-step-6000
Text Generation
•
3B
•
Updated
•
8
•
Robust-Decoding/gemma2-2b-it-hh-grpo-harmless-step350
Text Generation
•
3B
•
Updated
•
7
•
Robust-Decoding/gemma2-2b-it-hh-grpo-helpful-step550
Text Generation
•
3B
•
Updated
•
7
Robust-Decoding/gemma2-2b-it-hh-grpo-harmless-step100
Text Generation
•
3B
•
Updated
•
6
•
Robust-Decoding/gemma-2-2b-it_1.0-0.0_kl0.001_chk_5000
Text Generation
•
3B
•
Updated
•
6
•
Robust-Decoding/gemma-2-2b-it_1.0-0.0_kl0.01_chk_5000
Text Generation
•
3B
•
Updated
•
6
•
Robust-Decoding/gemma22bit-hh-ppo-helpful-step20000
Text Generation
•
3B
•
Updated
•
7
•
Robust-Decoding/gemma22bit-hh-ppo-harmless-step20000
Text Generation
•
3B
•
Updated
•
6
Robust-Decoding/gemma22bit-hh-ppo-average-step20000
Text Generation
•
3B
•
Updated
•
8
•