embedl
/

Llama-3.2-1B-Instruct-FlashHead

text-generation-inference

Model card Files Files and versions

WilhelmT commited on 4 days ago

Commit

1ef5d7b

·

verified ·

1 Parent(s): 594ce42

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -39,7 +39,7 @@ FlashHead matches the baseline **Llama-3.2-1B** within rounding on standard eval
 - **FlashHead LM Head** - lightweight replacement for the dense LM head, significantly improving throughput.
 - **Mixed-Precision Quantization (W4A16)** - optimal balance of memory footprint and accuracy.
-- **Custom Runtime Integration** - compatible with both **vLLM (0.10.2)** via the `embedl-models` package.
 ---

 - **FlashHead LM Head** - lightweight replacement for the dense LM head, significantly improving throughput.
 - **Mixed-Precision Quantization (W4A16)** - optimal balance of memory footprint and accuracy.
+- **Custom Runtime Integration** - compatible with **vLLM (0.10.2)** via the `embedl-models` package.
 ---