embedl
/

Llama-3.2-1B-Instruct-FlashHead

text-generation-inference

Model card Files Files and versions

WilhelmT commited on 4 days ago

Commit

ec6ebba

·

verified ·

1 Parent(s): cb556dc

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -11,6 +11,8 @@ tags:
 # Llama-3.2-1B-Instruct-FlashHead
 **Optimized version of Llama-3.2-1B-Instruct using FlashHead, Embedl’s efficient replacement for the language model head, reducing size while preserving accuracy.**
 Designed for **low-latency inference** on **NVIDIA RTX GPUs**, leveraging:

 # Llama-3.2-1B-Instruct-FlashHead
+![My model banner](assets/FlashHead.png)
 **Optimized version of Llama-3.2-1B-Instruct using FlashHead, Embedl’s efficient replacement for the language model head, reducing size while preserving accuracy.**
 Designed for **low-latency inference** on **NVIDIA RTX GPUs**, leveraging: