Update README.md
Browse files
README.md
CHANGED
|
@@ -96,6 +96,24 @@ prompt = "Write a haiku about coffee."
|
|
| 96 |
output = llm.generate([prompt], sampling)
|
| 97 |
print(output[0].outputs[0].text)
|
| 98 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 99 |
---
|
| 100 |
|
| 101 |
---
|
|
|
|
| 96 |
output = llm.generate([prompt], sampling)
|
| 97 |
print(output[0].outputs[0].text)
|
| 98 |
```
|
| 99 |
+
|
| 100 |
+
---
|
| 101 |
+
|
| 102 |
+
### Interactive REPL Example
|
| 103 |
+
|
| 104 |
+
The `run_repl()` coroutine launches an **interactive, streaming chat interface** using the vLLM backend with FlashHead enabled.
|
| 105 |
+
It maintains an in-memory chat history and supports simple commands such as `/exit` to quit and `/reset` to clear context.
|
| 106 |
+
|
| 107 |
+
```python
|
| 108 |
+
import asyncio
|
| 109 |
+
from embedl.models.vllm.demo import run_repl
|
| 110 |
+
|
| 111 |
+
model_id = "embedl/Llama-3.2-1B-Instruct-FlashHead"
|
| 112 |
+
asyncio.run(
|
| 113 |
+
run_repl(
|
| 114 |
+
model=model_id
|
| 115 |
+
)
|
| 116 |
+
```
|
| 117 |
---
|
| 118 |
|
| 119 |
---
|