WilhelmT commited on
Commit
d805c9e
·
verified ·
1 Parent(s): 74310bb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -0
README.md CHANGED
@@ -96,6 +96,24 @@ prompt = "Write a haiku about coffee."
96
  output = llm.generate([prompt], sampling)
97
  print(output[0].outputs[0].text)
98
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
99
  ---
100
 
101
  ---
 
96
  output = llm.generate([prompt], sampling)
97
  print(output[0].outputs[0].text)
98
  ```
99
+
100
+ ---
101
+
102
+ ### Interactive REPL Example
103
+
104
+ The `run_repl()` coroutine launches an **interactive, streaming chat interface** using the vLLM backend with FlashHead enabled.
105
+ It maintains an in-memory chat history and supports simple commands such as `/exit` to quit and `/reset` to clear context.
106
+
107
+ ```python
108
+ import asyncio
109
+ from embedl.models.vllm.demo import run_repl
110
+
111
+ model_id = "embedl/Llama-3.2-1B-Instruct-FlashHead"
112
+ asyncio.run(
113
+ run_repl(
114
+ model=model_id
115
+ )
116
+ ```
117
  ---
118
 
119
  ---