Alina Lozovskaya commited on
Commit
e91b8f2
·
1 Parent(s): 2ee9d8b

Update README

Browse files
Files changed (1) hide show
  1. README.md +6 -5
README.md CHANGED
@@ -1,10 +1,11 @@
1
  # Reachy Mini conversation demo
2
 
3
  Conversational demo for the Reachy Mini robot combining OpenAI's realtime APIs, vision pipelines, and choreographed motion libraries.
 
4
 
5
  ## Overview
6
  - Real-time audio conversation loop powered by the OpenAI realtime API and `fastrtc` for low-latency streaming.
7
- - Camera capture can route to OpenAI multimodal vision or stay on-device with SmolVLM2 local analysis.
8
  - Layered motion system queues primary moves (dances, emotions, goto poses, breathing) while blending speech-reactive wobble and face-tracking.
9
  - Async tool dispatch integrates robot motion, camera capture, and optional facial-recognition helpers through a Gradio web UI with live transcripts.
10
 
@@ -74,8 +75,9 @@ Some wheels (e.g. PyTorch) are large and require compatible CUDA or CPU builds
74
  |----------|-------------|
75
  | `OPENAI_API_KEY` | Required. Grants access to the OpenAI realtime endpoint.
76
  | `MODEL_NAME` | Override the realtime model (defaults to `gpt-realtime`).
77
- | `HF_HOME` | Cache directory for local Hugging Face downloads.
78
- | `HF_TOKEN` | Optional token for Hugging Face models.
 
79
 
80
  ## Running the demo
81
 
@@ -85,7 +87,7 @@ Activate your virtual environment, ensure the Reachy Mini robot (or simulator) i
85
  reachy-mini-conversation-demo
86
  ```
87
 
88
- The app starts a Gradio UI served locally (http://127.0.0.1:7860/). When running on a headless host, use `--headless`. With a camera attached, captured frames can be analysed remotely through OpenAI multimodal models or locally via the YOLO/MediaPipe pipelines depending on the extras you installed.
89
 
90
  ### CLI options
91
 
@@ -121,7 +123,6 @@ The app starts a Gradio UI served locally (http://127.0.0.1:7860/). When running
121
  | `stop_dance` | Clear queued dances. | Core install only. |
122
  | `play_emotion` | Play a recorded emotion clip via Hugging Face assets. | Needs `HF_TOKEN` for the recorded emotions dataset. |
123
  | `stop_emotion` | Clear queued emotions. | Core install only. |
124
- | `get_person_name` | DeepFace-based recognition of the current person. | Disabled by default (`ENABLE_FACE_RECOGNITION=False`); requires `deepface` and a local face database. |
125
  | `do_nothing` | Explicitly remain idle. | Core install only. |
126
 
127
  ## Development workflow
 
1
  # Reachy Mini conversation demo
2
 
3
  Conversational demo for the Reachy Mini robot combining OpenAI's realtime APIs, vision pipelines, and choreographed motion libraries.
4
+ ![Reachy Mini Dance](src/reachy_mini_conversation_demo/images/reachy_mini_dance.gif)
5
 
6
  ## Overview
7
  - Real-time audio conversation loop powered by the OpenAI realtime API and `fastrtc` for low-latency streaming.
8
+ - Local vision processing using SmolVLM2 model running on-device (CPU/GPU/MPS).
9
  - Layered motion system queues primary moves (dances, emotions, goto poses, breathing) while blending speech-reactive wobble and face-tracking.
10
  - Async tool dispatch integrates robot motion, camera capture, and optional facial-recognition helpers through a Gradio web UI with live transcripts.
11
 
 
75
  |----------|-------------|
76
  | `OPENAI_API_KEY` | Required. Grants access to the OpenAI realtime endpoint.
77
  | `MODEL_NAME` | Override the realtime model (defaults to `gpt-realtime`).
78
+ | `HF_HOME` | Cache directory for local Hugging Face downloads (defaults to `./cache`).
79
+ | `HF_TOKEN` | Optional token for Hugging Face models (falls back to `huggingface-cli login`).
80
+ | `LOCAL_VISION_MODEL` | Hugging Face model path for local vision processing (defaults to `HuggingFaceTB/SmolVLM2-2.2B-Instruct`).
81
 
82
  ## Running the demo
83
 
 
87
  reachy-mini-conversation-demo
88
  ```
89
 
90
+ By default, the app runs in console mode for direct audio interaction. Use the `--gradio` flag to launch a web UI served locally at http://127.0.0.1:7860/ (required when running in simulation mode). With a camera attached, captured frames are analyzed locally using the SmolVLM2 vision model. Additionally, you can enable face tracking via YOLO or MediaPipe pipelines depending on the extras you installed.
91
 
92
  ### CLI options
93
 
 
123
  | `stop_dance` | Clear queued dances. | Core install only. |
124
  | `play_emotion` | Play a recorded emotion clip via Hugging Face assets. | Needs `HF_TOKEN` for the recorded emotions dataset. |
125
  | `stop_emotion` | Clear queued emotions. | Core install only. |
 
126
  | `do_nothing` | Explicitly remain idle. | Core install only. |
127
 
128
  ## Development workflow