RemiFabre commited on
Commit
10aa8a0
·
1 Parent(s): d939dca

First README pass

Browse files
Files changed (1) hide show
  1. README.md +92 -24
README.md CHANGED
@@ -1,6 +1,16 @@
1
  # Reachy Mini conversation demo
2
 
3
- Working repo, we should turn this into a ReachyMini app at some point maybe ?
 
 
 
 
 
 
 
 
 
 
4
 
5
  ## Installation
6
 
@@ -26,46 +36,104 @@ You can combine extras or include dev dependencies:
26
  uv sync --extra all_vision --group dev
27
  ```
28
 
29
- ### Using pip
30
- Alternatively, you can install using pip in editable mode:
31
 
32
  ```bash
33
- python -m venv .venv # Create a virtual environment
34
  source .venv/bin/activate
35
  pip install -e .
36
  ```
37
 
38
- To include optional vision dependencies:
39
- ```
 
 
40
  pip install -e .[local_vision]
41
  pip install -e .[yolo_vision]
42
  pip install -e .[mediapipe_vision]
43
- pip install -e .[all_vision]
44
- ```
45
 
46
- To include dev dependencies:
47
- ```
48
  pip install -e .[dev]
49
  ```
50
 
51
- ## Run
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
 
53
  ```bash
54
  reachy-mini-conversation-demo
55
  ```
56
 
57
- ## Runtime Options
58
 
59
- | Option | Values | Default | Description |
60
- |--------|--------|---------|-------------|
61
- | `--sim` | *(flag)* | off | Run in **simulation mode** (no physical robot required). |
62
- | `--vision` | *(flag)* | off | Enable the **vision system** (must be paired with `--vision-provider`). |
63
- | `--vision-provider` | `local`, `openai` | `local` | Select vision backend:<br>• **local** → Hugging Face VLM (SmolVLM2) runs on your machine.<br>• **openai** → OpenAI multimodal models via API (requires `OPENAI_API_KEY`). |
64
- | `--head-tracking` | *(flag)* | off | Enable **head tracking** (ignored when `--sim` is active). |
65
- | `--debug` | *(flag)* | off | Enable **debug logging** (default log level is INFO). |
66
 
67
- ## Examples
68
- - Simulated run with OpenAI Vision:
69
- ```
70
- reachy-mini-conversation-demo --sim --vision --vision-provider=openai
71
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # Reachy Mini conversation demo
2
 
3
+ Conversational demo for the Reachy Mini robot combining OpenAI's realtime APIs, vision pipelines, and choreographed motion libraries.
4
+
5
+ ## Overview
6
+ - Real-time audio conversation loop powered by the OpenAI realtime API and `fastrtc` for low-latency streaming.
7
+ - Motion control queue that blends scripted dances, recorded emotions, idle breathing, and speech-reactive head wobbling.
8
+ - Optional camera worker with YOLO or MediaPipe-based head tracking and LLM-accessible scene capture.
9
+
10
+ ## Features
11
+ - Async tool dispatch integrates robot motion, camera capture, and optional facial recognition helpers.
12
+ - Gradio web UI provides audio chat and transcript display.
13
+ - Movement manager keeps real-time control in a dedicated thread with safeguards against abrupt pose changes.
14
 
15
  ## Installation
16
 
 
36
  uv sync --extra all_vision --group dev
37
  ```
38
 
39
+ ### Using pip (test on Ubuntu 24.04)
 
40
 
41
  ```bash
42
+ python -m venv .venv # Create a virtual environment
43
  source .venv/bin/activate
44
  pip install -e .
45
  ```
46
 
47
+ Install optional extras depending on the feature set you need:
48
+
49
+ ```bash
50
+ # Vision stacks (choose at least one if you plan to run head tracking)
51
  pip install -e .[local_vision]
52
  pip install -e .[yolo_vision]
53
  pip install -e .[mediapipe_vision]
54
+ pip install -e .[all_vision] # installs every vision extra
 
55
 
56
+ # Tooling for development workflows
 
57
  pip install -e .[dev]
58
  ```
59
 
60
+ Some wheels (e.g. PyTorch) are large and require compatible CUDA or CPU builds. Expect the `local_vision` extra to take significantly more disk space than YOLO or MediaPipe.
61
+
62
+ ## Optional dependency groups
63
+
64
+ | Extra | Purpose | Notes |
65
+ |-------|---------|-------|
66
+ | `local_vision` | Run the local VLM (SmolVLM2) through PyTorch/Transformers. | GPU recommended; installs large packages (~2 GB).
67
+ | `yolo_vision` | YOLOv8 tracking via `ultralytics` and `supervision`. | CPU friendly; supports the `--head-tracker yolo` option.
68
+ | `mediapipe_vision` | Lightweight landmark tracking with MediaPipe. | Works on CPU; enables `--head-tracker mediapipe`.
69
+ | `all_vision` | Convenience alias installing every vision extra. | Only use if you need to experiment with all providers.
70
+ | `dev` | Developer tooling (`pytest`, `ruff`). | Add on top of either base or `all_vision` environments.
71
+
72
+ ## Configuration
73
+
74
+ 1. Copy `.env.example` to `.env`.
75
+ 2. Fill in the required values, notably the OpenAI API key.
76
+
77
+ | Variable | Description |
78
+ |----------|-------------|
79
+ | `OPENAI_API_KEY` | Required. Grants access to the OpenAI realtime endpoint.
80
+ | `MODEL_NAME` | Override the realtime model (defaults to `gpt-realtime`).
81
+ | `HF_HOME` | Cache directory for local Hugging Face downloads.
82
+ | `HF_TOKEN` | Optional token for Hugging Face models.
83
+
84
+ ## Running the demo
85
+
86
+ Activate your virtual environment, ensure the Reachy Mini robot (or simulator) is reachable, then launch:
87
 
88
  ```bash
89
  reachy-mini-conversation-demo
90
  ```
91
 
92
+ The app starts a Gradio UI served locally (http://127.0.0.1:7860/). When running on a headless host, use `--headless`.
93
 
94
+ ### CLI options
 
 
 
 
 
 
95
 
96
+ | Option | Default | Description |
97
+ |--------|---------|-------------|
98
+ | `--head-tracker {yolo,mediapipe}` | `None` | Select a head-tracking backend when a camera is available. Requires the matching optional extra. |
99
+ | `--no-camera` | `False` | Run without camera capture or head tracking. |
100
+ | `--headless` | `False` | Suppress launching the Gradio UI (useful on remote machines). |
101
+ | `--debug` | `False` | Enable verbose logging for troubleshooting. |
102
+
103
+ ### Examples
104
+ - Run on hardware with MediaPipe head tracking:
105
+
106
+ ```bash
107
+ reachy-mini-conversation-demo --head-tracker mediapipe
108
+ ```
109
+
110
+ - Disable the camera pipeline (audio-only conversation):
111
+
112
+ ```bash
113
+ reachy-mini-conversation-demo --no-camera
114
+ ```
115
+
116
+ ## LLM tools exposed to the assistant
117
+
118
+ | Tool | Action | Dependencies |
119
+ |------|--------|--------------|
120
+ | `move_head` | Queue a head pose change (left/right/up/down/front). | Core install only. |
121
+ | `camera` | Capture the latest camera frame and optionally query a vision backend. | Requires camera worker; vision analysis depends on selected extras. |
122
+ | `head_tracking` | Enable or disable face-tracking offsets. | Camera worker with configured head tracker. |
123
+ | `dance` | Queue a dance from `reachy_mini_dances_library`. | Core install only. |
124
+ | `stop_dance` | Clear queued dances. | Core install only. |
125
+ | `play_emotion` | Play a recorded emotion clip via Hugging Face assets. | Needs `HF_TOKEN` for the recorded emotions dataset. |
126
+ | `stop_emotion` | Clear queued emotions. | Core install only. |
127
+ | `get_person_name` | Attempt DeepFace-based recognition of the current person. | Disabled by default (`ENABLE_FACE_RECOGNITION=False`); requires `deepface` and a local face database. |
128
+ | `do_nothing` | Explicitly remain idle. | Core install only. |
129
+
130
+ ## Development workflow
131
+ - Install the dev group extras: `uv sync --group dev` or `pip install -e .[dev]`.
132
+ - Run formatting and linting: `ruff check .`.
133
+ - Execute the test suite: `pytest`.
134
+ - When iterating on robot motions, keep the control loop responsive—offload blocking work using the helpers in `tools.py`.
135
+
136
+
137
+ ## License
138
+
139
+ Apache 2.0