Alina commited on
Commit
722c064
·
unverified ·
2 Parent(s): eaa4ab7 d112528

Merge pull request #50 from pollen-robotics/49-improve-readme

Browse files
.env.example CHANGED
@@ -1,8 +1,5 @@
1
  OPENAI_API_KEY=
2
- MODEL_NAME="gpt-4o-realtime-preview-2025-06-03"
3
-
4
- # Vision model (Responses API, multimodal)
5
- OPENAI_VISION_MODEL="gpt-4.1-mini"
6
 
7
  # Cache for local VLM
8
  HF_HOME=./cache
 
1
  OPENAI_API_KEY=
2
+ MODEL_NAME="gpt-realtime"
 
 
 
3
 
4
  # Cache for local VLM
5
  HF_HOME=./cache
README.md CHANGED
@@ -1,6 +1,12 @@
1
  # Reachy Mini conversation demo
2
 
3
- Working repo, we should turn this into a ReachyMini app at some point maybe ?
 
 
 
 
 
 
4
 
5
  ## Installation
6
 
@@ -26,53 +32,103 @@ You can combine extras or include dev dependencies:
26
  uv sync --extra all_vision --group dev
27
  ```
28
 
29
- ### Using pip
30
- Alternatively, you can install using pip in editable mode:
31
 
32
  ```bash
33
- python -m venv .venv # Create a virtual environment
34
  source .venv/bin/activate
35
  pip install -e .
36
  ```
37
 
38
- To include optional vision dependencies:
39
- ```
 
 
40
  pip install -e .[local_vision]
41
  pip install -e .[yolo_vision]
42
  pip install -e .[mediapipe_vision]
43
- pip install -e .[all_vision]
44
- ```
45
 
46
- To include dev dependencies:
47
- ```
48
  pip install -e .[dev]
49
  ```
50
 
51
- ## Run
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
 
53
  ```bash
54
  reachy-mini-conversation-demo
55
  ```
56
 
57
- ## Command line arguments
58
 
59
- | Option | Values | Default | Description |
60
- |--------|--------|---------|-------------|
61
- | `--head-tracker` | `yolo`, `mediapipe` | `None` | Enable **head tracking** using the specified tracker:<br>• **yolo** → YOLO-based head tracker.<br>• **mediapipe** → MediaPipe-based head tracker.<br> |
62
- | `--no-camera` | *(flag)* | off | Disable **camera usage** entirely. |
63
- | `--gradio` | *(flag)* | off | Launch with **Gradio web interface** for browser-based interaction. Required when running in simulation mode. |
64
- | `--debug` | *(flag)* | off | Enable **debug logging** (default log level is INFO). |
65
 
66
- ## Examples
67
- - Run with YOLO head tracking:
68
- ```
69
- reachy-mini-conversation-demo --head-tracker yolo
70
- ```
71
- - Run with MediaPipe head tracking and debug logging:
72
- ```
73
- reachy-mini-conversation-demo --head-tracker mediapipe --debug
74
- ```
75
- - Run with Gradio web interface:
76
- ```
77
- reachy-mini-conversation-demo --gradio
78
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # Reachy Mini conversation demo
2
 
3
+ Conversational demo for the Reachy Mini robot combining OpenAI's realtime APIs, vision pipelines, and choreographed motion libraries.
4
+
5
+ ## Overview
6
+ - Real-time audio conversation loop powered by the OpenAI realtime API and `fastrtc` for low-latency streaming.
7
+ - Camera capture can route to OpenAI multimodal vision or stay on-device with SmolVLM2 local analysis.
8
+ - Layered motion system queues primary moves (dances, emotions, goto poses, breathing) while blending speech-reactive wobble and face-tracking.
9
+ - Async tool dispatch integrates robot motion, camera capture, and optional facial-recognition helpers through a Gradio web UI with live transcripts.
10
 
11
  ## Installation
12
 
 
32
  uv sync --extra all_vision --group dev
33
  ```
34
 
35
+ ### Using pip (test on Ubuntu 24.04)
 
36
 
37
  ```bash
38
+ python -m venv .venv # Create a virtual environment
39
  source .venv/bin/activate
40
  pip install -e .
41
  ```
42
 
43
+ Install optional extras depending on the feature set you need:
44
+
45
+ ```bash
46
+ # Vision stacks (choose at least one if you plan to run face tracking)
47
  pip install -e .[local_vision]
48
  pip install -e .[yolo_vision]
49
  pip install -e .[mediapipe_vision]
50
+ pip install -e .[all_vision] # installs every vision extra
 
51
 
52
+ # Tooling for development workflows
 
53
  pip install -e .[dev]
54
  ```
55
 
56
+ Some wheels (e.g. PyTorch) are large and require compatible CUDA or CPU builds—make sure your platform matches the binaries pulled in by each extra.
57
+
58
+ ## Optional dependency groups
59
+
60
+ | Extra | Purpose | Notes |
61
+ |-------|---------|-------|
62
+ | `local_vision` | Run the local VLM (SmolVLM2) through PyTorch/Transformers. | GPU recommended; ensure compatible PyTorch builds for your platform.
63
+ | `yolo_vision` | YOLOv8 tracking via `ultralytics` and `supervision`. | CPU friendly; supports the `--head-tracker yolo` option.
64
+ | `mediapipe_vision` | Lightweight landmark tracking with MediaPipe. | Works on CPU; enables `--head-tracker mediapipe`.
65
+ | `all_vision` | Convenience alias installing every vision extra. | Install when you want the flexibility to experiment with every provider.
66
+ | `dev` | Developer tooling (`pytest`, `ruff`). | Add on top of either base or `all_vision` environments.
67
+
68
+ ## Configuration
69
+
70
+ 1. Copy `.env.example` to `.env`.
71
+ 2. Fill in the required values, notably the OpenAI API key.
72
+
73
+ | Variable | Description |
74
+ |----------|-------------|
75
+ | `OPENAI_API_KEY` | Required. Grants access to the OpenAI realtime endpoint.
76
+ | `MODEL_NAME` | Override the realtime model (defaults to `gpt-realtime`).
77
+ | `HF_HOME` | Cache directory for local Hugging Face downloads.
78
+ | `HF_TOKEN` | Optional token for Hugging Face models.
79
+
80
+ ## Running the demo
81
+
82
+ Activate your virtual environment, ensure the Reachy Mini robot (or simulator) is reachable, then launch:
83
 
84
  ```bash
85
  reachy-mini-conversation-demo
86
  ```
87
 
88
+ The app starts a Gradio UI served locally (http://127.0.0.1:7860/). When running on a headless host, use `--headless`. With a camera attached, captured frames can be analysed remotely through OpenAI multimodal models or locally via the YOLO/MediaPipe pipelines depending on the extras you installed.
89
 
90
+ ### CLI options
 
 
 
 
 
91
 
92
+ | Option | Default | Description |
93
+ |--------|---------|-------------|
94
+ | `--head-tracker {yolo,mediapipe}` | `None` | Select a face-tracking backend when a camera is available. Requires the matching optional extra. |
95
+ | `--no-camera` | `False` | Run without camera capture or face tracking. |
96
+ | `--gradio` | `False` | Launch the Gradio web UI. Without this flag, runs in console mode. Required when running in simulation mode. |
97
+ | `--debug` | `False` | Enable verbose logging for troubleshooting. |
98
+
99
+
100
+ ### Examples
101
+ - Run on hardware with MediaPipe face tracking:
102
+
103
+ ```bash
104
+ reachy-mini-conversation-demo --head-tracker mediapipe
105
+ ```
106
+
107
+ - Disable the camera pipeline (audio-only conversation):
108
+
109
+ ```bash
110
+ reachy-mini-conversation-demo --no-camera
111
+ ```
112
+
113
+ ## LLM tools exposed to the assistant
114
+
115
+ | Tool | Action | Dependencies |
116
+ |------|--------|--------------|
117
+ | `move_head` | Queue a head pose change (left/right/up/down/front). | Core install only. |
118
+ | `camera` | Capture the latest camera frame and optionally query a vision backend. | Requires camera worker; vision analysis depends on selected extras. |
119
+ | `head_tracking` | Enable or disable face-tracking offsets. | Camera worker with configured head tracker. |
120
+ | `dance` | Queue a dance from `reachy_mini_dances_library`. | Core install only. |
121
+ | `stop_dance` | Clear queued dances. | Core install only. |
122
+ | `play_emotion` | Play a recorded emotion clip via Hugging Face assets. | Needs `HF_TOKEN` for the recorded emotions dataset. |
123
+ | `stop_emotion` | Clear queued emotions. | Core install only. |
124
+ | `get_person_name` | DeepFace-based recognition of the current person. | Disabled by default (`ENABLE_FACE_RECOGNITION=False`); requires `deepface` and a local face database. |
125
+ | `do_nothing` | Explicitly remain idle. | Core install only. |
126
+
127
+ ## Development workflow
128
+ - Install the dev group extras: `uv sync --group dev` or `pip install -e .[dev]`.
129
+ - Run formatting and linting: `ruff check .`.
130
+ - Execute the test suite: `pytest`.
131
+ - When iterating on robot motions, keep the control loop responsive => offload blocking work using the helpers in `tools.py`.
132
+
133
+ ## License
134
+ Apache 2.0
src/reachy_mini_conversation_demo/config.py CHANGED
@@ -23,8 +23,7 @@ class Config:
23
  raise RuntimeError("OPENAI_API_KEY is missing in .env")
24
 
25
  # Optional
26
- MODEL_NAME = os.getenv("MODEL_NAME", "gpt-4o-realtime-preview")
27
- OPENAI_VISION_MODEL = os.getenv("OPENAI_VISION_MODEL", "gpt-4.1-mini")
28
  HF_HOME = os.getenv("HF_HOME", "./cache")
29
  HF_TOKEN = os.getenv("HF_TOKEN") # Optional, falls back to hf auth login if not set
30
 
 
23
  raise RuntimeError("OPENAI_API_KEY is missing in .env")
24
 
25
  # Optional
26
+ MODEL_NAME = os.getenv("MODEL_NAME", "gpt-realtime")
 
27
  HF_HOME = os.getenv("HF_HOME", "./cache")
28
  HF_TOKEN = os.getenv("HF_TOKEN") # Optional, falls back to hf auth login if not set
29
 
src/reachy_mini_conversation_demo/openai_realtime.py CHANGED
@@ -48,7 +48,7 @@ class OpenaiRealtimeHandler(AsyncStreamHandler):
48
  async def start_up(self):
49
  """Start the handler."""
50
  self.client = AsyncOpenAI(api_key=config.OPENAI_API_KEY)
51
- async with self.client.beta.realtime.connect(model="gpt-realtime") as conn:
52
  await conn.session.update(
53
  session={
54
  "turn_detection": {
 
48
  async def start_up(self):
49
  """Start the handler."""
50
  self.client = AsyncOpenAI(api_key=config.OPENAI_API_KEY)
51
+ async with self.client.beta.realtime.connect(model=config.MODEL_NAME) as conn:
52
  await conn.session.update(
53
  session={
54
  "turn_detection": {
src/reachy_mini_conversation_demo/vision/processors.py CHANGED
@@ -23,7 +23,6 @@ class VisionConfig:
23
  """Configuration for vision processing."""
24
 
25
  processor_type: str = "local"
26
- openai_model: str = os.getenv("OPENAI_VISION_MODEL", "gpt-4.1-mini")
27
  model_path: str = "HuggingFaceTB/SmolVLM2-2.2B-Instruct"
28
  vision_interval: float = 5.0
29
  max_new_tokens: int = 64
 
23
  """Configuration for vision processing."""
24
 
25
  processor_type: str = "local"
 
26
  model_path: str = "HuggingFaceTB/SmolVLM2-2.2B-Instruct"
27
  vision_interval: float = 5.0
28
  max_new_tokens: int = 64