Spaces:

pollen-robotics
/

reachy_mini_conversation_app

Running

App Files Files Community

Alina commited on Oct 10

Commit

722c064

unverified ·

2 Parent(s): eaa4ab7 d112528

Merge pull request #50 from pollen-robotics/49-improve-readme

Browse files

Files changed (5) hide show

.env.example +1 -4
README.md +87 -31
src/reachy_mini_conversation_demo/config.py +1 -2
src/reachy_mini_conversation_demo/openai_realtime.py +1 -1
src/reachy_mini_conversation_demo/vision/processors.py +0 -1

.env.example CHANGED Viewed

@@ -1,8 +1,5 @@
 OPENAI_API_KEY=
-MODEL_NAME="gpt-4o-realtime-preview-2025-06-03"
-# Vision model (Responses API, multimodal)
-OPENAI_VISION_MODEL="gpt-4.1-mini"
 # Cache for local VLM
 HF_HOME=./cache

 OPENAI_API_KEY=
+MODEL_NAME="gpt-realtime"
 # Cache for local VLM
 HF_HOME=./cache

README.md CHANGED Viewed

@@ -1,6 +1,12 @@
 # Reachy Mini conversation demo
-Working repo, we should turn this into a ReachyMini app at some point maybe ?
 ## Installation
@@ -26,53 +32,103 @@ You can combine extras or include dev dependencies:
 uv sync --extra all_vision --group dev
 ```
-### Using pip
-Alternatively, you can install using pip in editable mode:
 ```bash
-python -m venv .venv  # Create a virtual environment
 source .venv/bin/activate
 pip install -e .
 ```
-To include optional vision dependencies:
-```
 pip install -e .[local_vision]
 pip install -e .[yolo_vision]
 pip install -e .[mediapipe_vision]
-pip install -e .[all_vision]
-```
-To include dev dependencies:
-```
 pip install -e .[dev]
 ```
-## Run
 ```bash
 reachy-mini-conversation-demo
 ```
-## Command line arguments
-| Option | Values | Default | Description |
-|--------|--------|---------|-------------|
-| `--head-tracker` | `yolo`, `mediapipe` | `None` | Enable **head tracking** using the specified tracker:<br>• **yolo** → YOLO-based head tracker.<br>• **mediapipe** → MediaPipe-based head tracker.<br> |
-| `--no-camera` | *(flag)* | off | Disable **camera usage** entirely. |
-| `--gradio` | *(flag)* | off | Launch with **Gradio web interface** for browser-based interaction. Required when running in simulation mode. |
-| `--debug` | *(flag)* | off | Enable **debug logging** (default log level is INFO). |
-## Examples
-- Run with YOLO head tracking:
-```
-reachy-mini-conversation-demo --head-tracker yolo
-```
-- Run with MediaPipe head tracking and debug logging:
-```
-reachy-mini-conversation-demo --head-tracker mediapipe --debug
-```
-- Run with Gradio web interface:
-```
-reachy-mini-conversation-demo --gradio
-```

 # Reachy Mini conversation demo
+Conversational demo for the Reachy Mini robot combining OpenAI's realtime APIs, vision pipelines, and choreographed motion libraries.
+## Overview
+- Real-time audio conversation loop powered by the OpenAI realtime API and `fastrtc` for low-latency streaming.
+- Camera capture can route to OpenAI multimodal vision or stay on-device with SmolVLM2 local analysis.
+- Layered motion system queues primary moves (dances, emotions, goto poses, breathing) while blending speech-reactive wobble and face-tracking.
+- Async tool dispatch integrates robot motion, camera capture, and optional facial-recognition helpers through a Gradio web UI with live transcripts.
 ## Installation
 uv sync --extra all_vision --group dev
 ```
+### Using pip (test on Ubuntu 24.04)
 ```bash
+python -m venv .venv # Create a virtual environment
 source .venv/bin/activate
 pip install -e .
 ```
+Install optional extras depending on the feature set you need:
+```bash
+# Vision stacks (choose at least one if you plan to run face tracking)
 pip install -e .[local_vision]
 pip install -e .[yolo_vision]
 pip install -e .[mediapipe_vision]
+pip install -e .[all_vision]        # installs every vision extra
+# Tooling for development workflows
 pip install -e .[dev]
 ```
+Some wheels (e.g. PyTorch) are large and require compatible CUDA or CPU builds—make sure your platform matches the binaries pulled in by each extra.
+## Optional dependency groups
+| Extra | Purpose | Notes |
+|-------|---------|-------|
+| `local_vision` | Run the local VLM (SmolVLM2) through PyTorch/Transformers. | GPU recommended; ensure compatible PyTorch builds for your platform.
+| `yolo_vision` | YOLOv8 tracking via `ultralytics` and `supervision`. | CPU friendly; supports the `--head-tracker yolo` option.
+| `mediapipe_vision` | Lightweight landmark tracking with MediaPipe. | Works on CPU; enables `--head-tracker mediapipe`.
+| `all_vision` | Convenience alias installing every vision extra. | Install when you want the flexibility to experiment with every provider.
+| `dev` | Developer tooling (`pytest`, `ruff`). | Add on top of either base or `all_vision` environments.
+## Configuration
+1. Copy `.env.example` to `.env`.
+2. Fill in the required values, notably the OpenAI API key.
+| Variable | Description |
+|----------|-------------|
+| `OPENAI_API_KEY` | Required. Grants access to the OpenAI realtime endpoint.
+| `MODEL_NAME` | Override the realtime model (defaults to `gpt-realtime`).
+| `HF_HOME` | Cache directory for local Hugging Face downloads.
+| `HF_TOKEN` | Optional token for Hugging Face models.
+## Running the demo
+Activate your virtual environment, ensure the Reachy Mini robot (or simulator) is reachable, then launch:
 ```bash
 reachy-mini-conversation-demo
 ```
+The app starts a Gradio UI served locally (http://127.0.0.1:7860/). When running on a headless host, use `--headless`. With a camera attached, captured frames can be analysed remotely through OpenAI multimodal models or locally via the YOLO/MediaPipe pipelines depending on the extras you installed.
+### CLI options
+| Option | Default | Description |
+|--------|---------|-------------|
+| `--head-tracker {yolo,mediapipe}` | `None` | Select a face-tracking backend when a camera is available. Requires the matching optional extra. |
+| `--no-camera` | `False` | Run without camera capture or face tracking. |
+| `--gradio` | `False` | Launch the Gradio web UI. Without this flag, runs in console mode. Required when running in simulation mode. |
+| `--debug` | `False` | Enable verbose logging for troubleshooting. |
+### Examples
+- Run on hardware with MediaPipe face tracking:
+  ```bash
+  reachy-mini-conversation-demo --head-tracker mediapipe
+  ```
+- Disable the camera pipeline (audio-only conversation):
+  ```bash
+  reachy-mini-conversation-demo --no-camera
+  ```
+## LLM tools exposed to the assistant
+| Tool | Action | Dependencies |
+|------|--------|--------------|
+| `move_head` | Queue a head pose change (left/right/up/down/front). | Core install only. |
+| `camera` | Capture the latest camera frame and optionally query a vision backend. | Requires camera worker; vision analysis depends on selected extras. |
+| `head_tracking` | Enable or disable face-tracking offsets. | Camera worker with configured head tracker. |
+| `dance` | Queue a dance from `reachy_mini_dances_library`. | Core install only. |
+| `stop_dance` | Clear queued dances. | Core install only. |
+| `play_emotion` | Play a recorded emotion clip via Hugging Face assets. | Needs `HF_TOKEN` for the recorded emotions dataset. |
+| `stop_emotion` | Clear queued emotions. | Core install only. |
+| `get_person_name` | DeepFace-based recognition of the current person. | Disabled by default (`ENABLE_FACE_RECOGNITION=False`); requires `deepface` and a local face database. |
+| `do_nothing` | Explicitly remain idle. | Core install only. |
+## Development workflow
+- Install the dev group extras: `uv sync --group dev` or `pip install -e .[dev]`.
+- Run formatting and linting: `ruff check .`.
+- Execute the test suite: `pytest`.
+- When iterating on robot motions, keep the control loop responsive => offload blocking work using the helpers in `tools.py`.
+## License
+Apache 2.0

src/reachy_mini_conversation_demo/config.py CHANGED Viewed

@@ -23,8 +23,7 @@ class Config:
         raise RuntimeError("OPENAI_API_KEY is missing in .env")
     # Optional
-    MODEL_NAME = os.getenv("MODEL_NAME", "gpt-4o-realtime-preview")
-    OPENAI_VISION_MODEL = os.getenv("OPENAI_VISION_MODEL", "gpt-4.1-mini")
     HF_HOME = os.getenv("HF_HOME", "./cache")
     HF_TOKEN = os.getenv("HF_TOKEN")  # Optional, falls back to hf auth login if not set

         raise RuntimeError("OPENAI_API_KEY is missing in .env")
     # Optional
+    MODEL_NAME = os.getenv("MODEL_NAME", "gpt-realtime")
     HF_HOME = os.getenv("HF_HOME", "./cache")
     HF_TOKEN = os.getenv("HF_TOKEN")  # Optional, falls back to hf auth login if not set

src/reachy_mini_conversation_demo/openai_realtime.py CHANGED Viewed

@@ -48,7 +48,7 @@ class OpenaiRealtimeHandler(AsyncStreamHandler):
     async def start_up(self):
         """Start the handler."""
         self.client = AsyncOpenAI(api_key=config.OPENAI_API_KEY)
-        async with self.client.beta.realtime.connect(model="gpt-realtime") as conn:
             await conn.session.update(
                 session={
                     "turn_detection": {

     async def start_up(self):
         """Start the handler."""
         self.client = AsyncOpenAI(api_key=config.OPENAI_API_KEY)
+        async with self.client.beta.realtime.connect(model=config.MODEL_NAME) as conn:
             await conn.session.update(
                 session={
                     "turn_detection": {

src/reachy_mini_conversation_demo/vision/processors.py CHANGED Viewed

@@ -23,7 +23,6 @@ class VisionConfig:
     """Configuration for vision processing."""
     processor_type: str = "local"
-    openai_model: str = os.getenv("OPENAI_VISION_MODEL", "gpt-4.1-mini")
     model_path: str = "HuggingFaceTB/SmolVLM2-2.2B-Instruct"
     vision_interval: float = 5.0
     max_new_tokens: int = 64

     """Configuration for vision processing."""
     processor_type: str = "local"
     model_path: str = "HuggingFaceTB/SmolVLM2-2.2B-Instruct"
     vision_interval: float = 5.0
     max_new_tokens: int = 64