Spaces:

pollen-robotics
/

reachy_mini_conversation_app

Running

App Files Files Community

RemiFabre commited on Oct 8

Commit

10aa8a0

1 Parent(s): d939dca

First README pass

Browse files

Files changed (1) hide show

README.md +92 -24

README.md CHANGED Viewed

@@ -1,6 +1,16 @@
 # Reachy Mini conversation demo
-Working repo, we should turn this into a ReachyMini app at some point maybe ?
 ## Installation
@@ -26,46 +36,104 @@ You can combine extras or include dev dependencies:
 uv sync --extra all_vision --group dev
 ```
-### Using pip
-Alternatively, you can install using pip in editable mode:
 ```bash
-python -m venv .venv  # Create a virtual environment
 source .venv/bin/activate
 pip install -e .
 ```
-To include optional vision dependencies:
-```
 pip install -e .[local_vision]
 pip install -e .[yolo_vision]
 pip install -e .[mediapipe_vision]
-pip install -e .[all_vision]
-```
-To include dev dependencies:
-```
 pip install -e .[dev]
 ```
-## Run
 ```bash
 reachy-mini-conversation-demo
 ```
-## Runtime Options
-| Option | Values | Default | Description |
-|--------|--------|---------|-------------|
-| `--sim` | *(flag)* | off | Run in **simulation mode** (no physical robot required). |
-| `--vision` | *(flag)* | off | Enable the **vision system** (must be paired with `--vision-provider`). |
-| `--vision-provider` | `local`, `openai` | `local` | Select vision backend:<br>• **local** → Hugging Face VLM (SmolVLM2) runs on your machine.<br>• **openai** → OpenAI multimodal models via API (requires `OPENAI_API_KEY`). |
-| `--head-tracking` | *(flag)* | off | Enable **head tracking** (ignored when `--sim` is active). |
-| `--debug` | *(flag)* | off | Enable **debug logging** (default log level is INFO). |
-## Examples
-- Simulated run with OpenAI Vision:
-```
-reachy-mini-conversation-demo --sim --vision --vision-provider=openai
-```

 # Reachy Mini conversation demo
+Conversational demo for the Reachy Mini robot combining OpenAI's realtime APIs, vision pipelines, and choreographed motion libraries.
+## Overview
+- Real-time audio conversation loop powered by the OpenAI realtime API and `fastrtc` for low-latency streaming.
+- Motion control queue that blends scripted dances, recorded emotions, idle breathing, and speech-reactive head wobbling.
+- Optional camera worker with YOLO or MediaPipe-based head tracking and LLM-accessible scene capture.
+## Features
+- Async tool dispatch integrates robot motion, camera capture, and optional facial recognition helpers.
+- Gradio web UI provides audio chat and transcript display.
+- Movement manager keeps real-time control in a dedicated thread with safeguards against abrupt pose changes.
 ## Installation
 uv sync --extra all_vision --group dev
 ```
+### Using pip (test on Ubuntu 24.04)
 ```bash
+python -m venv .venv # Create a virtual environment
 source .venv/bin/activate
 pip install -e .
 ```
+Install optional extras depending on the feature set you need:
+```bash
+# Vision stacks (choose at least one if you plan to run head tracking)
 pip install -e .[local_vision]
 pip install -e .[yolo_vision]
 pip install -e .[mediapipe_vision]
+pip install -e .[all_vision]        # installs every vision extra
+# Tooling for development workflows
 pip install -e .[dev]
 ```
+Some wheels (e.g. PyTorch) are large and require compatible CUDA or CPU builds. Expect the `local_vision` extra to take significantly more disk space than YOLO or MediaPipe.
+## Optional dependency groups
+| Extra | Purpose | Notes |
+|-------|---------|-------|
+| `local_vision` | Run the local VLM (SmolVLM2) through PyTorch/Transformers. | GPU recommended; installs large packages (~2 GB).
+| `yolo_vision` | YOLOv8 tracking via `ultralytics` and `supervision`. | CPU friendly; supports the `--head-tracker yolo` option.
+| `mediapipe_vision` | Lightweight landmark tracking with MediaPipe. | Works on CPU; enables `--head-tracker mediapipe`.
+| `all_vision` | Convenience alias installing every vision extra. | Only use if you need to experiment with all providers.
+| `dev` | Developer tooling (`pytest`, `ruff`). | Add on top of either base or `all_vision` environments.
+## Configuration
+1. Copy `.env.example` to `.env`.
+2. Fill in the required values, notably the OpenAI API key.
+| Variable | Description |
+|----------|-------------|
+| `OPENAI_API_KEY` | Required. Grants access to the OpenAI realtime endpoint.
+| `MODEL_NAME` | Override the realtime model (defaults to `gpt-realtime`).
+| `HF_HOME` | Cache directory for local Hugging Face downloads.
+| `HF_TOKEN` | Optional token for Hugging Face models.
+## Running the demo
+Activate your virtual environment, ensure the Reachy Mini robot (or simulator) is reachable, then launch:
 ```bash
 reachy-mini-conversation-demo
 ```
+The app starts a Gradio UI served locally (http://127.0.0.1:7860/). When running on a headless host, use `--headless`.
+### CLI options
+| Option | Default | Description |
+|--------|---------|-------------|
+| `--head-tracker {yolo,mediapipe}` | `None` | Select a head-tracking backend when a camera is available. Requires the matching optional extra. |
+| `--no-camera` | `False` | Run without camera capture or head tracking. |
+| `--headless` | `False` | Suppress launching the Gradio UI (useful on remote machines). |
+| `--debug` | `False` | Enable verbose logging for troubleshooting. |
+### Examples
+- Run on hardware with MediaPipe head tracking:
+  ```bash
+  reachy-mini-conversation-demo --head-tracker mediapipe
+  ```
+- Disable the camera pipeline (audio-only conversation):
+  ```bash
+  reachy-mini-conversation-demo --no-camera
+  ```
+## LLM tools exposed to the assistant
+| Tool | Action | Dependencies |
+|------|--------|--------------|
+| `move_head` | Queue a head pose change (left/right/up/down/front). | Core install only. |
+| `camera` | Capture the latest camera frame and optionally query a vision backend. | Requires camera worker; vision analysis depends on selected extras. |
+| `head_tracking` | Enable or disable face-tracking offsets. | Camera worker with configured head tracker. |
+| `dance` | Queue a dance from `reachy_mini_dances_library`. | Core install only. |
+| `stop_dance` | Clear queued dances. | Core install only. |
+| `play_emotion` | Play a recorded emotion clip via Hugging Face assets. | Needs `HF_TOKEN` for the recorded emotions dataset. |
+| `stop_emotion` | Clear queued emotions. | Core install only. |
+| `get_person_name` | Attempt DeepFace-based recognition of the current person. | Disabled by default (`ENABLE_FACE_RECOGNITION=False`); requires `deepface` and a local face database. |
+| `do_nothing` | Explicitly remain idle. | Core install only. |
+## Development workflow
+- Install the dev group extras: `uv sync --group dev` or `pip install -e .[dev]`.
+- Run formatting and linting: `ruff check .`.
+- Execute the test suite: `pytest`.
+- When iterating on robot motions, keep the control loop responsive—offload blocking work using the helpers in `tools.py`.
+## License
+Apache 2.0