Spaces:

pollen-robotics
/

reachy_mini_conversation_app

Running

App Files Files Community

reachy_mini_conversation_app / README.md

RemiFabre

First README pass

10aa8a0 3 months ago

preview code

raw

history blame

5.68 kB

Reachy Mini conversation demo

Conversational demo for the Reachy Mini robot combining OpenAI's realtime APIs, vision pipelines, and choreographed motion libraries.

Overview

Real-time audio conversation loop powered by the OpenAI realtime API and fastrtc for low-latency streaming.
Motion control queue that blends scripted dances, recorded emotions, idle breathing, and speech-reactive head wobbling.
Optional camera worker with YOLO or MediaPipe-based head tracking and LLM-accessible scene capture.

Features

Async tool dispatch integrates robot motion, camera capture, and optional facial recognition helpers.
Gradio web UI provides audio chat and transcript display.
Movement manager keeps real-time control in a dedicated thread with safeguards against abrupt pose changes.

Installation

Using uv

You can set up the project quickly using uv:

uv venv --python 3.12.1  # Create a virtual environment with Python 3.12.1
source .venv/bin/activate
uv sync

To include optional vision dependencies:

uv sync --extra local_vision        # For local PyTorch/Transformers vision
uv sync --extra yolo_vision         # For YOLO-based vision
uv sync --extra mediapipe_vision    # For MediaPipe-based vision
uv sync --extra all_vision          # For all vision features

You can combine extras or include dev dependencies:

uv sync --extra all_vision --group dev

Using pip (test on Ubuntu 24.04)

python -m venv .venv # Create a virtual environment
source .venv/bin/activate
pip install -e .

Install optional extras depending on the feature set you need:

# Vision stacks (choose at least one if you plan to run head tracking)
pip install -e .[local_vision]
pip install -e .[yolo_vision]
pip install -e .[mediapipe_vision]
pip install -e .[all_vision]        # installs every vision extra

# Tooling for development workflows
pip install -e .[dev]

Some wheels (e.g. PyTorch) are large and require compatible CUDA or CPU builds. Expect the local_vision extra to take significantly more disk space than YOLO or MediaPipe.

Optional dependency groups

Extra	Purpose	Notes
`local_vision`	Run the local VLM (SmolVLM2) through PyTorch/Transformers.	GPU recommended; installs large packages (~2 GB).
`yolo_vision`	YOLOv8 tracking via `ultralytics` and `supervision`.	CPU friendly; supports the `--head-tracker yolo` option.
`mediapipe_vision`	Lightweight landmark tracking with MediaPipe.	Works on CPU; enables `--head-tracker mediapipe`.
`all_vision`	Convenience alias installing every vision extra.	Only use if you need to experiment with all providers.
`dev`	Developer tooling (`pytest`, `ruff`).	Add on top of either base or `all_vision` environments.

Configuration

Copy .env.example to .env.
Fill in the required values, notably the OpenAI API key.

Variable	Description
`OPENAI_API_KEY`	Required. Grants access to the OpenAI realtime endpoint.
`MODEL_NAME`	Override the realtime model (defaults to `gpt-realtime`).
`HF_HOME`	Cache directory for local Hugging Face downloads.
`HF_TOKEN`	Optional token for Hugging Face models.

Running the demo

Activate your virtual environment, ensure the Reachy Mini robot (or simulator) is reachable, then launch:

reachy-mini-conversation-demo

The app starts a Gradio UI served locally (http://127.0.0.1:7860/). When running on a headless host, use --headless.

CLI options

Option	Default	Description
`--head-tracker {yolo,mediapipe}`	`None`	Select a head-tracking backend when a camera is available. Requires the matching optional extra.
`--no-camera`	`False`	Run without camera capture or head tracking.
`--headless`	`False`	Suppress launching the Gradio UI (useful on remote machines).
`--debug`	`False`	Enable verbose logging for troubleshooting.

Examples

Run on hardware with MediaPipe head tracking:

reachy-mini-conversation-demo --head-tracker mediapipe

Disable the camera pipeline (audio-only conversation):
```
reachy-mini-conversation-demo --no-camera
```

LLM tools exposed to the assistant

Tool	Action	Dependencies
`move_head`	Queue a head pose change (left/right/up/down/front).	Core install only.
`camera`	Capture the latest camera frame and optionally query a vision backend.	Requires camera worker; vision analysis depends on selected extras.
`head_tracking`	Enable or disable face-tracking offsets.	Camera worker with configured head tracker.
`dance`	Queue a dance from `reachy_mini_dances_library`.	Core install only.
`stop_dance`	Clear queued dances.	Core install only.
`play_emotion`	Play a recorded emotion clip via Hugging Face assets.	Needs `HF_TOKEN` for the recorded emotions dataset.
`stop_emotion`	Clear queued emotions.	Core install only.
`get_person_name`	Attempt DeepFace-based recognition of the current person.	Disabled by default (`ENABLE_FACE_RECOGNITION=False`); requires `deepface` and a local face database.
`do_nothing`	Explicitly remain idle.	Core install only.

Development workflow

Install the dev group extras: uv sync --group dev or pip install -e .[dev].
Run formatting and linting: ruff check ..
Execute the test suite: pytest.
When iterating on robot motions, keep the control loop responsive—offload blocking work using the helpers in tools.py.

License

Apache 2.0