Spaces:
Runtime error
Runtime error
| # Hugging Face Dataset Integration | |
| The benchmark server can automatically upload results to a Hugging Face Dataset repository for centralized storage and sharing. | |
| ## Features | |
| - **Automatic Upload**: Results are automatically pushed to HF Dataset when benchmarks complete | |
| - **File Structure Preservation**: Uses the same path structure: `{task}/{org}/{model}/{params}.json` | |
| - **JSON Format**: Results are stored as JSON (not JSONL) for better Dataset compatibility | |
| - **Overwrite Strategy**: Each configuration gets a single file that is overwritten with the latest result | |
| - **Error Tracking**: Failed benchmarks are also uploaded to track issues | |
| ## Setup | |
| ### 1. Create a Hugging Face Dataset | |
| 1. Go to https://huggingface.co/new-dataset | |
| 2. Create a new dataset (e.g., `username/transformersjs-benchmark-results`) | |
| 3. Keep it public or private based on your needs | |
| ### 2. Get Your HF Token | |
| 1. Go to https://huggingface.co/settings/tokens | |
| 2. Create a new token with `write` permissions | |
| 3. Copy the token | |
| ### 3. Configure Environment Variables | |
| Create or update `.env` file in the `bench` directory: | |
| ```bash | |
| # Hugging Face Dataset Configuration | |
| HF_DATASET_REPO=whitphx/transformersjs-performance-leaderboard-results-dev | |
| HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx | |
| # Optional: Local storage directory | |
| BENCHMARK_RESULTS_DIR=./benchmark-results | |
| # Optional: Server port | |
| PORT=7860 | |
| ``` | |
| **Important**: Never commit `.env` to git. It's already in `.gitignore`. | |
| ## Usage | |
| Once configured, the server will automatically upload results: | |
| ```bash | |
| # Start the server | |
| npm run server | |
| # You should see: | |
| # π€ HF Dataset upload enabled: username/transformersjs-benchmark-results | |
| ``` | |
| When benchmarks complete, you'll see: | |
| ``` | |
| β Completed: abc-123 in 5.2s | |
| β Benchmark abc-123 saved to file | |
| β Uploaded to HF Dataset: feature-extraction/Xenova/all-MiniLM-L6-v2/node_warm_cpu_fp32_b1.json | |
| ``` | |
| ## File Structure in HF Dataset | |
| The dataset will have the same structure as local storage: | |
| ``` | |
| feature-extraction/ | |
| βββ Xenova/ | |
| β βββ all-MiniLM-L6-v2/ | |
| β β βββ node_warm_cpu_fp32_b1.json | |
| β β βββ node_warm_webgpu_fp16_b1.json | |
| β β βββ web_warm_wasm_b1_chromium.json | |
| β βββ distilbert-base-uncased/ | |
| β βββ node_warm_cpu_fp32_b1.json | |
| text-classification/ | |
| βββ Xenova/ | |
| βββ distilbert-base-uncased/ | |
| βββ node_warm_cpu_fp32_b1.json | |
| ``` | |
| ## JSON Format | |
| Each file contains a single benchmark result (not multiple runs): | |
| ```json | |
| { | |
| "id": "abc-123-456", | |
| "platform": "node", | |
| "modelId": "Xenova/all-MiniLM-L6-v2", | |
| "task": "feature-extraction", | |
| "mode": "warm", | |
| "repeats": 3, | |
| "dtype": "fp32", | |
| "batchSize": 1, | |
| "device": "cpu", | |
| "timestamp": 1234567890, | |
| "status": "completed", | |
| "result": { | |
| "metrics": { ... }, | |
| "environment": { ... } | |
| } | |
| } | |
| ``` | |
| ## Behavior | |
| ### Overwriting Results | |
| - Each benchmark configuration maps to a single file | |
| - New results **overwrite** the existing file | |
| - Only the **latest** result is kept per configuration | |
| - This ensures the dataset always has current data | |
| ### Local vs Remote Storage | |
| - **Local (JSONL)**: Keeps history of all runs (append-only) | |
| - **Remote (JSON)**: Keeps only latest result (overwrite) | |
| This dual approach allows: | |
| - Local: Full history for analysis | |
| - Remote: Clean, current results for leaderboards | |
| ### Failed Benchmarks | |
| Failed benchmarks are also uploaded to track: | |
| - Which models/configs have issues | |
| - Error types (memory errors, etc.) | |
| - Environmental context | |
| Example failed result: | |
| ```json | |
| { | |
| "id": "def-456-789", | |
| "status": "failed", | |
| "error": "Benchmark failed with code 1: ...", | |
| "result": { | |
| "error": { | |
| "type": "memory_error", | |
| "message": "Aborted(). Build with -sASSERTIONS for more info.", | |
| "stage": "load" | |
| }, | |
| "environment": { ... } | |
| } | |
| } | |
| ``` | |
| ## Git Commits | |
| Each upload creates a git commit in the dataset with: | |
| ``` | |
| Update benchmark: Xenova/all-MiniLM-L6-v2 (node/feature-extraction) | |
| Benchmark ID: abc-123-456 | |
| Status: completed | |
| Timestamp: 2025-10-13T06:48:57.481Z | |
| ``` | |
| ## Disabling Upload | |
| To disable HF Dataset upload: | |
| 1. Remove `HF_TOKEN` from `.env`, or | |
| 2. Remove both `HF_DATASET_REPO` and `HF_TOKEN` | |
| The server will show: | |
| ``` | |
| π€ HF Dataset upload disabled (set HF_DATASET_REPO and HF_TOKEN to enable) | |
| ``` | |
| ## Error Handling | |
| If HF upload fails: | |
| - The error is logged but doesn't fail the benchmark | |
| - Local storage still succeeds | |
| - You can retry manually or fix configuration | |
| Example error: | |
| ``` | |
| β Failed to upload benchmark abc-123 to HF Dataset: Authentication failed | |
| ``` | |
| ## API Endpoint (Future) | |
| Currently uploads happen automatically. In the future, we could add: | |
| ```bash | |
| # Manually trigger upload of a specific result | |
| POST /api/benchmark/:id/upload | |
| # Re-upload all local results to HF Dataset | |
| POST /api/benchmarks/sync | |
| ``` | |
| ## Development vs Production | |
| Use different dataset repositories for development and production: | |
| **Development** (`.env`): | |
| ```bash | |
| HF_DATASET_REPO=whitphx/transformersjs-performance-leaderboard-results-dev | |
| ``` | |
| **Production** (deployed environment): | |
| ```bash | |
| HF_DATASET_REPO=whitphx/transformersjs-performance-leaderboard-results | |
| ``` | |
| This allows testing without polluting production data. | |