whitphx's picture
whitphx HF Staff
upload result to HF dataset
7cb11f5
# Hugging Face Dataset Integration
The benchmark server can automatically upload results to a Hugging Face Dataset repository for centralized storage and sharing.
## Features
- **Automatic Upload**: Results are automatically pushed to HF Dataset when benchmarks complete
- **File Structure Preservation**: Uses the same path structure: `{task}/{org}/{model}/{params}.json`
- **JSON Format**: Results are stored as JSON (not JSONL) for better Dataset compatibility
- **Overwrite Strategy**: Each configuration gets a single file that is overwritten with the latest result
- **Error Tracking**: Failed benchmarks are also uploaded to track issues
## Setup
### 1. Create a Hugging Face Dataset
1. Go to https://huggingface.co/new-dataset
2. Create a new dataset (e.g., `username/transformersjs-benchmark-results`)
3. Keep it public or private based on your needs
### 2. Get Your HF Token
1. Go to https://huggingface.co/settings/tokens
2. Create a new token with `write` permissions
3. Copy the token
### 3. Configure Environment Variables
Create or update `.env` file in the `bench` directory:
```bash
# Hugging Face Dataset Configuration
HF_DATASET_REPO=whitphx/transformersjs-performance-leaderboard-results-dev
HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
# Optional: Local storage directory
BENCHMARK_RESULTS_DIR=./benchmark-results
# Optional: Server port
PORT=7860
```
**Important**: Never commit `.env` to git. It's already in `.gitignore`.
## Usage
Once configured, the server will automatically upload results:
```bash
# Start the server
npm run server
# You should see:
# πŸ“€ HF Dataset upload enabled: username/transformersjs-benchmark-results
```
When benchmarks complete, you'll see:
```
βœ… Completed: abc-123 in 5.2s
βœ“ Benchmark abc-123 saved to file
βœ“ Uploaded to HF Dataset: feature-extraction/Xenova/all-MiniLM-L6-v2/node_warm_cpu_fp32_b1.json
```
## File Structure in HF Dataset
The dataset will have the same structure as local storage:
```
feature-extraction/
β”œβ”€β”€ Xenova/
β”‚ β”œβ”€β”€ all-MiniLM-L6-v2/
β”‚ β”‚ β”œβ”€β”€ node_warm_cpu_fp32_b1.json
β”‚ β”‚ β”œβ”€β”€ node_warm_webgpu_fp16_b1.json
β”‚ β”‚ └── web_warm_wasm_b1_chromium.json
β”‚ └── distilbert-base-uncased/
β”‚ └── node_warm_cpu_fp32_b1.json
text-classification/
└── Xenova/
└── distilbert-base-uncased/
└── node_warm_cpu_fp32_b1.json
```
## JSON Format
Each file contains a single benchmark result (not multiple runs):
```json
{
"id": "abc-123-456",
"platform": "node",
"modelId": "Xenova/all-MiniLM-L6-v2",
"task": "feature-extraction",
"mode": "warm",
"repeats": 3,
"dtype": "fp32",
"batchSize": 1,
"device": "cpu",
"timestamp": 1234567890,
"status": "completed",
"result": {
"metrics": { ... },
"environment": { ... }
}
}
```
## Behavior
### Overwriting Results
- Each benchmark configuration maps to a single file
- New results **overwrite** the existing file
- Only the **latest** result is kept per configuration
- This ensures the dataset always has current data
### Local vs Remote Storage
- **Local (JSONL)**: Keeps history of all runs (append-only)
- **Remote (JSON)**: Keeps only latest result (overwrite)
This dual approach allows:
- Local: Full history for analysis
- Remote: Clean, current results for leaderboards
### Failed Benchmarks
Failed benchmarks are also uploaded to track:
- Which models/configs have issues
- Error types (memory errors, etc.)
- Environmental context
Example failed result:
```json
{
"id": "def-456-789",
"status": "failed",
"error": "Benchmark failed with code 1: ...",
"result": {
"error": {
"type": "memory_error",
"message": "Aborted(). Build with -sASSERTIONS for more info.",
"stage": "load"
},
"environment": { ... }
}
}
```
## Git Commits
Each upload creates a git commit in the dataset with:
```
Update benchmark: Xenova/all-MiniLM-L6-v2 (node/feature-extraction)
Benchmark ID: abc-123-456
Status: completed
Timestamp: 2025-10-13T06:48:57.481Z
```
## Disabling Upload
To disable HF Dataset upload:
1. Remove `HF_TOKEN` from `.env`, or
2. Remove both `HF_DATASET_REPO` and `HF_TOKEN`
The server will show:
```
πŸ“€ HF Dataset upload disabled (set HF_DATASET_REPO and HF_TOKEN to enable)
```
## Error Handling
If HF upload fails:
- The error is logged but doesn't fail the benchmark
- Local storage still succeeds
- You can retry manually or fix configuration
Example error:
```
βœ— Failed to upload benchmark abc-123 to HF Dataset: Authentication failed
```
## API Endpoint (Future)
Currently uploads happen automatically. In the future, we could add:
```bash
# Manually trigger upload of a specific result
POST /api/benchmark/:id/upload
# Re-upload all local results to HF Dataset
POST /api/benchmarks/sync
```
## Development vs Production
Use different dataset repositories for development and production:
**Development** (`.env`):
```bash
HF_DATASET_REPO=whitphx/transformersjs-performance-leaderboard-results-dev
```
**Production** (deployed environment):
```bash
HF_DATASET_REPO=whitphx/transformersjs-performance-leaderboard-results
```
This allows testing without polluting production data.