Spaces:

whitphx
/

transformersjs-performance-leaderboard-backend

Runtime error

App Files Files Community

transformersjs-performance-leaderboard-backend / bench /docs /hf-dataset-integration.md

whitphx HF Staff

upload result to HF dataset

7cb11f5 2 months ago

preview code

raw

history blame contribute delete

5.24 kB

	# Hugging Face Dataset Integration

	The benchmark server can automatically upload results to a Hugging Face Dataset repository for centralized storage and sharing.

	## Features

	- Automatic Upload: Results are automatically pushed to HF Dataset when benchmarks complete
	- File Structure Preservation: Uses the same path structure: `{task}/{org}/{model}/{params}.json`
	- JSON Format: Results are stored as JSON (not JSONL) for better Dataset compatibility
	- Overwrite Strategy: Each configuration gets a single file that is overwritten with the latest result
	- Error Tracking: Failed benchmarks are also uploaded to track issues

	## Setup

	### 1. Create a Hugging Face Dataset

	1. Go to https://huggingface.co/new-dataset
	2. Create a new dataset (e.g., `username/transformersjs-benchmark-results`)
	3. Keep it public or private based on your needs

	### 2. Get Your HF Token

	1. Go to https://huggingface.co/settings/tokens
	2. Create a new token with `write` permissions
	3. Copy the token

	### 3. Configure Environment Variables

	Create or update `.env` file in the `bench` directory:

	```bash
	# Hugging Face Dataset Configuration
	HF_DATASET_REPO=whitphx/transformersjs-performance-leaderboard-results-dev
	HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx

	# Optional: Local storage directory
	BENCHMARK_RESULTS_DIR=./benchmark-results

	# Optional: Server port
	PORT=7860
	```

	Important: Never commit `.env` to git. It's already in `.gitignore`.

	## Usage

	Once configured, the server will automatically upload results:

	```bash
	# Start the server
	npm run server

	# You should see:
	# 📤 HF Dataset upload enabled: username/transformersjs-benchmark-results
	```

	When benchmarks complete, you'll see:

	```
	✅ Completed: abc-123 in 5.2s
	✓ Benchmark abc-123 saved to file
	✓ Uploaded to HF Dataset: feature-extraction/Xenova/all-MiniLM-L6-v2/node_warm_cpu_fp32_b1.json
	```

	## File Structure in HF Dataset

	The dataset will have the same structure as local storage:

	```
	feature-extraction/
	├── Xenova/
	│ ├── all-MiniLM-L6-v2/
	│ │ ├── node_warm_cpu_fp32_b1.json
	│ │ ├── node_warm_webgpu_fp16_b1.json
	│ │ └── web_warm_wasm_b1_chromium.json
	│ └── distilbert-base-uncased/
	│ └── node_warm_cpu_fp32_b1.json
	text-classification/
	└── Xenova/
	└── distilbert-base-uncased/
	└── node_warm_cpu_fp32_b1.json
	```

	## JSON Format

	Each file contains a single benchmark result (not multiple runs):

	```json
	{
	"id": "abc-123-456",
	"platform": "node",
	"modelId": "Xenova/all-MiniLM-L6-v2",
	"task": "feature-extraction",
	"mode": "warm",
	"repeats": 3,
	"dtype": "fp32",
	"batchSize": 1,
	"device": "cpu",
	"timestamp": 1234567890,
	"status": "completed",
	"result": {
	"metrics": { ... },
	"environment": { ... }
	}
	}
	```

	## Behavior

	### Overwriting Results

	- Each benchmark configuration maps to a single file
	- New results overwrite the existing file
	- Only the latest result is kept per configuration
	- This ensures the dataset always has current data

	### Local vs Remote Storage

	- Local (JSONL): Keeps history of all runs (append-only)
	- Remote (JSON): Keeps only latest result (overwrite)

	This dual approach allows:
	- Local: Full history for analysis
	- Remote: Clean, current results for leaderboards

	### Failed Benchmarks

	Failed benchmarks are also uploaded to track:
	- Which models/configs have issues
	- Error types (memory errors, etc.)
	- Environmental context

	Example failed result:

	```json
	{
	"id": "def-456-789",
	"status": "failed",
	"error": "Benchmark failed with code 1: ...",
	"result": {
	"error": {
	"type": "memory_error",
	"message": "Aborted(). Build with -sASSERTIONS for more info.",
	"stage": "load"
	},
	"environment": { ... }
	}
	}
	```

	## Git Commits

	Each upload creates a git commit in the dataset with:

	```
	Update benchmark: Xenova/all-MiniLM-L6-v2 (node/feature-extraction)

	Benchmark ID: abc-123-456
	Status: completed
	Timestamp: 2025-10-13T06:48:57.481Z
	```

	## Disabling Upload

	To disable HF Dataset upload:

	1. Remove `HF_TOKEN` from `.env`, or
	2. Remove both `HF_DATASET_REPO` and `HF_TOKEN`

	The server will show:

	```
	📤 HF Dataset upload disabled (set HF_DATASET_REPO and HF_TOKEN to enable)
	```

	## Error Handling

	If HF upload fails:
	- The error is logged but doesn't fail the benchmark
	- Local storage still succeeds
	- You can retry manually or fix configuration

	Example error:

	```
	✗ Failed to upload benchmark abc-123 to HF Dataset: Authentication failed
	```

	## API Endpoint (Future)

	Currently uploads happen automatically. In the future, we could add:

	```bash
	# Manually trigger upload of a specific result
	POST /api/benchmark/:id/upload

	# Re-upload all local results to HF Dataset
	POST /api/benchmarks/sync
	```

	## Development vs Production

	Use different dataset repositories for development and production:

	Development (`.env`):
	```bash
	HF_DATASET_REPO=whitphx/transformersjs-performance-leaderboard-results-dev
	```

	Production (deployed environment):
	```bash
	HF_DATASET_REPO=whitphx/transformersjs-performance-leaderboard-results
	```

	This allows testing without polluting production data.