sachin sharma
commited on
Commit
Β·
4f88f85
1
Parent(s):
d136e15
added test case generation
Browse files- README.md +177 -2
- requirements.in +6 -0
- requirements.txt +27 -8
README.md
CHANGED
|
@@ -43,9 +43,12 @@ ml-inference-service/
|
|
| 43 |
ββ models/
|
| 44 |
β ββ resnet-18/ # Sample HF-style model folder
|
| 45 |
ββ scripts/
|
| 46 |
-
β
|
|
|
|
|
|
|
|
|
|
| 47 |
ββ requirements.in / requirements.txt
|
| 48 |
-
ββ test_main.http
|
| 49 |
```
|
| 50 |
|
| 51 |
---
|
|
@@ -262,3 +265,175 @@ Then set `MODEL_NAME=your-org/your-model` in your environment (Pydantic will map
|
|
| 262 |
- **Prod**: Use a process manager (e.g., `gunicorn -k uvicorn.workers.UvicornWorker`) and add health checks.
|
| 263 |
- **Containerize**: Copy only `requirements.txt` and source, install wheels, and bake the `models/` folder into the image or mount it as a volume.
|
| 264 |
- **CPU vs GPU**: This example uses CPU by default. If you have CUDA, install a CUDA-enabled PyTorch build and set device placement in your service.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
ββ models/
|
| 44 |
β ββ resnet-18/ # Sample HF-style model folder
|
| 45 |
ββ scripts/
|
| 46 |
+
β ββ model_download.bash # One-liner to snapshot HF weights locally
|
| 47 |
+
β ββ generate_test_datasets.py # Generate PyArrow datasets for testing
|
| 48 |
+
β ββ test_datasets.py # Test generated datasets against API
|
| 49 |
+
β ββ test_datasets/ # Generated PyArrow test datasets (100 files)
|
| 50 |
ββ requirements.in / requirements.txt
|
| 51 |
+
ββ test_main.http # Example request you can run from IDEs
|
| 52 |
```
|
| 53 |
|
| 54 |
---
|
|
|
|
| 265 |
- **Prod**: Use a process manager (e.g., `gunicorn -k uvicorn.workers.UvicornWorker`) and add health checks.
|
| 266 |
- **Containerize**: Copy only `requirements.txt` and source, install wheels, and bake the `models/` folder into the image or mount it as a volume.
|
| 267 |
- **CPU vs GPU**: This example uses CPU by default. If you have CUDA, install a CUDA-enabled PyTorch build and set device placement in your service.
|
| 268 |
+
|
| 269 |
+
---
|
| 270 |
+
|
| 271 |
+
## π§ͺ PyArrow Test Datasets
|
| 272 |
+
|
| 273 |
+
This project includes a comprehensive **PyArrow-based dataset generation system** designed specifically for academic challenges and ML model validation. The system generates **100 standardized test datasets** that allow participants to validate their models against consistent, reproducible test cases.
|
| 274 |
+
|
| 275 |
+
### ποΈ Why Both? `.parquet` + `_metadata.json`
|
| 276 |
+
```
|
| 277 |
+
standard_test_001.parquet # Actual test data (images, requests, responses)
|
| 278 |
+
standard_test_001_metadata.json # Human-readable description and stats
|
| 279 |
+
```
|
| 280 |
+
|
| 281 |
+
### π Dataset Categories (25 each = 100 total)
|
| 282 |
+
|
| 283 |
+
#### 1. **Standard Test Cases** (`standard_test_*.parquet`)
|
| 284 |
+
**Purpose**: Baseline functionality validation
|
| 285 |
+
|
| 286 |
+
**Content**: Normal images with expected successful predictions
|
| 287 |
+
|
| 288 |
+
- **Image Types**: Random patterns, geometric shapes, gradients, text overlays, solid colors
|
| 289 |
+
- **Formats**: JPEG, PNG with proper MIME types
|
| 290 |
+
- **Sizes**: 224x224, 256x256, 299x299, 384x384 (common ML input sizes)
|
| 291 |
+
- **Expected Behavior**: HTTP 200 responses with valid prediction structure
|
| 292 |
+
|
| 293 |
+
#### 2. **Edge Case Tests** (`edge_case_*.parquet`)
|
| 294 |
+
**Purpose**: Robustness and error handling validation
|
| 295 |
+
|
| 296 |
+
**Content**: Challenging scenarios that test model resilience
|
| 297 |
+
|
| 298 |
+
- **Tiny Images**: 32x32, 1x1 pixels (tests preprocessing robustness)
|
| 299 |
+
- **Huge Images**: 2048x2048 (tests memory management and resizing)
|
| 300 |
+
- **Extreme Aspect Ratios**: 1000x50 (tests preprocessing assumptions)
|
| 301 |
+
- **Corrupted Data**: Invalid base64, malformed requests (tests error handling)
|
| 302 |
+
- **Expected Behavior**: Graceful degradation, proper error responses
|
| 303 |
+
|
| 304 |
+
#### 3. **Performance Benchmarks** (`performance_test_*.parquet`)
|
| 305 |
+
**Purpose**: Latency and throughput measurement
|
| 306 |
+
|
| 307 |
+
**Content**: Varying batch sizes for performance profiling
|
| 308 |
+
|
| 309 |
+
- **Batch Sizes**: 1, 5, 10, 25, 50, 100 images per test
|
| 310 |
+
- **Latency Tracking**: Expected max response times based on batch size
|
| 311 |
+
- **Throughput Metrics**: Requests per second under different loads
|
| 312 |
+
- **Expected Behavior**: Consistent performance within acceptable bounds
|
| 313 |
+
|
| 314 |
+
#### 4. **Model Comparison** (`model_comparison_*.parquet`)
|
| 315 |
+
**Purpose**: Cross-model validation and benchmarking
|
| 316 |
+
|
| 317 |
+
**Content**: Identical inputs tested across different model architectures
|
| 318 |
+
|
| 319 |
+
- **Model Types**: ResNet-18/50, ViT, ConvNext, Swin Transformer
|
| 320 |
+
- **Consistent Inputs**: Same 10 base images per dataset
|
| 321 |
+
- **Comparative Analysis**: Enables direct performance comparison between models
|
| 322 |
+
- **Expected Behavior**: Architecture-specific but structurally consistent responses
|
| 323 |
+
|
| 324 |
+
### π οΈ Generation Process
|
| 325 |
+
|
| 326 |
+
The dataset generation follows a **deterministic, reproducible approach**:
|
| 327 |
+
|
| 328 |
+
#### Step 1: Synthetic Image Creation
|
| 329 |
+
```python
|
| 330 |
+
# Why synthetic images instead of real photos?
|
| 331 |
+
# 1. Copyright-free for academic distribution
|
| 332 |
+
# 3. Programmatically generated edge cases
|
| 333 |
+
|
| 334 |
+
def create_synthetic_image(width, height, image_type):
|
| 335 |
+
if image_type == "random":
|
| 336 |
+
# RGB noise - tests model noise robustness
|
| 337 |
+
array = np.random.randint(0, 256, (height, width, 3))
|
| 338 |
+
elif image_type == "geometric":
|
| 339 |
+
# Shapes and patterns - tests feature detection
|
| 340 |
+
# ... geometric pattern generation
|
| 341 |
+
# ... other synthetic types
|
| 342 |
+
```
|
| 343 |
+
|
| 344 |
+
#### Step 2: API Request Structure Generation
|
| 345 |
+
```python
|
| 346 |
+
# Matches exact API format for drop-in testing
|
| 347 |
+
{
|
| 348 |
+
"image": {
|
| 349 |
+
"mediaType": "image/jpeg", # Proper MIME types
|
| 350 |
+
"data": "<base64-encoded-image>" # Standard encoding
|
| 351 |
+
}
|
| 352 |
+
}
|
| 353 |
+
```
|
| 354 |
+
|
| 355 |
+
#### Step 3: Expected Response Generation
|
| 356 |
+
```python
|
| 357 |
+
# Realistic prediction responses with proper structure
|
| 358 |
+
{
|
| 359 |
+
"prediction": "tiger_cat", # ImageNet-style labels
|
| 360 |
+
"confidence": 0.8742, # Realistic confidence scores
|
| 361 |
+
"predicted_label": 282, # Numeric label indices
|
| 362 |
+
"model": "microsoft/resnet-18", # Model identification
|
| 363 |
+
"mediaType": "image/jpeg" # Echo input format
|
| 364 |
+
}
|
| 365 |
+
```
|
| 366 |
+
|
| 367 |
+
#### Step 4: PyArrow Table Creation
|
| 368 |
+
```python
|
| 369 |
+
# Columnar storage for efficient querying
|
| 370 |
+
table = pa.table({
|
| 371 |
+
"dataset_id": [...], # Unique dataset identifier
|
| 372 |
+
"image_id": [...], # Individual image identifier
|
| 373 |
+
"api_request": [...], # JSON-serialized requests
|
| 374 |
+
"expected_response": [...], # JSON-serialized expected responses
|
| 375 |
+
"test_category": [...], # Category classification
|
| 376 |
+
"difficulty": [...], # Complexity indicator
|
| 377 |
+
# ... additional metadata columns
|
| 378 |
+
})
|
| 379 |
+
```
|
| 380 |
+
|
| 381 |
+
### π Usage Guide
|
| 382 |
+
|
| 383 |
+
|
| 384 |
+
**1. Generate Test Datasets**
|
| 385 |
+
```bash
|
| 386 |
+
# Create all 100 datasets (~2-5 minutes depending on hardware)
|
| 387 |
+
python scripts/generate_test_datasets.py
|
| 388 |
+
|
| 389 |
+
# What this creates:
|
| 390 |
+
# - scripts/test_datasets/*.parquet (actual test data)
|
| 391 |
+
# - scripts/test_datasets/*_metadata.json (human-readable info)
|
| 392 |
+
# - scripts/test_datasets/datasets_summary.json (overview)
|
| 393 |
+
```
|
| 394 |
+
|
| 395 |
+
**2. Validate API**
|
| 396 |
+
```bash
|
| 397 |
+
# Start your ML service
|
| 398 |
+
uvicorn main:app --reload
|
| 399 |
+
|
| 400 |
+
# Quick test (5 samples per dataset)
|
| 401 |
+
python scripts/test_datasets.py --quick
|
| 402 |
+
|
| 403 |
+
# Full validation (all samples)
|
| 404 |
+
python scripts/test_datasets.py
|
| 405 |
+
|
| 406 |
+
# Category-specific testing
|
| 407 |
+
python scripts/test_datasets.py --category edge_case
|
| 408 |
+
python scripts/test_datasets.py --category performance
|
| 409 |
+
```
|
| 410 |
+
|
| 411 |
+
### π Testing Output and Metrics
|
| 412 |
+
|
| 413 |
+
The test runner provides comprehensive validation metrics:
|
| 414 |
+
|
| 415 |
+
```
|
| 416 |
+
π DATASET TESTING SUMMARY
|
| 417 |
+
============================================================
|
| 418 |
+
Datasets tested: 100
|
| 419 |
+
Successful datasets: 95
|
| 420 |
+
Failed datasets: 5
|
| 421 |
+
Total samples: 1,247
|
| 422 |
+
Overall success rate: 87.3%
|
| 423 |
+
Test duration: 45.2s
|
| 424 |
+
|
| 425 |
+
Performance:
|
| 426 |
+
Avg latency: 123.4ms
|
| 427 |
+
Median latency: 98.7ms
|
| 428 |
+
Min latency: 45.2ms
|
| 429 |
+
Max latency: 2,341.0ms
|
| 430 |
+
Requests/sec: 27.6
|
| 431 |
+
|
| 432 |
+
Category breakdown:
|
| 433 |
+
standard: 25 datasets, 94.2% avg success
|
| 434 |
+
edge_case: 25 datasets, 76.8% avg success
|
| 435 |
+
performance: 25 datasets, 91.1% avg success
|
| 436 |
+
model_comparison: 25 datasets, 89.3% avg success
|
| 437 |
+
|
| 438 |
+
Failed datasets: edge_case_023, edge_case_019, performance_012
|
| 439 |
+
```
|
requirements.in
CHANGED
|
@@ -14,3 +14,9 @@ python-multipart==0.0.6
|
|
| 14 |
transformers>=4.35.0
|
| 15 |
torch>=2.4.0 # Newer PyTorch with NumPy 2.x support
|
| 16 |
pillow>=10.0.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
transformers>=4.35.0
|
| 15 |
torch>=2.4.0 # Newer PyTorch with NumPy 2.x support
|
| 16 |
pillow>=10.0.0
|
| 17 |
+
|
| 18 |
+
# Dataset generation and testing
|
| 19 |
+
pyarrow>=14.0.0
|
| 20 |
+
numpy>=1.24.0
|
| 21 |
+
pandas>=2.0.0
|
| 22 |
+
requests>=2.25.0
|
requirements.txt
CHANGED
|
@@ -1,5 +1,9 @@
|
|
| 1 |
-
#
|
| 2 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
annotated-types==0.7.0
|
| 4 |
# via pydantic
|
| 5 |
anyio==3.7.1
|
|
@@ -47,7 +51,10 @@ mpmath==1.3.0
|
|
| 47 |
networkx==3.5
|
| 48 |
# via torch
|
| 49 |
numpy==2.3.2
|
| 50 |
-
# via
|
|
|
|
|
|
|
|
|
|
| 51 |
nvidia-cublas-cu12==12.8.4.1
|
| 52 |
# via
|
| 53 |
# nvidia-cudnn-cu12
|
|
@@ -89,8 +96,12 @@ packaging==25.0
|
|
| 89 |
# via
|
| 90 |
# huggingface-hub
|
| 91 |
# transformers
|
|
|
|
|
|
|
| 92 |
pillow==10.1.0
|
| 93 |
# via -r requirements.in
|
|
|
|
|
|
|
| 94 |
pydantic==2.5.0
|
| 95 |
# via
|
| 96 |
# -r requirements.in
|
|
@@ -100,6 +111,8 @@ pydantic-core==2.14.1
|
|
| 100 |
# via pydantic
|
| 101 |
pydantic-settings==2.0.3
|
| 102 |
# via -r requirements.in
|
|
|
|
|
|
|
| 103 |
python-dotenv==0.21.0
|
| 104 |
# via
|
| 105 |
# -r requirements.in
|
|
@@ -107,6 +120,8 @@ python-dotenv==0.21.0
|
|
| 107 |
# uvicorn
|
| 108 |
python-multipart==0.0.6
|
| 109 |
# via -r requirements.in
|
|
|
|
|
|
|
| 110 |
pyyaml==6.0.2
|
| 111 |
# via
|
| 112 |
# huggingface-hub
|
|
@@ -116,14 +131,13 @@ regex==2025.7.34
|
|
| 116 |
# via transformers
|
| 117 |
requests==2.32.5
|
| 118 |
# via
|
|
|
|
| 119 |
# huggingface-hub
|
| 120 |
# transformers
|
| 121 |
safetensors==0.6.2
|
| 122 |
# via transformers
|
| 123 |
-
|
| 124 |
-
# via
|
| 125 |
-
# torch
|
| 126 |
-
# triton
|
| 127 |
sniffio==1.3.1
|
| 128 |
# via anyio
|
| 129 |
starlette==0.27.0
|
|
@@ -149,9 +163,11 @@ typing-extensions==4.15.0
|
|
| 149 |
# pydantic
|
| 150 |
# pydantic-core
|
| 151 |
# torch
|
|
|
|
|
|
|
| 152 |
urllib3==2.5.0
|
| 153 |
# via requests
|
| 154 |
-
uvicorn==0.24.0
|
| 155 |
# via -r requirements.in
|
| 156 |
uvloop==0.21.0
|
| 157 |
# via uvicorn
|
|
@@ -159,3 +175,6 @@ watchfiles==1.1.0
|
|
| 159 |
# via uvicorn
|
| 160 |
websockets==15.0.1
|
| 161 |
# via uvicorn
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#
|
| 2 |
+
# This file is autogenerated by pip-compile with Python 3.12
|
| 3 |
+
# by the following command:
|
| 4 |
+
#
|
| 5 |
+
# pip-compile requirements.in
|
| 6 |
+
#
|
| 7 |
annotated-types==0.7.0
|
| 8 |
# via pydantic
|
| 9 |
anyio==3.7.1
|
|
|
|
| 51 |
networkx==3.5
|
| 52 |
# via torch
|
| 53 |
numpy==2.3.2
|
| 54 |
+
# via
|
| 55 |
+
# -r requirements.in
|
| 56 |
+
# pandas
|
| 57 |
+
# transformers
|
| 58 |
nvidia-cublas-cu12==12.8.4.1
|
| 59 |
# via
|
| 60 |
# nvidia-cudnn-cu12
|
|
|
|
| 96 |
# via
|
| 97 |
# huggingface-hub
|
| 98 |
# transformers
|
| 99 |
+
pandas==2.3.2
|
| 100 |
+
# via -r requirements.in
|
| 101 |
pillow==10.1.0
|
| 102 |
# via -r requirements.in
|
| 103 |
+
pyarrow==21.0.0
|
| 104 |
+
# via -r requirements.in
|
| 105 |
pydantic==2.5.0
|
| 106 |
# via
|
| 107 |
# -r requirements.in
|
|
|
|
| 111 |
# via pydantic
|
| 112 |
pydantic-settings==2.0.3
|
| 113 |
# via -r requirements.in
|
| 114 |
+
python-dateutil==2.9.0.post0
|
| 115 |
+
# via pandas
|
| 116 |
python-dotenv==0.21.0
|
| 117 |
# via
|
| 118 |
# -r requirements.in
|
|
|
|
| 120 |
# uvicorn
|
| 121 |
python-multipart==0.0.6
|
| 122 |
# via -r requirements.in
|
| 123 |
+
pytz==2025.2
|
| 124 |
+
# via pandas
|
| 125 |
pyyaml==6.0.2
|
| 126 |
# via
|
| 127 |
# huggingface-hub
|
|
|
|
| 131 |
# via transformers
|
| 132 |
requests==2.32.5
|
| 133 |
# via
|
| 134 |
+
# -r requirements.in
|
| 135 |
# huggingface-hub
|
| 136 |
# transformers
|
| 137 |
safetensors==0.6.2
|
| 138 |
# via transformers
|
| 139 |
+
six==1.17.0
|
| 140 |
+
# via python-dateutil
|
|
|
|
|
|
|
| 141 |
sniffio==1.3.1
|
| 142 |
# via anyio
|
| 143 |
starlette==0.27.0
|
|
|
|
| 163 |
# pydantic
|
| 164 |
# pydantic-core
|
| 165 |
# torch
|
| 166 |
+
tzdata==2025.2
|
| 167 |
+
# via pandas
|
| 168 |
urllib3==2.5.0
|
| 169 |
# via requests
|
| 170 |
+
uvicorn[standard]==0.24.0
|
| 171 |
# via -r requirements.in
|
| 172 |
uvloop==0.21.0
|
| 173 |
# via uvicorn
|
|
|
|
| 175 |
# via uvicorn
|
| 176 |
websockets==15.0.1
|
| 177 |
# via uvicorn
|
| 178 |
+
|
| 179 |
+
# The following packages are considered to be unsafe in a requirements file:
|
| 180 |
+
# setuptools
|