sachin sharma commited on
Commit
4f88f85
Β·
1 Parent(s): d136e15

added test case generation

Browse files
Files changed (3) hide show
  1. README.md +177 -2
  2. requirements.in +6 -0
  3. requirements.txt +27 -8
README.md CHANGED
@@ -43,9 +43,12 @@ ml-inference-service/
43
  β”œβ”€ models/
44
  β”‚ └─ resnet-18/ # Sample HF-style model folder
45
  β”œβ”€ scripts/
46
- β”‚ └─ model_download.bash # One-liner to snapshot HF weights locally
 
 
 
47
  β”œβ”€ requirements.in / requirements.txt
48
- └─ test_main.http # Example request you can run from IDEs
49
  ```
50
 
51
  ---
@@ -262,3 +265,175 @@ Then set `MODEL_NAME=your-org/your-model` in your environment (Pydantic will map
262
  - **Prod**: Use a process manager (e.g., `gunicorn -k uvicorn.workers.UvicornWorker`) and add health checks.
263
  - **Containerize**: Copy only `requirements.txt` and source, install wheels, and bake the `models/` folder into the image or mount it as a volume.
264
  - **CPU vs GPU**: This example uses CPU by default. If you have CUDA, install a CUDA-enabled PyTorch build and set device placement in your service.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43
  β”œβ”€ models/
44
  β”‚ └─ resnet-18/ # Sample HF-style model folder
45
  β”œβ”€ scripts/
46
+ β”‚ β”œβ”€ model_download.bash # One-liner to snapshot HF weights locally
47
+ β”‚ β”œβ”€ generate_test_datasets.py # Generate PyArrow datasets for testing
48
+ β”‚ β”œβ”€ test_datasets.py # Test generated datasets against API
49
+ β”‚ └─ test_datasets/ # Generated PyArrow test datasets (100 files)
50
  β”œβ”€ requirements.in / requirements.txt
51
+ └─ test_main.http # Example request you can run from IDEs
52
  ```
53
 
54
  ---
 
265
  - **Prod**: Use a process manager (e.g., `gunicorn -k uvicorn.workers.UvicornWorker`) and add health checks.
266
  - **Containerize**: Copy only `requirements.txt` and source, install wheels, and bake the `models/` folder into the image or mount it as a volume.
267
  - **CPU vs GPU**: This example uses CPU by default. If you have CUDA, install a CUDA-enabled PyTorch build and set device placement in your service.
268
+
269
+ ---
270
+
271
+ ## πŸ§ͺ PyArrow Test Datasets
272
+
273
+ This project includes a comprehensive **PyArrow-based dataset generation system** designed specifically for academic challenges and ML model validation. The system generates **100 standardized test datasets** that allow participants to validate their models against consistent, reproducible test cases.
274
+
275
+ ### πŸ—οΈ Why Both? `.parquet` + `_metadata.json`
276
+ ```
277
+ standard_test_001.parquet # Actual test data (images, requests, responses)
278
+ standard_test_001_metadata.json # Human-readable description and stats
279
+ ```
280
+
281
+ ### πŸ“Š Dataset Categories (25 each = 100 total)
282
+
283
+ #### 1. **Standard Test Cases** (`standard_test_*.parquet`)
284
+ **Purpose**: Baseline functionality validation
285
+
286
+ **Content**: Normal images with expected successful predictions
287
+
288
+ - **Image Types**: Random patterns, geometric shapes, gradients, text overlays, solid colors
289
+ - **Formats**: JPEG, PNG with proper MIME types
290
+ - **Sizes**: 224x224, 256x256, 299x299, 384x384 (common ML input sizes)
291
+ - **Expected Behavior**: HTTP 200 responses with valid prediction structure
292
+
293
+ #### 2. **Edge Case Tests** (`edge_case_*.parquet`)
294
+ **Purpose**: Robustness and error handling validation
295
+
296
+ **Content**: Challenging scenarios that test model resilience
297
+
298
+ - **Tiny Images**: 32x32, 1x1 pixels (tests preprocessing robustness)
299
+ - **Huge Images**: 2048x2048 (tests memory management and resizing)
300
+ - **Extreme Aspect Ratios**: 1000x50 (tests preprocessing assumptions)
301
+ - **Corrupted Data**: Invalid base64, malformed requests (tests error handling)
302
+ - **Expected Behavior**: Graceful degradation, proper error responses
303
+
304
+ #### 3. **Performance Benchmarks** (`performance_test_*.parquet`)
305
+ **Purpose**: Latency and throughput measurement
306
+
307
+ **Content**: Varying batch sizes for performance profiling
308
+
309
+ - **Batch Sizes**: 1, 5, 10, 25, 50, 100 images per test
310
+ - **Latency Tracking**: Expected max response times based on batch size
311
+ - **Throughput Metrics**: Requests per second under different loads
312
+ - **Expected Behavior**: Consistent performance within acceptable bounds
313
+
314
+ #### 4. **Model Comparison** (`model_comparison_*.parquet`)
315
+ **Purpose**: Cross-model validation and benchmarking
316
+
317
+ **Content**: Identical inputs tested across different model architectures
318
+
319
+ - **Model Types**: ResNet-18/50, ViT, ConvNext, Swin Transformer
320
+ - **Consistent Inputs**: Same 10 base images per dataset
321
+ - **Comparative Analysis**: Enables direct performance comparison between models
322
+ - **Expected Behavior**: Architecture-specific but structurally consistent responses
323
+
324
+ ### πŸ› οΈ Generation Process
325
+
326
+ The dataset generation follows a **deterministic, reproducible approach**:
327
+
328
+ #### Step 1: Synthetic Image Creation
329
+ ```python
330
+ # Why synthetic images instead of real photos?
331
+ # 1. Copyright-free for academic distribution
332
+ # 3. Programmatically generated edge cases
333
+
334
+ def create_synthetic_image(width, height, image_type):
335
+ if image_type == "random":
336
+ # RGB noise - tests model noise robustness
337
+ array = np.random.randint(0, 256, (height, width, 3))
338
+ elif image_type == "geometric":
339
+ # Shapes and patterns - tests feature detection
340
+ # ... geometric pattern generation
341
+ # ... other synthetic types
342
+ ```
343
+
344
+ #### Step 2: API Request Structure Generation
345
+ ```python
346
+ # Matches exact API format for drop-in testing
347
+ {
348
+ "image": {
349
+ "mediaType": "image/jpeg", # Proper MIME types
350
+ "data": "<base64-encoded-image>" # Standard encoding
351
+ }
352
+ }
353
+ ```
354
+
355
+ #### Step 3: Expected Response Generation
356
+ ```python
357
+ # Realistic prediction responses with proper structure
358
+ {
359
+ "prediction": "tiger_cat", # ImageNet-style labels
360
+ "confidence": 0.8742, # Realistic confidence scores
361
+ "predicted_label": 282, # Numeric label indices
362
+ "model": "microsoft/resnet-18", # Model identification
363
+ "mediaType": "image/jpeg" # Echo input format
364
+ }
365
+ ```
366
+
367
+ #### Step 4: PyArrow Table Creation
368
+ ```python
369
+ # Columnar storage for efficient querying
370
+ table = pa.table({
371
+ "dataset_id": [...], # Unique dataset identifier
372
+ "image_id": [...], # Individual image identifier
373
+ "api_request": [...], # JSON-serialized requests
374
+ "expected_response": [...], # JSON-serialized expected responses
375
+ "test_category": [...], # Category classification
376
+ "difficulty": [...], # Complexity indicator
377
+ # ... additional metadata columns
378
+ })
379
+ ```
380
+
381
+ ### πŸš€ Usage Guide
382
+
383
+
384
+ **1. Generate Test Datasets**
385
+ ```bash
386
+ # Create all 100 datasets (~2-5 minutes depending on hardware)
387
+ python scripts/generate_test_datasets.py
388
+
389
+ # What this creates:
390
+ # - scripts/test_datasets/*.parquet (actual test data)
391
+ # - scripts/test_datasets/*_metadata.json (human-readable info)
392
+ # - scripts/test_datasets/datasets_summary.json (overview)
393
+ ```
394
+
395
+ **2. Validate API**
396
+ ```bash
397
+ # Start your ML service
398
+ uvicorn main:app --reload
399
+
400
+ # Quick test (5 samples per dataset)
401
+ python scripts/test_datasets.py --quick
402
+
403
+ # Full validation (all samples)
404
+ python scripts/test_datasets.py
405
+
406
+ # Category-specific testing
407
+ python scripts/test_datasets.py --category edge_case
408
+ python scripts/test_datasets.py --category performance
409
+ ```
410
+
411
+ ### πŸ“ˆ Testing Output and Metrics
412
+
413
+ The test runner provides comprehensive validation metrics:
414
+
415
+ ```
416
+ 🏁 DATASET TESTING SUMMARY
417
+ ============================================================
418
+ Datasets tested: 100
419
+ Successful datasets: 95
420
+ Failed datasets: 5
421
+ Total samples: 1,247
422
+ Overall success rate: 87.3%
423
+ Test duration: 45.2s
424
+
425
+ Performance:
426
+ Avg latency: 123.4ms
427
+ Median latency: 98.7ms
428
+ Min latency: 45.2ms
429
+ Max latency: 2,341.0ms
430
+ Requests/sec: 27.6
431
+
432
+ Category breakdown:
433
+ standard: 25 datasets, 94.2% avg success
434
+ edge_case: 25 datasets, 76.8% avg success
435
+ performance: 25 datasets, 91.1% avg success
436
+ model_comparison: 25 datasets, 89.3% avg success
437
+
438
+ Failed datasets: edge_case_023, edge_case_019, performance_012
439
+ ```
requirements.in CHANGED
@@ -14,3 +14,9 @@ python-multipart==0.0.6
14
  transformers>=4.35.0
15
  torch>=2.4.0 # Newer PyTorch with NumPy 2.x support
16
  pillow>=10.0.0
 
 
 
 
 
 
 
14
  transformers>=4.35.0
15
  torch>=2.4.0 # Newer PyTorch with NumPy 2.x support
16
  pillow>=10.0.0
17
+
18
+ # Dataset generation and testing
19
+ pyarrow>=14.0.0
20
+ numpy>=1.24.0
21
+ pandas>=2.0.0
22
+ requests>=2.25.0
requirements.txt CHANGED
@@ -1,5 +1,9 @@
1
- # This file was autogenerated by uv via the following command:
2
- # uv pip compile requirements.in -o requirements.txt
 
 
 
 
3
  annotated-types==0.7.0
4
  # via pydantic
5
  anyio==3.7.1
@@ -47,7 +51,10 @@ mpmath==1.3.0
47
  networkx==3.5
48
  # via torch
49
  numpy==2.3.2
50
- # via transformers
 
 
 
51
  nvidia-cublas-cu12==12.8.4.1
52
  # via
53
  # nvidia-cudnn-cu12
@@ -89,8 +96,12 @@ packaging==25.0
89
  # via
90
  # huggingface-hub
91
  # transformers
 
 
92
  pillow==10.1.0
93
  # via -r requirements.in
 
 
94
  pydantic==2.5.0
95
  # via
96
  # -r requirements.in
@@ -100,6 +111,8 @@ pydantic-core==2.14.1
100
  # via pydantic
101
  pydantic-settings==2.0.3
102
  # via -r requirements.in
 
 
103
  python-dotenv==0.21.0
104
  # via
105
  # -r requirements.in
@@ -107,6 +120,8 @@ python-dotenv==0.21.0
107
  # uvicorn
108
  python-multipart==0.0.6
109
  # via -r requirements.in
 
 
110
  pyyaml==6.0.2
111
  # via
112
  # huggingface-hub
@@ -116,14 +131,13 @@ regex==2025.7.34
116
  # via transformers
117
  requests==2.32.5
118
  # via
 
119
  # huggingface-hub
120
  # transformers
121
  safetensors==0.6.2
122
  # via transformers
123
- setuptools==80.9.0
124
- # via
125
- # torch
126
- # triton
127
  sniffio==1.3.1
128
  # via anyio
129
  starlette==0.27.0
@@ -149,9 +163,11 @@ typing-extensions==4.15.0
149
  # pydantic
150
  # pydantic-core
151
  # torch
 
 
152
  urllib3==2.5.0
153
  # via requests
154
- uvicorn==0.24.0
155
  # via -r requirements.in
156
  uvloop==0.21.0
157
  # via uvicorn
@@ -159,3 +175,6 @@ watchfiles==1.1.0
159
  # via uvicorn
160
  websockets==15.0.1
161
  # via uvicorn
 
 
 
 
1
+ #
2
+ # This file is autogenerated by pip-compile with Python 3.12
3
+ # by the following command:
4
+ #
5
+ # pip-compile requirements.in
6
+ #
7
  annotated-types==0.7.0
8
  # via pydantic
9
  anyio==3.7.1
 
51
  networkx==3.5
52
  # via torch
53
  numpy==2.3.2
54
+ # via
55
+ # -r requirements.in
56
+ # pandas
57
+ # transformers
58
  nvidia-cublas-cu12==12.8.4.1
59
  # via
60
  # nvidia-cudnn-cu12
 
96
  # via
97
  # huggingface-hub
98
  # transformers
99
+ pandas==2.3.2
100
+ # via -r requirements.in
101
  pillow==10.1.0
102
  # via -r requirements.in
103
+ pyarrow==21.0.0
104
+ # via -r requirements.in
105
  pydantic==2.5.0
106
  # via
107
  # -r requirements.in
 
111
  # via pydantic
112
  pydantic-settings==2.0.3
113
  # via -r requirements.in
114
+ python-dateutil==2.9.0.post0
115
+ # via pandas
116
  python-dotenv==0.21.0
117
  # via
118
  # -r requirements.in
 
120
  # uvicorn
121
  python-multipart==0.0.6
122
  # via -r requirements.in
123
+ pytz==2025.2
124
+ # via pandas
125
  pyyaml==6.0.2
126
  # via
127
  # huggingface-hub
 
131
  # via transformers
132
  requests==2.32.5
133
  # via
134
+ # -r requirements.in
135
  # huggingface-hub
136
  # transformers
137
  safetensors==0.6.2
138
  # via transformers
139
+ six==1.17.0
140
+ # via python-dateutil
 
 
141
  sniffio==1.3.1
142
  # via anyio
143
  starlette==0.27.0
 
163
  # pydantic
164
  # pydantic-core
165
  # torch
166
+ tzdata==2025.2
167
+ # via pandas
168
  urllib3==2.5.0
169
  # via requests
170
+ uvicorn[standard]==0.24.0
171
  # via -r requirements.in
172
  uvloop==0.21.0
173
  # via uvicorn
 
175
  # via uvicorn
176
  websockets==15.0.1
177
  # via uvicorn
178
+
179
+ # The following packages are considered to be unsafe in a requirements file:
180
+ # setuptools