sachin sharma commited on
Commit
0b393b6
·
1 Parent(s): 9d9449a

updated README.md

Browse files
Files changed (2) hide show
  1. .dockerignore +38 -0
  2. README.md +197 -272
.dockerignore ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ __pycache__
2
+ *.pyc
3
+ *.pyo
4
+ *.pyd
5
+ .Python
6
+ *.so
7
+ *.egg
8
+ *.egg-info
9
+ dist
10
+ build
11
+
12
+ .venv
13
+ venv
14
+ ENV
15
+ env
16
+
17
+ .git
18
+ .gitignore
19
+ .idea
20
+ .vscode
21
+ .claude
22
+
23
+ *.md
24
+ README.md
25
+ Dockerfile
26
+ .dockerignore
27
+
28
+ test_*.http
29
+ test_results
30
+ scripts/test_datasets
31
+
32
+ .pytest_cache
33
+ .coverage
34
+ htmlcov
35
+
36
+ *.log
37
+ .DS_Store
38
+ .python-version
README.md CHANGED
@@ -1,89 +1,50 @@
1
- # ML Inference Service (FastAPI)
2
 
3
- A FastAPI-based inference server designed to make it easy to serve your ML models. The repo includes a complete working example using ResNet-18 for image classification, but the architecture is built to be model-agnostic. You implement a simple abstract base class, and everything else just works.
4
 
5
- Key features:
6
- - Abstract InferenceService class that you subclass for your model
7
- - Example ResNet-18 implementation showing how to do it
8
- - FastAPI application with clean separation (routes → controller → service)
9
- - Model loaded once at startup and reused across requests
10
- - Background threading for inference so the server stays responsive
11
- - Type-safe request/response handling with Pydantic
12
- - Single generic endpoint that works with any model
13
 
14
- ## What you get
15
-
16
- The service exposes a single endpoint `POST /predict` that accepts a base64-encoded image and returns:
17
- - `prediction` - the predicted class label
18
- - `confidence` - softmax probability for the prediction
19
- - `predicted_label` - numeric class index
20
- - `model` - identifier for which model produced this prediction
21
- - `mediaType` - echoed from the request
22
-
23
- The inference runs in a background thread using asyncio so long-running model predictions don't block the server from handling other requests.
24
-
25
- ## Project Layout
26
-
27
- ```
28
- ml-inference-service/
29
- ├─ main.py # Entry point
30
- ├─ app/
31
- │ ├─ core/
32
- │ │ ├─ app.py # Everything: config, DI, lifespan, app factory
33
- │ │ └─ logging.py # Logger setup
34
- │ ├─ api/
35
- │ │ ├─ models.py # Pydantic request/response schemas
36
- │ │ ├─ controllers.py # HTTP → service orchestration
37
- │ │ └─ routes/
38
- │ │ └─ prediction.py # POST /predict endpoint
39
- │ └─ services/
40
- │ ├─ base.py # Abstract InferenceService class
41
- │ └─ inference.py # ResNetInferenceService (example implementation)
42
- ├─ models/
43
- │ └─ microsoft/
44
- │ └─ resnet-18/ # Model files (preserves org structure)
45
- ├─ scripts/
46
- │ ├─ generate_test_datasets.py
47
- │ ├─ test_datasets.py
48
- │ └─ test_datasets/
49
- ├─ requirements.txt
50
- └─ test_main.http # Example HTTP request
51
- ```
52
-
53
- The key change from a typical FastAPI app is that `app/core/app.py` consolidates configuration, dependency injection, lifecycle management, and the app factory into one file. This avoids the complexity of managing global variables across multiple modules.
54
-
55
- ## Quickstart
56
-
57
- 1) Install dependencies (Python 3.9+)
58
  ```bash
 
59
  python -m venv .venv
60
- source .venv/bin/activate # Windows: .venv\Scripts\activate
61
  pip install -r requirements.txt
62
- ```
63
 
64
- 2) Download the example model
65
- ```bash
66
  bash scripts/model_download.bash
67
- ```
68
- This downloads ResNet-18 from Hugging Face and saves it to `models/microsoft/resnet-18/` (note the org structure is preserved).
69
 
70
- 3) Run the server
71
- ```bash
72
  uvicorn main:app --reload
73
  ```
74
- Server starts on `http://127.0.0.1:8000`.
75
 
76
- 4) Test the API
77
 
78
- Use `test_main.http` from your IDE or curl:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79
 
80
  ```bash
81
- curl -X POST http://127.0.0.1:8000/predict \
 
82
  -H "Content-Type: application/json" \
83
  -d '{
84
  "image": {
85
  "mediaType": "image/jpeg",
86
- "data": "<base64-encoded-bytes>"
87
  }
88
  }'
89
  ```
@@ -92,45 +53,77 @@ Example response:
92
  ```json
93
  {
94
  "prediction": "tiger cat",
95
- "confidence": 0.9971,
96
  "predicted_label": 282,
97
  "model": "microsoft/resnet-18",
98
  "mediaType": "image/jpeg"
99
  }
100
  ```
101
 
102
- ## Integrating Your Own Model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
103
 
104
- To use your own model, you implement the `InferenceService` abstract base class. The rest of the infrastructure (API routes, controllers, dependency injection) is already generic and works with any implementation.
105
 
106
- ### Step 1: Implement the InferenceService ABC
107
 
108
- Create a new file `app/services/your_model_service.py`:
109
 
110
  ```python
 
111
  from app.services.base import InferenceService
112
  from app.api.models import ImageRequest, PredictionResponse
 
113
 
114
  class YourModelService(InferenceService[ImageRequest, PredictionResponse]):
115
  def __init__(self, model_name: str):
116
  self.model_name = model_name
117
- self.model_path = os.path.join("models", model_name)
118
  self.model = None
119
  self._is_loaded = False
120
 
121
  async def load_model(self) -> None:
122
- # Load your model here
123
  self.model = load_your_model(self.model_path)
124
  self._is_loaded = True
125
 
126
  async def predict(self, request: ImageRequest) -> PredictionResponse:
127
- # Offload to background thread (important for performance)
128
  return await asyncio.to_thread(self._predict_sync, request)
129
 
130
  def _predict_sync(self, request: ImageRequest) -> PredictionResponse:
131
- # Decode image, run inference, return typed response
132
  image = decode_base64_image(request.image.data)
133
  result = self.model(image)
 
134
  return PredictionResponse(
135
  prediction=result.label,
136
  confidence=result.confidence,
@@ -144,275 +137,184 @@ class YourModelService(InferenceService[ImageRequest, PredictionResponse]):
144
  return self._is_loaded
145
  ```
146
 
147
- The key points:
148
- - Subclass `InferenceService[RequestType, ResponseType]` with your request/response types
149
- - Implement three methods: `load_model()`, `predict()`, and `is_loaded` property
150
- - Use `asyncio.to_thread()` to offload CPU-intensive inference to a background thread
151
- - Return typed Pydantic models, not dicts
152
 
153
- ### Step 2: Register your service at startup
154
 
155
- Edit `app/core/app.py` and find the lifespan function (around line 134):
156
 
157
  ```python
158
- # Replace this:
159
  service = ResNetInferenceService(model_name="microsoft/resnet-18")
160
 
161
- # With this:
162
  service = YourModelService(model_name="your-org/your-model")
163
  ```
164
 
165
- That's it. The same `/predict` endpoint now serves your model.
 
 
166
 
167
- ### Model file structure
168
 
169
- Your model files should be organized as:
170
  ```
171
  models/
172
  └── your-org/
173
  └── your-model/
174
  ├── config.json
175
  ├── weights.bin
176
- └── ... other files
177
  ```
178
 
179
- The full org/model structure is preserved - no more dropping the org prefix.
180
-
181
- ### Example: Swapping ResNet for ViT
182
 
183
- ```python
184
- # app/services/vit_service.py
185
- from transformers import ViTForImageClassification, ViTImageProcessor
186
-
187
- class ViTService(InferenceService[ImageRequest, PredictionResponse]):
188
- async def load_model(self) -> None:
189
- self.processor = ViTImageProcessor.from_pretrained(self.model_path)
190
- self.model = ViTForImageClassification.from_pretrained(self.model_path)
191
- self._is_loaded = True
192
-
193
- # ... implement predict() following the pattern above
194
- ```
195
-
196
- Then in `app/core/app.py`:
197
- ```python
198
- service = ViTService(model_name="google/vit-base-patch16-224")
199
- ```
200
 
201
- No other changes needed - the routes, controller, and dependency injection are all model-agnostic.
202
 
203
- ## Validating your setup
 
 
 
 
 
 
204
 
205
- When you start the server, the logs should show:
206
- ```
207
- INFO: Starting ML Inference Service...
208
- INFO: Initializing ResNet service with local model: models/microsoft/resnet-18
209
- INFO: Loading ResNet model from: models/microsoft/resnet-18
210
- INFO: ResNet model loaded successfully
211
- INFO: Startup completed successfully
212
- ```
213
-
214
- If you see errors like `Model directory not found`, check that your model files exist at the expected path with the full org/model structure.
215
-
216
- ## Request & Response Shapes
217
 
218
- ### Request
219
- ```json
220
- {
221
- "image": {
222
- "mediaType": "image/jpeg",
223
- "data": "<base64-encoded image bytes>"
224
- }
225
- }
226
  ```
227
 
228
- ### Response
229
- ```json
230
- {
231
- "prediction": "string label",
232
- "confidence": 0.0,
233
- "predicted_label": 0,
234
- "model": "your-org/your-model",
235
- "mediaType": "image/jpeg"
236
- }
237
- ```
238
-
239
- ## Configuration
240
-
241
- Settings are defined in `app/core/app.py` in the `Settings` class. The defaults are:
242
- - `app_name` - "ML Inference Service"
243
- - `app_version` - "0.1.0"
244
- - `debug` - False
245
- - `host` - "0.0.0.0"
246
- - `port` - 8000
247
-
248
- You can override these via environment variables or a `.env` file. If you want to make the model configurable via environment variable, add it to the Settings class:
249
-
250
- ```python
251
- class Settings(BaseSettings):
252
- # ... existing fields ...
253
- model_name: str = Field("microsoft/resnet-18")
254
-
255
- # Then in the lifespan function:
256
- service = ResNetInferenceService(model_name=settings.model_name)
257
  ```
258
 
259
  ## Deployment
260
 
261
- For development:
262
  ```bash
263
  uvicorn main:app --reload
264
  ```
265
 
266
- For production, use gunicorn with uvicorn workers:
267
  ```bash
268
  gunicorn main:app -w 4 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000
269
  ```
270
 
271
  The service runs on CPU by default. For GPU inference, install CUDA-enabled PyTorch and modify your service to move tensors to the GPU device.
272
 
273
- ## PyArrow Test Datasets
 
 
 
 
274
 
275
- This project includes a comprehensive **PyArrow-based dataset generation system** designed specifically for academic challenges and ML model validation. The system generates **100 standardized test datasets** that allow participants to validate their models against consistent, reproducible test cases.
276
 
277
- ### File Structure
278
  ```
279
- standard_test_001.parquet # Actual test data (images, requests, responses)
280
- standard_test_001_metadata.json # Human-readable description and stats
 
 
 
 
281
  ```
282
 
283
- ### Dataset Categories (25 each = 100 total)
284
-
285
- #### 1. **Standard Test Cases** (`standard_test_*.parquet`)
286
- **Purpose**: Baseline functionality validation
287
-
288
- **Content**: Normal images with expected successful predictions
289
 
290
- - **Image Types**: Random patterns, geometric shapes, gradients, text overlays, solid colors
291
- - **Formats**: JPEG, PNG with proper MIME types
292
- - **Sizes**: 224x224, 256x256, 299x299, 384x384 (common ML input sizes)
293
- - **Expected Behavior**: HTTP 200 responses with valid prediction structure
294
 
295
- #### 2. **Edge Case Tests** (`edge_case_*.parquet`)
296
- **Purpose**: Robustness and error handling validation
297
-
298
- **Content**: Challenging scenarios that test model resilience
299
-
300
- - **Tiny Images**: 32x32, 1x1 pixels (tests preprocessing robustness)
301
- - **Huge Images**: 2048x2048 (tests memory management and resizing)
302
- - **Extreme Aspect Ratios**: 1000x50 (tests preprocessing assumptions)
303
- - **Corrupted Data**: Invalid base64, malformed requests (tests error handling)
304
- - **Expected Behavior**: Graceful degradation, proper error responses
305
-
306
- #### 3. **Performance Benchmarks** (`performance_test_*.parquet`)
307
- **Purpose**: Latency and throughput measurement
308
-
309
- **Content**: Varying batch sizes for performance profiling
310
-
311
- - **Batch Sizes**: 1, 5, 10, 25, 50, 100 images per test
312
- - **Latency Tracking**: Expected max response times based on batch size
313
- - **Throughput Metrics**: Requests per second under different loads
314
- - **Expected Behavior**: Consistent performance within acceptable bounds
315
-
316
- #### 4. **Model Comparison** (`model_comparison_*.parquet`)
317
- **Purpose**: Cross-model validation and benchmarking
318
-
319
- **Content**: Identical inputs tested across different model architectures
320
-
321
- - **Model Types**: ResNet-18/50, ViT, ConvNext, Swin Transformer
322
- - **Consistent Inputs**: Same 10 base images per dataset
323
- - **Comparative Analysis**: Enables direct performance comparison between models
324
- - **Expected Behavior**: Architecture-specific but structurally consistent responses
325
-
326
- ### Generation Process
327
-
328
- The dataset generation follows a **deterministic, reproducible approach**:
329
-
330
- #### Step 1: Synthetic Image Creation
331
- ```python
332
- # Why synthetic images instead of real photos?
333
- # 1. Copyright-free for academic distribution
334
- # 3. Programmatically generated edge cases
335
-
336
- def create_synthetic_image(width, height, image_type):
337
- if image_type == "random":
338
- # RGB noise - tests model noise robustness
339
- array = np.random.randint(0, 256, (height, width, 3))
340
- elif image_type == "geometric":
341
- # Shapes and patterns - tests feature detection
342
- # ... geometric pattern generation
343
- # ... other synthetic types
344
- ```
345
 
346
- #### Step 2: API Request Structure Generation
347
- ```python
348
- # Matches exact API format for drop-in testing
349
  {
350
- "image": {
351
- "mediaType": "image/jpeg", # Proper MIME types
352
- "data": "<base64-encoded-image>" # Standard encoding
353
- }
354
  }
355
  ```
356
 
357
- #### Step 3: Expected Response Generation
358
- ```python
359
- # Realistic prediction responses with proper structure
360
  {
361
- "prediction": "tiger_cat", # ImageNet-style labels
362
- "confidence": 0.8742, # Realistic confidence scores
363
- "predicted_label": 282, # Numeric label indices
364
- "model": "microsoft/resnet-18", # Model identification
365
- "mediaType": "image/jpeg" # Echo input format
366
  }
367
  ```
368
 
369
- #### Step 4: PyArrow Table Creation
370
- ```python
371
- # Columnar storage for efficient querying
372
- table = pa.table({
373
- "dataset_id": [...], # Unique dataset identifier
374
- "image_id": [...], # Individual image identifier
375
- "api_request": [...], # JSON-serialized requests
376
- "expected_response": [...], # JSON-serialized expected responses
377
- "test_category": [...], # Category classification
378
- "difficulty": [...], # Complexity indicator
379
- # ... additional metadata columns
380
- })
381
- ```
382
 
383
- ### Usage Guide
384
 
 
 
 
385
 
386
- **1. Generate Test Datasets**
387
  ```bash
388
- # Create all 100 datasets (~2-5 minutes depending on hardware)
389
  python scripts/generate_test_datasets.py
390
-
391
- # What this creates:
392
- # - scripts/test_datasets/*.parquet (actual test data)
393
- # - scripts/test_datasets/*_metadata.json (human-readable info)
394
- # - scripts/test_datasets/datasets_summary.json (overview)
395
  ```
396
 
397
- **2. Validate API**
 
 
 
 
 
 
398
  ```bash
399
- # Start your ML service
400
  uvicorn main:app --reload
401
 
402
  # Quick test (5 samples per dataset)
403
  python scripts/test_datasets.py --quick
404
 
405
- # Full validation (all samples)
406
  python scripts/test_datasets.py
407
 
408
- # Category-specific testing
409
  python scripts/test_datasets.py --category edge_case
410
- python scripts/test_datasets.py --category performance
411
  ```
412
 
413
- ### Testing Output and Metrics
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
414
 
415
- The test runner provides comprehensive validation metrics:
 
 
 
 
 
416
 
417
  ```
418
  DATASET TESTING SUMMARY
@@ -427,7 +329,7 @@ Test duration: 45.2s
427
  Performance:
428
  Avg latency: 123.4ms
429
  Median latency: 98.7ms
430
- Min latency: 45.2ms
431
  Max latency: 2,341.0ms
432
  Requests/sec: 27.6
433
 
@@ -436,6 +338,29 @@ Category breakdown:
436
  edge_case: 25 datasets, 76.8% avg success
437
  performance: 25 datasets, 91.1% avg success
438
  model_comparison: 25 datasets, 89.3% avg success
 
 
 
439
 
440
- Failed datasets: edge_case_023, edge_case_019, performance_012
 
 
 
 
 
 
441
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ML Inference Service
2
 
3
+ FastAPI service for serving ML models over HTTP. Comes with ResNet-18 for image classification out of the box, but you can swap in any model you want.
4
 
5
+ ## Quick Start
 
 
 
 
 
 
 
6
 
7
+ **Local development:**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  ```bash
9
+ # Install dependencies
10
  python -m venv .venv
11
+ source .venv/bin/activate
12
  pip install -r requirements.txt
 
13
 
14
+ # Download the example model
 
15
  bash scripts/model_download.bash
 
 
16
 
17
+ # Run it
 
18
  uvicorn main:app --reload
19
  ```
 
20
 
21
+ Server runs on `http://127.0.0.1:8000`. Check `/docs` for the interactive API documentation.
22
 
23
+ **Docker:**
24
+ ```bash
25
+ # Build
26
+ docker build -t ml-inference-service:test .
27
+
28
+ # Run
29
+ docker run -d --name ml-inference-test -p 8000:8000 ml-inference-service:test
30
+
31
+ # Check logs
32
+ docker logs -f ml-inference-test
33
+
34
+ # Stop
35
+ docker stop ml-inference-test && docker rm ml-inference-test
36
+ ```
37
+
38
+ ## Testing the API
39
 
40
  ```bash
41
+ # Using curl
42
+ curl -X POST http://localhost:8000/predict \
43
  -H "Content-Type: application/json" \
44
  -d '{
45
  "image": {
46
  "mediaType": "image/jpeg",
47
+ "data": "<base64-encoded-image>"
48
  }
49
  }'
50
  ```
 
53
  ```json
54
  {
55
  "prediction": "tiger cat",
56
+ "confidence": 0.394,
57
  "predicted_label": 282,
58
  "model": "microsoft/resnet-18",
59
  "mediaType": "image/jpeg"
60
  }
61
  ```
62
 
63
+ ## Project Structure
64
+
65
+ ```
66
+ ml-inference-service/
67
+ ├── main.py # Entry point
68
+ ├── app/
69
+ │ ├── core/
70
+ │ │ ├── app.py # App factory, config, DI, lifecycle
71
+ │ │ └── logging.py # Logging setup
72
+ │ ├── api/
73
+ │ │ ├── models.py # Request/response schemas
74
+ │ │ ├── controllers.py # Business logic
75
+ │ │ └── routes/
76
+ │ │ └── prediction.py # POST /predict
77
+ │ └── services/
78
+ │ ├── base.py # Abstract InferenceService class
79
+ │ └── inference.py # ResNet implementation
80
+ ├── models/
81
+ │ └── microsoft/
82
+ │ └── resnet-18/ # Model weights and config
83
+ ├── scripts/
84
+ │ ├── model_download.bash
85
+ │ ├── generate_test_datasets.py
86
+ │ └── test_datasets.py
87
+ ├── Dockerfile # Multi-stage build
88
+ ├── .env.example # Environment config template
89
+ └── requirements.txt
90
+ ```
91
+
92
+ The key design decision here is that `app/core/app.py` consolidates everything—config, dependency injection, lifecycle, and the app factory. This avoids the mess of managing global state across multiple files.
93
 
94
+ ## How to Plug In Your Own Model
95
 
96
+ The whole service is built around one abstract base class: `InferenceService`. Implement it for your model, and everything else just works.
97
 
98
+ ### Step 1: Create Your Service Class
99
 
100
  ```python
101
+ # app/services/your_model_service.py
102
  from app.services.base import InferenceService
103
  from app.api.models import ImageRequest, PredictionResponse
104
+ import asyncio
105
 
106
  class YourModelService(InferenceService[ImageRequest, PredictionResponse]):
107
  def __init__(self, model_name: str):
108
  self.model_name = model_name
109
+ self.model_path = f"models/{model_name}"
110
  self.model = None
111
  self._is_loaded = False
112
 
113
  async def load_model(self) -> None:
114
+ """Load your model here. Called once at startup."""
115
  self.model = load_your_model(self.model_path)
116
  self._is_loaded = True
117
 
118
  async def predict(self, request: ImageRequest) -> PredictionResponse:
119
+ """Run inference. Offload heavy work to thread pool."""
120
  return await asyncio.to_thread(self._predict_sync, request)
121
 
122
  def _predict_sync(self, request: ImageRequest) -> PredictionResponse:
123
+ """Actual inference happens here."""
124
  image = decode_base64_image(request.image.data)
125
  result = self.model(image)
126
+
127
  return PredictionResponse(
128
  prediction=result.label,
129
  confidence=result.confidence,
 
137
  return self._is_loaded
138
  ```
139
 
140
+ **Important:** Use `asyncio.to_thread()` to run CPU-heavy inference in a background thread. This keeps the server responsive while your model is working.
 
 
 
 
141
 
142
+ ### Step 2: Register Your Service
143
 
144
+ Open `app/core/app.py` and find the lifespan function:
145
 
146
  ```python
147
+ # Change this line:
148
  service = ResNetInferenceService(model_name="microsoft/resnet-18")
149
 
150
+ # To this:
151
  service = YourModelService(model_name="your-org/your-model")
152
  ```
153
 
154
+ That's it. The `/predict` endpoint now serves your model.
155
+
156
+ ### Model Files
157
 
158
+ Put your model files under `models/` with the full org/model structure:
159
 
 
160
  ```
161
  models/
162
  └── your-org/
163
  └── your-model/
164
  ├── config.json
165
  ├── weights.bin
166
+ └── (other files)
167
  ```
168
 
169
+ No renaming, no dropping the org prefix—it just mirrors the Hugging Face structure.
 
 
170
 
171
+ ## Configuration
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
172
 
173
+ Settings are managed via environment variables or a `.env` file. See `.env.example` for all available options.
174
 
175
+ **Default values:**
176
+ - `APP_NAME`: "ML Inference Service"
177
+ - `APP_VERSION`: "0.1.0"
178
+ - `DEBUG`: false
179
+ - `HOST`: "0.0.0.0"
180
+ - `PORT`: 8000
181
+ - `MODEL_NAME`: "microsoft/resnet-18"
182
 
183
+ **To customize:**
184
+ ```bash
185
+ # Copy the example
186
+ cp .env.example .env
 
 
 
 
 
 
 
 
187
 
188
+ # Edit values
189
+ vim .env
 
 
 
 
 
 
190
  ```
191
 
192
+ Or set environment variables directly:
193
+ ```bash
194
+ export MODEL_NAME="google/vit-base-patch16-224"
195
+ uvicorn main:app --reload
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
196
  ```
197
 
198
  ## Deployment
199
 
200
+ **Development:**
201
  ```bash
202
  uvicorn main:app --reload
203
  ```
204
 
205
+ **Production:**
206
  ```bash
207
  gunicorn main:app -w 4 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000
208
  ```
209
 
210
  The service runs on CPU by default. For GPU inference, install CUDA-enabled PyTorch and modify your service to move tensors to the GPU device.
211
 
212
+ **Docker:**
213
+ - Multi-stage build keeps the image small
214
+ - Runs as non-root user (`appuser`)
215
+ - Python dependencies installed in user site-packages
216
+ - Model files baked into the image
217
 
218
+ ## What Happens When You Start the Server
219
 
 
220
  ```
221
+ INFO: Starting ML Inference Service...
222
+ INFO: Initializing ResNet service: models/microsoft/resnet-18
223
+ INFO: Loading model from models/microsoft/resnet-18
224
+ INFO: Model loaded: 1000 classes
225
+ INFO: Startup completed successfully
226
+ INFO: Uvicorn running on http://0.0.0.0:8000
227
  ```
228
 
229
+ If you see "Model directory not found", check that your model files exist at the expected path with the full org/model structure.
 
 
 
 
 
230
 
231
+ ## API Reference
 
 
 
232
 
233
+ **Endpoint:** `POST /predict`
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
234
 
235
+ **Request:**
236
+ ```json
 
237
  {
238
+ "image": {
239
+ "mediaType": "image/jpeg", // or "image/png"
240
+ "data": "<base64-encoded-image>"
241
+ }
242
  }
243
  ```
244
 
245
+ **Response:**
246
+ ```json
 
247
  {
248
+ "prediction": "string", // Human-readable label
249
+ "confidence": 0.0, // Softmax probability
250
+ "predicted_label": 0, // Numeric class index
251
+ "model": "org/model-name", // Model identifier
252
+ "mediaType": "image/jpeg" // Echoed from request
253
  }
254
  ```
255
 
256
+ **Docs:**
257
+ - Swagger UI: `http://localhost:8000/docs`
258
+ - ReDoc: `http://localhost:8000/redoc`
259
+ - OpenAPI JSON: `http://localhost:8000/openapi.json`
 
 
 
 
 
 
 
 
 
260
 
261
+ ## PyArrow Test Datasets
262
 
263
+ We've included a test dataset system for validating your model. It generates 100 standardized test cases covering normal inputs, edge cases, performance benchmarks, and model comparisons.
264
+
265
+ ### Generate Datasets
266
 
 
267
  ```bash
 
268
  python scripts/generate_test_datasets.py
 
 
 
 
 
269
  ```
270
 
271
+ This creates:
272
+ - `scripts/test_datasets/*.parquet` - Test data (images, requests, expected responses)
273
+ - `scripts/test_datasets/*_metadata.json` - Human-readable descriptions
274
+ - `scripts/test_datasets/datasets_summary.json` - Overview of all datasets
275
+
276
+ ### Run Tests
277
+
278
  ```bash
279
+ # Start your service first
280
  uvicorn main:app --reload
281
 
282
  # Quick test (5 samples per dataset)
283
  python scripts/test_datasets.py --quick
284
 
285
+ # Full validation
286
  python scripts/test_datasets.py
287
 
288
+ # Test specific category
289
  python scripts/test_datasets.py --category edge_case
 
290
  ```
291
 
292
+ ### Dataset Categories (25 datasets each)
293
+
294
+ **1. Standard Tests** (`standard_test_*.parquet`)
295
+ - Normal images: random patterns, shapes, gradients
296
+ - Common sizes: 224x224, 256x256, 299x299, 384x384
297
+ - Formats: JPEG, PNG
298
+ - Purpose: Baseline validation
299
+
300
+ **2. Edge Cases** (`edge_case_*.parquet`)
301
+ - Tiny images (32x32, 1x1)
302
+ - Huge images (2048x2048)
303
+ - Extreme aspect ratios (1000x50)
304
+ - Corrupted data, malformed requests
305
+ - Purpose: Test error handling
306
+
307
+ **3. Performance Benchmarks** (`performance_test_*.parquet`)
308
+ - Batch sizes: 1, 5, 10, 25, 50, 100 images
309
+ - Latency and throughput tracking
310
+ - Purpose: Performance profiling
311
 
312
+ **4. Model Comparisons** (`model_comparison_*.parquet`)
313
+ - Same inputs across different architectures
314
+ - Models: ResNet-18/50, ViT, ConvNext, Swin
315
+ - Purpose: Cross-model benchmarking
316
+
317
+ ### Test Output
318
 
319
  ```
320
  DATASET TESTING SUMMARY
 
329
  Performance:
330
  Avg latency: 123.4ms
331
  Median latency: 98.7ms
332
+ p95 latency: 342.1ms
333
  Max latency: 2,341.0ms
334
  Requests/sec: 27.6
335
 
 
338
  edge_case: 25 datasets, 76.8% avg success
339
  performance: 25 datasets, 91.1% avg success
340
  model_comparison: 25 datasets, 89.3% avg success
341
+ ```
342
+
343
+ ## Common Issues
344
 
345
+ **Port 8000 already in use:**
346
+ ```bash
347
+ # Find what's using it
348
+ lsof -i :8000
349
+
350
+ # Or just use a different port
351
+ uvicorn main:app --port 8080
352
  ```
353
+
354
+ **Model not loading:**
355
+ - Check the path: models should be in `models/<org>/<model-name>/`
356
+ - Make sure you ran `bash scripts/model_download.bash`
357
+ - Check logs for the exact error
358
+
359
+ **Slow inference:**
360
+ - Inference runs on CPU by default
361
+ - For GPU: install CUDA PyTorch and modify service to use GPU device
362
+ - Consider using smaller models or quantization
363
+
364
+ ## License
365
+
366
+ MIT