refactored codebase

Files changed (11) hide show

README.md +193 -191
app/api/controllers.py +44 -40
app/api/routes/prediction.py +46 -7
app/api/routes/resnet_service_manager.py +0 -19
app/core/app.py +138 -5
app/core/config.py +0 -29
app/core/dependencies.py +0 -19
app/core/lifespan.py +0 -43
app/services/base.py +135 -0
app/services/inference.py +102 -57
test_main.http +3 -2

README.md CHANGED Viewed

@@ -1,89 +1,94 @@
 # ML Inference Service (FastAPI)
-A production-ready **FastAPI** web service that serves **image classification** models.
-This repo ships with a working example using **ResNet-18** (downloaded from Hugging Face) under `models/resnet-18/` and exposes a simple **REST** endpoint.
----
-## ✨ What you get
-- FastAPI application with clean layering (routes → controller → service)
-- Hot-loaded model on startup (single instance reused per request)
-- Hugging Face–compatible local model folder (`config.json`, weights, preprocessor, etc.)
-- Example endpoint: `POST /predict/resnet` that accepts a base64 image and returns:
-  - `prediction` (class label)
-  - `confidence` (softmax probability)
-  - `predicted_label` (class index)
-  - `model` (model id)
-  - `mediaType` (echoed)
----
-## 🧭 Project Layout
 ```
 ml-inference-service/
-├─ main.py
 ├─ app/
-│  ├─ __init__.py
 │  ├─ core/
-│  │  ├─ app.py               # App factory & router wiring
-│  │  ├─ config.py            # Settings (app name/version/debug)
-│  │  ├─ dependencies.py      # DI for model services
-│  │  ├─ lifespan.py          # Startup: load model & register service
 │  │  └─ logging.py           # Logger setup
 │  ├─ api/
-│  │  ├─ models.py            # Pydantic request/response
 │  │  ├─ controllers.py       # HTTP → service orchestration
 │  │  └─ routes/
-│  │     ├─ prediction.py     # `POST /predict/resnet`
-│  │     └─ resnet_service_manager.py (legacy, unused)
 │  └─ services/
-│     └─ inference.py         # ResNetInferenceService (load/predict)
 ├─ models/
-│  └─ resnet-18/              # Sample HF-style model folder
 ├─ scripts/
-│  ├─ model_download.bash        # One-liner to snapshot HF weights locally
-│  ├─ generate_test_datasets.py  # Generate PyArrow datasets for testing
-│  ├─ test_datasets.py           # Test generated datasets against API
-│  └─ test_datasets/             # Generated PyArrow test datasets (100 files)
-├─ requirements.in / requirements.txt
-└─ test_main.http                # Example request you can run from IDEs
 ```
----
-## 🚀 Quickstart
-### 1) Install dependencies (Python 3.9+)
 ```bash
 python -m venv .venv
 source .venv/bin/activate   # Windows: .venv\Scripts\activate
 pip install -r requirements.txt
 ```
-### 2) Download the sample model (ResNet‑18) locally
 ```bash
 bash scripts/model_download.bash
 ```
-This populates `models/resnet-18/` with Hugging Face artifacts (`config.json`, weights, `preprocessor_config.json`, etc.).
-### 3) Run the server
 ```bash
 uvicorn main:app --reload
 ```
-Server listens on `http://127.0.0.1:8000`.
-### 4) Call the API
-- Use `test_main.http` from your IDE (VSCode/IntelliJ) **or** curl:
 ```bash
-curl -X POST http://127.0.0.1:8000/predict/resnet   -H "Content-Type: application/json"   -d '{
-    "image": { "mediaType": "image/jpeg", "data": "<base64-encoded-bytes>" }
   }'
 ```
-**Response (example):**
 ```json
 {
   "prediction": "tiger cat",
@@ -94,117 +99,121 @@ curl -X POST http://127.0.0.1:8000/predict/resnet   -H "Content-Type: applicatio
 }
 ```
----
-## 🧩 Bring Your Own Model (BYOM)
-There are **two** ways to integrate your own model.
-### Option A — *Drop-in replacement (zero code changes)*
-If your model is a **Hugging Face image classification** model that works with
-`AutoImageProcessor` and `ResNetForImageClassification` **or** a compatible
-`*ForImageClassification` class from `transformers`, you can simply place the
-model folder alongside `resnet-18` and point the service at it.
-1. Put your HF-style folder under `models/<your-model-name>/` containing at least:
-   - `config.json`
-   - weights (e.g., `pytorch_model.bin` or `model.safetensors`)
-   - `preprocessor_config.json` / `image_processor` files
-2. **Choose one** of these approaches:
-   - **Simplest**: Replace the contents of `models/resnet-18/` with your model files *but keep the folder name*. The existing `/predict/resnet` endpoint will now serve your model.
-   - **Preferred**: Change the model id used at startup:
-     - Open `app/core/lifespan.py` and modify the service initialization:
-       ```python
-       resnet_service = ResNetInferenceService(
-           model_name="your-org/your-model",  # used for local folder name
-           use_local_model=True               # loads from models/your-model/
-       )
-       ```
-     - Ensure your local folder is `models/your-model/`.
-> How folder naming works: when `use_local_model=True`, the service derives the
-> local directory as `models/<last-segment-of-model_name>`. For
-> `"microsoft/resnet-18"` that becomes `models/resnet-18`. For
-> `"your-org/awesome-vit-base"`, it becomes `models/awesome-vit-base`.
-That’s it. No code changes elsewhere if your model is a standard image classifier.
----
-### Option B — *New task/model type (minimal code: new service + route)*
-If you are **not** serving a Hugging Face image classifier (e.g., object detection,
-segmentation, text models), implement a small service class and a route mirroring
-the `ResNetInferenceService` flow.
-1. **Create your service** (copy and adapt `ResNetInferenceService`):
-   - File: `app/services/<your_model>_service.py`
-   - Responsibilities you must implement:
-     - `__init__(model_name: str, use_local_model: bool)` → set `self.model_path`
-     - `load_model()` → load weights & preprocessor
-     - `predict(image: PIL.Image.Image) -> Dict[str, Any]` → run inference and return a dict with:
-       ```python
-       {
-         "prediction": "<your label or structured result>",
-         "confidence": <float 0..1>,
-         "predicted_label": <int or meaningful code>,
-         "model": "<model id>"
-       }
-       ```
-       *Feel free to extend the payload; just update the API schema accordingly.*
-2. **Wire the dependency**:
-   - Register your service at startup in `app/core/lifespan.py` similar to ResNet:
-     ```python
-     from app.core.dependencies import set_resnet_service  # or create your own set/get
-     from app.services.your_model_service import YourModelService
-     svc = YourModelService(model_name="your-org/your-model", use_local_model=True)
-     svc.load_model()
-     set_resnet_service(svc)  # or create set_your_model_service(...)
-     ```
-   - Optionally create **new getters/setters** in `app/core/dependencies.py` if you serve multiple models in parallel (one getter per model).
-3. **Add a route**:
-   - Create `app/api/routes/your_model.py` analogous to `prediction.py`:
-     ```python
-     from fastapi import APIRouter, Depends
-     from app.api.controllers import PredictionController
-     from app.api.models import ImageRequest, PredictionResponse
-     from app.core.dependencies import get_resnet_service  # or your getter
-     from app.services.your_model_service import YourModelService
-     router = APIRouter()
-     @router.post("/predict/your-model", response_model=PredictionResponse)
-     async def predict_image(request: ImageRequest, service: YourModelService = Depends(get_resnet_service)):
-         controller = PredictionController(service)  # reuse the controller
-         return await controller.predict(request)
-     ```
-   - Register the router in `app/core/app.py`:
-     ```python
-     from app.api.routes import your_model as your_model_routes
-     app.include_router(your_model_routes.router)
-     ```
-4. **Adjust schemas if needed**:
-   - The default `PredictionResponse` in `app/api/models.py` is for single-label classification. For other tasks, either extend it or define a new response model and use it in your route’s `response_model=`.
-> **Tip**: Keep your controller thin and push all model-specific logic into your service class. The server glue (DI + routes) stays identical across models.
----
-## 🧪 Validating your setup
-- **Startup logs** should include: `Initializing ResNet service with local model: models/<folder>` and `Model and processor loaded successfully`.
-- Hitting your endpoint should return a **200** with a JSON body like the example above.
-- If you see `Local model directory not found`, check your `models/<name>/` path and filenames.
----
-## 🔌 Request & Response Shapes
 ### Request
 ```json
@@ -227,58 +236,51 @@ the `ResNetInferenceService` flow.
 }
 ```
----
-## ⚙️ Configuration
-Basic settings live in `app/core/config.py`. Out of the box we keep it simple:
-- `app_name`, `app_version`, `debug`
-If you want to make the **model** configurable without touching code, extend `Settings` with a `model_name` env var and consume it in `lifespan.py` when creating your service instance.
-Example:
 ```python
-# app/core/config.py
-from pydantic_settings import BaseSettings
-from pydantic import Field
 class Settings(BaseSettings):
-    app_name: str = Field("ML Inference Service")
-    app_version: str = Field("0.1.0")
-    debug: bool = Field(False)
-    model_name: str = Field("microsoft/resnet-18", description="HF model id used at startup")
-settings = Settings()
-# app/core/lifespan.py
-from app.core.config import settings
-svc = ResNetInferenceService(model_name=settings.model_name, use_local_model=True)
 ```
-Then set `MODEL_NAME=your-org/your-model` in your environment (Pydantic will map `model_name` from `MODEL_NAME`).
----
-## 📦 Packaging & Deployment
-- **Dev**: `uvicorn main:app --reload`
-- **Prod**: Use a process manager (e.g., `gunicorn -k uvicorn.workers.UvicornWorker`) and add health checks.
-- **Containerize**: Copy only `requirements.txt` and source, install wheels, and bake the `models/` folder into the image or mount it as a volume.
-- **CPU vs GPU**: This example uses CPU by default. If you have CUDA, install a CUDA-enabled PyTorch build and set device placement in your service.
----
-## 🧪 PyArrow Test Datasets
 This project includes a comprehensive **PyArrow-based dataset generation system** designed specifically for academic challenges and ML model validation. The system generates **100 standardized test datasets** that allow participants to validate their models against consistent, reproducible test cases.
-### 🏗️ Why Both? `.parquet` + `_metadata.json`
 ```
 standard_test_001.parquet         # Actual test data (images, requests, responses)
 standard_test_001_metadata.json   # Human-readable description and stats
 ```
-### 📊 Dataset Categories (25 each = 100 total)
 #### 1. **Standard Test Cases** (`standard_test_*.parquet`)
 **Purpose**: Baseline functionality validation
@@ -321,7 +323,7 @@ standard_test_001_metadata.json   # Human-readable description and stats
 - **Comparative Analysis**: Enables direct performance comparison between models
 - **Expected Behavior**: Architecture-specific but structurally consistent responses
-### 🛠️ Generation Process
 The dataset generation follows a **deterministic, reproducible approach**:
@@ -378,7 +380,7 @@ table = pa.table({
 })
 ```
-### 🚀 Usage Guide
 **1. Generate Test Datasets**
@@ -408,12 +410,12 @@ python scripts/test_datasets.py --category edge_case
 python scripts/test_datasets.py --category performance
 ```
-### 📈 Testing Output and Metrics
 The test runner provides comprehensive validation metrics:
 ```
-🏁 DATASET TESTING SUMMARY
 ============================================================
 Datasets tested: 100
 Successful datasets: 95

 # ML Inference Service (FastAPI)
+A FastAPI-based inference server designed to make it easy to serve your ML models. The repo includes a complete working example using ResNet-18 for image classification, but the architecture is built to be model-agnostic. You implement a simple abstract base class, and everything else just works.
+Key features:
+- Abstract InferenceService class that you subclass for your model
+- Example ResNet-18 implementation showing how to do it
+- FastAPI application with clean separation (routes → controller → service)
+- Model loaded once at startup and reused across requests
+- Background threading for inference so the server stays responsive
+- Type-safe request/response handling with Pydantic
+- Single generic endpoint that works with any model
+## What you get
+The service exposes a single endpoint `POST /predict` that accepts a base64-encoded image and returns:
+- `prediction` - the predicted class label
+- `confidence` - softmax probability for the prediction
+- `predicted_label` - numeric class index
+- `model` - identifier for which model produced this prediction
+- `mediaType` - echoed from the request
+The inference runs in a background thread using asyncio so long-running model predictions don't block the server from handling other requests.
+## Project Layout
 ```
 ml-inference-service/
+├─ main.py                    # Entry point
 ├─ app/
 │  ├─ core/
+│  │  ├─ app.py               # Everything: config, DI, lifespan, app factory
 │  │  └─ logging.py           # Logger setup
 │  ├─ api/
+│  │  ├─ models.py            # Pydantic request/response schemas
 │  │  ├─ controllers.py       # HTTP → service orchestration
 │  │  └─ routes/
+│  │     └─ prediction.py     # POST /predict endpoint
 │  └─ services/
+│     ├─ base.py              # Abstract InferenceService class
+│     └─ inference.py         # ResNetInferenceService (example implementation)
 ├─ models/
+│  └─ microsoft/
+│     └─ resnet-18/           # Model files (preserves org structure)
 ├─ scripts/
+│  ├─ generate_test_datasets.py
+│  ├─ test_datasets.py
+│  └─ test_datasets/
+├─ requirements.txt
+└─ test_main.http             # Example HTTP request
 ```
+The key change from a typical FastAPI app is that `app/core/app.py` consolidates configuration, dependency injection, lifecycle management, and the app factory into one file. This avoids the complexity of managing global variables across multiple modules.
+## Quickstart
+1) Install dependencies (Python 3.9+)
 ```bash
 python -m venv .venv
 source .venv/bin/activate   # Windows: .venv\Scripts\activate
 pip install -r requirements.txt
 ```
+2) Download the example model
 ```bash
 bash scripts/model_download.bash
 ```
+This downloads ResNet-18 from Hugging Face and saves it to `models/microsoft/resnet-18/` (note the org structure is preserved).
+3) Run the server
 ```bash
 uvicorn main:app --reload
 ```
+Server starts on `http://127.0.0.1:8000`.
+4) Test the API
+Use `test_main.http` from your IDE or curl:
 ```bash
+curl -X POST http://127.0.0.1:8000/predict \
+  -H "Content-Type: application/json" \
+  -d '{
+    "image": {
+      "mediaType": "image/jpeg",
+      "data": "<base64-encoded-bytes>"
+    }
   }'
 ```
+Example response:
 ```json
 {
   "prediction": "tiger cat",
 }
 ```
+## Integrating Your Own Model
+To use your own model, you implement the `InferenceService` abstract base class. The rest of the infrastructure (API routes, controllers, dependency injection) is already generic and works with any implementation.
+### Step 1: Implement the InferenceService ABC
+Create a new file `app/services/your_model_service.py`:
+```python
+from app.services.base import InferenceService
+from app.api.models import ImageRequest, PredictionResponse
+class YourModelService(InferenceService[ImageRequest, PredictionResponse]):
+    def __init__(self, model_name: str):
+        self.model_name = model_name
+        self.model_path = os.path.join("models", model_name)
+        self.model = None
+        self._is_loaded = False
+    async def load_model(self) -> None:
+        # Load your model here
+        self.model = load_your_model(self.model_path)
+        self._is_loaded = True
+    async def predict(self, request: ImageRequest) -> PredictionResponse:
+        # Offload to background thread (important for performance)
+        return await asyncio.to_thread(self._predict_sync, request)
+    def _predict_sync(self, request: ImageRequest) -> PredictionResponse:
+        # Decode image, run inference, return typed response
+        image = decode_base64_image(request.image.data)
+        result = self.model(image)
+        return PredictionResponse(
+            prediction=result.label,
+            confidence=result.confidence,
+            predicted_label=result.class_id,
+            model=self.model_name,
+            mediaType=request.image.mediaType
+        )
+    @property
+    def is_loaded(self) -> bool:
+        return self._is_loaded
+```
+The key points:
+- Subclass `InferenceService[RequestType, ResponseType]` with your request/response types
+- Implement three methods: `load_model()`, `predict()`, and `is_loaded` property
+- Use `asyncio.to_thread()` to offload CPU-intensive inference to a background thread
+- Return typed Pydantic models, not dicts
+### Step 2: Register your service at startup
+Edit `app/core/app.py` and find the lifespan function (around line 134):
+```python
+# Replace this:
+service = ResNetInferenceService(model_name="microsoft/resnet-18")
+# With this:
+service = YourModelService(model_name="your-org/your-model")
+```
+That's it. The same `/predict` endpoint now serves your model.
+### Model file structure
+Your model files should be organized as:
+```
+models/
+└── your-org/
+    └── your-model/
+        ├── config.json
+        ├── weights.bin
+        └── ... other files
+```
+The full org/model structure is preserved - no more dropping the org prefix.
+### Example: Swapping ResNet for ViT
+```python
+# app/services/vit_service.py
+from transformers import ViTForImageClassification, ViTImageProcessor
+class ViTService(InferenceService[ImageRequest, PredictionResponse]):
+    async def load_model(self) -> None:
+        self.processor = ViTImageProcessor.from_pretrained(self.model_path)
+        self.model = ViTForImageClassification.from_pretrained(self.model_path)
+        self._is_loaded = True
+    # ... implement predict() following the pattern above
+```
+Then in `app/core/app.py`:
+```python
+service = ViTService(model_name="google/vit-base-patch16-224")
+```
+No other changes needed - the routes, controller, and dependency injection are all model-agnostic.
+## Validating your setup
+When you start the server, the logs should show:
+```
+INFO: Starting ML Inference Service...
+INFO: Initializing ResNet service with local model: models/microsoft/resnet-18
+INFO: Loading ResNet model from: models/microsoft/resnet-18
+INFO: ResNet model loaded successfully
+INFO: Startup completed successfully
+```
+If you see errors like `Model directory not found`, check that your model files exist at the expected path with the full org/model structure.
+## Request & Response Shapes
 ### Request
 ```json
 }
 ```
+## Configuration
+Settings are defined in `app/core/app.py` in the `Settings` class. The defaults are:
+- `app_name` - "ML Inference Service"
+- `app_version` - "0.1.0"
+- `debug` - False
+- `host` - "0.0.0.0"
+- `port` - 8000
+You can override these via environment variables or a `.env` file. If you want to make the model configurable via environment variable, add it to the Settings class:
 ```python
 class Settings(BaseSettings):
+    # ... existing fields ...
+    model_name: str = Field("microsoft/resnet-18")
+# Then in the lifespan function:
+service = ResNetInferenceService(model_name=settings.model_name)
 ```
+## Deployment
+For development:
+```bash
+uvicorn main:app --reload
+```
+For production, use gunicorn with uvicorn workers:
+```bash
+gunicorn main:app -w 4 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000
+```
+The service runs on CPU by default. For GPU inference, install CUDA-enabled PyTorch and modify your service to move tensors to the GPU device.
+## PyArrow Test Datasets
 This project includes a comprehensive **PyArrow-based dataset generation system** designed specifically for academic challenges and ML model validation. The system generates **100 standardized test datasets** that allow participants to validate their models against consistent, reproducible test cases.
+### File Structure
 ```
 standard_test_001.parquet         # Actual test data (images, requests, responses)
 standard_test_001_metadata.json   # Human-readable description and stats
 ```
+### Dataset Categories (25 each = 100 total)
 #### 1. **Standard Test Cases** (`standard_test_*.parquet`)
 **Purpose**: Baseline functionality validation
 - **Comparative Analysis**: Enables direct performance comparison between models
 - **Expected Behavior**: Architecture-specific but structurally consistent responses
+### Generation Process
 The dataset generation follows a **deterministic, reproducible approach**:
 })
 ```
+### Usage Guide
 **1. Generate Test Datasets**
 python scripts/test_datasets.py --category performance
 ```
+### Testing Output and Metrics
 The test runner provides comprehensive validation metrics:
 ```
+DATASET TESTING SUMMARY
 ============================================================
 Datasets tested: 100
 Successful datasets: 95

app/api/controllers.py CHANGED Viewed

@@ -1,75 +1,79 @@
 """
 Controllers for handling API business logic.
-"""
-import base64
-import io
 from fastapi import HTTPException
-from PIL import Image
 from app.core.logging import logger
-from app.services.inference import ResNetInferenceService
 from app.api.models import ImageRequest, PredictionResponse
 class PredictionController:
-    """Controller for ML prediction endpoints."""
     @staticmethod
-    async def predict_resnet(
             request: ImageRequest,
-            resnet_service: ResNetInferenceService
     ) -> PredictionResponse:
         """
-        Classify an image using ResNet-18 from base64 encoded data.
         """
         try:
             # Validate service availability
-            if not resnet_service:
                 raise HTTPException(
                     status_code=503,
                     detail="Service not initialized"
                 )
-            # Validate media type
-            if not request.image.mediaType.startswith('image/'):
                 raise HTTPException(
-                    status_code=400,
-                    detail=f"Invalid media type: {request.image.mediaType}"
-                )
-            # Decode base64 image data
-            try:
-                image_data = base64.b64decode(request.image.data)
-            except Exception as decode_error:
-                raise HTTPException(
-                    status_code=400,
-                    detail=f"Invalid base64 data: {str(decode_error)}"
                 )
-            # Load and validate image
-            try:
-                image = Image.open(io.BytesIO(image_data))
-            except Exception as img_error:
                 raise HTTPException(
                     status_code=400,
-                    detail=f"Invalid image file: {str(img_error)}"
                 )
-            # Perform prediction
-            result = resnet_service.predict(image)
-            # Return structured response
-            return PredictionResponse(
-                prediction=result["prediction"],
-                confidence=result["confidence"],
-                model=result["model"],
-                predicted_label=result["predicted_label"],
-                mediaType=request.image.mediaType
-            )
         except HTTPException:
             raise
         except Exception as e:
             logger.error(f"Prediction failed: {e}")
-            raise HTTPException(status_code=500, detail=str(e))

 """
 Controllers for handling API business logic.
+This controller layer orchestrates requests between the API routes and the
+inference service layer. It handles validation and error responses.
+The controller is model-agnostic and works with any InferenceService implementation.
+"""
 from fastapi import HTTPException
 from app.core.logging import logger
+from app.services.base import InferenceService
 from app.api.models import ImageRequest, PredictionResponse
 class PredictionController:
+    """
+    Controller for ML prediction endpoints.
+    This controller works with any InferenceService implementation,
+    making it easy to swap different models without changing the API layer.
+    """
     @staticmethod
+    async def predict(
             request: ImageRequest,
+            service: InferenceService
     ) -> PredictionResponse:
         """
+        Run inference using the configured model service.
+        The controller handles request validation and error handling,
+        while the service handles the actual inference logic.
+        Args:
+            request: ImageRequest with base64-encoded image data
+            service: Initialized inference service (can be any model)
+        Returns:
+            PredictionResponse with prediction results
+        Raises:
+            HTTPException: If service unavailable, invalid input, or inference fails
         """
         try:
             # Validate service availability
+            if not service:
                 raise HTTPException(
                     status_code=503,
                     detail="Service not initialized"
                 )
+            if not service.is_loaded:
                 raise HTTPException(
+                    status_code=503,
+                    detail="Model not loaded"
                 )
+            # Validate media type
+            if not request.image.mediaType.startswith('image/'):
                 raise HTTPException(
                     status_code=400,
+                    detail=f"Invalid media type: {request.image.mediaType}. Must be image/*"
                 )
+            # Call service - it handles decoding and returns typed response
+            response = await service.predict(request)
+            return response
         except HTTPException:
             raise
+        except ValueError as e:
+            # Service raises ValueError for invalid input
+            logger.error(f"Invalid input: {e}")
+            raise HTTPException(status_code=400, detail=str(e))
         except Exception as e:
+            # Unexpected errors
             logger.error(f"Prediction failed: {e}")
+            raise HTTPException(status_code=500, detail="Internal server error")

app/api/routes/prediction.py CHANGED Viewed

@@ -1,20 +1,59 @@
 """
 ML Prediction routes.
 """
 from fastapi import APIRouter, Depends
 from app.api.controllers import PredictionController
 from app.api.models import ImageRequest, PredictionResponse
-from app.core.dependencies import get_resnet_service
-from app.services.inference import ResNetInferenceService
 router = APIRouter()
-@router.post("/predict/resnet", response_model=PredictionResponse)
-async def predict_image(
     request: ImageRequest,
-    resnet_service: ResNetInferenceService = Depends(get_resnet_service)
 ):
-    """Classify an image using ResNet-18 from base64 encoded data."""
-    return await PredictionController.predict_resnet(request, resnet_service)

 """
 ML Prediction routes.
+This module defines the HTTP endpoints for running model inference.
+The routes are model-agnostic and work with any InferenceService implementation.
 """
 from fastapi import APIRouter, Depends
 from app.api.controllers import PredictionController
 from app.api.models import ImageRequest, PredictionResponse
+from app.core.app import get_inference_service
+from app.services.base import InferenceService
 router = APIRouter()
+@router.post("/predict", response_model=PredictionResponse)
+async def predict(
     request: ImageRequest,
+    service: InferenceService = Depends(get_inference_service)
 ):
+    """
+    Run inference on an image using the configured model.
+    This endpoint works with any model that implements the InferenceService interface.
+    The actual model used depends on what was configured during app startup.
+    Example Request Body:
+    ```json
+    {
+        "image": {
+            "mediaType": "image/jpeg",
+            "data": "<base64-encoded-image-data>"
+        }
+    }
+    ```
+    Example Response:
+    ```json
+    {
+        "prediction": "tabby cat",
+        "confidence": 0.8542,
+        "model": "microsoft/resnet-18",
+        "predicted_label": 281,
+        "mediaType": "image/jpeg"
+    }
+    ```
+    Args:
+        request: ImageRequest containing base64-encoded image
+        service: Injected inference service (configured at startup)
+    Returns:
+        PredictionResponse with model predictions
+    Raises:
+        HTTPException: 400 for invalid input, 503 if service unavailable, 500 for errors
+    """
+    return await PredictionController.predict(request, service)

app/api/routes/resnet_service_manager.py DELETED Viewed

@@ -1,19 +0,0 @@
-# """
-# Dependency injection for FastAPI.
-# """
-# from typing import Optional
-# from app.services.inference import ResNetInferenceService
-#
-# # Global service instance
-# _resnet_service: Optional[ResNetInferenceService] = None
-#
-#
-# def get_resnet_service() -> Optional[ResNetInferenceService]:
-#     """Get the ResNet service instance."""
-#     return _resnet_service
-#
-#
-# def set_resnet_service(service: ResNetInferenceService) -> None:
-#     """Set the global ResNet service instance."""
-#     global _resnet_service
-#     _resnet_service = service

app/core/app.py CHANGED Viewed

@@ -1,16 +1,150 @@
 """
-FastAPI application factory.
 """
 from fastapi import FastAPI
-from app.core.config import settings
-from app.core.lifespan import lifespan
 from app.api.routes import prediction
 def create_app() -> FastAPI:
-    """Application factory."""
     app = FastAPI(
         title=settings.app_name,
         description="ML inference service for image classification",
@@ -19,7 +153,6 @@ def create_app() -> FastAPI:
         lifespan=lifespan
     )
-    # Include only prediction router
     app.include_router(prediction.router)
     return app

 """
+FastAPI application factory and core infrastructure.
+This module consolidates all core application components:
+- Configuration management
+- Global service instance (dependency injection)
+- Application lifecycle (startup/shutdown)
+- FastAPI app creation
+By keeping everything in one place, we avoid the complexity of managing
+global variables across multiple modules.
 """
+import warnings
+from contextlib import asynccontextmanager
+from typing import AsyncGenerator, Optional
 from fastapi import FastAPI
+from pydantic import Field
+from pydantic_settings import BaseSettings
+from app.core.logging import logger
+from app.services.base import InferenceService
+from app.services.inference import ResNetInferenceService
 from app.api.routes import prediction
+class Settings(BaseSettings):
+    """
+    Application settings with environment variable support.
+    Settings can be overridden via environment variables or .env file.
+    """
+    # Basic app settings
+    app_name: str = Field(default="ML Inference Service", description="Application name")
+    app_version: str = Field(default="0.1.0", description="Application version")
+    debug: bool = Field(default=False, description="Debug mode")
+    # Server settings
+    host: str = Field(default="0.0.0.0", description="Server host")
+    port: int = Field(default=8000, description="Server port")
+    class Config:
+        """Load from .env file if it exists."""
+        env_file = ".env"
+# Global settings instance
+settings = Settings()
+# Global inference service instance (initialized during startup)
+_inference_service: Optional[InferenceService] = None
+def get_inference_service() -> Optional[InferenceService]:
+    """
+    Get the inference service instance for dependency injection.
+    This function is used in FastAPI route handlers via Depends().
+    The service is initialized once during app startup and reused
+    for all requests.
+    Returns:
+        The initialized inference service, or None if not yet initialized.
+    Example:
+        ```python
+        @router.post("/predict")
+        async def predict(
+            request: ImageRequest,
+            service: InferenceService = Depends(get_inference_service)
+        ):
+            return await service.predict(request)
+        ```
+    """
+    return _inference_service
+def _set_inference_service(service: InferenceService) -> None:
+    """
+    INTERNAL: Set the global inference service instance.
+    Called during application startup to register the service.
+    This is marked as internal (prefixed with _) because it should
+    only be called from the lifespan handler below.
+    Args:
+        service: The initialized inference service instance.
+    """
+    global _inference_service
+    _inference_service = service
+@asynccontextmanager
+async def lifespan(app: FastAPI) -> AsyncGenerator[None, None]:
+    """
+    Application lifespan manager.
+    Handles startup and shutdown events for the FastAPI application.
+    During startup, it initializes and loads the inference service.
+    CUSTOMIZATION POINT FOR GRAD STUDENTS:
+    To use your own model, replace ResNetInferenceService below with
+    your implementation that subclasses InferenceService.
+    Example:
+        ```python
+        service = MyCustomService(model_name="my-org/my-model")
+        await service.load_model()
+        _set_inference_service(service)
+        ```
+    """
+    logger.info("Starting ML Inference Service...")
+    try:
+        with warnings.catch_warnings():
+            warnings.filterwarnings("ignore", category=FutureWarning)
+            service = ResNetInferenceService(
+                model_name="microsoft/resnet-18"
+            )
+            await service.load_model()
+            _set_inference_service(service)
+        logger.info("Startup completed successfully")
+    except Exception as e:
+        logger.error(f"Startup failed: {e}")
+        raise
+    yield
+    logger.info("Shutting down...")
 def create_app() -> FastAPI:
+    """
+    Create and configure the FastAPI application.
+    This is the main entry point for the application. It:
+    1. Creates a FastAPI instance with metadata from settings
+    2. Attaches the lifespan handler for startup/shutdown
+    3. Registers API routes
+    Returns:
+        Configured FastAPI application instance.
+    """
     app = FastAPI(
         title=settings.app_name,
         description="ML inference service for image classification",
         lifespan=lifespan
     )
     app.include_router(prediction.router)
     return app

app/core/config.py DELETED Viewed

@@ -1,29 +0,0 @@
-"""
-Basic configuration management.
-Starting simple - just app settings. We'll expand as needed.
-"""
-from pydantic import Field
-from pydantic_settings import BaseSettings  # Changed import
-class Settings(BaseSettings):
-    """Application settings with environment variable support."""
-    # Basic app settings
-    app_name: str = Field(default="ML Inference Service", description="Application name")
-    app_version: str = Field(default="0.1.0", description="Application version")
-    debug: bool = Field(default=False, description="Debug mode")
-    # Server settings
-    host: str = Field(default="0.0.0.0", description="Server host")
-    port: int = Field(default=8000, description="Server port")
-    class Config:
-        """Load from .env file if it exists."""
-        env_file = ".env"
-# Global settings instance
-settings = Settings()

app/core/dependencies.py DELETED Viewed

@@ -1,19 +0,0 @@
-"""
-Dependency injection for FastAPI.
-"""
-from typing import Optional
-from app.services.inference import ResNetInferenceService
-# Global service instance
-_resnet_service: Optional[ResNetInferenceService] = None
-def get_resnet_service() -> Optional[ResNetInferenceService]:
-    """Get the ResNet service instance."""
-    return _resnet_service
-def set_resnet_service(service: ResNetInferenceService) -> None:
-    """Set the global ResNet service instance."""
-    global _resnet_service
-    _resnet_service = service

app/core/lifespan.py DELETED Viewed

@@ -1,43 +0,0 @@
-"""
-Application lifespan management.
-"""
-import warnings
-from contextlib import asynccontextmanager
-from typing import AsyncGenerator
-from fastapi import FastAPI
-from app.core.logging import logger
-from app.core.dependencies import set_resnet_service
-from app.services.inference import ResNetInferenceService
-@asynccontextmanager
-async def lifespan(app: FastAPI) -> AsyncGenerator[None, None]:
-    """Application lifespan manager."""
-    # Startup
-    logger.info("Starting ML Inference Service...")
-    try:
-        with warnings.catch_warnings():
-            warnings.filterwarnings("ignore", category=FutureWarning)
-            # Initialize and load ResNet service
-            resnet_service = ResNetInferenceService(
-                model_name="microsoft/resnet-18",
-                use_local_model=True
-            )
-            resnet_service.load_model()
-            set_resnet_service(resnet_service)
-        logger.info("Startup completed successfully")
-    except Exception as e:
-        logger.error(f"Startup failed: {e}")
-        raise
-    yield  # App runs here
-    # Shutdown
-    logger.info("Shutting down...")

app/services/base.py ADDED Viewed

	@@ -0,0 +1,135 @@

+"""
+Abstract base class for ML inference services.
+This module defines the contract that all inference services must implement.
+Grad students should subclass `InferenceService` and implement the abstract methods
+to integrate their models with the serving infrastructure.
+"""
+from abc import ABC, abstractmethod
+from typing import Generic, TypeVar
+from pydantic import BaseModel
+# Type variables for request and response models
+TRequest = TypeVar('TRequest', bound=BaseModel)
+TResponse = TypeVar('TResponse', bound=BaseModel)
+class InferenceService(ABC, Generic[TRequest, TResponse]):
+    """
+    Abstract base class for ML inference services.
+    This class defines the interface that all model serving implementations must follow.
+    By subclassing this and implementing the abstract methods, you can integrate any
+    ML model with the serving infrastructure.
+    Type Parameters:
+        TRequest: Pydantic model for input requests (e.g., ImageRequest, TextRequest)
+        TResponse: Pydantic model for prediction responses (e.g., PredictionResponse)
+    Example:
+        ```python
+        class MyModelService(InferenceService[MyRequest, MyResponse]):
+            async def load_model(self) -> None:
+                # Load your model here
+                self.model = torch.load("my_model.pt")
+                self._is_loaded = True
+            async def predict(self, request: MyRequest) -> MyResponse:
+                # Run inference
+                output = self.model(request.data)
+                return MyResponse(result=output)
+            @property
+            def is_loaded(self) -> bool:
+                return self._is_loaded
+        ```
+    """
+    @abstractmethod
+    async def load_model(self) -> None:
+        """
+        Load the model weights and any required processors/tokenizers.
+        This method is called once during application startup (in the lifespan handler).
+        Use this to:
+        - Load model weights from disk
+        - Initialize processors, tokenizers, or other preprocessing components
+        - Set up any required state
+        - Perform model warmup if needed
+        Raises:
+            FileNotFoundError: If model files don't exist
+            RuntimeError: If model loading fails
+        """
+        pass
+    @abstractmethod
+    async def predict(self, request: TRequest) -> TResponse:
+        """
+        Run inference on the input request and return a typed response.
+        This method is called for each prediction request. It should:
+        1. Extract input data from the request
+        2. Preprocess the input (if needed)
+        3. Run the model inference
+        4. Post-process the output
+        5. Return a Pydantic response model
+        Args:
+            request: Input request containing the data to predict on.
+                    Type is specified by the TRequest type parameter.
+        Returns:
+            Typed Pydantic response model containing predictions.
+            Type is specified by the TResponse type parameter.
+        Raises:
+            ValueError: If input data is invalid
+            RuntimeError: If model inference fails
+        Important - Background Threading:
+            For CPU-intensive operations (like deep learning inference), you MUST
+            offload computation to a background thread to avoid blocking the event loop.
+            Pattern to follow:
+            ```python
+            import asyncio
+            def _predict_sync(self, request: TRequest) -> TResponse:
+                # Heavy CPU work here (PyTorch, TensorFlow, etc.)
+                result = self.model(data)
+                return TResponse(result=result)
+            async def predict(self, request: TRequest) -> TResponse:
+                # Offload to thread pool
+                return await asyncio.to_thread(self._predict_sync, request)
+            ```
+            Why this matters:
+            - Inference can take 1-3+ seconds and will freeze the server
+            - asyncio.to_thread() runs the work in a background thread
+            - The event loop stays responsive to handle other requests
+        """
+        pass
+    @property
+    @abstractmethod
+    def is_loaded(self) -> bool:
+        """
+        Check if the model is loaded and ready for inference.
+        Returns:
+            True if model is loaded and ready, False otherwise.
+        Example:
+            ```python
+            @property
+            def is_loaded(self) -> bool:
+                return self.model is not None and self._is_loaded
+            ```
+        """
+        pass

app/services/inference.py CHANGED Viewed

@@ -1,68 +1,92 @@
 """
-Inference service for machine learning models.
-This service handles the business logic for ML inference,
-following the Single Responsibility Principle.
 """
 import os
-from typing import Dict, Any
 import torch
 from PIL import Image
 from transformers import AutoImageProcessor, ResNetForImageClassification
 from app.core.logging import logger
-class ResNetInferenceService:
     """
-    ResNet inference service.
-    Handles loading and inference for ResNet models.
-    Follows the Singleton pattern - loads model once.
     """
-    def __init__(self, model_name: str = "microsoft/resnet-18", use_local_model: bool = True):
         """
         Initialize the ResNet service.
         Args:
-            model_name: HuggingFace model identifier
         """
         self.model_name = model_name
-        self.use_local_model = use_local_model
         self.model = None
         self.processor = None
         self._is_loaded = False
-        if use_local_model:
-            self.model_path = os.path.join("models", model_name.split("/")[-1])
-            logger.info(f"Initializing ResNet service with local model: {self.model_path}")
-        else:
-            self.model_path = model_name
-            logger.info(f"Initializing ResNet service with remote model: {model_name}")
-    def load_model(self) -> None:
         """
         Load the ResNet model and processor.
-        This method loads the model once and reuses it for all requests.
         """
         if self._is_loaded:
             logger.debug("Model already loaded, skipping...")
             return
         try:
-            if self.use_local_model:
-                if not os.path.exists(self.model_path):
-                    raise FileNotFoundError(f"Local model directory not found: {self.model_path}")
-                config_path = os.path.join(self.model_path, "config.json")
-                if not os.path.exists(config_path):
-                    raise FileNotFoundError(f"Model config not found: {config_path}")
-                logger.info(f"Loading ResNet model from local directory: {self.model_path}")
-            else:
-                logger.info(f"Loading ResNet model from HuggingFace Hub: {self.model_name}")
             # Suppress warnings during model loading
             import warnings
@@ -70,17 +94,15 @@ class ResNetInferenceService:
                 warnings.filterwarnings("ignore", category=FutureWarning)
                 warnings.filterwarnings("ignore", message="Could not find image processor class")
-                # Load processor and model from local directory or remote
                 self.processor = AutoImageProcessor.from_pretrained(
                     self.model_path,
-                    local_files_only=self.use_local_model
                 )
                 self.model = ResNetForImageClassification.from_pretrained(
                     self.model_path,
-                    local_files_only=self.use_local_model
                 )
             self._is_loaded = True
             logger.info("ResNet model loaded successfully")
             logger.info(f"Model architecture: {self.model.config.architectures}")
@@ -88,64 +110,87 @@ class ResNetInferenceService:
         except Exception as e:
             logger.error(f"Failed to load ResNet model: {e}")
-            if self.use_local_model:
-                logger.error("Hint: Make sure the model was downloaded correctly with dwl.bash")
             raise
-    def predict(self, image: Image.Image) -> Dict[str, Any]:
         """
-        Perform inference on an image.
         Args:
-            image: PIL Image to classify
         Returns:
-            Dictionary containing prediction results
         Raises:
-            RuntimeError: If model is not loaded
-            ValueError: If image processing fails
         """
-        if not self._is_loaded:
-            logger.info("Model not loaded, loading now...")
-            self.load_model()
         try:
-            logger.debug("Starting ResNet inference")
             if image.mode != 'RGB':
                 image = image.convert('RGB')
-                logger.debug(f"Converted image from {image.mode} to RGB")
             inputs = self.processor(image, return_tensors="pt")
-            # Perform inference
             with torch.no_grad():
                 logits = self.model(**inputs).logits
-            # Get prediction
             predicted_label = logits.argmax(-1).item()
             predicted_class = self.model.config.id2label[predicted_label]
-            # Calculate confidence score
             probabilities = torch.nn.functional.softmax(logits, dim=-1)
             confidence = probabilities[0][predicted_label].item()
-            result = {
-                "prediction": predicted_class,
-                "confidence": round(confidence, 4),
-                "model": self.model_name,
-                "predicted_label": predicted_label
-            }
             logger.debug(f"Inference completed: {predicted_class} (confidence: {confidence:.4f})")
-            return result
         except Exception as e:
             logger.error(f"Inference failed: {e}")
             raise ValueError(f"Failed to process image: {str(e)}")
     @property
     def is_loaded(self) -> bool:
         """Check if model is loaded."""

 """
+Inference service for ResNet image classification models.
+This module provides an EXAMPLE implementation of the InferenceService ABC.
+Grad students should use this as a reference when implementing their own model services.
+This example demonstrates:
+- How to load a HuggingFace transformer model
+- How to preprocess image inputs
+- How to return typed Pydantic responses
+- How to use background threading for CPU-intensive inference
+- Proper error handling and logging
 """
 import os
+import base64
+import asyncio
+from io import BytesIO
 import torch
 from PIL import Image
 from transformers import AutoImageProcessor, ResNetForImageClassification
 from app.core.logging import logger
+from app.services.base import InferenceService
+from app.api.models import ImageRequest, PredictionResponse
+class ResNetInferenceService(InferenceService[ImageRequest, PredictionResponse]):
     """
+    EXAMPLE: ResNet inference service implementation.
+    This is a reference implementation showing how to integrate a HuggingFace
+    image classification model with the serving infrastructure.
+    To create your own service:
+    1. Subclass InferenceService[YourRequest, YourResponse]
+    2. Implement load_model() to load your model
+    3. Implement predict() to run inference and return typed response
+    4. Implement the is_loaded property
+    This service loads a ResNet-18 model for ImageNet classification.
     """
+    def __init__(self, model_name: str = "microsoft/resnet-18"):
         """
         Initialize the ResNet service.
         Args:
+            model_name: Model identifier (e.g., "microsoft/resnet-18").
+                       Model files must exist in models/{model_name}/ directory.
+                       The full org/model structure is preserved.
+        Example:
+            For model_name="microsoft/resnet-18", expects files at:
+            models/microsoft/resnet-18/config.json
+            models/microsoft/resnet-18/pytorch_model.bin
+            etc.
         """
         self.model_name = model_name
         self.model = None
         self.processor = None
         self._is_loaded = False
+        # Preserve full org/model path structure
+        self.model_path = os.path.join("models", model_name)
+        logger.info(f"Initializing ResNet service with local model: {self.model_path}")
+    async def load_model(self) -> None:
         """
         Load the ResNet model and processor.
+        This method loads the model once during startup and reuses it for all requests.
+        Called by the application lifespan handler.
         """
         if self._is_loaded:
             logger.debug("Model already loaded, skipping...")
             return
         try:
+            if not os.path.exists(self.model_path):
+                raise FileNotFoundError(
+                    f"Model directory not found: {self.model_path}\n"
+                    f"Make sure the model files are downloaded to the correct location."
+                )
+            config_path = os.path.join(self.model_path, "config.json")
+            if not os.path.exists(config_path):
+                raise FileNotFoundError(f"Model config not found: {config_path}")
+            logger.info(f"Loading ResNet model from: {self.model_path}")
             # Suppress warnings during model loading
             import warnings
                 warnings.filterwarnings("ignore", category=FutureWarning)
                 warnings.filterwarnings("ignore", message="Could not find image processor class")
                 self.processor = AutoImageProcessor.from_pretrained(
                     self.model_path,
+                    local_files_only=True
                 )
                 self.model = ResNetForImageClassification.from_pretrained(
                     self.model_path,
+                    local_files_only=True
                 )
             self._is_loaded = True
             logger.info("ResNet model loaded successfully")
             logger.info(f"Model architecture: {self.model.config.architectures}")
         except Exception as e:
             logger.error(f"Failed to load ResNet model: {e}")
+            logger.error(f"Hint: Ensure model files exist at: {self.model_path}")
             raise
+    def _predict_sync(self, request: ImageRequest) -> PredictionResponse:
         """
+        INTERNAL: Synchronous prediction logic that runs in a background thread.
+        This method contains all CPU-intensive operations (image decoding,
+        preprocessing, PyTorch inference). It's called from predict() via
+        asyncio.to_thread() to avoid blocking the event loop.
         Args:
+            request: ImageRequest containing base64-encoded image data
         Returns:
+            PredictionResponse with prediction, confidence, and metadata
         Raises:
+            ValueError: If image decoding or processing fails
         """
         try:
+            logger.debug("Starting ResNet inference in background thread")
+            image_data = base64.b64decode(request.image.data)
+            image = Image.open(BytesIO(image_data))
             if image.mode != 'RGB':
+                logger.debug(f"Converting image from {image.mode} to RGB")
                 image = image.convert('RGB')
             inputs = self.processor(image, return_tensors="pt")
             with torch.no_grad():
                 logits = self.model(**inputs).logits
             predicted_label = logits.argmax(-1).item()
             predicted_class = self.model.config.id2label[predicted_label]
             probabilities = torch.nn.functional.softmax(logits, dim=-1)
             confidence = probabilities[0][predicted_label].item()
             logger.debug(f"Inference completed: {predicted_class} (confidence: {confidence:.4f})")
+            return PredictionResponse(
+                prediction=predicted_class,
+                confidence=round(confidence, 4),
+                model=self.model_name,
+                predicted_label=predicted_label,
+                mediaType=request.image.mediaType
+            )
         except Exception as e:
             logger.error(f"Inference failed: {e}")
             raise ValueError(f"Failed to process image: {str(e)}")
+    async def predict(self, request: ImageRequest) -> PredictionResponse:
+        """
+        Perform inference on an image request.
+        This method demonstrates proper async handling for CPU-intensive operations.
+        The actual inference work is offloaded to a background thread using
+        asyncio.to_thread(), which prevents blocking the event loop.
+        Args:
+            request: ImageRequest containing base64-encoded image data
+        Returns:
+            PredictionResponse with prediction, confidence, and metadata
+        Raises:
+            RuntimeError: If model is not loaded
+            ValueError: If image decoding or processing fails
+        """
+        if not self._is_loaded:
+            logger.warning("Model not loaded, loading now...")
+            await self.load_model()
+        response = await asyncio.to_thread(self._predict_sync, request)
+        return response
     @property
     def is_loaded(self) -> bool:
         """Check if model is loaded."""

test_main.http CHANGED Viewed

@@ -1,6 +1,7 @@
-# Test ResNet Prediction Endpoint
-POST http://127.0.0.1:8000/predict/resnet
 Content-Type: application/json
 {

+# Test Prediction Endpoint
+# Works with any model configured at startup (default: ResNet-18)
+POST http://127.0.0.1:8000/predict
 Content-Type: application/json
 {