|
|
--- |
|
|
title: Capsule Defect Detection and Segmentation with ConvNeXt+U-Net and FastAPI |
|
|
emoji: 💊 |
|
|
colorFrom: blue |
|
|
colorTo: gray |
|
|
sdk: docker |
|
|
app_port: 8000 |
|
|
pinned: True |
|
|
--- |
|
|
|
|
|
> **Note:** This repo contains only deployment/demo files. |
|
|
> For full source, notebooks, and complete code, see [Capsule Defect Detection and Segmentation with ConvNeXt+U-Net and FastAPI](https://github.com/Kev-HL/capsule-defect-segmentation-api). |
|
|
|
|
|
# Capsule Defect Detection and Segmentation with ConvNeXt+U-Net and FastAPI |
|
|
|
|
|
This project addresses a real-world computer vision challenge: detecting and localizing defects on medicinal capsules via image classification and segmentation. |
|
|
The aim is to deliver a complete pipeline—data preprocessing, model training and evaluation, and deployment, demonstrating practical ML engineering from scratch to API. |
|
|
|
|
|
--- |
|
|
|
|
|
## Main Repo |
|
|
|
|
|
This is a minimal clone with only the necessary files from the main repo. |
|
|
For full source, notebooks, and complete code, see [Capsule Defect Detection and Segmentation with ConvNeXt+U-Net and FastAPI](https://github.com/Kev-HL/capsule-defect-segmentation-api). |
|
|
|
|
|
--- |
|
|
|
|
|
## Project Overview |
|
|
|
|
|
End-to-end defect detection and localization using the **Capsule** class from the **MVTec AD dataset**. |
|
|
Key steps include: |
|
|
- Data preprocessing, formatting, and augmentation |
|
|
- Model design (pre-trained backbone + custom heads) |
|
|
- Training, evaluation, and hyperparameter tuning |
|
|
- Dockerized FastAPI deployment for inference |
|
|
|
|
|
*Portfolio project to showcase ML workflow and engineering.* |
|
|
|
|
|
--- |
|
|
|
|
|
## Key Results |
|
|
|
|
|
- Evaluation dataset: MVTec AD 'capsule' class, 70/15/15 train/val/test split |
|
|
- Quantitative results on test evaluation: |
|
|
- Classification accuracy: **83 %** |
|
|
- Classification defect-only accuracy: **75 %** |
|
|
- Defect presence accuracy: **91 %** |
|
|
- Segmentation quality (mIoU / Dice): **0.79 / 0.73** |
|
|
- Segmentation defect-only quality (mIoU / Dice): **0.70 / 0.55** |
|
|
- Model artifacts: |
|
|
- Original model size (.keras / SavedModel): **345 MB** |
|
|
- Raw Converted TFLite size (.tflite): **119 MB** |
|
|
- Optimized Converted TFLite size (.tflite): **31 MB** (Dynamic Range Quantization applied) |
|
|
- Container / runtime: |
|
|
- Docker image size: **317 MB** |
|
|
- Runtime used: **tflite-runtime + Uvicorn/FastAPI** |
|
|
- Avg inference latency (inference only, set tensor + invoke): **239 ms** |
|
|
- Avg inference latency (single POST request, measured): **271 ms** |
|
|
- Average memory usage during inference: **321 MB** |
|
|
- Startup time (local): **72 ms** |
|
|
- Observations: |
|
|
- The app returns expected visualizations and class labels for the MVTec-style test images. |
|
|
- POST inference latency measured locally, expect increased latency on real use (network delays) |
|
|
- Given the small and highly imbalanced dataset (351 samples, 242 'good' and 109 defective distributed in 5 defect types, ~22 per defect), coupled with the nature of the samples (only distinctive feature is the defect, which in most cases has a small size and varied shape), performance is not as strong as desired, and results lack statistical confidence for a real-case use. Without more data would be difficult to get a reasonable improvement. |
|
|
|
|
|
--- |
|
|
|
|
|
## Dataset |
|
|
|
|
|
- *Capsule* class from [MVTec AD dataset](https://www.mvtec.com/company/research/datasets/mvtec-ad) |
|
|
- License: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) |
|
|
- Dataset folder contains license file |
|
|
- Usage is strictly non-commercial/educational |
|
|
|
|
|
--- |
|
|
|
|
|
## Tech Stack |
|
|
|
|
|
- Python |
|
|
- TensorFlow |
|
|
- Scikit-Learn |
|
|
- Numpy / Pandas |
|
|
- OpenCV / Pillow |
|
|
- Ray Tune (Experiment tracking) |
|
|
- OmegaConf (Config management) |
|
|
- Docker, FastAPI, Uvicorn (Deployment) |
|
|
|
|
|
--- |
|
|
|
|
|
## Folder Structure |
|
|
|
|
|
``` |
|
|
data/ # Dataset and annotations |
|
|
app/ # Inference and deployment code and files |
|
|
models/ # Saved trained models and training logs |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## How to Run |
|
|
|
|
|
**Build image for deployment:** |
|
|
- Requirements: |
|
|
- `models/final_model/final_model.tflite` (included) |
|
|
- `app/` folder and contents (included) |
|
|
- `Dockerfile` (included) |
|
|
- `.dockerignore` (included) |
|
|
- From the project root, build and run the Docker image: |
|
|
```sh |
|
|
docker build -t cv-app . |
|
|
docker run -p 8000:8000 cv-app |
|
|
``` |
|
|
- Open http://0.0.0.0:8000 in your browser to access the demo UI |
|
|
|
|
|
_Note: For the full source code and steps on how to recreate the model, visit the full repo (see "Main Repo" section near the top)_ |
|
|
|
|
|
--- |
|
|
|
|
|
## Citations & References |
|
|
|
|
|
**Backbone architectures:** |
|
|
- EfficientNetV2: [EfficientNetV2: Smaller Models and Faster Training](https://arxiv.org/abs/2104.00298) (Mingxing Tan, Quoc V. Le. ICML 2021) |
|
|
- MobileNetV3: [Searching for MobileNetV3](https://arxiv.org/abs/1905.02244) (Andrew Howard et al. ICCV 2019) |
|
|
- ConvNeXt: [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) (Zhuang Liu et al. CVPR 2022) |
|
|
|
|
|
**Output heads architectures:** |
|
|
_Not directly implemented, but inspired by:_ |
|
|
- FCN: [Fully Convolutional Networks for Semantic Segmentation](https://arxiv.org/abs/1411.4038) (Jonathan Long, Evan Shelhamer, Trevor Darrell. CVPR 2015) |
|
|
- U-Net: [U-Net: Convolutional Networks for Biomedical Image Segmentation](https://arxiv.org/abs/1505.04597) (Olaf Ronneberger, Philipp Fischer, Thomas Brox. MICCAI 2015) |
|
|
|
|
|
--- |
|
|
|
|
|
## Contact |
|
|
|
|
|
For questions reach out via GitHub (Kev-HL). |
|
|
|