Spaces:

Kev-HL
/

capsule-defect-segmentation-api

Sleeping

App Files Files Community

capsule-defect-segmentation-api / README.md

Kev-HL

Added citations

eb20f84 22 days ago

preview code

raw

history blame contribute delete

5.28 kB

	---
	title: Capsule Defect Detection and Segmentation with ConvNeXt+U-Net and FastAPI
	emoji: 💊
	colorFrom: blue
	colorTo: gray
	sdk: docker
	app_port: 8000
	pinned: True
	---

	> Note: This repo contains only deployment/demo files.
	> For full source, notebooks, and complete code, see [Capsule Defect Detection and Segmentation with ConvNeXt+U-Net and FastAPI](https://github.com/Kev-HL/capsule-defect-segmentation-api).

	# Capsule Defect Detection and Segmentation with ConvNeXt+U-Net and FastAPI

	This project addresses a real-world computer vision challenge: detecting and localizing defects on medicinal capsules via image classification and segmentation.
	The aim is to deliver a complete pipeline—data preprocessing, model training and evaluation, and deployment, demonstrating practical ML engineering from scratch to API.

	---

	## Main Repo

	This is a minimal clone with only the necessary files from the main repo.
	For full source, notebooks, and complete code, see [Capsule Defect Detection and Segmentation with ConvNeXt+U-Net and FastAPI](https://github.com/Kev-HL/capsule-defect-segmentation-api).

	---

	## Project Overview

	End-to-end defect detection and localization using the Capsule class from the MVTec AD dataset.
	Key steps include:
	- Data preprocessing, formatting, and augmentation
	- Model design (pre-trained backbone + custom heads)
	- Training, evaluation, and hyperparameter tuning
	- Dockerized FastAPI deployment for inference

	Portfolio project to showcase ML workflow and engineering.

	---

	## Key Results

	- Evaluation dataset: MVTec AD 'capsule' class, 70/15/15 train/val/test split
	- Quantitative results on test evaluation:
	- Classification accuracy: 83 %
	- Classification defect-only accuracy: 75 %
	- Defect presence accuracy: 91 %
	- Segmentation quality (mIoU / Dice): 0.79 / 0.73
	- Segmentation defect-only quality (mIoU / Dice): 0.70 / 0.55
	- Model artifacts:
	- Original model size (.keras / SavedModel): 345 MB
	- Raw Converted TFLite size (.tflite): 119 MB
	- Optimized Converted TFLite size (.tflite): 31 MB (Dynamic Range Quantization applied)
	- Container / runtime:
	- Docker image size: 317 MB
	- Runtime used: tflite-runtime + Uvicorn/FastAPI
	- Avg inference latency (inference only, set tensor + invoke): 239 ms
	- Avg inference latency (single POST request, measured): 271 ms
	- Average memory usage during inference: 321 MB
	- Startup time (local): 72 ms
	- Observations:
	- The app returns expected visualizations and class labels for the MVTec-style test images.
	- POST inference latency measured locally, expect increased latency on real use (network delays)
	- Given the small and highly imbalanced dataset (351 samples, 242 'good' and 109 defective distributed in 5 defect types, ~22 per defect), coupled with the nature of the samples (only distinctive feature is the defect, which in most cases has a small size and varied shape), performance is not as strong as desired, and results lack statistical confidence for a real-case use. Without more data would be difficult to get a reasonable improvement.

	---

	## Dataset

	- Capsule class from [MVTec AD dataset](https://www.mvtec.com/company/research/datasets/mvtec-ad)
	- License: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
	- Dataset folder contains license file
	- Usage is strictly non-commercial/educational

	---

	## Tech Stack

	- Python
	- TensorFlow
	- Scikit-Learn
	- Numpy / Pandas
	- OpenCV / Pillow
	- Ray Tune (Experiment tracking)
	- OmegaConf (Config management)
	- Docker, FastAPI, Uvicorn (Deployment)

	---

	## Folder Structure

	```
	data/ # Dataset and annotations
	app/ # Inference and deployment code and files
	models/ # Saved trained models and training logs
	```

	---

	## How to Run

	Build image for deployment:
	- Requirements:
	- `models/final_model/final_model.tflite` (included)
	- `app/` folder and contents (included)
	- `Dockerfile` (included)
	- `.dockerignore` (included)
	- From the project root, build and run the Docker image:
	```sh
	docker build -t cv-app .
	docker run -p 8000:8000 cv-app
	```
	- Open http://0.0.0.0:8000 in your browser to access the demo UI

	_Note: For the full source code and steps on how to recreate the model, visit the full repo (see "Main Repo" section near the top)_

	---

	## Citations & References

	Backbone architectures:
	- EfficientNetV2: [EfficientNetV2: Smaller Models and Faster Training](https://arxiv.org/abs/2104.00298) (Mingxing Tan, Quoc V. Le. ICML 2021)
	- MobileNetV3: [Searching for MobileNetV3](https://arxiv.org/abs/1905.02244) (Andrew Howard et al. ICCV 2019)
	- ConvNeXt: [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) (Zhuang Liu et al. CVPR 2022)

	Output heads architectures:
	_Not directly implemented, but inspired by:_
	- FCN: [Fully Convolutional Networks for Semantic Segmentation](https://arxiv.org/abs/1411.4038) (Jonathan Long, Evan Shelhamer, Trevor Darrell. CVPR 2015)
	- U-Net: [U-Net: Convolutional Networks for Biomedical Image Segmentation](https://arxiv.org/abs/1505.04597) (Olaf Ronneberger, Philipp Fischer, Thomas Brox. MICCAI 2015)

	---

	## Contact

	For questions reach out via GitHub (Kev-HL).