File size: 23,956 Bytes

8d9dc17
 
 
 
b3e1b1c
38bc058
b3e1b1c
 
ec53f66
b3e1b1c
ec53f66
b3e1b1c
ec53f66
 
 
b3e1b1c
 
ec53f66
b3e1b1c
4a61d23
 
 
 
 
b3e1b1c
 
ec53f66
b3e1b1c
ec53f66
b3e1b1c
ec53f66
b3e1b1c
ec53f66
b3e1b1c
ec53f66
b3e1b1c
ec53f66
b3e1b1c
 
ec53f66
b3e1b1c
ec53f66
b3e1b1c
 
 
ec53f66
b3e1b1c
7dfa32a
b3e1b1c
 
7dfa32a
 
b3e1b1c
 
 
7dfa32a
b3e1b1c
 
7dfa32a
 
b3e1b1c
 
 
ec53f66
b3e1b1c
ec53f66
 
 
 
 
b3e1b1c
 
 
ec53f66
 
b3e1b1c
 
 
ec53f66
 
 
b3e1b1c
ec53f66
b3e1b1c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ec53f66
 
 
 
 
b3e1b1c
 
 
 
 
 
 
 
 
ec53f66
 
 
 
 
 
 
 
 
b3e1b1c
 
 
 
 
 
 
ec53f66
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b3e1b1c
 
 
 
 
 
ec53f66
38bc058
0b393b6
38bc058
c2feb3e
 
 
0b393b6
34cf378
0b393b6
c2feb3e
 
38bc058
0b393b6
c2feb3e
38bc058
0b393b6
c2feb3e
 
 
 
 
 
 
34cf378
38bc058
ec53f66
5ddae77
0b393b6
 
 
c2feb3e
0b393b6
 
c2feb3e
0b393b6
 
ec53f66
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0b393b6
38bc058
ec53f66
 
34cf378
0b393b6
 
5ddae77
 
 
 
ec53f66
5ddae77
34cf378
 
38bc058
5ddae77
34cf378
 
c2feb3e
 
 
 
 
34cf378
 
38bc058
0b393b6
 
 
c2feb3e
b3e1b1c
0b393b6
 
b3e1b1c
 
0b393b6
b3e1b1c
 
0b393b6
b3e1b1c
0b393b6
b3e1b1c
 
0b393b6
 
b3e1b1c
0b393b6
b3e1b1c
 
 
c2feb3e
b3e1b1c
 
c2feb3e
b3e1b1c
 
 
 
 
 
c2feb3e
0b393b6
 
 
5ddae77
a12ee73
5ddae77
0b393b6
5ddae77
 
0b393b6
5ddae77
 
 
 
 
 
0b393b6
5ddae77
 
 
a12ee73
0b393b6
5ddae77
 
 
a12ee73
0b393b6
5ddae77
 
0b393b6
a12ee73
 
 
5ddae77
a12ee73
 
5ddae77
 
 
 
 
 
 
0b393b6
5ddae77
0b393b6
5ddae77
 
0b393b6
5ddae77
 
0b393b6
a12ee73
5ddae77
 
0b393b6
 
 
5ddae77
a12ee73
5ddae77
 
 
 
 
 
 
0b393b6
5ddae77
 
b3e1b1c
 
 
 
 
 
ec53f66
 
 
b3e1b1c
0b393b6
5ddae77
0b393b6
5ddae77
0b393b6
 
 
 
 
 
 
5ddae77
0b393b6
 
 
 
34cf378
0b393b6
 
34cf378
38bc058
0b393b6
 
 
 
34cf378
38bc058
4f88f85
 
0b393b6
4f88f85
0b393b6
4f88f85
0b393b6
 
4f88f85
0b393b6
 
a12ee73
0b393b6
4f88f85
 
 
ec53f66
 
0b393b6
 
4f88f85
ec53f66
a12ee73
ec53f66
a12ee73
 
4f88f85
 
 
ec53f66
 
 
 
 
 
 
 
 
 
 
 
 
0b393b6
ec53f66
 
 
0b393b6
 
 
4f88f85
0b393b6
4f88f85
0b393b6
 
 
4f88f85
 
 
 
 
0b393b6
 
 
 
 
 
 
4f88f85
0b393b6
a12ee73
 
 
 
4f88f85
a12ee73
4f88f85
 
 
0b393b6
4f88f85
 
0b393b6
4f88f85
 
 
0b393b6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4f88f85
0b393b6
 
 
 
 
 
4f88f85
 
5ddae77
4f88f85
 
 
 
 
 
 
 
 
 
 
0b393b6
4f88f85
 
 
 
 
 
 
 
0b393b6
 
 
4f88f85
0b393b6
 
 
 
 
 
 
4f88f85
0b393b6
 
 
a12ee73
0b393b6
 
 
 
 
 
 
 
 
8d9dc17

---
license: apache-2.0
---

# Example Submission for the SAFE: Image Edit Detection and Localization Challenge 2025

This project provides a starting point for implementing a submission to the [SAFE: Image Edit Detection and Localization Challenge 2025](https://app.dyff.io/challenges/dc509a8c771b492b90c43012fde9a04f). You do not need to use this code to participate in the challenge.

## Clone this repository

To use the code and tools in this repository, [clone it](https://huggingface.co/docs/hub/en/repositories-getting-started#cloning-repositories) with `git`:

```
git clone https://huggingface.co/safe-challenge-2025/example-submission
```


# How to participate

To participate in the challenge, you need to do three things:

1. Visit the [challenge home page](https://app.dyff.io/challenges/dc509a8c771b492b90c43012fde9a04f) and sign up using the linked registration form. After verifying your email, you will receive access credentials for the submission platform.
2. Implement your detector model. You can [use this repository](#how-to-implement-a-detector) as a starting point, but you don't have to.
3. [Submit your detector model](#how-to-make-a-submission) for evaluation. You can build your submission package yourself and submit it [using a CLI tool](#submitting-using-the-dyff-api) (preferred), or you can [build your submission in a HuggingFace Space](#submitting-a-docker-huggingface-space) and submit the Space using a web form. 


# How to make a submission

The infrastructure for the challenge runs on [DSRI's Dyff platform](https://app.dyff.io). Submissions to the challenge must be in the form of a containerized web service that serves a simple JSON HTTP API.

If you're comfortable building a Docker image yourself, the [preferred way](#submitting-using-the-dyff-api) to make a submission is to upload and submit a built image using the [Dyff client](https://docs.dyff.io/python-api/dyff.client/).

Alternatively, you can create a Docker [HuggingFace Space](https://huggingface.co/new-space?sdk=docker) and [create submissions from the space](#submitting-a-docker-huggingface-space) using a [webform](https://challenge.dyff.io/submit). The advantage of using an HF Space is that it builds the Docker image for you. However, HF Spaces also have some limitations that you'll need to account for.

## General considerations

* Your submission will run **without Internet access** during evaluation. All of the files required to run your submission must be packaged along with it. You can either include files in the Docker image, or upload the files as a separate package and mount them in your application container during execution.


# Submitting using the Dyff API

If you're able to build a Docker image for your submission yourself, the preferred way to make submissions is via the Dyff API. We provide a command line tool (`challenge-cli.py`) in this repository to simplify the submission process. 

In the [terminology of Dyff](https://docs.dyff.io/tutorial/), the thing that you're submitting is an `InferenceService`. You can think of an `InferenceService` as a recipe for spinning up a Docker container that runs an HTTP server that serves an inference API. To create a new submission, you need to upload the Docker image that the service should run, and, optionally, a volume of files such as neural network weights that will be mounted in the container.

## Install the Dyff SDK

You need Python 3.10+ (3.12 recommended). We recommend you install into a virtual environment. If you're using this repository, you can install `dyff` and a few other useful dependencies as described in the [Quick Start](#quick-start) section:

```
# After installing the `uv` tool:
make setup
source venv/bin/activate
```

Or, you can install just `dyff` into a `venv` like this:

```
python3 -m venv venv
source venv/bin/activate
python3 -m pip install --upgrade dyff
```

## Install `skopeo`

To upload Docker images via the Dyff API, you need to have the [`skopeo` tool](https://github.com/containers/skopeo) in your PATH.

## Prepare the submission data

Before creating a submission, you need to build the Docker image that you want to submit locally. For example, running the `make docker-build` command in this repository will build a Docker image in your local Docker daemon with the name `safe-challenge-2025/example-submission:latest`. You can check that the image exists using the `docker images` command:

```
$ docker images
REPOSITORY                                TAG       IMAGE ID       CREATED        SIZE
safe-challenge-2025/example-submission    latest    b86a46d856f0   3 hours ago    1.86GB
...
```

If your submission includes large data files such as neural network weights, we recommend that you upload these separately from the Docker image and then arrange for them to be mounted in the running container at run-time. You can upload a local directory recursively to the Dyff platform. Once uploaded, you will get the ID of a Dyff `Model` resources that you can reference in your `InferenceService`.

If you're uploading your large files separately as a `Model`, you'll need to tell Dyff where to mount them in your container. When testing your system locally, you can use the `-v/--volume` flag with `docker run` to [mount a local directory](https://docs.docker.com/engine/storage/volumes/) in the container. Then, just make sure to specify the same mount path when creating your `InferenceService` in Dyff.

## Use the `challenge-cli` tool

This repository contains a CLI script that simplifies the submission process. Usage is like this:

```
$ python3 challenge-cli.py
Usage: challenge-cli.py [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  submit
  upload-submission
```

You create a new submission in two steps. First, you upload the submission files and create an `InferenceService`. Then, you create the actual `Submission` resource to tell the Dyff platform that you want to submit the `InferenceService` for the challenge.

### Upload submission files

To upload submission files, use the `upload-submission` command:

```
DYFF_API_TOKEN=<your token> python3 challenge-cli.py upload-submission [OPTIONS]
```

Notice that we're providing an access token via an environment variable.

This command creates a Dyff `Artifact` resource corresponding to your Docker image, an optional Dyff `Model` resource containing your uploaded model files, and a Dyff `InferenceService` resource that references the `Artifact` and the `Model`.

You need to provide your Account ID with `--account`, a name for your system with `--name`, and a Docker image name + tag with `--image`. The tool will create a Dyff `Artifact` resource representing the Docker image.

If your system serves the inference endpoint at a route other than `predict`, use the `--endpoint` flag to specify the correct route.

If you are uploading your large files in a separate data volume, use the `--volume` flag to specify the directory tree to upload, and use the `--volume-mount` flag to set the path where this direcctory should be mounted in the running container. When you use the flags, the tool creates a Dyff `Model` resource representing the uploaded files.

You can also use the `--artifact` and `--model` flags to provide the ID of an `Artifact` or `Model` that already exists instead of creating a new one. For example, if you always use the same Docker image but you mount different model weights in it for different submissions, you can create the Docker image `Artifact`  once, and then reference its ID with `--artifact` to avoid uploading it again.

### Submit your system for evaluation

After uploading the submission files, you create the actual `Submission` resource with:

```
DYFF_API_TOKEN=<your token> python3 challenge-cli.py submit [OPTIONS]
```

When submitting, you provide the ID of the `InferenceService` you created in the previous step in `--service`. You also provide your Account ID in `--account`, your Team ID for the challenge in `--team`, and the Task ID that you're submitting to in `--task`.


# Submitting a Docker HuggingFace Space

If you can't build a Docker image yourself, or if the steps above seem too confusing, you can make a submission without interacting with the Dyff API directly by submitting from a HuggingFace Space. HF Spaces that use the Docker SDK will build a Docker image from the contents of the repository associated with the Space when the Space is run. You can then grant the Dyff platform permission to pull this image and use a web form to trigger a new submission.

These are the steps to prepare a HF Space for making submissions to the challenge:

1. Create a new HuggingFace [**Organization**](https://huggingface.co/organizations/new) (**not a user account**) for your challenge team. **The length of your combined Organization name + Space name must be less than 47 characters** due to a limitation of the HuggingFace API.
2. Create a new `Space` within your `Organization`. The Space must use the [Docker SDK](https://huggingface.co/new-space?sdk=docker). **Private Spaces are OK and they will work with the submission process.** **The length of your combined Organization name + Space name must be less than 47 characters** due to a limitation of the HuggingFace API.
3. Create a file called `DYFF_TEAM` in the root directory of your HF Space. The contents of the file should be your Team ID (not your Account ID). This file allows our infrastructure to verify that your Team controls this HF Space.
4. Create a `Dockerfile` in your Space that builds your challenge submission image.
5. Run the Space; this will build the Docker image.

To make a challenge submission from your Space:

1. Add the [official SAFE Challenge user account](https://huggingface.co/safe-challenge-2025-submissions) as a Member of your organization with `read` permissions. **Make sure you are adding the correct user account;** the account name is `safe-challenge-2025-submissions`. This grants permission to our infrastructure to pull the Docker image built by your Space.
2. When you're ready to submit, use the [submission web form](https://challenge.dyff.io/submit) and enter the URL of your Space and the branch that you want to submit.

## Handling large models

There is a size limitation on Space repositories. If your submission contains large files (such as neural network weights), it may be too large to store in the space. In this case, you need to fetch your files from somewhere else **during the Docker build process**.

This means that your Dockerfile should contain something like this:

```
COPY download-my-model.sh ./ 
RUN ./download-my-model.sh
```

One convenient option is to create a seperate [HuggingFace Model repository](https://huggingface.co/new?owner=my-challenge-org) and use `git clone` in your Dockerfile to fetch the repository files.

## Handling private models

If access credentials are required to download your model files, you should provide them using the [Secrets feature](https://huggingface.co/docs/hub/spaces-overview#managing-secrets) of HuggingFace Spaces. **Do not hard-code credentials in your Dockerfile or anywhere else in your Space or Organization!**

Access credentials are necessary if you want to clone a private HuggingFace Model repository during your Docker build process.

Access the secrets as described in the [Secrets > Buildtime section](https://huggingface.co/docs/hub/spaces-sdks-docker#secrets). Remember that you can't download files at run-time because your system will not have access to the Internet.


# How to implement a detector

To implement a new detector that you can submit to the challenge, you need to implement an HTTP server that serves the required JSON API for inference requests. This repository contains a template that you can use as a starting point for implementing a detector in Python. You should be able to adapt this template easily to support common model formats such as neural networks built with PyTorch.

You are also free to build detectors with any other technologies and software stacks that you want, but you may have to figure out packaging on your own. All that's required of your submission is that it runs in a Docker container and that it supports the [required inference API](#api-reference).

## Quick Start

**Install `uv`:**
https://docs.astral.sh/uv/getting-started/installation/

**Local development:**
```bash
# Install dependencies
make setup
source venv/bin/activate

# Download the example model
make download

# Run it
make serve
```

In a second terminal:
```bash
# Process an example input
./prompt.sh cat.json
```

The server runs on `http://127.0.0.1:8000`. Check `/docs` for the interactive API documentation.

**Docker:**
```bash
# Build
make docker-build

# Run
make docker-run
```

The Docker container also runs the server at `http://127.0.0.1:8000`.

## What Happens When You Start the Server

```
INFO: Starting ML Inference Service...
INFO: Initializing ResNet service: models/microsoft/resnet-18
INFO: Loading model from models/microsoft/resnet-18
INFO: Model loaded: 1000 classes
INFO: Startup completed successfully
INFO: Uvicorn running on http://0.0.0.0:8000
```

If you see "Model directory not found", check that your model files exist at the expected path with the full org/model structure.

## Testing the API

By default, the server serves the inference API at `/predict`:

```bash
# Using curl
curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{
    "image": {
      "mediaType": "image/jpeg",
      "data": "<base64-encoded-image-data>"
    }
  }'
```

Example response:
```json
{
  "logprobs": [-0.859380304813385,-1.2701971530914307,-2.1918208599090576,-1.69235098361969],
  "localizationMask": {
    "mediaType":"image/png",
    "data":"iVBORw0KGgoAAAANSUhEUgAAA8AAAAKDAQAAAAD9Fl5AAAAAu0lEQVR4nO3NsREAMAgDMWD/nZMVKEwn1T5/FQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMCl3g5f+HC24TRhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAj70gwKsTlmdBwAAAABJRU5ErkJggg=="
  }
}
```

## Project Structure

```
example-submission/
├── main.py                        # Entry point
├── app/
│   ├── core/
│   │   ├── app.py                 # <= INSTANTIATE YOUR DETECTOR HERE
│   │   └── logging.py           
│   ├── api/
│   │   ├── models.py              # Request/response schemas
│   │   ├── controllers.py         # Business logic
│   │   └── routes/
│   │       └── prediction.py      # POST /predict
│   └── services/
│       ├── base.py                # <= YOUR DETECTOR IMPLEMENTS THIS INTERFACE
│       └── inference.py           # Example service based on ResNet-18
├── models/
│   └── microsoft/
│       └── resnet-18/             # Model weights and config
├── scripts/
│   ├── model_download.bash        # Downloads resnet-18
│   ├── generate_test_datasets.py  # Creates test datasets
│   └── test_datasets.py           # Runs inference on test datasets
├── Dockerfile
├── .env.example                   # Environment config template
├── cat.json                       # An example /predict request object
├── makefile
├── prompt.sh                      # Script that makes a /predict request
├── requirements.cpu.in              
├── requirements.cpu.txt
├── requirements.torch.cpu.in    
├── requirements.torch.cpu.in    
├── response.json                  # An example /predict response object
└──
```

## How to Plug In Your Own Model

To integrate your model, implement the `InferenceService` abstract class defined in `app/services/base.py`. You can follow the example implementation in `app/services/inference.py`, which is based on ResNet-18. After implementing the required interface, instantiate your model in the `lifespan()` function in `app/core/app.py`, replacing the `ResNetInferenceService` instance.

### Step 1: Create Your Service Class

```python
# app/services/your_model_service.py
from app.services.base import InferenceService
from app.api.models import ImageRequest, PredictionResponse

class YourModelService(InferenceService[ImageRequest, PredictionResponse]):
    def __init__(self, model_name: str):
        self.model_name = model_name
        self.model_path = f"models/{model_name}"
        self.model = None
        self._is_loaded = False

    def load_model(self) -> None:
        """Load your model here. Called once at startup."""
        self.model = load_your_model(self.model_path)
        self._is_loaded = True

    def predict(self, request: ImageRequest) -> PredictionResponse:
        """Actual inference happens here."""
        image = decode_base64_image(request.image.data)
        result = self.model(image)

        logprobs = ...
        mask = ...

        return PredictionResponse(
            logprobs=logprobs,
            localizationMask=mask,
        )

    @property
    def is_loaded(self) -> bool:
        return self._is_loaded
```

### Step 2: Register Your Service

Open `app/core/app.py` and find the lifespan function:

```python
# Change this line:
service = ResNetInferenceService(model_name="microsoft/resnet-18")

# To this:
service = YourModelService(...)
```

That's it. The `/predict` endpoint now serves your model.

### Model Files

Put your model files under the `models/` directory:

```
models/
└── your-org/
    └── your-model/
        ├── config.json
        ├── weights.bin
        └── (other files)
```

## GPU inference

The default configuration in this repo runs the model on CPU and does not contain the necessary dependencies for using GPUs.

To enable GPU inference, you need to:

1. Base your Docker image on an image that contains the CUDA system packages such as [this one](https://hub.docker.com/layers/nvidia/cuda/13.0.2-cudnn-runtime-ubuntu24.04/images/sha256-4d242f206abc4b9588a6506cce2d88932cc879849395aae3785075179718cc49). If you're using the `nvidia/cuda` images, you probably want one of the `-runtime-` tags, as the `-devel-` versions contain dependencies you probably don't need.
2. Install the GPU version of PyTorch (or whichever framework you use).
3. Use the PyTorch `.to()` function (or its equivalent in your framework) in the `load_model()` and `predict()` functions to move model weights and input data to and from the CUDA device.

## Configuration

Settings are managed via environment variables or a `.env` file. See `.env.example` for all available options.

**Default values:**
- `APP_NAME`: "ML Inference Service"
- `APP_VERSION`: "0.1.0"
- `DEBUG`: false
- `HOST`: "0.0.0.0"
- `PORT`: 8000
- `MODEL_NAME`: "microsoft/resnet-18"

**To customize:**
```bash
# Copy the example
cp .env.example .env

# Edit values
vim .env
```

Or set environment variables directly:
```bash
export MODEL_NAME="google/vit-base-patch16-224"
uvicorn main:app --reload
```



## API Reference

**Endpoint:** `POST /predict`

**Request:**
```json
{
  "image": {
    "mediaType": "image/jpeg",  // or "image/png"
    "data": "<base64 string>"
  }
}
```

To decode a request, first convert `.image.data` from a base64 string to binary data (i.e., a Python `bytes` string), then interpret the binary data as image data of the type specified in `.image.mediaType`. The `.image.mediaType` will be either `image/jpeg` or `image/png`.

**Response:**
```json
{
  "logprobs": [float],         // Log-probabilities of each label (length 4)
  "localizationMask": {        // [Optional] binary mask
    "mediaType": "image/png",  // Must be 'image/png'
    "data": "<base64 string>"  // Image data
  }
}
```

The `.logprobs` field must contain a list of `floats` of length `4`. Each index in the list corresponds to the log-probability of the associated label. The possible labels are describe in the `app.api.models.Labels` enumeration:

```python
Natural = 0
FullySynthesized = 1
LocallyEdited = 2
LocallySynthesized = 3
```

The `Synthesized` labels mean that the image was partially or fully synthesized by a tool such as a generative image model. The `LocallyEdited` label means that the image was manipulated in some way other than by synthesizing content, such as by copying and pasting content from another image using image editing software.

The `.localizationMask` field is optional, but you should populate it if your detector is capable of localizing its detections. The mask is a binary (`0/1`) bitmap encoded as a PNG image. A non-zero value for a pixel means that the detector thinks that that pixel has been manipulated. A Python function to convert a `numpy` array to a PNG mask is provided in `app.api.models.BinaryMask.from_numpy()`.

**Docs:**

The server in this repository serves API docs at the following endpoints:

- Swagger UI: `http://localhost:8000/docs`
- ReDoc: `http://localhost:8000/redoc`
- OpenAPI JSON: `http://localhost:8000/openapi.json`

## PyArrow Test Datasets

We've included a test dataset system for validating your model. It generates 100 standardized test cases covering normal inputs, edge cases, performance benchmarks, and model comparisons.

### Generate Datasets

```bash
python scripts/generate_test_datasets.py
```

This creates:
- `scripts/test_datasets/*.parquet` - Test data (images, requests, expected responses)
- `scripts/test_datasets/*_metadata.json` - Human-readable descriptions
- `scripts/test_datasets/datasets_summary.json` - Overview of all datasets

### Run Tests

```bash
# Start your service first
make serve
```

In another terminal:

```bash
# Quick test (5 samples per dataset)
python scripts/test_datasets.py --quick

# Full validation
python scripts/test_datasets.py

# Test specific category
python scripts/test_datasets.py --category edge_case
```

### Dataset Categories (25 datasets each)

**1. Standard Tests** (`standard_test_*.parquet`)
- Normal images: random patterns, shapes, gradients
- Common sizes: 224x224, 256x256, 299x299, 384x384
- Formats: JPEG, PNG
- Purpose: Baseline validation

**2. Edge Cases** (`edge_case_*.parquet`)
- Tiny images (32x32, 1x1)
- Huge images (2048x2048)
- Extreme aspect ratios (1000x50)
- Corrupted data, malformed requests
- Purpose: Test error handling

**3. Performance Benchmarks** (`performance_test_*.parquet`)
- Batch sizes: 1, 5, 10, 25, 50, 100 images
- Latency and throughput tracking
- Purpose: Performance profiling

**4. Model Comparisons** (`model_comparison_*.parquet`)
- Same inputs across different architectures
- Models: ResNet-18/50, ViT, ConvNext, Swin
- Purpose: Cross-model benchmarking

### Test Output

```
DATASET TESTING SUMMARY
============================================================
Datasets tested: 100
Successful datasets: 95
Failed datasets: 5
Total samples: 1,247
Overall success rate: 87.3%
Test duration: 45.2s

Performance:
  Avg latency: 123.4ms
  Median latency: 98.7ms
  p95 latency: 342.1ms
  Max latency: 2,341.0ms
  Requests/sec: 27.6

Category breakdown:
  standard: 25 datasets, 94.2% avg success
  edge_case: 25 datasets, 76.8% avg success
  performance: 25 datasets, 91.1% avg success
  model_comparison: 25 datasets, 89.3% avg success
```

## Common Issues

**Port 8000 already in use:**
```bash
# Find what's using it
lsof -i :8000

# Or just use a different port
uvicorn main:app --port 8080
```

**Model not loading:**
- Check the path: models should be in `models/<org>/<model-name>/`
- If you're trying to run the example ResNet-based model, make sure you ran `make download` to fetch the model weights.
- Check logs for the exact error

**Slow inference:**
- Inference runs on CPU by default
- For GPU: install CUDA PyTorch and modify service to use GPU device
- Consider using smaller models or quantization

## License

Apache 2.0