Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,76 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
---
|
| 4 |
+
|
| 5 |
+
# Anime Classifiers
|
| 6 |
+
|
| 7 |
+
[Training/inference code](https://github.com/city96/CityClassifiers) | [Live Demo](https://huggingface.co/spaces/city96/AnimeClassifiers-demo)
|
| 8 |
+
|
| 9 |
+
|
| 10 |
+
These are models that predict whether a concept is present in an image. The performance on high resolution images isn't very good, especially when detecting subtle image effects such as noise. This is due to CLIP using a fairly low resolution (336x336/224x224).
|
| 11 |
+
|
| 12 |
+
To combat this, tiling is used at inference time. The input image is first downscaled to 1536 (shortest edge - See `TF.functional.resize`), then 5 separate 512x512 areas are selected (4 corners + center - See `TF.functional.five_crop`). This helps as the downscale factor isn't nearly as drastic as passing the entire image to CLIP. As a bonus, it also avoids the issues with odd aspect ratios requiring cropping or letterboxing to work.
|
| 13 |
+
|
| 14 |
+

|
| 15 |
+
|
| 16 |
+
As for the training, it will be detailed in the sections below for the individual classifiers. At first, specialized models will be trained to a relatively high accuracy, building up a high quality but specific dataset in the process.
|
| 17 |
+
|
| 18 |
+
Then, these models will be used to split/sort each other's the datasets. The code will need to be updated to support one image being part of more than one class, but the final result should be a clean dataset where each target aspect acts as a "tag" rather than a class.
|
| 19 |
+
|
| 20 |
+
## Architecture
|
| 21 |
+
|
| 22 |
+
The base model itself is fairly simple. It takes embeddings from a CLIP model (in this case, `openai/clip-vit-large-patch14`) and expands them to 1024 dimensions. From there, a single block with residuals is followed by a few linear layers which converge down to the final output.
|
| 23 |
+
|
| 24 |
+
For the classifier models, the final output goes through `nn.Softmax`.
|
| 25 |
+
|
| 26 |
+
# Models
|
| 27 |
+
|
| 28 |
+
## Future/planned
|
| 29 |
+
|
| 30 |
+
- Unified (by joining the datasets of the other classifiers)
|
| 31 |
+
- Compression (jpg/webp/gif/dithering/etc)
|
| 32 |
+
- Noise
|
| 33 |
+
|
| 34 |
+
## ChromaticAberration - Anime
|
| 35 |
+
|
| 36 |
+
### Design goals
|
| 37 |
+
|
| 38 |
+
The goal was to detect [chromatic aberration](https://en.wikipedia.org/wiki/Chromatic_aberration?useskin=vector) in images.
|
| 39 |
+
|
| 40 |
+
For some odd reason, this effect has become a popular post processing effect to apply to images and drawings. While attempting to train an ESRGAN model, I noticed an odd halo around images and quickly figured out that this effect was the cause. This classifier aims to work as a base filter to remove such images from the dataset.
|
| 41 |
+
|
| 42 |
+
### Issues
|
| 43 |
+
|
| 44 |
+
- Seems to get confused by excessive HSV noise
|
| 45 |
+
- Triggers even if the effect is only applied to the background
|
| 46 |
+
- Sometimes triggers on rough linework/sketches (i.e. multiple semi-transparent lines overlapping)
|
| 47 |
+
- Low accuracy on 3D/2.5D with possible false positives.
|
| 48 |
+
|
| 49 |
+
### Training
|
| 50 |
+
|
| 51 |
+
The training settings can be found in the `config/CCAnime-ChromaticAberration-v1.yaml` file (7e-6 LR, cosine scheduler, 100K steps).
|
| 52 |
+
|
| 53 |
+

|
| 54 |
+
|
| 55 |
+

|
| 56 |
+
|
| 57 |
+
|
| 58 |
+
Final dataset score distribution for v1.16:
|
| 59 |
+
```
|
| 60 |
+
3215 images in dataset.
|
| 61 |
+
0_reg - 395 ||||
|
| 62 |
+
0_reg_booru - 1805 ||||||||||||||||||||||
|
| 63 |
+
1_chroma - 515 ||||||
|
| 64 |
+
1_synthetic - 500 ||||||
|
| 65 |
+
|
| 66 |
+
Class ratios:
|
| 67 |
+
00 - 2200 |||||||||||||||||||||||||||
|
| 68 |
+
01 - 1015 ||||||||||||
|
| 69 |
+
```
|
| 70 |
+
|
| 71 |
+
Version history:
|
| 72 |
+
|
| 73 |
+
- v1.0 - Initial test model, dataset is fully synthetic (500 images). Effect added by shifting red/blue channel by a random amount using chaiNNer.
|
| 74 |
+
- v1.1 - Added 300 images tagged "chromatic_aberration" from gelbooru. Added first 1000 images from danbooru2021 as reg images
|
| 75 |
+
- v1.2 - Used the newly trained predictor to filter the existing datasets - found ~70 positives in the reg set and ~30 false positives in the target set.
|
| 76 |
+
- v1.3-v1.16 - Repeatedly ran predictor against various datasets, adding false positives/negatives back into the dataset, sometimes running against the training set to filter out misclassified images as the predictor got better. Added/removed images were manually checked (My eyes hurt).
|