DeQA-Doc-Overall: Document Image Quality Assessment
DeQA-Doc-Overall is a vision-language model for assessing the overall quality of document images. It provides a quality score from 1 (bad) to 5 (excellent) that reflects the general visual quality of scanned or photographed documents.
Model Family
This model is part of the DeQA-Doc family, which includes three specialized models:
| Model | Description | HuggingFace |
|---|---|---|
| DeQA-Doc-Overall | Overall document quality (this model) | mapo80/DeQA-Doc-Overall |
| DeQA-Doc-Color | Color quality assessment | mapo80/DeQA-Doc-Color |
| DeQA-Doc-Sharpness | Sharpness/clarity assessment | mapo80/DeQA-Doc-Sharpness |
Quick Start
import torch
from transformers import AutoModelForCausalLM
from PIL import Image
# Load the model
model = AutoModelForCausalLM.from_pretrained(
"mapo80/DeQA-Doc-Overall",
trust_remote_code=True,
torch_dtype=torch.float16,
device_map="auto",
)
# Score an image
image = Image.open("document.jpg").convert("RGB")
score = model.score([image])
print(f"Overall Quality Score: {score.item():.2f} / 5.0")
Batch Processing
You can score multiple images at once:
images = [
Image.open("doc1.jpg").convert("RGB"),
Image.open("doc2.jpg").convert("RGB"),
Image.open("doc3.jpg").convert("RGB"),
]
scores = model.score(images)
for i, score in enumerate(scores):
print(f"Document {i+1}: {score.item():.2f} / 5.0")
Score Interpretation
| Score Range | Quality Level | Description |
|---|---|---|
| 4.5 - 5.0 | Excellent | Perfect quality, no visible defects |
| 3.5 - 4.5 | Good | Minor imperfections, highly readable |
| 2.5 - 3.5 | Fair | Noticeable issues but still usable |
| 1.5 - 2.5 | Poor | Significant quality problems |
| 1.0 - 1.5 | Bad | Severe degradation, hard to read |
Model Architecture
- Base Model: mPLUG-Owl2 (LLaMA2-7B + ViT-L Vision Encoder)
- Vision Encoder: CLIP ViT-L/14 (1024 visual tokens via Visual Abstractor)
- Language Model: LLaMA2-7B
- Training: Full fine-tuning on document quality datasets
- Input Resolution: Images are resized to 448x448 (with aspect ratio preservation)
Technical Details
| Property | Value |
|---|---|
| Model Size | ~16 GB (float16) |
| Parameters | ~7.2B |
| Input | RGB images (any resolution) |
| Output | Quality score (1.0 - 5.0) |
| Inference | ~2-3 seconds per image on A100 |
Hardware Requirements
| Setup | VRAM Required | Recommended |
|---|---|---|
| Full precision (fp32) | ~32 GB | A100, H100 |
| Half precision (fp16) | ~16 GB | A100, A40, RTX 4090 |
| With CPU offload | ~8 GB GPU + RAM | RTX 3090, RTX 4080 |
GPU Inference (Recommended)
model = AutoModelForCausalLM.from_pretrained(
"mapo80/DeQA-Doc-Overall",
trust_remote_code=True,
torch_dtype=torch.float16,
device_map="auto",
)
CPU Offload (Lower VRAM)
model = AutoModelForCausalLM.from_pretrained(
"mapo80/DeQA-Doc-Overall",
trust_remote_code=True,
torch_dtype=torch.float16,
device_map="auto",
offload_folder="/tmp/offload",
)
Installation
pip install torch transformers accelerate pillow sentencepiece protobuf
Note: Use transformers>=4.36.0 for best compatibility.
Use Cases
- Document Scanning QA: Automatically flag low-quality scans for re-scanning
- Archive Digitization: Prioritize documents needing restoration
- OCR Preprocessing: Filter images likely to produce poor OCR results
- Document Management: Sort and categorize documents by quality
- Quality Control: Automated quality checks in document processing pipelines
Example: Quality-Based Filtering
import torch
from transformers import AutoModelForCausalLM
from PIL import Image
from pathlib import Path
model = AutoModelForCausalLM.from_pretrained(
"mapo80/DeQA-Doc-Overall",
trust_remote_code=True,
torch_dtype=torch.float16,
device_map="auto",
)
# Filter documents by quality
def filter_by_quality(image_paths, min_score=3.0):
good_docs = []
bad_docs = []
for path in image_paths:
img = Image.open(path).convert("RGB")
score = model.score([img]).item()
if score >= min_score:
good_docs.append((path, score))
else:
bad_docs.append((path, score))
return good_docs, bad_docs
# Usage
docs = list(Path("documents/").glob("*.jpg"))
good, bad = filter_by_quality(docs, min_score=3.5)
print(f"Good quality: {len(good)} documents")
print(f"Need review: {len(bad)} documents")
Limitations
- Optimized for document images (forms, letters, reports, etc.)
- May not perform well on natural photos or artistic images
- Requires GPU with sufficient VRAM for efficient inference
- Score is subjective and based on training data distribution
Credits & Attribution
This model is based on the DeQA-Doc project by Junjie Gao et al., which won the Championship in the VQualA 2025 DIQA (Document Image Quality Assessment) Challenge.
Original Repository: https://github.com/Junjie-Gao19/DeQA-Doc
All credit for the research, training methodology, and model architecture goes to the original authors.
Citation
If you use this model in your research, please cite the original paper:
@inproceedings{deqadoc,
title={{DeQA-Doc}: Adapting {DeQA-Score} to Document Image Quality Assessment},
author={Gao, Junjie and Liu, Runze and Peng, Yingzhe and Yang, Shujian and Zhang, Jin and Yang, Kai and You, Zhiyuan},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop},
year={2025},
}
ArXiv: https://arxiv.org/abs/2507.12796
License
Apache 2.0
Related Models
- DeQA-Doc-Color - Color quality assessment
- DeQA-Doc-Sharpness - Sharpness assessment
- Downloads last month
- 46