Spaces:

lucasgagneten
/

layoutlmv3-facturas-extractor

Sleeping

App Files Files Community

Lucas Gagneten commited on 25 days ago

Commit

809b92e

1 Parent(s): e5619d9

Interfaz mejorada

Browse files

Files changed (8) hide show

README.md +150 -1
app.py +33 -404
batch_processor.py +144 -0
config.py +61 -0
interface.py +155 -0
invoice_processor.py +279 -0
model_loader.py +75 -0
requirements.txt +7 -14

README.md CHANGED Viewed

@@ -11,4 +11,153 @@ license: mit
 short_description: LayoutLMv3 fine-tuned - Ner Facturas Extractor
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 short_description: LayoutLMv3 fine-tuned - Ner Facturas Extractor
 ---
+# 🇦🇷 Extractor de Datos de Facturas Argentinas
+Aplicación de extracción automática de datos de facturas argentinas usando LayoutLMv3 y DocTR.
+## 📁 Estructura del Proyecto
+```
+layoutlmv3-facturas-extractor/
+│
+├── app.py                    # Punto de entrada principal
+├── config.py                 # Configuración y constantes
+├── model_loader.py           # Carga de modelos (LayoutLMv3 y DocTR)
+├── invoice_processor.py      # Procesamiento de facturas individuales
+├── batch_processor.py        # Procesamiento por lotes y navegación
+├── interface.py              # Interfaz Gradio
+├── requirements.txt          # Dependencias
+└── README.md                 # Este archivo
+```
+## 🏗️ Arquitectura
+### 1. **config.py**
+- Define etiquetas NER
+- Mapeo de colores para visualización
+- Constantes de configuración
+### 2. **model_loader.py**
+- Clase `ModelManager`: Carga y gestiona modelos
+  - LayoutLMv3 para NER
+  - DocTR para OCR
+  - Manejo de dispositivos (CPU/GPU)
+### 3. **invoice_processor.py**
+- Clase `InvoiceProcessor`: Procesamiento completo de facturas
+  - `extract_ocr_data()`: Extracción de texto con DocTR
+  - `perform_ner()`: Predicción de entidades con LayoutLMv3
+  - `group_entities()`: Agrupación BIO y desduplicación
+  - `draw_annotations()`: Visualización de resultados
+  - `process_invoice()`: Pipeline completo
+### 4. **batch_processor.py**
+- Clase `BatchProcessor`: Procesa múltiples facturas
+- Clase `ResultNavigator`: Navegación entre resultados
+  - `go_next()`: Siguiente factura
+  - `go_prev()`: Factura anterior
+### 5. **interface.py**
+- Clase `GradioInterface`: Construcción de UI
+  - Carga de archivos
+  - Visualización de resultados
+  - Controles de navegación
+### 6. **app.py**
+- Inicializa todos los componentes
+- Lanza la aplicación
+## 🚀 Uso
+### Instalación
+```bash
+# Crear entorno virtual
+python -m venv venv
+# Activar entorno (Windows)
+venv\Scripts\activate
+# Activar entorno (Linux/Mac)
+source venv/bin/activate
+# Instalar dependencias
+pip install -r requirements.txt
+```
+### Ejecución
+```bash
+python app.py
+```
+La aplicación se abrirá en `http://localhost:7860`
+## 📦 Dependencias Principales
+- `gradio`: Interfaz de usuario
+- `transformers`: LayoutLMv3
+- `torch`: Framework de deep learning
+- `python-doctr`: OCR
+- `Pillow`: Procesamiento de imágenes
+- `numpy`: Operaciones numéricas
+## 🔧 Personalización
+### Cambiar modelo
+Edita `HUGGINGFACE_MODEL` en `config.py`:
+```python
+HUGGINGFACE_MODEL = "tu-usuario/tu-modelo"
+```
+### Añadir nuevas etiquetas
+Modifica `LABEL_LIST` en `config.py`:
+```python
+LABEL_LIST = [
+    'B-TU_NUEVA_ETIQUETA',
+    'I-TU_NUEVA_ETIQUETA',
+    # ...
+]
+```
+### Cambiar colores
+Ajusta `COLOR_PALETTE` en `config.py`
+## 🎯 Etiquetas Soportadas
+- ALICUOTA
+- COMPROBANTE_NUMERO
+- CONCEPTO_GASTO
+- FECHA
+- IVA
+- JURISDICCION_GASTO
+- NETO
+- PROVEEDOR_CUIT
+- PROVEEDOR_RAZON_SOCIAL
+- TIPO
+- TOTAL
+## 📝 Flujo de Procesamiento
+1. **Carga**: Usuario sube hasta 10 facturas
+2. **OCR**: DocTR extrae texto y coordenadas
+3. **NER**: LayoutLMv3 predice entidades
+4. **Agrupación**: Sistema BIO agrupa tokens
+5. **Desduplicación**: Selección de mejores candidatos
+6. **Visualización**: Anotación de imagen con bounding boxes
+7. **Navegación**: Usuario explora resultados
+## 🐛 Solución de Problemas
+### Error de memoria
+Reduce el tamaño del lote o las imágenes
+### Modelo no encontrado
+Verifica conexión a internet para descargar desde HuggingFace
+### Fuente no encontrada
+El sistema usará fuente por defecto si `arial.ttf` no está disponible
+## 📄 Licencia
+Este proyecto utiliza modelos pre-entrenados sujetos a sus respectivas licencias.

app.py CHANGED Viewed

@@ -1,416 +1,45 @@
-import gradio as gr
-import numpy as np
-from PIL import Image, ImageDraw, ImageFont
-import torch
-from transformers import AutoProcessor, LayoutLMv3ForTokenClassification
-from doctr.models import ocr_predictor
-from doctr.io import DocumentFile
-import os
-import warnings
-from io import BytesIO
-warnings.filterwarnings('ignore')
-# --- 1. Carga de Modelo y Procesador (CPU Habilitada) ---
-# MODELO DE HUGGING FACE FINE-TUNEADO
-HUGGINGFACE_MODEL = "lucasgagneten/layoutlmv3-argentine-invoices"
-# Define el dispositivo como CPU
-device = torch.device("cpu")
-print(f"Inferencia forzada al dispositivo: {device}")
-# Definir las etiquetas utilizadas durante el entrenamiento
-label_list = [
-    'B-ALICUOTA',
-    'B-COMPROBANTE_NUMERO',
-    'B-CONCEPTO_GASTO',
-    'B-FECHA',
-    'B-IVA',
-    'B-JURISDICCION_GASTO',
-    'B-NETO',
-    'B-PROVEEDOR_CUIT',
-    'B-PROVEEDOR_RAZON_SOCIAL',
-    'B-TIPO',
-    'B-TOTAL',
-    'I-COMPROBANTE_NUMERO',
-    'I-CONCEPTO_GASTO',
-    'I-JURISDICCION_GASTO',
-    'I-PROVEEDOR_CUIT',
-    'I-PROVEEDOR_RAZON_SOCIAL',
-    'I-TOTAL',
-    'O'
-    ]
-id2label = {i: label for i, label in enumerate(label_list)}
-label2id = {label: i for i, label in enumerate(label_list)}
-# Configuración de colores para las cajas delimitadoras
-color_palette = [
-    'red', 'blue', 'green', 'purple', 'orange', 'brown', 'pink', 'cyan',
-    'lime', 'olive', 'teal', 'magenta', 'navy', 'maroon', 'gold', 'silver',
-    'indigo', 'turquoise'
-]
-root_labels = set()
-for label in label_list:
-    if label != 'O':
-        root_label = label.split('-', 1)[-1]
-        root_labels.add(root_label)
-label2color = {}
-for i, root_label in enumerate(sorted(list(root_labels))):
-    label2color[root_label] = color_palette[i % len(color_palette)]
-# Cargar el modelo/procesador
-try:
-    loaded_processor = AutoProcessor.from_pretrained(HUGGINGFACE_MODEL, apply_ocr=False)
-    loaded_model = LayoutLMv3ForTokenClassification.from_pretrained(HUGGINGFACE_MODEL).to(device)
-    loaded_model.config.id2label = id2label
-    loaded_model.config.label2id = label2id
-    print(f"Modelo fine-tuneado cargado exitosamente desde Hugging Face: {HUGGINGFACE_MODEL} en CPU.")
-except Exception as e:
-    print(f"Error fatal al cargar el modelo o procesador desde Hugging Face: {e}")
-# Cargar el predictor OCR de DocTR
-doctr_model = ocr_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn', pretrained=True)
-# --- 2. Funciones de Procesamiento y Navegación ---
-def process_single_invoice(image: Image.Image, image_filename: str):
-    """
-    Realiza OCR, NER, y devuelve los resultados y la imagen anotada para una sola factura.
-    Retorna: nombre_archivo, imagen_anotada, tabla_resultados, json_resultados
-    """
-    # 1. OCR con DocTR (obtener texto y bboxes)
-    try:
-        rgb_image = image.convert("RGB")
-        img_byte_arr = BytesIO()
-        rgb_image.save(img_byte_arr, format='JPEG')
-        img_byte_arr.seek(0)
-        image_bytes = img_byte_arr.read()
-        doctr_doc = DocumentFile.from_images([image_bytes])
-    except Exception as e:
-        return image_filename, None, [["ERROR", f"DocTR Error: {e}"]], []
-    doctr_result = doctr_model(doctr_doc)
-    if not doctr_result.pages:
-        return image_filename, None, [["ERROR", "DocTR no pudo extraer ninguna página."]], []
-    page = doctr_result.pages[0]
-    words_data = []
-    for block in page.blocks:
-        for line in block.lines:
-            for word in line.words:
-                text = word.value
-                geom = np.array(word.geometry) * 1000
-                xmin, ymin = map(int, geom[0])
-                xmax, ymax = map(int, geom[1])
-                words_data.append({"text": text, "box": [xmin, ymin, xmax, ymax]})
-    words = [wd["text"] for wd in words_data]
-    boxes = [wd["box"] for wd in words_data]
-    image_width, image_height = image.size
-    # 2. Preprocesamiento para LayoutLMv3
-    encoding = loaded_processor(
-        image, words, boxes=boxes, max_length=512, truncation=True,
-        padding="max_length", return_tensors="pt"
-    )
-    input_ids = encoding["input_ids"].to(device)
-    attention_mask = encoding["attention_mask"].to(device)
-    bbox = encoding["bbox"].to(device)
-    pixel_values = encoding["pixel_values"].to(device)
-    # 3. Inferencia del Modelo LayoutLMv3
-    loaded_model.eval()
-    with torch.no_grad():
-        outputs = loaded_model(
-            input_ids=input_ids, attention_mask=attention_mask,
-            bbox=bbox, pixel_values=pixel_values
-        )
-    predictions = outputs.logits.argmax(dim=-1).squeeze().tolist()
-    # Mapeo Correcto de Predicciones a Palabras del OCR
-    word_ids = encoding.word_ids()
-    predictions_final = []
-    current_word_index = None
-    for idx, pred_id in enumerate(predictions):
-        word_idx = word_ids[idx]
-        if word_idx is not None:
-            if word_idx != current_word_index:
-                if len(predictions_final) < len(words):
-                    predictions_final.append(loaded_model.config.id2label[pred_id])
-                current_word_index = word_idx
-    # 4. Agrupación de Resultados BIO y Desduplicación
-    ner_candidates = {}
-    current_entity = []
-    current_label = None
-    current_bbox_group = []
-    def save_current_entity(entity_list, label, bbox_list):
-        if not entity_list or not label: return
-        # Calcular el Bounding Box de la entidad a partir de los BBoxes de las palabras
-        all_x = [b[0] for b in bbox_list] + [b[2] for b in bbox_list]
-        all_y = [b[1] for b in bbox_list] + [b[3] for b in bbox_list]
-        bbox_normalized = [min(all_x), min(all_y), max(all_x), max(all_y)]
-        if label not in ner_candidates:
-            ner_candidates[label] = []
-        ner_candidates[label].append({
-            'valor': " ".join(entity_list),
-            'bbox_entity': bbox_normalized
-        })
-    for word_data, pred_label in zip(words_data, predictions_final):
-        word_text = word_data["text"]
-        word_box = word_data["box"]
-        tag_parts = pred_label.split('-', 1)
-        tag_type = tag_parts[0]
-        root_label = tag_parts[1] if len(tag_parts) > 1 else None
-        if tag_type == 'B':
-            save_current_entity(current_entity, current_label, current_bbox_group)
-            current_label = root_label
-            current_entity = [word_text]
-            current_bbox_group = [word_box]
-        elif tag_type == 'I':
-            if current_label == root_label:
-                current_entity.append(word_text)
-                current_bbox_group.append(word_box)
-            else:
-                save_current_entity(current_entity, current_label, current_bbox_group)
-                current_label = root_label
-                current_entity = [word_text]
-                current_bbox_group = [word_box]
-        elif tag_type == 'O':
-            save_current_entity(current_entity, current_label, current_bbox_group)
-            current_entity = []
-            current_label = None
-            current_bbox_group = []
-    save_current_entity(current_entity, current_label, current_bbox_group)
-    # Desduplicación (Seleccionar el valor más largo)
-    final_ner_results = []
-    for label, candidates in ner_candidates.items():
-        if not candidates: continue
-        sorted_candidates = sorted(candidates, key=lambda x: len(x['valor']), reverse=True)
-        best_candidate = sorted_candidates[0]
-        final_ner_results.append({
-            'etiqueta': label,
-            'valor': best_candidate['valor'],
-            'bbox_entity': best_candidate['bbox_entity']
-        })
-    # 5. Dibujar Bounding Boxes en la Imagen
-    annotated_image = image.copy()
-    draw = ImageDraw.Draw(annotated_image)
-    try:
-        font = ImageFont.truetype("arial.ttf", 20)
-    except IOError:
-        font = ImageFont.load_default()
-    for res in final_ner_results:
-        label = res['etiqueta']
-        min_x_norm, min_y_norm, max_x_norm, max_y_norm = res['bbox_entity']
-        # Desnormalizar el bbox [0-1000] a píxeles
-        min_x = int(min_x_norm * image_width / 1000)
-        min_y = int(min_y_norm * image_height / 1000)
-        max_x = int(max_x_norm * image_width / 1000)
-        max_y = int(max_y_norm * image_height / 1000)
-        color = label2color.get(label, 'yellow')
-        draw.rectangle([min_x, min_y, max_x, max_y], outline=color, width=3)
-        draw.text((min_x, min_y - 20), label, fill=color, font=font)
-    # 6. Devolver resultados
-    table_data = [[res['etiqueta'], res['valor']] for res in final_ner_results]
-    json_data = [
-        {'etiqueta': r['etiqueta'], 'valor': r['valor'], 'bbox_entity': r['bbox_entity']}
-        for r in final_ner_results
-    ]
-    return image_filename, annotated_image, table_data, json_data
-# --- Funciones de Lote y Navegación (CORREGIDAS) ---
-def load_and_process_batch(file_list):
-    """
-    Carga un lote de archivos, los procesa y devuelve una lista de resultados,
-    y los valores crudos para la primera imagen.
-    """
-    if not file_list:
-        return [], 0, "Por favor, carga al menos un archivo.", None, [], ""
-    results = []
-    for i, file in enumerate(file_list):
-        try:
-            image = Image.open(file.name).convert("RGB")
-            filename = os.path.basename(file.name)
-            _, annotated_image, table_data, _ = process_single_invoice(image, filename)
-            results.append({
-                "filename": filename,
-                "image": annotated_image,
-                "table": table_data
-            })
-        except Exception as e:
-            results.append({
-                "filename": os.path.basename(file.name),
-                "image": None,
-                "table": [["ERROR FATAL", f"No se pudo cargar o procesar el archivo: {e}"]]
-            })
-    initial_index = 0
-    first_result = results[initial_index]
-    total_count = len(results)
-    status = f"Procesamiento completado para {total_count} facturas. Mostrando la factura 1 de {total_count}."
-    return (
-        results,             # all_results_state
-        initial_index,       # current_index_state
-        status,              # status_output
-        first_result["image"],
-        first_result["table"],
-        first_result["filename"]
-    )
-def update_ui(all_results, current_index):
-    """ Función auxiliar que devuelve los 4 elementos de la interfaz como valores crudos. """
-    if not all_results:
-        return None, [["Resultado", "Lista vacía"]], "Sin datos", "Sin nombre"
-    current_result = all_results[current_index]
-    total_count = len(all_results)
-    status = f"Factura {current_index + 1} de {total_count}."
-    return (
-        current_result["image"],
-        current_result["table"],
-        status,
-        current_result["filename"],
-    )
-def go_next(all_results, current_index):
-    """Avanza a la siguiente factura en el lote."""
-    if not all_results:
-        return 0, None, [["ERROR", "No hay facturas cargadas."]], "Sin datos", "Sin nombre"
-    new_index = (current_index + 1) % len(all_results)
-    image, table, status, filename = update_ui(all_results, new_index)
-    return new_index, image, table, status, filename
-def go_prev(all_results, current_index):
-    """Retrocede a la factura anterior en el lote."""
-    if not all_results:
-        return 0, None, [["ERROR", "No hay facturas cargadas."]], "Sin datos", "Sin nombre"
-    new_index = (current_index - 1) % len(all_results)
-    image, table, status, filename = update_ui(all_results, new_index)
-    return new_index, image, table, status, filename
-def enable_buttons(all_results):
-    """Habilita los botones de navegación si hay resultados."""
-    has_results = len(all_results) > 0
-    return gr.update(interactive=has_results), gr.update(interactive=has_results)
-# --- 3. Interfaz Gradio ---
-with gr.Blocks(title="NER de Facturas Argentinas por Lote") as demo:
-    gr.Markdown(
-        f"""
-        # 🇦🇷 Extracción de Datos de Facturas Argentinas (Procesamiento por Lote)
-        Carga hasta **10 facturas** para su procesamiento.
-        Se utiliza **LayoutLMv3** (`{HUGGINGFACE_MODEL}`) y **DocTR** forzando la **ejecución en CPU**.
-        """
-    )
-    # Elementos de estado
-    all_results_state = gr.State(value=[])
-    current_index_state = gr.State(value=0)
-    with gr.Row():
-        with gr.Column(scale=1):
-            file_input = gr.Files(
-                file_count="multiple",
-                type="filepath",
-                label="Cargar hasta 10 Facturas (Máx. 10 archivos)",
-                interactive=True
-            )
-            process_button = gr.Button("🚀 Procesar Lote de Facturas", variant="primary")
-            status_output = gr.Textbox(
-                label="Estado del Lote",
-                value="Carga tus facturas y haz clic en 'Procesar'",
-                interactive=False
-            )
-        with gr.Column(scale=2):
-            filename_output = gr.Textbox(
-                label="Nombre de Archivo",
-                value="",
-                interactive=False,
-                visible=True
-            )
-            image_output = gr.Image(type="pil", label="Factura con Entidades Resaltadas")
-            # Controles de navegación
-            with gr.Row():
-                prev_button = gr.Button("⬅️ Anterior", interactive=False)
-                next_button = gr.Button("Siguiente ➡️", interactive=False)
-            table_output = gr.Dataframe(
-                headers=["Etiqueta", "Valor"],
-                label="Resultados de NER",
-                interactive=False,
-                col_count=(2, "fixed")
-            )
-    # Lógica de procesamiento de lote
-    process_button.click(
-        fn=load_and_process_batch,
-        inputs=[file_input],
-        outputs=[
-            all_results_state,
-            current_index_state,
-            status_output,
-            image_output,
-            table_output,
-            filename_output
-        ]
-    ).then(
-        fn=enable_buttons,
-        inputs=[all_results_state],
-        outputs=[prev_button, next_button]
-    )
-    # Lógica de navegación
-    prev_button.click(
-        fn=go_prev,
-        inputs=[all_results_state, current_index_state],
-        outputs=[current_index_state, image_output, table_output, status_output, filename_output]
-    )
-    next_button.click(
-        fn=go_next,
-        inputs=[all_results_state, current_index_state],
-        outputs=[current_index_state, image_output, table_output, status_output, filename_output]
-    )
-# Lanzar la aplicación
-demo.launch()

+# app.py
+"""
+Punto de entrada principal de la aplicación
+"""
+from model_loader import ModelManager
+from invoice_processor import InvoiceProcessor
+from batch_processor import BatchProcessor, ResultNavigator
+from interface import GradioInterface
+def main():
+    """Función principal para inicializar y lanzar la aplicación."""
+    print("=" * 60)
+    print("Iniciando aplicación de extracción de datos de facturas")
+    print("=" * 60)
+    # 1. Cargar modelos
+    print("\n[1/4] Cargando modelos...")
+    model_manager = ModelManager(force_cpu=True)
+    # 2. Inicializar procesador de facturas
+    print("\n[2/4] Inicializando procesador de facturas...")
+    invoice_processor = InvoiceProcessor(model_manager)
+    # 3. Inicializar procesador de lotes
+    print("\n[3/4] Inicializando procesador de lotes...")
+    batch_processor = BatchProcessor(invoice_processor)
+    # 4. Construir interfaz
+    print("\n[4/4] Construyendo interfaz Gradio...")
+    gradio_interface = GradioInterface(batch_processor, ResultNavigator)
+    demo = gradio_interface.build_interface()
+    print("\n" + "=" * 60)
+    print("✓ Aplicación lista")
+    print("=" * 60 + "\n")
+    # Lanzar la aplicación
+    demo.launch()
+if __name__ == "__main__":
+    main()

batch_processor.py ADDED Viewed

	@@ -0,0 +1,144 @@

+# batch_processor.py
+"""
+Procesamiento por lotes y navegación de resultados
+"""
+import os
+from PIL import Image
+class BatchProcessor:
+    """Clase para manejar el procesamiento por lotes de facturas."""
+    def __init__(self, invoice_processor):
+        """
+        Inicializa el procesador de lotes.
+        Args:
+            invoice_processor: Instancia de InvoiceProcessor
+        """
+        self.invoice_processor = invoice_processor
+    def process_batch(self, file_list):
+        """
+        Procesa un lote de archivos de facturas.
+        Args:
+            file_list: Lista de archivos cargados
+        Returns:
+            tuple: (results, initial_index, status, first_image, first_table, first_filename)
+        """
+        if not file_list:
+            return [], 0, "Por favor, carga al menos un archivo.", None, [], ""
+        results = []
+        for file in file_list:
+            try:
+                image = Image.open(file.name).convert("RGB")
+                filename = os.path.basename(file.name)
+                _, annotated_image, table_data, _ = self.invoice_processor.process_invoice(
+                    image, filename
+                )
+                results.append({
+                    "filename": filename,
+                    "image": annotated_image,
+                    "table": table_data
+                })
+            except Exception as e:
+                results.append({
+                    "filename": os.path.basename(file.name),
+                    "image": None,
+                    "table": [["ERROR FATAL", f"No se pudo cargar o procesar: {e}"]]
+                })
+        # Preparar resultados iniciales
+        initial_index = 0
+        first_result = results[initial_index]
+        total_count = len(results)
+        status = f"Procesamiento completado para {total_count} facturas. Mostrando factura 1 de {total_count}."
+        return (
+            results,
+            initial_index,
+            status,
+            first_result["image"],
+            first_result["table"],
+            first_result["filename"]
+        )
+class ResultNavigator:
+    """Clase para navegar entre resultados procesados."""
+    @staticmethod
+    def get_result_at_index(all_results, index):
+        """
+        Obtiene los datos de un resultado específico.
+        Args:
+            all_results: Lista de todos los resultados
+            index: Índice del resultado a obtener
+        Returns:
+            tuple: (image, table, status, filename)
+        """
+        if not all_results:
+            return None, [["Resultado", "Lista vacía"]], "Sin datos", "Sin nombre"
+        current_result = all_results[index]
+        total_count = len(all_results)
+        status = f"Factura {index + 1} de {total_count}."
+        return (
+            current_result["image"],
+            current_result["table"],
+            status,
+            current_result["filename"]
+        )
+    @staticmethod
+    def go_next(all_results, current_index):
+        """
+        Avanza a la siguiente factura.
+        Args:
+            all_results: Lista de todos los resultados
+            current_index: Índice actual
+        Returns:
+            tuple: (new_index, image, table, status, filename)
+        """
+        if not all_results:
+            return 0, None, [["ERROR", "No hay facturas cargadas."]], "Sin datos", "Sin nombre"
+        new_index = (current_index + 1) % len(all_results)
+        image, table, status, filename = ResultNavigator.get_result_at_index(
+            all_results, new_index
+        )
+        return new_index, image, table, status, filename
+    @staticmethod
+    def go_prev(all_results, current_index):
+        """
+        Retrocede a la factura anterior.
+        Args:
+            all_results: Lista de todos los resultados
+            current_index: Índice actual
+        Returns:
+            tuple: (new_index, image, table, status, filename)
+        """
+        if not all_results:
+            return 0, None, [["ERROR", "No hay facturas cargadas."]], "Sin datos", "Sin nombre"
+        new_index = (current_index - 1) % len(all_results)
+        image, table, status, filename = ResultNavigator.get_result_at_index(
+            all_results, new_index
+        )
+        return new_index, image, table, status, filename

config.py ADDED Viewed

	@@ -0,0 +1,61 @@

+# config.py
+"""
+Configuración y constantes del proyecto
+"""
+# MODELO DE HUGGING FACE FINE-TUNEADO
+HUGGINGFACE_MODEL = "lucasgagneten/layoutlmv3-argentine-invoices"
+# Definir las etiquetas utilizadas durante el entrenamiento
+LABEL_LIST = [
+    'B-ALICUOTA',
+    'B-COMPROBANTE_NUMERO',
+    'B-CONCEPTO_GASTO',
+    'B-FECHA',
+    'B-IVA',
+    'B-JURISDICCION_GASTO',
+    'B-NETO',
+    'B-PROVEEDOR_CUIT',
+    'B-PROVEEDOR_RAZON_SOCIAL',
+    'B-TIPO',
+    'B-TOTAL',
+    'I-COMPROBANTE_NUMERO',
+    'I-CONCEPTO_GASTO',
+    'I-JURISDICCION_GASTO',
+    'I-PROVEEDOR_CUIT',
+    'I-PROVEEDOR_RAZON_SOCIAL',
+    'I-TOTAL',
+    'O'
+]
+# Mapeo de etiquetas
+ID2LABEL = {i: label for i, label in enumerate(LABEL_LIST)}
+LABEL2ID = {label: i for i, label in enumerate(LABEL_LIST)}
+# Configuración de colores para las cajas delimitadoras
+COLOR_PALETTE = [
+    'red', 'blue', 'green', 'purple', 'orange', 'brown', 'pink', 'cyan',
+    'lime', 'olive', 'teal', 'magenta', 'navy', 'maroon', 'gold', 'silver',
+    'indigo', 'turquoise'
+]
+# Crear mapeo de etiquetas a colores
+def get_label_colors():
+    """Genera el mapeo de etiquetas raíz a colores."""
+    root_labels = set()
+    for label in LABEL_LIST:
+        if label != 'O':
+            root_label = label.split('-', 1)[-1]
+            root_labels.add(root_label)
+    label2color = {}
+    for i, root_label in enumerate(sorted(list(root_labels))):
+        label2color[root_label] = COLOR_PALETTE[i % len(COLOR_PALETTE)]
+    return label2color
+LABEL2COLOR = get_label_colors()
+# Configuración de procesamiento
+MAX_LENGTH = 512
+NORMALIZATION_FACTOR = 1000  # Factor para normalizar coordenadas de bbox

interface.py ADDED Viewed

	@@ -0,0 +1,155 @@

+# interface.py
+"""
+Interfaz de usuario con Gradio
+"""
+import gradio as gr
+from config import HUGGINGFACE_MODEL
+class GradioInterface:
+    """Clase para construir y gestionar la interfaz Gradio."""
+    def __init__(self, batch_processor, result_navigator):
+        """
+        Inicializa la interfaz.
+        Args:
+            batch_processor: Instancia de BatchProcessor
+            result_navigator: Clase ResultNavigator (no instancia)
+        """
+        self.batch_processor = batch_processor
+        self.navigator = result_navigator
+    def enable_buttons(self, all_results):
+        """Habilita los botones de navegación si hay resultados."""
+        has_results = len(all_results) > 0
+        return gr.update(interactive=has_results), gr.update(interactive=has_results)
+    def build_interface(self):
+        """
+        Construye y retorna la interfaz Gradio.
+        Returns:
+            gr.Blocks: Interfaz Gradio configurada
+        """
+        with gr.Blocks(title="NER de Facturas Argentinas por Lote") as demo:
+            gr.Markdown(
+                f"""
+                # 🇦🇷 Extracción de Datos de Facturas Argentinas (Procesamiento por Lote)
+                Carga hasta **10 facturas** para su procesamiento.
+                Se utiliza **LayoutLMv3** (`{HUGGINGFACE_MODEL}`) y **DocTR** forzando la **ejecución en CPU**.
+                """
+            )
+            # Estados
+            all_results_state = gr.State(value=[])
+            current_index_state = gr.State(value=0)
+            # Sección de carga y procesamiento
+            with gr.Row():
+                # Columna izquierda: Carga de archivos
+                with gr.Column(scale=1):
+                    file_input = gr.Files(
+                        file_count="multiple",
+                        type="filepath",
+                        label="📂 Cargar hasta 10 Facturas (Máx. 10 archivos)",
+                        interactive=True
+                    )
+                # Columna derecha: Botón y estado
+                with gr.Column(scale=1):
+                    process_button = gr.Button(
+                        "🚀 Procesar Lote de Facturas",
+                        variant="primary",
+                        size="lg"
+                    )
+                    status_output = gr.Textbox(
+                        label="📊 Estado del Procesamiento",
+                        value="Carga tus facturas y haz clic en 'Procesar'",
+                        interactive=False
+                    )
+            gr.Markdown("---")
+            # Sección de resultados: Imagen a la izquierda, datos a la derecha
+            with gr.Row():
+                # Columna izquierda: Imagen
+                with gr.Column(scale=1):
+                    image_output = gr.Image(
+                        type="pil",
+                        label="🖼️ Factura con Entidades Resaltadas"
+                    )
+                # Columna derecha: Información y navegación
+                with gr.Column(scale=1):
+                    filename_output = gr.Textbox(
+                        label="📄 Nombre de Archivo",
+                        value="",
+                        interactive=False,
+                        visible=True
+                    )
+                    # Controles de navegación
+                    with gr.Row():
+                        prev_button = gr.Button(
+                            "⬅️ Anterior",
+                            interactive=False,
+                            size="lg"
+                        )
+                        next_button = gr.Button(
+                            "Siguiente ➡️",
+                            interactive=False,
+                            size="lg"
+                        )
+                    table_output = gr.Dataframe(
+                        headers=["Etiqueta", "Valor"],
+                        label="📋 Resultados de NER",
+                        interactive=False,
+                        col_count=(2, "fixed")
+                    )
+            # Eventos
+            process_button.click(
+                fn=self.batch_processor.process_batch,
+                inputs=[file_input],
+                outputs=[
+                    all_results_state,
+                    current_index_state,
+                    status_output,
+                    image_output,
+                    table_output,
+                    filename_output
+                ]
+            ).then(
+                fn=self.enable_buttons,
+                inputs=[all_results_state],
+                outputs=[prev_button, next_button]
+            )
+            prev_button.click(
+                fn=self.navigator.go_prev,
+                inputs=[all_results_state, current_index_state],
+                outputs=[
+                    current_index_state,
+                    image_output,
+                    table_output,
+                    status_output,
+                    filename_output
+                ]
+            )
+            next_button.click(
+                fn=self.navigator.go_next,
+                inputs=[all_results_state, current_index_state],
+                outputs=[
+                    current_index_state,
+                    image_output,
+                    table_output,
+                    status_output,
+                    filename_output
+                ]
+            )
+        return demo

invoice_processor.py ADDED Viewed

	@@ -0,0 +1,279 @@

+# invoice_processor.py
+"""
+Procesamiento de facturas: OCR, NER y visualización
+"""
+import numpy as np
+from PIL import Image, ImageDraw, ImageFont
+import torch
+from doctr.io import DocumentFile
+from io import BytesIO
+from config import LABEL2COLOR, MAX_LENGTH, NORMALIZATION_FACTOR
+class InvoiceProcessor:
+    """Clase para procesar facturas y extraer entidades."""
+    def __init__(self, model_manager):
+        """
+        Inicializa el procesador de facturas.
+        Args:
+            model_manager: Instancia de ModelManager con los modelos cargados
+        """
+        self.model_manager = model_manager
+        self.processor = model_manager.get_processor()
+        self.model = model_manager.get_model()
+        self.ocr_model = model_manager.get_ocr_model()
+        self.device = model_manager.get_device()
+    def extract_ocr_data(self, image: Image.Image):
+        """
+        Extrae texto y bounding boxes usando DocTR.
+        Args:
+            image: Imagen PIL de la factura
+        Returns:
+            tuple: (words_data, image_width, image_height) o (None, None, None) en caso de error
+        """
+        try:
+            rgb_image = image.convert("RGB")
+            img_byte_arr = BytesIO()
+            rgb_image.save(img_byte_arr, format='JPEG')
+            img_byte_arr.seek(0)
+            image_bytes = img_byte_arr.read()
+            doctr_doc = DocumentFile.from_images([image_bytes])
+            doctr_result = self.ocr_model(doctr_doc)
+            if not doctr_result.pages:
+                return None, None, None
+            page = doctr_result.pages[0]
+            words_data = []
+            for block in page.blocks:
+                for line in block.lines:
+                    for word in line.words:
+                        text = word.value
+                        geom = np.array(word.geometry) * NORMALIZATION_FACTOR
+                        xmin, ymin = map(int, geom[0])
+                        xmax, ymax = map(int, geom[1])
+                        words_data.append({"text": text, "box": [xmin, ymin, xmax, ymax]})
+            image_width, image_height = image.size
+            return words_data, image_width, image_height
+        except Exception as e:
+            print(f"Error en OCR: {e}")
+            return None, None, None
+    def perform_ner(self, image: Image.Image, words_data: list):
+        """
+        Realiza NER sobre las palabras extraídas.
+        Args:
+            image: Imagen PIL
+            words_data: Lista de diccionarios con 'text' y 'box'
+        Returns:
+            list: Predicciones para cada palabra
+        """
+        words = [wd["text"] for wd in words_data]
+        boxes = [wd["box"] for wd in words_data]
+        # Preprocesamiento
+        encoding = self.processor(
+            image, words, boxes=boxes, max_length=MAX_LENGTH,
+            truncation=True, padding="max_length", return_tensors="pt"
+        )
+        input_ids = encoding["input_ids"].to(self.device)
+        attention_mask = encoding["attention_mask"].to(self.device)
+        bbox = encoding["bbox"].to(self.device)
+        pixel_values = encoding["pixel_values"].to(self.device)
+        # Inferencia
+        self.model.eval()
+        with torch.no_grad():
+            outputs = self.model(
+                input_ids=input_ids,
+                attention_mask=attention_mask,
+                bbox=bbox,
+                pixel_values=pixel_values
+            )
+        predictions = outputs.logits.argmax(dim=-1).squeeze().tolist()
+        # Mapeo de predicciones a palabras
+        word_ids = encoding.word_ids()
+        predictions_final = []
+        current_word_index = None
+        for idx, pred_id in enumerate(predictions):
+            word_idx = word_ids[idx]
+            if word_idx is not None:
+                if word_idx != current_word_index:
+                    if len(predictions_final) < len(words):
+                        predictions_final.append(self.model.config.id2label[pred_id])
+                    current_word_index = word_idx
+        return predictions_final
+    def group_entities(self, words_data: list, predictions: list):
+        """
+        Agrupa entidades usando el esquema BIO y desduplicación.
+        Args:
+            words_data: Lista de palabras con sus bboxes
+            predictions: Predicciones NER para cada palabra
+        Returns:
+            list: Lista de entidades finales con etiqueta, valor y bbox
+        """
+        ner_candidates = {}
+        current_entity = []
+        current_label = None
+        current_bbox_group = []
+        def save_current_entity(entity_list, label, bbox_list):
+            if not entity_list or not label:
+                return
+            all_x = [b[0] for b in bbox_list] + [b[2] for b in bbox_list]
+            all_y = [b[1] for b in bbox_list] + [b[3] for b in bbox_list]
+            bbox_normalized = [min(all_x), min(all_y), max(all_x), max(all_y)]
+            if label not in ner_candidates:
+                ner_candidates[label] = []
+            ner_candidates[label].append({
+                'valor': " ".join(entity_list),
+                'bbox_entity': bbox_normalized
+            })
+        for word_data, pred_label in zip(words_data, predictions):
+            word_text = word_data["text"]
+            word_box = word_data["box"]
+            tag_parts = pred_label.split('-', 1)
+            tag_type = tag_parts[0]
+            root_label = tag_parts[1] if len(tag_parts) > 1 else None
+            if tag_type == 'B':
+                save_current_entity(current_entity, current_label, current_bbox_group)
+                current_label = root_label
+                current_entity = [word_text]
+                current_bbox_group = [word_box]
+            elif tag_type == 'I':
+                if current_label == root_label:
+                    current_entity.append(word_text)
+                    current_bbox_group.append(word_box)
+                else:
+                    save_current_entity(current_entity, current_label, current_bbox_group)
+                    current_label = root_label
+                    current_entity = [word_text]
+                    current_bbox_group = [word_box]
+            elif tag_type == 'O':
+                save_current_entity(current_entity, current_label, current_bbox_group)
+                current_entity = []
+                current_label = None
+                current_bbox_group = []
+        save_current_entity(current_entity, current_label, current_bbox_group)
+        # Desduplicación: seleccionar el valor más largo
+        final_ner_results = []
+        for label, candidates in ner_candidates.items():
+            if not candidates:
+                continue
+            sorted_candidates = sorted(candidates, key=lambda x: len(x['valor']), reverse=True)
+            best_candidate = sorted_candidates[0]
+            final_ner_results.append({
+                'etiqueta': label,
+                'valor': best_candidate['valor'],
+                'bbox_entity': best_candidate['bbox_entity']
+            })
+        return final_ner_results
+    def draw_annotations(self, image: Image.Image, entities: list):
+        """
+        Dibuja bounding boxes y etiquetas en la imagen.
+        Args:
+            image: Imagen PIL original
+            entities: Lista de entidades con bbox
+        Returns:
+            Image: Imagen anotada
+        """
+        annotated_image = image.copy()
+        draw = ImageDraw.Draw(annotated_image)
+        image_width, image_height = image.size
+        try:
+            font = ImageFont.truetype("arial.ttf", 20)
+        except IOError:
+            font = ImageFont.load_default()
+        for entity in entities:
+            label = entity['etiqueta']
+            min_x_norm, min_y_norm, max_x_norm, max_y_norm = entity['bbox_entity']
+            # Desnormalizar coordenadas
+            min_x = int(min_x_norm * image_width / NORMALIZATION_FACTOR)
+            min_y = int(min_y_norm * image_height / NORMALIZATION_FACTOR)
+            max_x = int(max_x_norm * image_width / NORMALIZATION_FACTOR)
+            max_y = int(max_y_norm * image_height / NORMALIZATION_FACTOR)
+            color = LABEL2COLOR.get(label, 'yellow')
+            draw.rectangle([min_x, min_y, max_x, max_y], outline=color, width=3)
+            draw.text((min_x, min_y - 20), label, fill=color, font=font)
+        return annotated_image
+    def process_invoice(self, image: Image.Image, filename: str):
+        """
+        Procesa una factura completa: OCR + NER + visualización.
+        Args:
+            image: Imagen PIL de la factura
+            filename: Nombre del archivo
+        Returns:
+            tuple: (filename, annotated_image, table_data, json_data)
+        """
+        # 1. OCR
+        words_data, image_width, image_height = self.extract_ocr_data(image)
+        if words_data is None:
+            return filename, None, [["ERROR", "No se pudo realizar OCR"]], []
+        if not words_data:
+            return filename, None, [["ERROR", "No se encontró texto en la imagen"]], []
+        # 2. NER
+        try:
+            predictions = self.perform_ner(image, words_data)
+        except Exception as e:
+            return filename, None, [["ERROR", f"Error en NER: {e}"]], []
+        # 3. Agrupar entidades
+        entities = self.group_entities(words_data, predictions)
+        # 4. Dibujar anotaciones
+        annotated_image = self.draw_annotations(image, entities)
+        # 5. Preparar resultados
+        table_data = [[e['etiqueta'], e['valor']] for e in entities]
+        json_data = [
+            {
+                'etiqueta': e['etiqueta'],
+                'valor': e['valor'],
+                'bbox_entity': e['bbox_entity']
+            }
+            for e in entities
+        ]
+        return filename, annotated_image, table_data, json_data

model_loader.py ADDED Viewed

	@@ -0,0 +1,75 @@

+# model_loader.py
+"""
+Carga y gestión de modelos (LayoutLMv3 y DocTR)
+"""
+import torch
+from transformers import AutoProcessor, LayoutLMv3ForTokenClassification
+from doctr.models import ocr_predictor
+import warnings
+from config import HUGGINGFACE_MODEL, ID2LABEL, LABEL2ID
+warnings.filterwarnings('ignore')
+class ModelManager:
+    """Clase para gestionar la carga y acceso a los modelos."""
+    def __init__(self, force_cpu=True):
+        """
+        Inicializa y carga los modelos necesarios.
+        Args:
+            force_cpu (bool): Si True, fuerza el uso de CPU para inferencia
+        """
+        self.device = torch.device("cpu" if force_cpu else "cuda" if torch.cuda.is_available() else "cpu")
+        print(f"Inferencia en dispositivo: {self.device}")
+        # Cargar LayoutLMv3
+        self.processor, self.model = self._load_layoutlmv3()
+        # Cargar DocTR
+        self.ocr_model = self._load_doctr()
+    def _load_layoutlmv3(self):
+        """Carga el modelo LayoutLMv3 y su procesador."""
+        try:
+            processor = AutoProcessor.from_pretrained(HUGGINGFACE_MODEL, apply_ocr=False)
+            model = LayoutLMv3ForTokenClassification.from_pretrained(HUGGINGFACE_MODEL).to(self.device)
+            model.config.id2label = ID2LABEL
+            model.config.label2id = LABEL2ID
+            print(f"✓ Modelo LayoutLMv3 cargado: {HUGGINGFACE_MODEL}")
+            return processor, model
+        except Exception as e:
+            print(f"✗ Error al cargar LayoutLMv3: {e}")
+            raise
+    def _load_doctr(self):
+        """Carga el modelo OCR de DocTR."""
+        try:
+            ocr_model = ocr_predictor(
+                det_arch='db_resnet50',
+                reco_arch='crnn_vgg16_bn',
+                pretrained=True
+            )
+            print("✓ Modelo DocTR cargado")
+            return ocr_model
+        except Exception as e:
+            print(f"✗ Error al cargar DocTR: {e}")
+            raise
+    def get_processor(self):
+        """Retorna el procesador de LayoutLMv3."""
+        return self.processor
+    def get_model(self):
+        """Retorna el modelo de LayoutLMv3."""
+        return self.model
+    def get_ocr_model(self):
+        """Retorna el modelo OCR de DocTR."""
+        return self.ocr_model
+    def get_device(self):
+        """Retorna el dispositivo utilizado."""
+        return self.device

requirements.txt CHANGED Viewed

@@ -1,14 +1,7 @@
-# --- Requerimientos del Frameworks y Utilidades ---
-gradio>=4.28.3           # Interfaz de usuario (Versión estable con correcciones de estado)
-pillow                   # Manipulación de imágenes (PIL)
-numpy                    # Operaciones numéricas
-# --- Requerimientos de OCR (DocTR) y NER (Transformers) ---
-python-doctr[viz,html]>=1.0.0 # Librería DocTR
-transformers>=4.30.0     # Librería principal para LayoutLMv3
-# --- Requerimientos de PyTorch (Ajustado) ---
-# Se recomienda encarecidamente instalar PyTorch CPU por separado
-# para asegurar la versión correcta, pero si usas pip, puedes usar:
-torch>=2.0.0             # PyTorch (necesario para la inferencia de modelos)
-matplotlib               # Para visualización (ya añadida)

+gradio>=4.0.0
+transformers>=4.30.0
+torch>=2.0.0
+torchvision>=0.15.0
+python-doctr>=0.6.0
+Pillow>=9.0.0
+numpy>=1.24.0