aamanlamba commited on
Commit
f12921d
·
1 Parent(s): 322f5cd

updated Readme.md

Browse files
Files changed (1) hide show
  1. README.md +104 -0
README.md CHANGED
@@ -11,4 +11,108 @@ license: mit
11
  short_description: An agent that extracts data lineage, pipeline dependencies
12
  ---
13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
11
  short_description: An agent that extracts data lineage, pipeline dependencies
12
  ---
13
 
14
+ # Lineage Graph Accelerator 🔥
15
+
16
+ A Gradio-based agent that extracts, summarizes, and visualizes data lineage from multiple metadata sources (BigQuery, dbt, Airflow, APIs, and more). Designed as a small, extendable framework of sub-agents that parse metadata, infer relationships, and render clear graph visualizations for exploration and debugging.
17
+
18
+ ## Architecture
19
+
20
+ This project is organized as a collection of lightweight sub-agents (workers): a metadata parser, a graph visualizer, and optional integration adapters (BigQuery, URL fetcher, dbt, Airflow). The UI (Gradio) orchestrates these components and displays results as Mermaid diagrams.
21
+
22
+ ### Visual Overview
23
+
24
+ ```mermaid
25
+ flowchart TD
26
+ A[User/UI (Gradio)] --> B[Main Agent / Orchestrator]
27
+ B --> C[Metadata Parser Sub-Agent]
28
+ B --> D[Graph Visualizer Sub-Agent]
29
+ B --> E[Integration Adapters]
30
+ E --> E1[BigQuery Adapter]
31
+ E --> E2[URL / API Adapter]
32
+ E --> E3[dbt / Airflow Adapter]
33
+ C --> F[Lineage Model / Relations]
34
+ F --> D
35
+ D --> G[Mermaid / DOT Renderer]
36
+ G --> H[UI Visualization]
37
+ style B fill:#f9f,stroke:#333,stroke-width:1px
38
+ style C fill:#bbf,stroke:#333,stroke-width:1px
39
+ style D fill:#bfb,stroke:#333,stroke-width:1px
40
+ style E fill:#ffd,stroke:#333,stroke-width:1px
41
+ ```
42
+
43
+ ## Features
44
+
45
+ - Multi-source metadata ingestion (Text, BigQuery, URLs/APIs)
46
+ - AI-assisted metadata parsing and relationship extraction (pluggable agent backend)
47
+ - Mermaid and DOT visualization support (Mermaid rendered in the UI)
48
+ - Lightweight, modular code designed for easy extension and testing
49
+
50
+ ## Built with
51
+
52
+ - Gradio (UI)
53
+ - Mermaid for graph visualizations (client-side)
54
+ - Langsmith's Agent Builder (used to design and orchestrate the agent/sub-agent structure)
55
+
56
+ This project was prepared as a submission for the MCP 1st Birthday celebration. See the Hugging Face MCP-1st-Birthday activity for context: https://huggingface.co/organizations/MCP-1st-Birthday/activity/all
57
+
58
+ ## Quickstart (local)
59
+
60
+ 1. Create and activate the project's virtual environment (macOS / zsh):
61
+
62
+ ```bash
63
+ python3 -m venv .venv
64
+ source .venv/bin/activate
65
+ ```
66
+
67
+ 2. Install dependencies:
68
+
69
+ ```bash
70
+ pip install -r requirements.txt
71
+ ```
72
+
73
+ 3. Run the app (will open Gradio on http://127.0.0.1:7860):
74
+
75
+ ```bash
76
+ python app.py
77
+ ```
78
+
79
+ 4. Open the UI in your browser and try the sample inputs (Text/File Metadata, BigQuery, URL/API).
80
+
81
+ ## Running tests
82
+
83
+ Unit tests are included under `tests/` to validate the mermaid wrapper and extractor stubs.
84
+
85
+ Run them with the venv python:
86
+
87
+ ```bash
88
+ source .venv/bin/activate
89
+ python -m unittest tests.test_app -v
90
+ ```
91
+
92
+ ## Notes and next steps
93
+
94
+ - The current extractors are stubs that return sample Mermaid graphs. Replace the TODOs in `app.py` to integrate with your chosen agent backend (Langsmith, OpenAI, Anthropic, etc.) or actual metadata connectors.
95
+ - Consider moving app construction into a `create_app()` factory to make imports and testing cleaner (avoid side-effects at module import time).
96
+ - To provide DOT/Graphviz rendering in-browser, consider adding viz.js or generating SVG server-side.
97
+
98
+ ## Contributing
99
+
100
+ Contributions welcome — open a PR or issue with ideas, bug reports, or integration adapters (dbt, Snowflake, Airflow connectors).
101
+
102
+ ## License
103
+
104
+ MIT
105
+ ---
106
+ title: Lineage Graph Accelerator
107
+ emoji: 🔥
108
+ colorFrom: gray
109
+ colorTo: gray
110
+ sdk: gradio
111
+ sdk_version: 5.49.1
112
+ app_file: app.py
113
+ pinned: false
114
+ license: mit
115
+ short_description: An agent that extracts data lineage, pipeline dependencies
116
+ ---
117
+
118
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference