| # Local Setup Guide - Lineage Graph Extractor | |
| This guide provides detailed instructions for setting up and running the Lineage Graph Extractor agent locally. | |
| ## Table of Contents | |
| 1. [System Requirements](#system-requirements) | |
| 2. [Installation Methods](#installation-methods) | |
| 3. [Configuration](#configuration) | |
| 4. [Usage Scenarios](#usage-scenarios) | |
| 5. [Advanced Configuration](#advanced-configuration) | |
| 6. [Troubleshooting](#troubleshooting) | |
| ## System Requirements | |
| ### Minimum Requirements | |
| - **OS**: Windows 10+, macOS 10.15+, or Linux | |
| - **Python**: 3.9 or higher | |
| - **Memory**: 2GB RAM minimum | |
| - **Disk Space**: 100MB for agent files | |
| ### Recommended Requirements | |
| - **Python**: 3.10+ | |
| - **Memory**: 4GB RAM | |
| - **Internet**: Stable connection for API calls | |
| ## Installation Methods | |
| ### Method 1: Standalone Use (Recommended) | |
| This method uses the agent configuration files with any platform that supports the Anthropic API. | |
| 1. **Download the agent** | |
| ```bash | |
| # If you have a git repository | |
| git clone <repository-url> | |
| cd local_clone | |
| # Or extract from downloaded archive | |
| unzip lineage-graph-extractor.zip | |
| cd lineage-graph-extractor | |
| ``` | |
| 2. **Set up environment** | |
| ```bash | |
| # Copy environment template | |
| cp .env.example .env | |
| ``` | |
| 3. **Edit .env file** | |
| ```bash | |
| # Edit with your preferred editor | |
| nano .env | |
| # or | |
| vim .env | |
| # or | |
| code .env # VS Code | |
| ``` | |
| Add your credentials: | |
| ``` | |
| ANTHROPIC_API_KEY=sk-ant-your-key-here | |
| GOOGLE_CLOUD_PROJECT=your-gcp-project | |
| GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json | |
| ``` | |
| 4. **Install Python dependencies** (optional, for examples) | |
| ```bash | |
| pip install anthropic google-cloud-bigquery requests pyyaml | |
| ``` | |
| ### Method 2: Claude Desktop Integration | |
| If you're using Claude Desktop or similar platforms: | |
| 1. **Locate your agent configuration directory** | |
| - Claude Desktop: `~/.config/claude/agents/` (Linux/Mac) or `%APPDATA%\claude\agents\` (Windows) | |
| - Other platforms: Check platform documentation | |
| 2. **Copy the memories folder** | |
| ```bash | |
| # Linux/Mac | |
| cp -r memories ~/.config/claude/agents/lineage-extractor/ | |
| # Windows | |
| xcopy /E /I memories %APPDATA%\claude\agents\lineage-extractor\ | |
| ``` | |
| 3. **Configure API credentials** in your platform's settings | |
| 4. **Restart the application** | |
| ### Method 3: Python Integration | |
| To integrate into your own Python application: | |
| 1. **Install dependencies** | |
| ```bash | |
| pip install anthropic python-dotenv | |
| ``` | |
| 2. **Use the integration example** | |
| ```python | |
| from anthropic import Anthropic | |
| from dotenv import load_dotenv | |
| import os | |
| # Load environment variables | |
| load_dotenv() | |
| # Initialize client | |
| client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY")) | |
| # Load agent configuration | |
| with open("memories/agent.md", "r") as f: | |
| system_prompt = f.read() | |
| # Use the agent | |
| response = client.messages.create( | |
| model="claude-3-5-sonnet-20241022", | |
| max_tokens=4000, | |
| system=system_prompt, | |
| messages=[{ | |
| "role": "user", | |
| "content": "Extract lineage from this metadata: ..." | |
| }] | |
| ) | |
| print(response.content[0].text) | |
| ``` | |
| ## Configuration | |
| ### API Keys Setup | |
| #### Anthropic API Key | |
| 1. Go to https://console.anthropic.com/ | |
| 2. Create an account or sign in | |
| 3. Navigate to API Keys | |
| 4. Create a new key | |
| 5. Copy to `.env` file | |
| #### Google Cloud (for BigQuery) | |
| 1. Go to https://console.cloud.google.com/ | |
| 2. Create a project or select existing | |
| 3. Enable BigQuery API | |
| 4. Create a service account: | |
| - Go to IAM & Admin β Service Accounts | |
| - Create service account | |
| - Grant "BigQuery Data Viewer" role | |
| - Create JSON key | |
| 5. Download JSON and reference in `.env` | |
| #### Tavily (for web search) | |
| 1. Go to https://tavily.com/ | |
| 2. Sign up for an account | |
| 3. Get your API key | |
| 4. Add to `.env` file | |
| ### Tool Configuration | |
| Edit `memories/tools.json` to customize available tools: | |
| ```json | |
| { | |
| "tools": [ | |
| "bigquery_execute_query", // Query BigQuery | |
| "read_url_content", // Fetch from URLs | |
| "google_sheets_read_range", // Read Google Sheets | |
| "tavily_web_search" // Web search | |
| ], | |
| "interrupt_config": { | |
| "bigquery_execute_query": false, | |
| "read_url_content": false, | |
| "google_sheets_read_range": false, | |
| "tavily_web_search": false | |
| } | |
| } | |
| ``` | |
| **Available Tools:** | |
| - `bigquery_execute_query`: Execute SQL queries on BigQuery | |
| - `read_url_content`: Fetch content from URLs/APIs | |
| - `google_sheets_read_range`: Read data from Google Sheets | |
| - `tavily_web_search`: Perform web searches | |
| ### Subagent Configuration | |
| Customize subagents by editing their configuration files: | |
| **Metadata Parser** (`memories/subagents/metadata_parser/`) | |
| - `agent.md`: Instructions for parsing metadata | |
| - `tools.json`: Tools available to parser | |
| **Graph Visualizer** (`memories/subagents/graph_visualizer/`) | |
| - `agent.md`: Instructions for creating visualizations | |
| - `tools.json`: Tools available to visualizer | |
| ## Usage Scenarios | |
| ### Scenario 1: BigQuery Lineage Extraction | |
| ```python | |
| from anthropic import Anthropic | |
| import os | |
| client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY")) | |
| with open("memories/agent.md", "r") as f: | |
| system_prompt = f.read() | |
| response = client.messages.create( | |
| model="claude-3-5-sonnet-20241022", | |
| max_tokens=4000, | |
| system=system_prompt, | |
| messages=[{ | |
| "role": "user", | |
| "content": "Extract lineage from BigQuery project: my-project, dataset: analytics" | |
| }] | |
| ) | |
| print(response.content[0].text) | |
| ``` | |
| ### Scenario 2: File-Based Metadata | |
| ```python | |
| # Read metadata from file | |
| with open("dbt_manifest.json", "r") as f: | |
| metadata = f.read() | |
| response = client.messages.create( | |
| model="claude-3-5-sonnet-20241022", | |
| max_tokens=4000, | |
| system=system_prompt, | |
| messages=[{ | |
| "role": "user", | |
| "content": f"Extract lineage from this dbt manifest:\n\n{metadata}" | |
| }] | |
| ) | |
| ``` | |
| ### Scenario 3: API Metadata | |
| ```python | |
| response = client.messages.create( | |
| model="claude-3-5-sonnet-20241022", | |
| max_tokens=4000, | |
| system=system_prompt, | |
| messages=[{ | |
| "role": "user", | |
| "content": "Extract lineage from API: https://api.example.com/metadata" | |
| }] | |
| ) | |
| ``` | |
| ## Advanced Configuration | |
| ### Custom Visualization Formats | |
| To add custom visualization formats, edit `memories/subagents/graph_visualizer/agent.md`: | |
| ```markdown | |
| ### 4. Custom Format | |
| Generate a custom format with: | |
| - Your specific requirements | |
| - Custom styling rules | |
| - Special formatting needs | |
| ``` | |
| ### Adding New Metadata Sources | |
| To support new metadata sources: | |
| 1. Add tool to `memories/tools.json` | |
| 2. Update `memories/agent.md` with source-specific instructions | |
| 3. Update `memories/subagents/metadata_parser/agent.md` if needed | |
| ### MCP Integration | |
| To integrate with Model Context Protocol servers: | |
| 1. Check if MCP tools are available: `/tools` directory | |
| 2. Add MCP tools to `memories/tools.json` | |
| 3. Configure MCP server connection | |
| 4. See `memories/mcp_integration.md` (if available) | |
| ## Troubleshooting | |
| ### Common Issues | |
| #### 1. Authentication Errors | |
| **Problem**: API authentication fails | |
| **Solutions**: | |
| - Verify API key is correct in `.env` | |
| - Check key hasn't expired | |
| - Ensure environment variables are loaded | |
| - Try regenerating the API key | |
| ```bash | |
| # Test Anthropic API key | |
| python -c "from anthropic import Anthropic; import os; from dotenv import load_dotenv; load_dotenv(); client = Anthropic(api_key=os.getenv('ANTHROPIC_API_KEY')); print('β API key works')" | |
| ``` | |
| #### 2. BigQuery Access Issues | |
| **Problem**: Cannot access BigQuery | |
| **Solutions**: | |
| - Verify service account has BigQuery permissions | |
| - Check project ID is correct | |
| - Ensure JSON key file path is correct | |
| - Test credentials: | |
| ```bash | |
| # Test BigQuery access | |
| gcloud auth activate-service-account --key-file=/path/to/key.json | |
| bq ls --project_id=your-project-id | |
| ``` | |
| #### 3. Import Errors | |
| **Problem**: `ModuleNotFoundError` | |
| **Solutions**: | |
| ```bash | |
| # Install missing packages | |
| pip install anthropic google-cloud-bigquery requests pyyaml python-dotenv | |
| # Or install all at once | |
| pip install -r requirements.txt # if you create one | |
| ``` | |
| #### 4. Environment Variables Not Loading | |
| **Problem**: `.env` file not being read | |
| **Solutions**: | |
| ```python | |
| # Explicitly load .env | |
| from dotenv import load_dotenv | |
| load_dotenv() | |
| # Or specify path | |
| load_dotenv(".env") | |
| # Verify loading | |
| import os | |
| print(os.getenv("ANTHROPIC_API_KEY")) # Should not be None | |
| ``` | |
| #### 5. File Path Issues | |
| **Problem**: Cannot find `memories/agent.md` | |
| **Solutions**: | |
| ```python | |
| # Use absolute path | |
| import os | |
| base_dir = os.path.dirname(os.path.abspath(__file__)) | |
| agent_path = os.path.join(base_dir, "memories", "agent.md") | |
| # Or change working directory | |
| os.chdir("/path/to/local_clone") | |
| ``` | |
| ### Performance Issues | |
| #### Slow Response Times | |
| **Causes**: | |
| - Large metadata files | |
| - Complex lineage graphs | |
| - Network latency | |
| **Solutions**: | |
| - Break large metadata into chunks | |
| - Use filtering to focus on specific entities | |
| - Increase API timeout settings | |
| - Cache frequently used results | |
| ### Debugging Tips | |
| 1. **Enable verbose logging** | |
| ```python | |
| import logging | |
| logging.basicConfig(level=logging.DEBUG) | |
| ``` | |
| 2. **Test each component separately** | |
| - Test API connection first | |
| - Test metadata retrieval | |
| - Test parsing separately | |
| - Test visualization separately | |
| 3. **Validate metadata format** | |
| - Ensure JSON is valid | |
| - Check for required fields | |
| - Verify structure matches expected format | |
| 4. **Check agent configuration** | |
| - Verify `memories/agent.md` is readable | |
| - Check `tools.json` syntax | |
| - Ensure subagent files exist | |
| ## Getting Help | |
| ### Documentation | |
| - Agent instructions: `memories/agent.md` | |
| - Subagent docs: `memories/subagents/*/agent.md` | |
| - Anthropic API: https://docs.anthropic.com/ | |
| ### Testing Your Setup | |
| Run this complete test: | |
| ```python | |
| from anthropic import Anthropic | |
| from dotenv import load_dotenv | |
| import os | |
| # Load environment | |
| load_dotenv() | |
| # Test 1: API Connection | |
| try: | |
| client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY")) | |
| print("β Anthropic API connection successful") | |
| except Exception as e: | |
| print(f"β API connection failed: {e}") | |
| exit(1) | |
| # Test 2: Load Agent Config | |
| try: | |
| with open("memories/agent.md", "r") as f: | |
| system_prompt = f.read() | |
| print("β Agent configuration loaded") | |
| except Exception as e: | |
| print(f"β Failed to load agent config: {e}") | |
| exit(1) | |
| # Test 3: Simple Query | |
| try: | |
| response = client.messages.create( | |
| model="claude-3-5-sonnet-20241022", | |
| max_tokens=1000, | |
| system=system_prompt, | |
| messages=[{ | |
| "role": "user", | |
| "content": "Hello, what can you help me with?" | |
| }] | |
| ) | |
| print("β Agent response successful") | |
| print(f"\nAgent says: {response.content[0].text}") | |
| except Exception as e: | |
| print(f"β Agent query failed: {e}") | |
| exit(1) | |
| print("\nβ All tests passed! Your setup is ready.") | |
| ``` | |
| Save as `test_setup.py` and run: | |
| ```bash | |
| python test_setup.py | |
| ``` | |
| ## Next Steps | |
| 1. β Complete setup | |
| 2. β Test with sample metadata | |
| 3. π Extract your first lineage | |
| 4. π¨ Customize visualization preferences | |
| 5. π§ Integrate with your workflow | |
| --- | |
| **Setup complete?** Try the usage examples in README.md or run your own lineage extraction! | |