# Local Setup Guide - Lineage Graph Extractor This guide provides detailed instructions for setting up and running the Lineage Graph Extractor agent locally. ## Table of Contents 1. [System Requirements](#system-requirements) 2. [Installation Methods](#installation-methods) 3. [Configuration](#configuration) 4. [Usage Scenarios](#usage-scenarios) 5. [Advanced Configuration](#advanced-configuration) 6. [Troubleshooting](#troubleshooting) ## System Requirements ### Minimum Requirements - **OS**: Windows 10+, macOS 10.15+, or Linux - **Python**: 3.9 or higher - **Memory**: 2GB RAM minimum - **Disk Space**: 100MB for agent files ### Recommended Requirements - **Python**: 3.10+ - **Memory**: 4GB RAM - **Internet**: Stable connection for API calls ## Installation Methods ### Method 1: Standalone Use (Recommended) This method uses the agent configuration files with any platform that supports the Anthropic API. 1. **Download the agent** ```bash # If you have a git repository git clone cd local_clone # Or extract from downloaded archive unzip lineage-graph-extractor.zip cd lineage-graph-extractor ``` 2. **Set up environment** ```bash # Copy environment template cp .env.example .env ``` 3. **Edit .env file** ```bash # Edit with your preferred editor nano .env # or vim .env # or code .env # VS Code ``` Add your credentials: ``` ANTHROPIC_API_KEY=sk-ant-your-key-here GOOGLE_CLOUD_PROJECT=your-gcp-project GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json ``` 4. **Install Python dependencies** (optional, for examples) ```bash pip install anthropic google-cloud-bigquery requests pyyaml ``` ### Method 2: Claude Desktop Integration If you're using Claude Desktop or similar platforms: 1. **Locate your agent configuration directory** - Claude Desktop: `~/.config/claude/agents/` (Linux/Mac) or `%APPDATA%\claude\agents\` (Windows) - Other platforms: Check platform documentation 2. **Copy the memories folder** ```bash # Linux/Mac cp -r memories ~/.config/claude/agents/lineage-extractor/ # Windows xcopy /E /I memories %APPDATA%\claude\agents\lineage-extractor\ ``` 3. **Configure API credentials** in your platform's settings 4. **Restart the application** ### Method 3: Python Integration To integrate into your own Python application: 1. **Install dependencies** ```bash pip install anthropic python-dotenv ``` 2. **Use the integration example** ```python from anthropic import Anthropic from dotenv import load_dotenv import os # Load environment variables load_dotenv() # Initialize client client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY")) # Load agent configuration with open("memories/agent.md", "r") as f: system_prompt = f.read() # Use the agent response = client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=4000, system=system_prompt, messages=[{ "role": "user", "content": "Extract lineage from this metadata: ..." }] ) print(response.content[0].text) ``` ## Configuration ### API Keys Setup #### Anthropic API Key 1. Go to https://console.anthropic.com/ 2. Create an account or sign in 3. Navigate to API Keys 4. Create a new key 5. Copy to `.env` file #### Google Cloud (for BigQuery) 1. Go to https://console.cloud.google.com/ 2. Create a project or select existing 3. Enable BigQuery API 4. Create a service account: - Go to IAM & Admin → Service Accounts - Create service account - Grant "BigQuery Data Viewer" role - Create JSON key 5. Download JSON and reference in `.env` #### Tavily (for web search) 1. Go to https://tavily.com/ 2. Sign up for an account 3. Get your API key 4. Add to `.env` file ### Tool Configuration Edit `memories/tools.json` to customize available tools: ```json { "tools": [ "bigquery_execute_query", // Query BigQuery "read_url_content", // Fetch from URLs "google_sheets_read_range", // Read Google Sheets "tavily_web_search" // Web search ], "interrupt_config": { "bigquery_execute_query": false, "read_url_content": false, "google_sheets_read_range": false, "tavily_web_search": false } } ``` **Available Tools:** - `bigquery_execute_query`: Execute SQL queries on BigQuery - `read_url_content`: Fetch content from URLs/APIs - `google_sheets_read_range`: Read data from Google Sheets - `tavily_web_search`: Perform web searches ### Subagent Configuration Customize subagents by editing their configuration files: **Metadata Parser** (`memories/subagents/metadata_parser/`) - `agent.md`: Instructions for parsing metadata - `tools.json`: Tools available to parser **Graph Visualizer** (`memories/subagents/graph_visualizer/`) - `agent.md`: Instructions for creating visualizations - `tools.json`: Tools available to visualizer ## Usage Scenarios ### Scenario 1: BigQuery Lineage Extraction ```python from anthropic import Anthropic import os client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY")) with open("memories/agent.md", "r") as f: system_prompt = f.read() response = client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=4000, system=system_prompt, messages=[{ "role": "user", "content": "Extract lineage from BigQuery project: my-project, dataset: analytics" }] ) print(response.content[0].text) ``` ### Scenario 2: File-Based Metadata ```python # Read metadata from file with open("dbt_manifest.json", "r") as f: metadata = f.read() response = client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=4000, system=system_prompt, messages=[{ "role": "user", "content": f"Extract lineage from this dbt manifest:\n\n{metadata}" }] ) ``` ### Scenario 3: API Metadata ```python response = client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=4000, system=system_prompt, messages=[{ "role": "user", "content": "Extract lineage from API: https://api.example.com/metadata" }] ) ``` ## Advanced Configuration ### Custom Visualization Formats To add custom visualization formats, edit `memories/subagents/graph_visualizer/agent.md`: ```markdown ### 4. Custom Format Generate a custom format with: - Your specific requirements - Custom styling rules - Special formatting needs ``` ### Adding New Metadata Sources To support new metadata sources: 1. Add tool to `memories/tools.json` 2. Update `memories/agent.md` with source-specific instructions 3. Update `memories/subagents/metadata_parser/agent.md` if needed ### MCP Integration To integrate with Model Context Protocol servers: 1. Check if MCP tools are available: `/tools` directory 2. Add MCP tools to `memories/tools.json` 3. Configure MCP server connection 4. See `memories/mcp_integration.md` (if available) ## Troubleshooting ### Common Issues #### 1. Authentication Errors **Problem**: API authentication fails **Solutions**: - Verify API key is correct in `.env` - Check key hasn't expired - Ensure environment variables are loaded - Try regenerating the API key ```bash # Test Anthropic API key python -c "from anthropic import Anthropic; import os; from dotenv import load_dotenv; load_dotenv(); client = Anthropic(api_key=os.getenv('ANTHROPIC_API_KEY')); print('✓ API key works')" ``` #### 2. BigQuery Access Issues **Problem**: Cannot access BigQuery **Solutions**: - Verify service account has BigQuery permissions - Check project ID is correct - Ensure JSON key file path is correct - Test credentials: ```bash # Test BigQuery access gcloud auth activate-service-account --key-file=/path/to/key.json bq ls --project_id=your-project-id ``` #### 3. Import Errors **Problem**: `ModuleNotFoundError` **Solutions**: ```bash # Install missing packages pip install anthropic google-cloud-bigquery requests pyyaml python-dotenv # Or install all at once pip install -r requirements.txt # if you create one ``` #### 4. Environment Variables Not Loading **Problem**: `.env` file not being read **Solutions**: ```python # Explicitly load .env from dotenv import load_dotenv load_dotenv() # Or specify path load_dotenv(".env") # Verify loading import os print(os.getenv("ANTHROPIC_API_KEY")) # Should not be None ``` #### 5. File Path Issues **Problem**: Cannot find `memories/agent.md` **Solutions**: ```python # Use absolute path import os base_dir = os.path.dirname(os.path.abspath(__file__)) agent_path = os.path.join(base_dir, "memories", "agent.md") # Or change working directory os.chdir("/path/to/local_clone") ``` ### Performance Issues #### Slow Response Times **Causes**: - Large metadata files - Complex lineage graphs - Network latency **Solutions**: - Break large metadata into chunks - Use filtering to focus on specific entities - Increase API timeout settings - Cache frequently used results ### Debugging Tips 1. **Enable verbose logging** ```python import logging logging.basicConfig(level=logging.DEBUG) ``` 2. **Test each component separately** - Test API connection first - Test metadata retrieval - Test parsing separately - Test visualization separately 3. **Validate metadata format** - Ensure JSON is valid - Check for required fields - Verify structure matches expected format 4. **Check agent configuration** - Verify `memories/agent.md` is readable - Check `tools.json` syntax - Ensure subagent files exist ## Getting Help ### Documentation - Agent instructions: `memories/agent.md` - Subagent docs: `memories/subagents/*/agent.md` - Anthropic API: https://docs.anthropic.com/ ### Testing Your Setup Run this complete test: ```python from anthropic import Anthropic from dotenv import load_dotenv import os # Load environment load_dotenv() # Test 1: API Connection try: client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY")) print("✓ Anthropic API connection successful") except Exception as e: print(f"✗ API connection failed: {e}") exit(1) # Test 2: Load Agent Config try: with open("memories/agent.md", "r") as f: system_prompt = f.read() print("✓ Agent configuration loaded") except Exception as e: print(f"✗ Failed to load agent config: {e}") exit(1) # Test 3: Simple Query try: response = client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=1000, system=system_prompt, messages=[{ "role": "user", "content": "Hello, what can you help me with?" }] ) print("✓ Agent response successful") print(f"\nAgent says: {response.content[0].text}") except Exception as e: print(f"✗ Agent query failed: {e}") exit(1) print("\n✓ All tests passed! Your setup is ready.") ``` Save as `test_setup.py` and run: ```bash python test_setup.py ``` ## Next Steps 1. ✅ Complete setup 2. ✅ Test with sample metadata 3. 📊 Extract your first lineage 4. 🎨 Customize visualization preferences 5. 🔧 Integrate with your workflow --- **Setup complete?** Try the usage examples in README.md or run your own lineage extraction!