| # Hugging Face Token Setup - Working Models | |
| ## β Current Configuration | |
| ### Model Selected: `facebook/blenderbot-400M-distill` | |
| **Why this model:** | |
| - β Publicly available (no gating required) | |
| - β Works with HF Inference API | |
| - β Text generation task | |
| - β No special permissions needed | |
| - β Fast response times | |
| - β Stable and reliable | |
| **Fallback:** `gpt2` (guaranteed to work on HF API) | |
| ## Setting Up Your HF Token | |
| ### Step 1: Get Your Token | |
| 1. Go to https://huggingface.co/settings/tokens | |
| 2. Click "New token" | |
| 3. Name it: "Research Assistant" | |
| 4. Set role: **Read** (this is sufficient for inference) | |
| 5. Generate token | |
| 6. **Copy it immediately** (won't show again) | |
| ### Step 2: Add to Hugging Face Space | |
| **In your HF Space settings:** | |
| 1. Go to your Space: https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE | |
| 2. Click "Settings" (gear icon) | |
| 3. Under "Repository secrets" or "Space secrets" | |
| 4. Add new secret: | |
| - **Name:** `HF_TOKEN` | |
| - **Value:** (paste your token) | |
| 5. Save | |
| ### Step 3: Verify Token Works | |
| The code will automatically: | |
| - β Load token from environment: `os.getenv('HF_TOKEN')` | |
| - β Use it in API calls | |
| - β Log success/failure | |
| **Check logs for:** | |
| ``` | |
| llm_router - INFO - Calling HF API for model: facebook/blenderbot-400M-distill | |
| llm_router - INFO - HF API returned response (length: XXX) | |
| ``` | |
| ## Alternative Models (Tested & Working) | |
| If you want to try different models: | |
| ### Option 1: GPT-2 (Very Reliable) | |
| ```python | |
| "model_id": "gpt2" | |
| ``` | |
| - β‘ Fast | |
| - β Always available | |
| - β οΈ Simple responses | |
| ### Option 2: Flan-T5 Large (Better Quality) | |
| ```python | |
| "model_id": "google/flan-t5-large" | |
| ``` | |
| - π Better quality | |
| - β‘ Fast | |
| - β Public access | |
| ### Option 3: Blenderbot (Conversational) | |
| ```python | |
| "model_id": "facebook/blenderbot-400M-distill" | |
| ``` | |
| - π¬ Good for conversation | |
| - β Current selection | |
| - β‘ Fast | |
| ### Option 4: DistilGPT-2 (Faster) | |
| ```python | |
| "model_id": "distilgpt2" | |
| ``` | |
| - β‘ Very fast | |
| - β Guaranteed available | |
| - β οΈ Smaller, less capable | |
| ## How the System Works Now | |
| ### API Call Flow: | |
| 1. **User question** β Synthesis Agent | |
| 2. **Synthesis Agent** β Tries LLM call | |
| 3. **LLM Router** β Calls HF Inference API with token | |
| 4. **HF API** β Returns generated text | |
| 5. **System** β Uses real LLM response β | |
| ### No More Fallbacks | |
| - β No knowledge base fallback | |
| - β No template responses | |
| - β Always uses real LLM when available | |
| - β GPT-2 fallback if model loading (503 error) | |
| ## Verification | |
| ### Test Your Setup: | |
| Ask: "What is 2+2?" | |
| **Expected:** Real LLM generated response (not template) | |
| **Check logs for:** | |
| ``` | |
| llm_router - INFO - Calling HF API for model: facebook/blenderbot-400M-distill | |
| llm_router - INFO - HF API returned response (length: XX) | |
| src.agents.synthesis_agent - INFO - RESP_SYNTH_001 received LLM response | |
| ``` | |
| ### If You See 401 Error: | |
| ``` | |
| HF API error: 401 - Unauthorized | |
| ``` | |
| **Fix:** Token not set correctly in HF Space settings | |
| ### If You See 404 Error: | |
| ``` | |
| HF API error: 404 - Not Found | |
| ``` | |
| **Fix:** Model ID not valid (very unlikely with current models) | |
| ### If You See 503 Error: | |
| ``` | |
| Model loading (503), trying fallback | |
| ``` | |
| **Fix:** First-time model load, automatically retries with GPT-2 | |
| ## Current Models in Config | |
| **File:** `models_config.py` | |
| ```python | |
| "reasoning_primary": { | |
| "model_id": "facebook/blenderbot-400M-distill", | |
| "max_tokens": 500, | |
| "temperature": 0.7 | |
| } | |
| ``` | |
| ## Performance Notes | |
| **Latency:** | |
| - Blenderbot: ~2-4 seconds | |
| - GPT-2: ~1-2 seconds | |
| - Flan-T5: ~3-5 seconds | |
| **Quality:** | |
| - Blenderbot: Good for conversational responses | |
| - GPT-2: Basic but coherent | |
| - Flan-T5: More factual, less conversational | |
| ## Troubleshooting | |
| ### Token Not Working? | |
| 1. Verify in HF Dashboard β Settings β Access Tokens | |
| 2. Check it has "Read" permissions | |
| 3. Regenerate if needed | |
| 4. Update in Space settings | |
| ### Model Not Loading? | |
| - First request may take 10-30 seconds (cold start) | |
| - Subsequent requests are faster | |
| - 503 errors auto-retry with fallback | |
| ### Still Seeing Placeholders? | |
| 1. Restart your Space | |
| 2. Check logs for HF API calls | |
| 3. Verify token is in environment | |
| ## Next Steps | |
| 1. β Add token to HF Space settings | |
| 2. β Restart Space | |
| 3. β Test with a question | |
| 4. β Check logs for "HF API returned response" | |
| 5. β Enjoy real LLM responses! | |
| ## Summary | |
| **Model:** `facebook/blenderbot-400M-distill` | |
| **Fallback:** `gpt2` | |
| **Status:** β Configured and ready | |
| **Requirement:** Valid HF token in Space settings | |
| **No fallbacks:** System always tries real LLM first | |