Research_AI_Assistant / HF_TOKEN_SETUP.md
JatsTheAIGen's picture
workflow errors debugging v14
fa57725
# Hugging Face Token Setup - Working Models
## βœ… Current Configuration
### Model Selected: `facebook/blenderbot-400M-distill`
**Why this model:**
- βœ… Publicly available (no gating required)
- βœ… Works with HF Inference API
- βœ… Text generation task
- βœ… No special permissions needed
- βœ… Fast response times
- βœ… Stable and reliable
**Fallback:** `gpt2` (guaranteed to work on HF API)
## Setting Up Your HF Token
### Step 1: Get Your Token
1. Go to https://huggingface.co/settings/tokens
2. Click "New token"
3. Name it: "Research Assistant"
4. Set role: **Read** (this is sufficient for inference)
5. Generate token
6. **Copy it immediately** (won't show again)
### Step 2: Add to Hugging Face Space
**In your HF Space settings:**
1. Go to your Space: https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE
2. Click "Settings" (gear icon)
3. Under "Repository secrets" or "Space secrets"
4. Add new secret:
- **Name:** `HF_TOKEN`
- **Value:** (paste your token)
5. Save
### Step 3: Verify Token Works
The code will automatically:
- βœ… Load token from environment: `os.getenv('HF_TOKEN')`
- βœ… Use it in API calls
- βœ… Log success/failure
**Check logs for:**
```
llm_router - INFO - Calling HF API for model: facebook/blenderbot-400M-distill
llm_router - INFO - HF API returned response (length: XXX)
```
## Alternative Models (Tested & Working)
If you want to try different models:
### Option 1: GPT-2 (Very Reliable)
```python
"model_id": "gpt2"
```
- ⚑ Fast
- βœ… Always available
- ⚠️ Simple responses
### Option 2: Flan-T5 Large (Better Quality)
```python
"model_id": "google/flan-t5-large"
```
- πŸ“ˆ Better quality
- ⚑ Fast
- βœ… Public access
### Option 3: Blenderbot (Conversational)
```python
"model_id": "facebook/blenderbot-400M-distill"
```
- πŸ’¬ Good for conversation
- βœ… Current selection
- ⚑ Fast
### Option 4: DistilGPT-2 (Faster)
```python
"model_id": "distilgpt2"
```
- ⚑ Very fast
- βœ… Guaranteed available
- ⚠️ Smaller, less capable
## How the System Works Now
### API Call Flow:
1. **User question** β†’ Synthesis Agent
2. **Synthesis Agent** β†’ Tries LLM call
3. **LLM Router** β†’ Calls HF Inference API with token
4. **HF API** β†’ Returns generated text
5. **System** β†’ Uses real LLM response βœ…
### No More Fallbacks
- ❌ No knowledge base fallback
- ❌ No template responses
- βœ… Always uses real LLM when available
- βœ… GPT-2 fallback if model loading (503 error)
## Verification
### Test Your Setup:
Ask: "What is 2+2?"
**Expected:** Real LLM generated response (not template)
**Check logs for:**
```
llm_router - INFO - Calling HF API for model: facebook/blenderbot-400M-distill
llm_router - INFO - HF API returned response (length: XX)
src.agents.synthesis_agent - INFO - RESP_SYNTH_001 received LLM response
```
### If You See 401 Error:
```
HF API error: 401 - Unauthorized
```
**Fix:** Token not set correctly in HF Space settings
### If You See 404 Error:
```
HF API error: 404 - Not Found
```
**Fix:** Model ID not valid (very unlikely with current models)
### If You See 503 Error:
```
Model loading (503), trying fallback
```
**Fix:** First-time model load, automatically retries with GPT-2
## Current Models in Config
**File:** `models_config.py`
```python
"reasoning_primary": {
"model_id": "facebook/blenderbot-400M-distill",
"max_tokens": 500,
"temperature": 0.7
}
```
## Performance Notes
**Latency:**
- Blenderbot: ~2-4 seconds
- GPT-2: ~1-2 seconds
- Flan-T5: ~3-5 seconds
**Quality:**
- Blenderbot: Good for conversational responses
- GPT-2: Basic but coherent
- Flan-T5: More factual, less conversational
## Troubleshooting
### Token Not Working?
1. Verify in HF Dashboard β†’ Settings β†’ Access Tokens
2. Check it has "Read" permissions
3. Regenerate if needed
4. Update in Space settings
### Model Not Loading?
- First request may take 10-30 seconds (cold start)
- Subsequent requests are faster
- 503 errors auto-retry with fallback
### Still Seeing Placeholders?
1. Restart your Space
2. Check logs for HF API calls
3. Verify token is in environment
## Next Steps
1. βœ… Add token to HF Space settings
2. βœ… Restart Space
3. βœ… Test with a question
4. βœ… Check logs for "HF API returned response"
5. βœ… Enjoy real LLM responses!
## Summary
**Model:** `facebook/blenderbot-400M-distill`
**Fallback:** `gpt2`
**Status:** βœ… Configured and ready
**Requirement:** Valid HF token in Space settings
**No fallbacks:** System always tries real LLM first