File size: 4,511 Bytes
fa57725 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 |
# Hugging Face Token Setup - Working Models
## β
Current Configuration
### Model Selected: `facebook/blenderbot-400M-distill`
**Why this model:**
- β
Publicly available (no gating required)
- β
Works with HF Inference API
- β
Text generation task
- β
No special permissions needed
- β
Fast response times
- β
Stable and reliable
**Fallback:** `gpt2` (guaranteed to work on HF API)
## Setting Up Your HF Token
### Step 1: Get Your Token
1. Go to https://huggingface.co/settings/tokens
2. Click "New token"
3. Name it: "Research Assistant"
4. Set role: **Read** (this is sufficient for inference)
5. Generate token
6. **Copy it immediately** (won't show again)
### Step 2: Add to Hugging Face Space
**In your HF Space settings:**
1. Go to your Space: https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE
2. Click "Settings" (gear icon)
3. Under "Repository secrets" or "Space secrets"
4. Add new secret:
- **Name:** `HF_TOKEN`
- **Value:** (paste your token)
5. Save
### Step 3: Verify Token Works
The code will automatically:
- β
Load token from environment: `os.getenv('HF_TOKEN')`
- β
Use it in API calls
- β
Log success/failure
**Check logs for:**
```
llm_router - INFO - Calling HF API for model: facebook/blenderbot-400M-distill
llm_router - INFO - HF API returned response (length: XXX)
```
## Alternative Models (Tested & Working)
If you want to try different models:
### Option 1: GPT-2 (Very Reliable)
```python
"model_id": "gpt2"
```
- β‘ Fast
- β
Always available
- β οΈ Simple responses
### Option 2: Flan-T5 Large (Better Quality)
```python
"model_id": "google/flan-t5-large"
```
- π Better quality
- β‘ Fast
- β
Public access
### Option 3: Blenderbot (Conversational)
```python
"model_id": "facebook/blenderbot-400M-distill"
```
- π¬ Good for conversation
- β
Current selection
- β‘ Fast
### Option 4: DistilGPT-2 (Faster)
```python
"model_id": "distilgpt2"
```
- β‘ Very fast
- β
Guaranteed available
- β οΈ Smaller, less capable
## How the System Works Now
### API Call Flow:
1. **User question** β Synthesis Agent
2. **Synthesis Agent** β Tries LLM call
3. **LLM Router** β Calls HF Inference API with token
4. **HF API** β Returns generated text
5. **System** β Uses real LLM response β
### No More Fallbacks
- β No knowledge base fallback
- β No template responses
- β
Always uses real LLM when available
- β
GPT-2 fallback if model loading (503 error)
## Verification
### Test Your Setup:
Ask: "What is 2+2?"
**Expected:** Real LLM generated response (not template)
**Check logs for:**
```
llm_router - INFO - Calling HF API for model: facebook/blenderbot-400M-distill
llm_router - INFO - HF API returned response (length: XX)
src.agents.synthesis_agent - INFO - RESP_SYNTH_001 received LLM response
```
### If You See 401 Error:
```
HF API error: 401 - Unauthorized
```
**Fix:** Token not set correctly in HF Space settings
### If You See 404 Error:
```
HF API error: 404 - Not Found
```
**Fix:** Model ID not valid (very unlikely with current models)
### If You See 503 Error:
```
Model loading (503), trying fallback
```
**Fix:** First-time model load, automatically retries with GPT-2
## Current Models in Config
**File:** `models_config.py`
```python
"reasoning_primary": {
"model_id": "facebook/blenderbot-400M-distill",
"max_tokens": 500,
"temperature": 0.7
}
```
## Performance Notes
**Latency:**
- Blenderbot: ~2-4 seconds
- GPT-2: ~1-2 seconds
- Flan-T5: ~3-5 seconds
**Quality:**
- Blenderbot: Good for conversational responses
- GPT-2: Basic but coherent
- Flan-T5: More factual, less conversational
## Troubleshooting
### Token Not Working?
1. Verify in HF Dashboard β Settings β Access Tokens
2. Check it has "Read" permissions
3. Regenerate if needed
4. Update in Space settings
### Model Not Loading?
- First request may take 10-30 seconds (cold start)
- Subsequent requests are faster
- 503 errors auto-retry with fallback
### Still Seeing Placeholders?
1. Restart your Space
2. Check logs for HF API calls
3. Verify token is in environment
## Next Steps
1. β
Add token to HF Space settings
2. β
Restart Space
3. β
Test with a question
4. β
Check logs for "HF API returned response"
5. β
Enjoy real LLM responses!
## Summary
**Model:** `facebook/blenderbot-400M-distill`
**Fallback:** `gpt2`
**Status:** β
Configured and ready
**Requirement:** Valid HF token in Space settings
**No fallbacks:** System always tries real LLM first
|