Spaces:

JatinAutonomousLabs
/

Research_AI_Assistant

Sleeping

App Files Files Community

Research_AI_Assistant / HF_TOKEN_SETUP.md

JatsTheAIGen

workflow errors debugging v14

fa57725 about 2 months ago

preview code

raw

history blame contribute delete

4.51 kB

	# Hugging Face Token Setup - Working Models

	## ✅ Current Configuration

	### Model Selected: `facebook/blenderbot-400M-distill`

	Why this model:
	- ✅ Publicly available (no gating required)
	- ✅ Works with HF Inference API
	- ✅ Text generation task
	- ✅ No special permissions needed
	- ✅ Fast response times
	- ✅ Stable and reliable

	Fallback: `gpt2` (guaranteed to work on HF API)

	## Setting Up Your HF Token

	### Step 1: Get Your Token

	1. Go to https://huggingface.co/settings/tokens
	2. Click "New token"
	3. Name it: "Research Assistant"
	4. Set role: Read (this is sufficient for inference)
	5. Generate token
	6. Copy it immediately (won't show again)

	### Step 2: Add to Hugging Face Space

	In your HF Space settings:
	1. Go to your Space: https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE
	2. Click "Settings" (gear icon)
	3. Under "Repository secrets" or "Space secrets"
	4. Add new secret:
	- Name: `HF_TOKEN`
	- Value: (paste your token)
	5. Save

	### Step 3: Verify Token Works

	The code will automatically:
	- ✅ Load token from environment: `os.getenv('HF_TOKEN')`
	- ✅ Use it in API calls
	- ✅ Log success/failure

	Check logs for:
	```
	llm_router - INFO - Calling HF API for model: facebook/blenderbot-400M-distill
	llm_router - INFO - HF API returned response (length: XXX)
	```

	## Alternative Models (Tested & Working)

	If you want to try different models:

	### Option 1: GPT-2 (Very Reliable)
	```python
	"model_id": "gpt2"
	```
	- ⚡ Fast
	- ✅ Always available
	- ⚠️ Simple responses

	### Option 2: Flan-T5 Large (Better Quality)
	```python
	"model_id": "google/flan-t5-large"
	```
	- 📈 Better quality
	- ⚡ Fast
	- ✅ Public access

	### Option 3: Blenderbot (Conversational)
	```python
	"model_id": "facebook/blenderbot-400M-distill"
	```
	- 💬 Good for conversation
	- ✅ Current selection
	- ⚡ Fast

	### Option 4: DistilGPT-2 (Faster)
	```python
	"model_id": "distilgpt2"
	```
	- ⚡ Very fast
	- ✅ Guaranteed available
	- ⚠️ Smaller, less capable

	## How the System Works Now

	### API Call Flow:
	1. User question → Synthesis Agent
	2. Synthesis Agent → Tries LLM call
	3. LLM Router → Calls HF Inference API with token
	4. HF API → Returns generated text
	5. System → Uses real LLM response ✅

	### No More Fallbacks
	- ❌ No knowledge base fallback
	- ❌ No template responses
	- ✅ Always uses real LLM when available
	- ✅ GPT-2 fallback if model loading (503 error)

	## Verification

	### Test Your Setup:

	Ask: "What is 2+2?"

	Expected: Real LLM generated response (not template)

	Check logs for:
	```
	llm_router - INFO - Calling HF API for model: facebook/blenderbot-400M-distill
	llm_router - INFO - HF API returned response (length: XX)
	src.agents.synthesis_agent - INFO - RESP_SYNTH_001 received LLM response
	```

	### If You See 401 Error:
	```
	HF API error: 401 - Unauthorized
	```
	Fix: Token not set correctly in HF Space settings

	### If You See 404 Error:
	```
	HF API error: 404 - Not Found
	```
	Fix: Model ID not valid (very unlikely with current models)

	### If You See 503 Error:
	```
	Model loading (503), trying fallback
	```
	Fix: First-time model load, automatically retries with GPT-2

	## Current Models in Config

	File: `models_config.py`

	```python
	"reasoning_primary": {
	"model_id": "facebook/blenderbot-400M-distill",
	"max_tokens": 500,
	"temperature": 0.7
	}
	```

	## Performance Notes

	Latency:
	- Blenderbot: ~2-4 seconds
	- GPT-2: ~1-2 seconds
	- Flan-T5: ~3-5 seconds

	Quality:
	- Blenderbot: Good for conversational responses
	- GPT-2: Basic but coherent
	- Flan-T5: More factual, less conversational

	## Troubleshooting

	### Token Not Working?
	1. Verify in HF Dashboard → Settings → Access Tokens
	2. Check it has "Read" permissions
	3. Regenerate if needed
	4. Update in Space settings

	### Model Not Loading?
	- First request may take 10-30 seconds (cold start)
	- Subsequent requests are faster
	- 503 errors auto-retry with fallback

	### Still Seeing Placeholders?
	1. Restart your Space
	2. Check logs for HF API calls
	3. Verify token is in environment

	## Next Steps

	1. ✅ Add token to HF Space settings
	2. ✅ Restart Space
	3. ✅ Test with a question
	4. ✅ Check logs for "HF API returned response"
	5. ✅ Enjoy real LLM responses!

	## Summary

	Model: `facebook/blenderbot-400M-distill`
	Fallback: `gpt2`
	Status: ✅ Configured and ready
	Requirement: Valid HF token in Space settings
	No fallbacks: System always tries real LLM first