Research_AI_Assistant / HF_TOKEN_SETUP.md
JatsTheAIGen's picture
workflow errors debugging v14
fa57725

Hugging Face Token Setup - Working Models

βœ… Current Configuration

Model Selected: facebook/blenderbot-400M-distill

Why this model:

  • βœ… Publicly available (no gating required)
  • βœ… Works with HF Inference API
  • βœ… Text generation task
  • βœ… No special permissions needed
  • βœ… Fast response times
  • βœ… Stable and reliable

Fallback: gpt2 (guaranteed to work on HF API)

Setting Up Your HF Token

Step 1: Get Your Token

  1. Go to https://huggingface.co/settings/tokens
  2. Click "New token"
  3. Name it: "Research Assistant"
  4. Set role: Read (this is sufficient for inference)
  5. Generate token
  6. Copy it immediately (won't show again)

Step 2: Add to Hugging Face Space

In your HF Space settings:

  1. Go to your Space: https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE
  2. Click "Settings" (gear icon)
  3. Under "Repository secrets" or "Space secrets"
  4. Add new secret:
    • Name: HF_TOKEN
    • Value: (paste your token)
  5. Save

Step 3: Verify Token Works

The code will automatically:

  • βœ… Load token from environment: os.getenv('HF_TOKEN')
  • βœ… Use it in API calls
  • βœ… Log success/failure

Check logs for:

llm_router - INFO - Calling HF API for model: facebook/blenderbot-400M-distill
llm_router - INFO - HF API returned response (length: XXX)

Alternative Models (Tested & Working)

If you want to try different models:

Option 1: GPT-2 (Very Reliable)

"model_id": "gpt2"
  • ⚑ Fast
  • βœ… Always available
  • ⚠️ Simple responses

Option 2: Flan-T5 Large (Better Quality)

"model_id": "google/flan-t5-large"
  • πŸ“ˆ Better quality
  • ⚑ Fast
  • βœ… Public access

Option 3: Blenderbot (Conversational)

"model_id": "facebook/blenderbot-400M-distill"
  • πŸ’¬ Good for conversation
  • βœ… Current selection
  • ⚑ Fast

Option 4: DistilGPT-2 (Faster)

"model_id": "distilgpt2"
  • ⚑ Very fast
  • βœ… Guaranteed available
  • ⚠️ Smaller, less capable

How the System Works Now

API Call Flow:

  1. User question β†’ Synthesis Agent
  2. Synthesis Agent β†’ Tries LLM call
  3. LLM Router β†’ Calls HF Inference API with token
  4. HF API β†’ Returns generated text
  5. System β†’ Uses real LLM response βœ…

No More Fallbacks

  • ❌ No knowledge base fallback
  • ❌ No template responses
  • βœ… Always uses real LLM when available
  • βœ… GPT-2 fallback if model loading (503 error)

Verification

Test Your Setup:

Ask: "What is 2+2?"

Expected: Real LLM generated response (not template)

Check logs for:

llm_router - INFO - Calling HF API for model: facebook/blenderbot-400M-distill
llm_router - INFO - HF API returned response (length: XX)
src.agents.synthesis_agent - INFO - RESP_SYNTH_001 received LLM response

If You See 401 Error:

HF API error: 401 - Unauthorized

Fix: Token not set correctly in HF Space settings

If You See 404 Error:

HF API error: 404 - Not Found

Fix: Model ID not valid (very unlikely with current models)

If You See 503 Error:

Model loading (503), trying fallback

Fix: First-time model load, automatically retries with GPT-2

Current Models in Config

File: models_config.py

"reasoning_primary": {
    "model_id": "facebook/blenderbot-400M-distill",
    "max_tokens": 500,
    "temperature": 0.7
}

Performance Notes

Latency:

  • Blenderbot: ~2-4 seconds
  • GPT-2: ~1-2 seconds
  • Flan-T5: ~3-5 seconds

Quality:

  • Blenderbot: Good for conversational responses
  • GPT-2: Basic but coherent
  • Flan-T5: More factual, less conversational

Troubleshooting

Token Not Working?

  1. Verify in HF Dashboard β†’ Settings β†’ Access Tokens
  2. Check it has "Read" permissions
  3. Regenerate if needed
  4. Update in Space settings

Model Not Loading?

  • First request may take 10-30 seconds (cold start)
  • Subsequent requests are faster
  • 503 errors auto-retry with fallback

Still Seeing Placeholders?

  1. Restart your Space
  2. Check logs for HF API calls
  3. Verify token is in environment

Next Steps

  1. βœ… Add token to HF Space settings
  2. βœ… Restart Space
  3. βœ… Test with a question
  4. βœ… Check logs for "HF API returned response"
  5. βœ… Enjoy real LLM responses!

Summary

Model: facebook/blenderbot-400M-distill Fallback: gpt2
Status: βœ… Configured and ready Requirement: Valid HF token in Space settings No fallbacks: System always tries real LLM first